This is a small example of how to build a MNIST classifier with MXNet and Julia using executors with customized optimizer and accuracy functions.
MXNet is a deep learning framework that is very well suited for parallel computations over several GPUs, i.e. it’s a trustworthy ally when deploying and training large scale neural networks. Building deep learning models in MXNet easily allows you to switch between an imperative and a symbolic API. While this is well documented with smooth and clear tutorials in bigger languages like Python and R it is not as easy to find in LakeTide’s favourite language, Julia. Since this is one of the main features of the framework I wanna give you a quick walk through of how to build a simple classifier of the MNIST data set using a both the symbolic API and custom objects. It is wholly possible to use MXNet using only the symbolic API, see here, but that does not provide the same freedom for personal specifications, such as calculating your own gradients and designing your own optimization functions.
Start by obtaining data providers from the MNIST data set. To create personalized data providers see the documentation.

Here we define a tiny network with two hidden layers using symbolic nodes as usual.
 
Normally, if we were just using symbols, we would define a feed-forward model and then train it using a pre-defined optimizer like gradient descent. In this case we begin by creating a custom executor using mx.simple_bind to the symbolic graph, and specify the shape of the input data. This creates the executor object which contains forward and backward propagation functions.

The next step is to create dictionaries as placeholders for all the parameters of your network.

Now we initialize the weights for all the layers before training.

It is fairly easy to implement any fancy update function you want for your weights. Here I’ve chosen to implement Stochastic Gradient Descent. Note that this particular example is not generalized for arbitrary network structures. 

We also need to create a function that calculates the accuracy of a prediction.
 
Now we are ready to calculate the forward and backward passes that are required for training. Let us begin by running over the data set for 15 epochs.
 

As you can see, the network converges and the accuracy is calculated for every epoch. In a few simple steps we have implemented our own optimizer and accuracy calculator. In this example, because the network ends with a softmax function and nothing else is specified, the executor uses a built in log-loss function to calculate the gradient. However, that would be a good next function to rewrite if you want to continue learning the imperative API of MXNet. Good luck, have fun!