Robert Luciani & Hao Huang

Fairly recently we came across a paper by Li et.al on generating loss surfaces. The plots looked so cool, we immediately wanted to try it ourselves. Unfortunately, the paper’s associated GitHub repo had so much Python code for this one task, there was no way we were going to read through all that. Instead, with the research paper as a guide and a bit of hacking, we ended up with a few dozen lines of Julia code and some sweet looking plots. Not to say the two implementations are equivalent, but still, rendering a 51×51 pixel surface is comparatively “interactive” with our code:

Reference implementation: 60 minutes on 4x GTX 1080 Ti
This implementation:       4 minutes on GTX 1080

image2

A QUICK PRIMER ON GENERATING LOSS LANDSCAPES

We begin by producing a set of Gaussian random set of weights (a vector) ϕ similar in shape and scale to a trained model θ. We do it again for another random vector ψ. They will act as the x and y axes the final plot. We then do piecewise linear interpolation between the three vectors and plot out the loss at those points. That’s basically it!

An intriguing part about this method is that each time the code is run, the generated landscape looks nearly identical. It’s a neat result because the vectors that we map to plot axes are chosen essentially at random, with no consideration given to perpendicularity. In theory, assuming the weights of the network are not degenerate, one might be able to solve for two perfectly orthogonal vectors so that the resulting landscape would be more “true”. In practice though, the nature of high-dimensional space seems to render such a step unnecessary as any set of random vectors are “orthogonal enough”, hence the consistent results.

THE JULIA PROGRAMMING LANGUAGE

The combined expressiveness and high performance of Julia makes it very well suited for Machine Learning work. You can grab mathematical pseudocode from a science book and basically copy it verbatim in Julia, and on the first run it will execute both as intended and blazingly fast. Ultimately, it lets programmers move seamlessly between being mathematicians and computer engineers.

THE CODE

Feel free to try the code out yourself. It is conveniently hosted on a notebook on our github. Have fun!