Hi everyone! It’s been over a year since I first blogged about Julia and why its mathematical syntax, metaprogramming abilities, and blazing speed make it an awesome language for data scientists. In this post I want to follow up by comparing it directly with R and Python, currently the most popular languages for data science. Now, it’s no secret these two are not the fastest languages of all, but is it obvious exactly how much performance is sacrificed for the cause? Insofar as their mathematical or scientific expressiveness is concerned… I’ll leave that one up for grabs.
Code was run on my laptop, a Dell XPS with a Core i7-7700HQ @ 2.80GHz 32GB RAM
The example is a golden oldie, namely the Escape Time Algorithm for rendering the Julia set using the quadratic polynomial: and the complex parameter
The algorithm finished in a bit over 2 seconds.
Highly legible. No special optimizations. Sweet!
Let’s try the same thing in Python (and Numpy).
Python took about 3 minutes. It was ~72x slower than Julia. Hmm…
The code is fairly legible in this example, though we see in the double nested loop a preview of the sort of object-oriented syntax that quickly obfuscates mathematical formulas. Now, we could probably start tweaking it to make it faster right?
We are all taught that to get good performance in R and Python, the
code entire algorithm needs to be refactored so that it’s “vectorized”. That way hot code can be offloaded to a faster language (like BLAS functions written in Fortran) without having to make an expensive function call for every element of a data set. This sort of vectorization is not always simple or possible, and when it is possible, it can force you to express algorithms in a contrived way. But before we compare vectorized code with regular code, let’s try running the algorithm again in R.
I guess that’s it then.
Take note that the chart below uses a logarithmic scale.
1 – Is Python slow? Yes.
2 – R is absurdly slow
R code… wow. It’s not just slowish… it’s positively glacial compared to other languages. If you’re hoping to work with anything bigger than your average Excel file, you can pretty much forget about it. Some vendors even have products that spread R code across multiple servers to achieve “high scalability”. I’m sorry but that just means you have peanut-butter-code running on all your machines instead of one. It would be one thing if R was syntactically elegant, but that’s not the case either.
3 – Python and R do not speak mathematics
Neither Python nor R are very expressive when it comes to mathematical concepts, and a large part of it has to do with them not being homoiconic . The clumsy way they handled complex types in the code above is one example. What code looks better to you?
julia > c = (φ-2) + (φ-1)im R > c <- complex(real=-0.38, imaginary=0.61)
Their inability to conveniently work with high precision arithmetic is another example – which might not matter when drawing fractals, but when doing finance is a big deal. As a result, following code in R can never return TRUE:
R > 0.1 * 3 == 0.3 FALSE
I find it intolerable that code which I know can be run in 2 seconds in Julia, should take nearly a quarter of an hour to run in Python and R. It would be one thing if their code were such a pleasure to read that it’d keep you busy for that whole time, but that’s not the case. R and Python do have one upside though – lots of questions on StackOverflow. 😉
At LakeTide we use Julia for almost everything, from number crunching on DC/OS clusters to controlling robots with RaspberryPi. Fast code is important, but so is having fun! Fortunately you don’t have to choose. Give Julia a try! http://www.julialang.org