UPDATE 1/15/2014: This blog is no longer in service.
This post is now located at: http://slendermeans.org/julia-loops.html
Although I agree with many of your points, I think that it’s problematic to teach young programmers to dislike for/while loops. How would you express a sequential, memory-in-place algorithm like the Gibbs sampler for the Ising model using list comprehensions? I suspect it can be done by a sufficiently clever person, but why should we go to great lengths to make algorithms harder to express?
For me, list comprehensions and other higher-order structures are very powerful tools, but are best used as complements to the basic iteration constructs that modern computers were designed to perform efficiently — rather than treated as replacements. There are many problems for which explicit iteration with mutating state is the most natural formulation. What makes Julia so appealing to me is precisely that it embraces every school of programming.
I think it’s problematic to teach young programmers to dislike for loops too! But I think it’s a shame that many OLD scientific programmers never learn to take advantage of functional constructs and more sophisticated data structures to make better code. They learn a certain imperative style and they use it everywhere.
I wasn’t trying to say that you should never use for loops (or even that you should rarely use for loops). For loops are important! I don’t think, for example, a pure FP framework that would prevent looping over a changing state, would be right for most scientific computing problems.
So I also prefer languages languages that don’t enforce on specific paradigm: this is a big part of what makes Python and R so great. People often complain about the limits and warts of OO and FP tools in these languages, but the tools are there together, and you can mix and match them with imperative code, which I think is a good tradeoff for scientific programmers.
My main fear (and it’s not a big one; smarter folks than me are dealing with these things) is that Julia get sold short as a language that will speed up someone’s crappy 5-deep for loop in R. First, I think that severely underestimates what is powerful about Julia. And secondly, it excuses bad programming; I’d rather those programmers think about how to better express their problem than just throw it at a better compiler.
This doesn’t apply to you or to a lot of other people–I didn’t mean to pick on your code (which I don’t really have an problems with)–I think I just had a visceral reaction to the message of “Hey everybody! Fast loops!” You appreciate that power because you know what talented programmers can do with it; I was griping because I was thinking about what untalented programmers will do with it (because I’m a bastard). I think Julia looks like Matlab on the LLVM to some people, and they’ll just carry over awful Matlab habits, especially when there’s no performance penalty for doing so.
I’d be interested to hear how you’d suggest pitching Julia’s more advanced capabilities to novice programmers. Many of Julia’s best features (e.g type inference, macros) seem to require a mindset shift that’s difficult to articulate in short soundbites. I’ve often chosen things like loops or recursion because they’re easy to understood and dramatic in their effects.
Why is Julia it’s own thing? Couldn’t the development efforts be put into R, or is that changing too much of what fundamentally makes R?
I think Julia has enough new ideas, both inside and out, to justify existing on its own. I don’t know how feasible it is to get R to perform like Julia, so that you can write new parts of R in R and not suffer speed. But Julia’s type system is richer than R’s, for example, and it’s got macros, and other things that I think would be too much of change to R’s DNA. In other words, by the time you’re changing that much, the questions becomes, “Why are we doing all this to R, why not just use Julia?”
Your question, which is a good and common one, I think gets to my discomfort with Julia marketing. If your sales pitch is “It’s fast R!” Then the obvious question is, “Why not just make R faster?”
we may all be coding in Julia in 5 years from now, but for the time being I am still very skeptical of the language and don’t think speed alone or loops are a strong selling point. Regarding R: nowhere in the Morandat Paper is it mentioned that it would be nearly impossible to implement it efficiently (possibly with a type system). Namely, from page 18: “We believe that with some effort it should be possible to improve both time and space usage, but this would likely require a full rewrite of the implementation”. By the way, there is not much wrong with breaking backward compatibility for R and freezing R 3.x. Packages can be ported more easily than the can be rewritten in Julia.
Julia’s main value proposition seems to avoid the two-language problem. But either speed (with or without loops) is not a problem, or it’s the only problem, in which case no scientific programmer would go for anything less than the trustworthiness and speed of FORTRAN and C++. Julia is stuck in the middle. The programmer’s value is convex on the language simplex (his/her language mix), not convex. That’s partly why the number of actively maintained languages grows over time.
I think that focusing on writing an R targeting LLVM with a few syntax additions, an optional type system would have won the huge user community and facilitated package porting. It would have probably secured a preeminent place in open-source scientific computing. Julia is a valiant effort but I can’t see that happening. Python and R have too many compelling advantages in the high-level camp, and FORTRAN and C++ in the low-level one.
Several good points here. My responses are in random order.
I think on a re-implementation of R targeting the LLVM and addressing some of the issues discussed in the Purdue paper would have gotten a lot of positive attention from R users (if not R Core). I’m also not sure why that would be an insurmountable undertaking — Julia and, e.g., PyPy, were written by teams with relatively modest resources in manpower and money, and got by with just having some compiler geniuses.
Though, I think that project would have to basically be a fork of R. I can’t imaging R Core going this route ever. Even really basic performance patches to Base R get a bristly reception. The attitude seems to be “It’s Fast Enough; What’s Your Hurry?” or “Let Them Eat C++.”
On whether Julia’s speed is a sufficient sales proposition: I don’t know. I think it’s more likely to pick up newer programmers than convert someone who’s been working at an advanced level in R or Python. Fixing the annoyance of dropping into C or Cython isn’t really enough to get people to switch away from libraries they know and legacy code they can leverage. Like I mentioned in the post, though, if I were a Matlab coder whose license ran out and was cast into the wilderness, Julia would be a natural replacement. And if I were an MIT student who had used it in a course, I’d probably keep hacking on it afterwards.
I think your description about how scientific programmers think about speed is not quite right — there’s certainly a large number of programmers who would trade multiple-hour simulation runs for 20 minute simulation runs, even if pushing the thing to C or C++ would get that down to 5 minutes. Much of scientific programming (well, programming in general, really) is an effort to hit the sweet spot in the [Flexibility, Programmer Resources, Machine Resources] simplex. I think what you say is right for some population of coders, but not all.
I feel like I’m sounding negative on Julia — I’m not. The fact that there’s a low barrier to entry for writing high performance libraries is a huge deal. As is built-in parallelism and meta-programming. I think those types of features are more likely to get me interested in the language than more microbenchmarks. I’d like to hear more sales pitches that go: “Here’s some neat code that you’d have a hard time writing in R/Matlab/Python” rather than, “You know that code you write in R/Matlab/Python? Here it is going faster.”
“I think your description about how scientific programmers think about speed is not quite right.”
Maybe, but I know quite a few scientific programmers, and think about speed in a very similar way, and you’re probably biased by single-workstation experience. Some examples: i) the entirety of computational fluidodynamics libraries are written in FORTRAN; ii) all ab initio simulators for molecular dynamics (e.g., Amber) are written in C++; iii) All finite analysis sotware is in highly optimized FORTRAN; iv) the entire workload on the Blue Gene Cluster was generated by C++ and FORTRAN code. The reason is, that when execution time is binding on a large job, even a 2x improvement makes all the difference. You can object that the majority of scientific programmers run opportunistic code on some multicore machine, and that those happen to be simulations (e.g., some MCMC). I have seem Julia advocates mentioning this use case several times. But, there’s already some very fast C++ code that addresses the issue and is callable from R. Maybe not as flexible, but most likely faster. Perhaps, there’s a use benefit to these users in switching to Julia, but I think it’s not compelling enough a reason. My bet is that Python will take over most traditional MATLAB users and Julia will join the rank of niche languages with a devoted user base.
Pingback: A Beginner's Look at Julia | randyzwitch.com
With languages such as R, IDL, Matlab, etc there lies a strength of speed… but not so much the speed of computation but the speed of expression. This is exactly where these languages succeed and others, python, C#, Java, fail miserably.
The python group surely makes for wonderful programming, but if your focus is on exploration it’s the wrong tool for the job.
Comments are closed.
Get every new post delivered to your Inbox.
Join 37 other followers