Updated date:

differences between Python and R

Python & pandas have horrible assignment semantics. If you pass lists & data frames in a natural way then you will start to notice unexpected changes in the calling functions’ variables. After a few such experiences, you start cluttering your code with copy & deep_copy. R does conceptually deep copies (of just about everything except environments) more efficiently than whatever reasonably safe code you end up writing in Python. (MatLab had an even more efficient approach decades ago, but didn’t have to worry about circular references.) Even when you have it right, there can still be confusing warning messages. I sometimes wish for pass-by-reference behaviour in R (maybe a function such that whatever you do to ref(x) also happens to x), but not as often as I have to work around the semantics of Python.

Python has unlimited integers (up to some fraction of available memory). I haven’t found a data-science application, but it’s fun to compute the 10000th Fibonacci number.

R is more flexible about the order of named & unnamed arguments (in function calls) and about the order of formals with & without defaults. For example, paste(collapse=’,’, ….) seems natural because it says exactly what I am doing to the strings that follow. On the other hand, *args & **kwargs are more convenient than … & do.call.

I haven’t yet found the Python equivalent of environments. The debugger allows me to move up & down the call stack, but it’s not easy to move information from one frame to another. If I’m debugging in R and I’m not sure where the code blows up then I can save the environment of each function call. If there is an error then I can attempt a fix in the environment of the call that failed, compute a value, put the value into the appropriate variable in the caller, run the remaining code in the caller, and so on to top level. Moreover, variable scope is clearer to me in R than in Python.

Both R and Python allow items in one package to mask items in previously-import packages. At least R gives warnings and provides scope-resolution syntax.

R’s delayed argument evaluation is confusing at first, but it leads to more efficiency, more flexibility in default arguments, and clearer (to those who understand the idiom) function prototypes. For example “function(x=g(y), y=h(z), z=stop(‘x or y or z must be provided.’)” suggests that the function needs x but can compute it from y and can compute y from z. Presumably y is ignored if x is provided.

PEP8 is ugly and seems to have been designed to make code harder to read. On the other hand, the ‘canonical’ format in which the R debugger shows code is even worse. At least Python uses backslash for line continuation. (However, I shouldn’t need backslash in the header of an if, while, or for statement: the absence of a colon indicates that I haven’t finished the header.)

Braces around code blocks make R files easier to navigate (e.g. in emacs) than Python files.

Left-to-right assignment (available in R but not Python) cleans up some code. Moreover, R has assignment syntax that is clearly distinguishable from argument binding. This makes exploration & debugging easier, but I wouldn’t recommend anything like “setdiff(y = …. -> small_set, ….)” in production code. (It’s hard to read, and the assignment won’t happen if for any reason setdiff doesn’t use y.)

Data science seems to be an afterthought for Python. Random numbers, vectors, arrays, & data frames all require packages that are not loaded automatically. The syntax in pandas & numpy are just different enough to be annoying.

pandas seems more verbose than R: for example, I might have to say .loc or .iloc when R would just check whether I’m using logical, integer or character. On the other hand, chains of methods (with C++-like syntax) are easier to follow than the equivalent nesting of functions (whether dynamically or statically resolved) in R.

R & Python use negative indices very differently. I wish that R allowed indices offset from end as in MatLab. (That would do what Python’s negative indices do, and R could still use negative indices to omit specific entries.)

Python starts indexing at zero, which I think is better than starting at 1.

Most of the usual models are available in R & Python. A specialised or leading-edge model might be available in one or the other.

I’ll probably think of more differences, but this is what I have for now.

This content is accurate and true to the best of the author’s knowledge and is not meant to substitute for formal and individualized advice from a qualified professional.