Stop staying you can't put R in prod.
I made a blog post with my thoughts and reactions.
https://josiahparry.com/posts/2023-07-06-r-is-still-fast.html
Stop staying you can't put R in prod.
I made a blog post with my thoughts and reactions.
https://josiahparry.com/posts/2023-07-06-r-is-still-fast.html
@josiah How about maintainability?
Never met a developer who liked R, and to put things into production, you have to work with devs.
The new people coming into data science all know python. Even among older data scientists, you rarely find one that prefers R (unless they're a statistician working in academia).
Who is going to maintain that code after the person who wrote it leaves for another company?
@jrosell quarto also supports python. So it's a hard sell.
The thing R has got going for it is the plots, they are pretty.
I'm sorry, but the rest just lags behind, from the syntax to the ecosystem (except very specific stats packages).
Just having array indexes starting at 1 will make a Dev's skin crawl.
@milesmcbain @jrosell never said there have not been any ideas that started in R, or that there are no specific fields that use R (specially in academia).
But if you talk with data scientists working in most companies, it's very rare to see R used in production (and when you do see it, it's usually legacy code).
EDIT: and by production, I don't mean some one-off analysis that you made for a client, I mean something that's handling requests on a kubernetes cluster.
Well, in those cases, R is not really doing any live computation or handling requests. You only use it to generate HTML files, and those are the ones that are in the production server.
The blogpost was about having R being used to handle requests in a production server.
EDIT: I'm not saying that a setup with R doing these things is impossible, but it's rare, which means finding people to maintain it is hard. This has to be a main consideration when putting in prod
@orizuru @jrosell I think you’re probably right in that using R for the back-end of high load online ML systems is rare. But I’d also argue that businesses where those kinds of systems are viable are also quite rare when you consider the universe of businesses that are trying to derive value from data.
The definition of ‘production’ being used in the original blog post and this conversation is quite narrow.
Regarding nightly runs, yes R can do that, and if you have a DS team that favours R, then go for it.
However, I don't think R has that much of an advantage on this (besides syntax preference). Python has a very mature ecosystem, e.g. pandas, sklearn, pytorch, etc. And if you need some extra speed in a data pipeline, you could always go with pyspark (to distribute across machines) or polars (to run on the same machine in parallel).
Yes, that still proves my point: R does not have have any functionality that python does not have in this scenario of nightly runs.
The choice to use R over Python is basically because the team is used to it. It's just syntax preference.
I admit that I haven't been keeping up to date with deep learning in R. But I would also guess it's not as common as pytorch (in case you want to work with / extend other people's models).
In my opinion, if you want o use a less-common language / framework, it should have a very strong selling point.
As you are introducing constraints on the number of people you could hire to maintain the infrastructure.