Mastodawn

Josiah

Jul 6, 2023

Stop staying you can't put R in prod.

I made a blog post with my thoughts and reactions.

#rstats #rust #putRinprod

https://josiahparry.com/posts/2023-07-06-r-is-still-fast.html

Josiah Parry - R is still fast: a salty reaction to a salty blog post

Show thread

orizuru Jul 6, 2023

@josiah How about maintainability?

Never met a developer who liked R, and to put things into production, you have to work with devs.
The new people coming into data science all know python. Even among older data scientists, you rarely find one that prefers R (unless they're a statistician working in academia).

Who is going to maintain that code after the person who wrote it leaves for another company?

Show thread

Jordi Rosell Jul 6, 2023

@orizuru show them quarto, they will love R for generating static websites https://r4ds.hadley.nz/quarto-formats.html#websites-and-books

R for Data Science (2e) - 30 Quarto formats

Show thread

orizuru Jul 6, 2023

@jrosell quarto also supports python. So it's a hard sell.

The thing R has got going for it is the plots, they are pretty.
I'm sorry, but the rest just lags behind, from the syntax to the ecosystem (except very specific stats packages).

Just having array indexes starting at 1 will make a Dev's skin crawl.

Show thread

MilesMcBain Jul 6, 2023

@orizuru @jrosell funny how many ideas that started in our dead language in just the last 5 years keep getting ported to these mainstream ones.

Show thread

orizuru Jul 7, 2023

@milesmcbain @jrosell never said there have not been any ideas that started in R, or that there are no specific fields that use R (specially in academia).
But if you talk with data scientists working in most companies, it's very rare to see R used in production (and when you do see it, it's usually legacy code).

EDIT: and by production, I don't mean some one-off analysis that you made for a client, I mean something that's handling requests on a kubernetes cluster.

Show thread

Jordi Rosell Jul 7, 2023

@orizuru @milesmcbain "production" can algo mean code that generates html files that are latter served on kubernetes cluster

Show thread

orizuru Jul 7, 2023

@jrosell @milesmcbain

Well, in those cases, R is not really doing any live computation or handling requests. You only use it to generate HTML files, and those are the ones that are in the production server.

The blogpost was about having R being used to handle requests in a production server.

EDIT: I'm not saying that a setup with R doing these things is impossible, but it's rare, which means finding people to maintain it is hard. This has to be a main consideration when putting in prod

Show thread

MilesMcBain Jul 7, 2023

@orizuru @jrosell I think you’re probably right in that using R for the back-end of high load online ML systems is rare. But I’d also argue that businesses where those kinds of systems are viable are also quite rare when you consider the universe of businesses that are trying to derive value from data.

The definition of ‘production’ being used in the original blog post and this conversation is quite narrow.

Show thread

MilesMcBain Jul 7, 2023

@orizuru @jrosell like there would be many fold more businesses where nightly batch runs for preds / recs / ranks which are then fed into a database and distributed from there to front end apps via standard queries would be more than sufficient.

Show thread

orizuru

@milesmcbain @jrosell

Regarding nightly runs, yes R can do that, and if you have a DS team that favours R, then go for it.

However, I don't think R has that much of an advantage on this (besides syntax preference). Python has a very mature ecosystem, e.g. pandas, sklearn, pytorch, etc. And if you need some extra speed in a data pipeline, you could always go with pyspark (to distribute across machines) or polars (to run on the same machine in parallel).

Show thread

Jordi Rosell Jul 7, 2023

@orizuru @milesmcbain or use h2o or dplyr with sparkylr backend in R

Show thread

orizuru Jul 7, 2023

@jrosell @milesmcbain

Yes, that still proves my point: R does not have have any functionality that python does not have in this scenario of nightly runs.
The choice to use R over Python is basically because the team is used to it. It's just syntax preference.

I admit that I haven't been keeping up to date with deep learning in R. But I would also guess it's not as common as pytorch (in case you want to work with / extend other people's models).

Show thread

orizuru Jul 7, 2023

@jrosell @milesmcbain

In my opinion, if you want o use a less-common language / framework, it should have a very strong selling point.
As you are introducing constraints on the number of people you could hire to maintain the infrastructure.

Show thread

Jordi Rosell Jul 7, 2023

@orizuru @milesmcbain dplyr enables changing backend, like iris can do in python. But iris is not as popular as dplyr