fcase — fcase • data.table

Exploring a parallel syntax with #RDataTable

```
data |>
_[, .(x = async(long(x))), by = .(group1, group2)] |>
collect_async()
```

Is syntax sugar for

```
data |>
_[, .(x = list(future::future(long(x)))), by = .(group1, group2)] |>
_[, x := future::value(x[[1]]), by = .(group1, group2)] |>
_[]
```

#RStats

data challenge: rolling median

library(data.table)
set.seed(108)
x = rnorm(1e8)
n = 1000
frollmedian(x, n) |> system.time()
# user system elapsed
# 8.439 0.727 3.212

#pandas #polars #data #datascience #rdatatable #rstats

Look under your tree! 🌲 🎁

There's a major #rstats #rdatatable release waiting! @r_data_table

A tremendous thanks to all involved and especially those contributing to some major (performance-maintaining!!) rewrites around the non-API issue.

Thanks also to the CRAN team and Luke for their patience+working with us to get it over the line.

https://cran.r-project.org/web/packages/data.table/index.html

data.table: Extension of 'data.frame'

Fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast add/modify/delete of columns by group using no copies at all, list columns, friendly and fast character-separated-value read/write. Offers a natural and flexible syntax, for faster development.

@kernpanik Usually, I also try to stick to base #rstats or lightweight packages (#tinyplot, #tinytable, #rdatatable, ...). Methinks, since most tutorial promote the tidyverse, some do not know base equivalent. However, base data frame operations may require more careful handling of row order, factor levels, and preserving the data frame structure. dplyr maintains a consistent behavior across grouped operations.
@pglpm The only reason I don't use {collapse} is because usually what I want is already covered by #RDataTable
With #macos #tahoe is there out of the box #Multithreading for #rdatatable ? Or is there still manual compilation necessary with obscure flags? #rstats

Switching from #rmarkdown and #rdatatable to #quarto and #polars is a bit cumbersome. I just want to compile a document with tables to pdf.

If I print a polars table, I get the data type with it. If I convert it to pandas df, I get an index. If I set_tbl_hide_column_data_types, my strings get quotes. Is there no #knitr kable equivalent in #Python /Quarto?

So far, I have been a very enthusiastic user of #rstats, #rdatatable , #rmarkdown and #ggplot2 in #RStudio . I am looking for an equally effective, modern #python based setup . So far, I think I will go for #polars, #quarto and #plotnine in #vscode with #uv as package manager . Does anybody have suggestions about pitfalls in switching or advice for the setup? Or are there potentially better alternatives?

data.table giving bizarre results on my system when compiled with the intel compiler. This is just a simple mean by time. The the GForce version goes all wacky. Using base::mean() returns to sanity.

#RDataTable #RStats