I'd really like #awk with #dplyr select semantics. Or, I guess what I really want is a good way of streaming through a file in #Rstats without running into memory issues when I'm doing basic filtering. It feels like this should be doable in R using #arrow or #duckdb or even read_delim_chunked or something, but I haven't gotten it to work reliably yet. What are y'all's best tips to do handle this?
Maybe #qsv #xsv or #csvkit are suitable tools for this? Though a solution based on R would be neat...
@nikostr Multiplyr is often forgotten about, but I'm not sure how different it would be in terms of workflow than using duckdb or arrow. Actually it's probably closer to {sparklyr}. https://multidplyr.tidyverse.org/
A Multi-Process dplyr Backend

Partition a data frame across multiple worker processes to provide simple multicore parallelism.

@nikostr It is a great trick, and I wrote a few SO answers with it as the one below. The `pipe()` connection really is wonderful in combination with the standard csv readers; `data.table::fread` can also read from commands. #rstats

https://stackoverflow.com/questions/18877120/reading-specific-row-of-file-depending-upon-the-first-column-value/18879449#18879449

Reading specific row of file depending upon the first column value

I have a file which have different record type in different row and this can be identified using the first column value of the row, a sample data set is given below V1 V2 V3 V4 1 ABC DEF...

Stack Overflow