I have a very large .csv file with a numerical matrix saved within. I need to calculate the mean of many selections of values in each row. (e.g. In each row, the mean of the values at index 1, 3, 52, 123; then, in the same line, values 2, 3, 12, 29, 67, etc...)

My file is HUGE, even in row length (a row has like, 8000+ items), so I need this to be fast. I know the indexes I need to average at the start of the computation, but don't have enough memory to load the whole file at once.

I want to do this in #rust but I don't know how to do it fast enough. Any tips?

@MrHedmad If I interpret your problem correctly, you probably don't want to use Parquet or Arrow for this (as others have suggested), because your operation is fundamentaly row-based and not column-based.

You can use `csv` crate in a "streaming" fashion by reusing your `StringRecord` in every iteration of row parsing:

https://docs.rs/csv/latest/csv/struct.Reader.html#method.read_record

Something like this:
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=3f6ba982764f8e97a20bf718d4546f25

It will basically only require memory for one row at a time.

1/2

#Rust #RustLang #CSV #Performance

Reader in csv - Rust

A already configured CSV reader.

@MrHedmad If you want to go even further, you might be able to split your file into chunks and have threads operate on those chunks individually. However, this will be more complicated, because how will you determine the correct boundary on where to split without resulting in invalid csv?

Here is a very interesting discussion in the Rust forum about this (with a normal txt file, though, and not csv)

https://users.rust-lang.org/t/reading-a-file-4x-faster-using-4-threads-works-threaded-is-faster/41180

2/2

Reading a file 4x faster using 4 threads (Works - threaded is faster!)

Hello I want to make a small program to search a word in a file, but I'd like to implement threads to do it faster. This is how I'd do it: With std::io::Seek I can change the file-pointer. The file-pointer is where Rust starts reading a file. Everytime a character is read/scanned it increases by one. I'd make a function like this search_for_word(word : String, start : u64, end: u64) where word is the word we're searching for, start is the position where the function will set the file-poi...

The Rust Programming Language Forum