I have come to the conclusion that the best data format for 'raw' scientific timeseries data is good-old, plain CSV (comma separated values):

- ➕ new data easily appendable to files
- 🔎 searchable with command-line tools (grep, sed, awk, perl, etc.)
- 📝 easily modifyable as well
- 🗜️ compresses like crazy ('xz -v -9' is insane, plain text compresses even better than already-dense binary measurement data in my experience)
- 📤 readable by any data analysis language

All other solutions I tried (HDF5, NetCDF4, Sqlite3, DIY binary, ...) all have their own huge downsides, be it the single-huge-file problem, bad appendability, bad compressability in conjunction, weird library problems, inconsistency between languages, bad scalability, etc.

I also don't like the whole database thing. Files are dependency-less and don't complain, they're just there and work.

I'm a #plaintext guy, I will keep saving my experiment data as #plaintext! 😌

Also, using simple files for data storage enables usage of
#gitannex for backup, tracking and synchronization 🔄

https://git-annex.branchable.com/

#git #gitannex

For post-processed data though, I think #NetCDF4 is the best format with its multiple structured and indexed array data fields which can have arbitrary metadata attached and work flawlessly with #Python's #xarray package:

https://xarray.dev/

#Python #dataanalysis

@nobodyinperson Cosigned! Plus then it's trivial to pull into Jupyter/Pandas for further analysis!
@nobodyinperson what do you think of Zarr?
@yuvipanda Didn' know it until now, thanks for the recommendation. Though I would only use it as an intermediate format to temporarily work around limitations of the other formats mentioned (e.g. thread-safety, compression speed, etc.). A language-specific data format is something I'd avoid for reproducibility and interoperability reasons.
@nobodyinperson Zarr definitely has multi lingual support (unlike say pickle) but I generally agree with you that if you can solve your problem with compressed csv you should!
@nobodyinperson in particular something like Zarr probably won’t have the command line tooling support that gz + csv (or even jsonl) will
@nobodyinperson I would agree but there is no support for complex numbers :(
@nobodyinperson Shriek! Please, NC4 if possible. As a dataviz coder, space-wasting CSV is absolutely NOT something that my app will ever import. (But if Excel is your go-to tool, then whoot!) Trying to update my code to work with Zarr, but library dependencies, sigh.
@roadskater Compressed #CSV is (in my case) actually smaller than #NetCDF4. Trade IO speed for size and data approachability.