Did you know that in recent Python-Blosc2 4.2.0 we released extremely efficient indexing engines whose data are stored in... (guess what) ...compressed state?

As a result, much larger tables can be indexed. Look at how this compares with the indexing engine in DuckDB (ART) in the plots below.

Also, the new indexing facility allows you to use whatever indexing engine (bucket, partial, opsi and full) that better adapts to your needs.

Enjoy data surfing!

#TabularData #ColumnarStorage #Fast

Python-Blosc2 4.2.0 is out: get ready for data surfing! 🏄‍♂️

Have fun with the new CTable object:

* columnar storage format
* SOTA compression
* transparent in-memory and disk-based operation
* support for missing/null values
* multiple indexing engines for fast queries in large tables
* easy interoperability with parquet, arrow, and pandas-like ecosystems

Compress Better, Analyze Faster, Integrate Easier

Release notes: https://github.com/Blosc/python-blosc2/releases/tag/v4.2.0

#TabularData #ColumnarStorage #Fast #EnjoyData

Behold! A revolutionary revelation: columnar storage isn’t an alien conspiracy but merely a close cousin to relational databases! 🤯 Because, who knew that rearranging data could still count as normalizing? Next week: 🥁 water is wet!
https://buttondown.com/jaffray/archive/columnar-storage-is-normalization/ #columnarstorage #relationaldatabases #datarearrangement #normalization #technews #HackerNews #ngated
Columnar Storage is Normalization

Something I didn't understand for a while is that the process of turning row-oriented data into column-oriented data isn't a totally bespoke, foreign concept...

NULL BITMAP by Justin Jaffray
Columnar Storage is Normalization

Something I didn't understand for a while is that the process of turning row-oriented data into column-oriented data isn't a totally bespoke, foreign concept...

NULL BITMAP by Justin Jaffray