julienledem

179 Followers
112 Following
110 Posts
Architect, Founder, Angel, Advisor, OSS: OpenLineage Marquez, Apache: Parquet Arrow Iceberg 🐖 
he/him
https://julien.ledem.net on 🦋
@J_ on Twitter
Column Storage for the AI era

(illustration hand generated in 1958) “Column Storage for the AI era” © 2025 by Julien Le Dem is licensed under CC BY-NC-SA 4.0. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-sa/4.0/ Julien Le Dem ([email protected]) Column Storage for the AI era M.C. Escher, Be...

Google Docs

In the past few years, we’ve seen a cambrian explosion of new columnar formats, challenging the hegemony of Parquet. Presumably, the design of yore is not going to cut it moving forward. I spent some time to understand a bit better how things actually changed.

https://sympathetic.ink/2025/12/11/Column-Storage-for-the-AI-era.html

Column Storage for the AI Era

In the past few years, we’ve seen a Cambrian explosion of new columnar formats, challenging the hegemony of Parquet: Lance, Fastlanes, Nimble, Vortex, AnyBlox, F3 (File Format for the Future). The thinking is that the context has changed so much that the design of yore (the previous decade) is not going to cut it moving forward. This seemed a bit intriguing to me, especially since the main contribution of Parquet has been to provide a standard for columnar storage. Parquet is not simply a file format. As an open source project hosted by the ASF, it acts as a consensus building machine for the industry. Creating six new formats is not going to help with interoperability. I spent some time to understand a bit better how things actually changed and how Parquet needs to adapt to meet the demands of this new era. In this post I’ll discuss my findings.

The Sympathetic Ink Blog
If you follow me on here, you should follow me on there. https://bsky.app/profile/julien.ledem.net
Julien Le Dem (@julien.ledem.net)

Principal Engineer, Founder, Angel, Advisor, OSS. LFAI&data: OpenLineage, Marquez, ASF: Parquet, Arrow, Iceberg, 🐖 he/him. Me: https://julien.ledem.net/ Blog: https://sympathetic.ink

Bluesky Social
[Chill Data Summit SF 2024] Iceberg and the deconstructed database

Iceberg and the Deconstructed Database Julien Le Dem: Principal Engineer at Datadog @J_ The advent of the Open Data Lake

Google Docs
The hardest problem in computer science is neither naming things nor off by one errors. It’s talking to other people.
I have the 14 inch macbook pro at the new job and my main complaint is there is less room for stickers than the 16 inch.
It’s happened! The Apache Parquet Java implementation repo I now called parquet-java. Thank you Andrew Lamb for the nudge! This further clarifies that Parquet is used far beyond the Hadoop ecosystem. Maybe whoever created this repo could have thought of this to start with.
New laptop, new stickers.

I am happy to announce that I have started a new position!

The cat is out of the bag but one question lingers… Who let the dogs out?
Woof! Woof!

I am now Principal Engineer at Datadog.
Who could resist the puppy-dog eyes of this data platform?

I wrote a new post on the Sympathetic Ink blog to sum up The Deconstructed Database and what makes it composable.

Learn more about the role of Parquet, Arrow, DataFusion, Iceberg, Calcite and OpenLineage

https://sympathetic.ink/2024/04/29/The-Deconstructed-Database.html

The Deconstructed Database

By Julien Le Dem

The Sympathetic Ink Blog