#ApacheHudi 1.0 is now generally available!

The release introduces new features aimed at transforming data lakehouses into what the project community considers a fully-fledged "Data Lakehouse Management System" (DLMS).

Details on #InfoQ 👉 https://bit.ly/3E5AXZi

#AI #DataLake #opensource #DataAnalytics

Apache Hudi 1.0 Now Generally Available

The Apache Software Foundation has recently announced the general availability of Apache Hudi 1.0, the transactional data lake platform with support for near real-time analytics. Initially introduced

InfoQ
When to use Apache Xtable or Delta Lake Uniform for Data Lakehouse Interoperability

The value of the lakehouse model, along with the concept of “shifting left” by moving more data modeling and processing from the data warehouse to the data lake, has seen significant buy-in and…

Data, Analytics & AI with Dremio
When to use Apache Xtable or Delta Lake Uniform for Data Lakehouse Interoperability

The value of the lakehouse model, along with the concept of “shifting left” by moving more data modeling and processing from the data warehouse to the data lake, has seen significant buy-in and…

Data, Analytics & AI with Dremio

More good stuff from Grab - this time writing about how they are building a realtime datalake with tools including #apacheFlink, #apacheHudi, #apacheSpark and #TrinoDB

https://engineering.grab.com/enabling-near-realtime-data-analytics

#dataEngineering #dataArchitectures #openSource

Enabling near real-time data analytics on the data lake

As the data lake landscape matures over the years, it presents opportunities to unlock more business value from the data. This correlates with the increased demand for flexible ad-hoc usage of fresh data. This article explores how we implemented data ingestion in Hudi table formats using Flink to meet this business demand.

Grab Tech

❓❓❓HOW LAKEHOUSE TABLE FORMAT WORKS❓❓❓

1. Engine reads table format metadata
2. Builds list of files with relevant data based on metadata
3. Scans those files and executes query

#DataEngineering #DataAnalytics #BigData #DataLakehouse #ApacheIceberg #ApacheHudi #DeltaLake

How to Implement Write-Audit-Publish (WAP)

How to implement Write-Audit-Publish (WAP) on Apache Iceberg, Apache Hudi, Delta Lake, Project Nessie, and lakeFS

Git for Data - lakeFS

Get a detailed overview of #DeltaLake, #ApacheHudi, and #ApacheIceberg as we discuss their data storage, processing capabilities, and deployment options https://dzone.com/articles/delta-hudi-and-iceberg-the-data-lakehouse-trifecta

#analytics #spark

Delta, Hudi, and Iceberg: The Data Lakehouse Trifecta - DZone

Get a detailed overview of Delta Lake, Apache Hudi, and Apache Iceberg as we discuss their data storage, processing capabilities, and deployment options.

dzone.com
How Infomedia built a serverless data pipeline with change data capture using AWS Glue and Apache Hudi | Amazon Web Services

This is a guest post co-written with Gowtham Dandu from Infomedia. Infomedia Ltd (ASX:IFM) is a leading global provider of DaaS and SaaS solutions that empowers the data-driven automotive ecosystem. Infomedia’s solutions help OEMs, NSCs, dealerships and 3rd party partners manage the vehicle and customer lifecycle. They are used by over 250,000 industry professionals, across […]

Amazon Web Services

This blog from Onehouse about #ApacheHudi is interesting.

My eye was caught by the chart showing which organisations and companies contribute to the #opensource projects. We all know that DB dominates DL. I wonder if the balance on the other two will stay over time or if Onehouse and Tabular (circled) will start to grow.

https://www.onehouse.ai/blog/apache-hudi-vs-delta-lake-vs-apache-iceberg-lakehouse-feature-comparison

Apache Hudi vs Delta Lake vs Apache Iceberg - Lakehouse Feature Comparison

A thorough comparison of Apache Hudi, Delta Lake, and Apache Iceberg across features, community, and performance benchmarks.

My Medium adventure enters a new phase: the first post for a Medium-held publication, Plumbers of Data Science, just got published :)

It's also more technical than my previous writings. The point is to introduce Apache Hudi in a softer way than the official documentation does at the moment. So, if you're interested in starting with Hudi, look no further :)

#apachehudi #apachespark #dataengineering

https://medium.com/plumbersofdatascience/apache-hudi-copy-on-write-explained-563f1d23d34f

Apache Hudi: Copy-on-Write Explained - Plumbers Of Data Science - Medium

You are responsible for handling batch data updates. Your current Apache Spark solution reads in and overwrites the entire table/partition with each update, even for the slightest change. It sucks…

Plumbers Of Data Science