Webinair Dataviz et Logiciels Libres

https://peertube.aukfood.net/w/vEjUHGWciWp2iHiD82a2c6

Webinair Dataviz et Logiciels Libres

PeerTube
Stop the "Small File Syndrome" in your Data Lake. Learn how to implement Compaction, Z-Ordering, and automated maintenance in Databricks and Snowflake. https://hackernoon.com/the-silent-killer-of-data-lakes-solving-the-small-file-problem #datalake
The Silent Killer of Data Lakes: Solving the Small File Problem | HackerNoon

Stop the "Small File Syndrome" in your Data Lake. Learn how to implement Compaction, Z-Ordering, and automated maintenance in Databricks and Snowflake.

#Uber’s HiveSync team optimized Hadoop Distcp for multi-petabyte replication across hybrid cloud and on-prem data lakes.

✅ Task parallelization
✅ Uber jobs for small transfers
✅ Improved observability

Result: 5× replication capacity & seamless on-prem-to-cloud migration.

Read more: https://bit.ly/4bwUUFt

#InfoQ #SoftwareArchitecture #DistributedSystems #Observability #DataLake

Most ML issues are not model problems. They are data problems.

I retrained the same churn model twice.
Same code. Same path to the data.
Different result.

Why? Because of mutable data references.

 I wrote a small Data Lake vs Data Lakehouse demo showing why versioned data makes ML debugging reproducible: https://tinyurl.com/lake-vs-lakehouse-medium

 Friend-Link: https://medium.com/towards-artificial-intelligence/from-data-lake-to-data-lakehouse-why-ai-changes-the-rules-for-data-platforms-c78feab48e1c?sk=405811cbc10baa4622bcfcad90736ed4

#ai #machinelearning #data #lakehouse #warehouse #python #datalake #technology #regression

Процедурное SQL-расширение в Lakehouse-платформе – новые возможности для работы с данными

Вас приветствует команда Data Sapience, и в сегодняшней публикации мы расскажем о реализации процедурного расширения для работы с MPP-движками Lakehouse-платформы данных Data Ocean Nova, которое стало доступным для пользователей. В материале пойдет речь о возможностях, применимости и сценариях использования процедурного языка в аналитической платформе данных и примеры реализации решения типовых задач.

https://habr.com/ru/companies/datasapience/articles/987006/

#lakehouse #impala #starrocks #bigdata #dwh #datalakehouse #datalake #bi

Процедурное SQL-расширение в Lakehouse-платформе – новые возможности для работы с данными

Вас приветствует команда Data Sapience, и в сегодняшней публикации мы расскажем о реализации процедурного расширения для работы с MPP-движками Lakehouse-платформы данных Data Ocean Nova, которое стало...

Хабр

Shifting Left delivers clean, reliable, and accessible data to everyone who needs it - right when they need it.

The result? Less complexity, lower overhead, and far less break-fix work, freeing teams to focus on higher-value problems.

At the core of a #ShiftLeft strategy are Data Products. They form the backbone of healthy data communication and ensure quality is built in - not patched on later.

📖 Great insights from this #InfoQ article on rethinking the Medallion Architecture: https://bit.ly/3WHjxsf

#SoftwareArchitecture #DataMesh #DataEngineering #DataLake #DataPipelines

via #Microsoft : Microsoft announces acquisition of Osmos to accelerate autonomous data engineering in Fabric

https://ift.tt/MpyJ38g
#Microsoft #Osmos #DataEngineering #AI #AutonomousAI #MicrosoftFabric #DataAnalytics #DataWorkflows #DataIntegration #BigData #DataLake #OneLak

Microsoft announces acquisition of Osmos to accelerate autonomous data engineering in Fabric - The Official Microsoft Blog

Today, Microsoft is announcing the acquisition of Osmos, an agentic AI data engineering platform designed to help simplify complex and time-consuming data workflows. Microsoft + Osmos: Extending Microsoft Fabric with agentic AI for data engineering Organizations today face a common challenge: data is everywhere, but making it actionable is often manual, slow and expensive. Many...

The Official Microsoft Blog
Data Lake – Wikipedia

Data lakes are typically thought of as simple warehouses. But they don't have to be! 👀 In Graylog 7.0 data lakes function as pressure release valves for #security teams overwhelmed by storage costs, investigation delays, and cloud data sprawl — where analysts can get direct access to long term data, and more.

Our data lake provides inexpensive storage where logs stay searchable, preview-able, and recoverable. Learn more about getting cloud scale without cloud surprises, and why this is a truly practical stance on managing data volume.

https://graylog.org/post/how-to-use-data-lakes-to-reduce-siem-costs-and-strengthen-investigations/ #CyberSecurity #SEIM #DataLake #TDIR