Webinair Dataviz et Logiciels Libres

Webinair Dataviz et Logiciels Libres

#Uber’s HiveSync team optimized Hadoop Distcp for multi-petabyte replication across hybrid cloud and on-prem data lakes.
✅ Task parallelization
✅ Uber jobs for small transfers
✅ Improved observability
Result: 5× replication capacity & seamless on-prem-to-cloud migration.
Read more: https://bit.ly/4bwUUFt
#InfoQ #SoftwareArchitecture #DistributedSystems #Observability #DataLake
O que é Data Lake e Data Warehouse? Saiba a diferença entre os repositórios de dados
Most ML issues are not model problems. They are data problems.
I retrained the same churn model twice.
Same code. Same path to the data.
Different result.
Why? Because of mutable data references.
I wrote a small Data Lake vs Data Lakehouse demo showing why versioned data makes ML debugging reproducible: https://tinyurl.com/lake-vs-lakehouse-medium
#ai #machinelearning #data #lakehouse #warehouse #python #datalake #technology #regression
Процедурное SQL-расширение в Lakehouse-платформе – новые возможности для работы с данными
Вас приветствует команда Data Sapience, и в сегодняшней публикации мы расскажем о реализации процедурного расширения для работы с MPP-движками Lakehouse-платформы данных Data Ocean Nova, которое стало доступным для пользователей. В материале пойдет речь о возможностях, применимости и сценариях использования процедурного языка в аналитической платформе данных и примеры реализации решения типовых задач.
https://habr.com/ru/companies/datasapience/articles/987006/
#lakehouse #impala #starrocks #bigdata #dwh #datalakehouse #datalake #bi
Shifting Left delivers clean, reliable, and accessible data to everyone who needs it - right when they need it.
The result? Less complexity, lower overhead, and far less break-fix work, freeing teams to focus on higher-value problems.
At the core of a #ShiftLeft strategy are Data Products. They form the backbone of healthy data communication and ensure quality is built in - not patched on later.
📖 Great insights from this #InfoQ article on rethinking the Medallion Architecture: https://bit.ly/3WHjxsf
#SoftwareArchitecture #DataMesh #DataEngineering #DataLake #DataPipelines
via #Microsoft : Microsoft announces acquisition of Osmos to accelerate autonomous data engineering in Fabric
https://ift.tt/MpyJ38g
#Microsoft #Osmos #DataEngineering #AI #AutonomousAI #MicrosoftFabric #DataAnalytics #DataWorkflows #DataIntegration #BigData #DataLake #OneLak…

Today, Microsoft is announcing the acquisition of Osmos, an agentic AI data engineering platform designed to help simplify complex and time-consuming data workflows. Microsoft + Osmos: Extending Microsoft Fabric with agentic AI for data engineering Organizations today face a common challenge: data is everywhere, but making it actionable is often manual, slow and expensive. Many...
Data lakes are typically thought of as simple warehouses. But they don't have to be! 👀 In Graylog 7.0 data lakes function as pressure release valves for #security teams overwhelmed by storage costs, investigation delays, and cloud data sprawl — where analysts can get direct access to long term data, and more.
Our data lake provides inexpensive storage where logs stay searchable, preview-able, and recoverable. Learn more about getting cloud scale without cloud surprises, and why this is a truly practical stance on managing data volume.
https://graylog.org/post/how-to-use-data-lakes-to-reduce-siem-costs-and-strengthen-investigations/ #CyberSecurity #SEIM #DataLake #TDIR