Mastodawn

Bo...
#APIresponses #BoxLang #BoxLang1140 #bxquery #CFML #ColdFusion #DATAENGINEERING #DATAPIPELINES #datatransformation #databases #Developertools #domainobjects #dynamiclanguage #Java #Javainteroperability #jdbc #JDBCmetadata #JVM #languagefeatures #Lucee #objecthydration #OrtusSolutions #pagination #queries #queryresults #querytransformers #queryExecute #RESTAPIs #resultmapping #SQL #tabulardata
https://foojay.io/today/boxlang-1-14-0-query-transformers-take-full-control-of-your-query-results/

BoxLang 1.14.0 : Query Transformers - Take Full Control of Your Query Results - foojay

BoxLang 1.14.0 ships a lot of exciting features - Dynamic Sets, Ranges, Inner Classes, JSONPath navigation - but one quietly powerful addition will - by Cristobal Escobar

foojay

KillBait News Jun 16

Data Pipeline Architecture: Core Layers, Design Patterns, and Best Practices for Modern Data Systems

📰 Original title: What is data pipeline architecture?

🤖 IA: It's not clickbait ✅
👥 Users: It's not clickbait ✅

View full AI summary https://en.killbait.com/data-pipeline-architecture-core-layers-design-patterns-and-best-practices-for-modern-data-systems.html?utm_source=mastodon_world&utm_medium=social&utm_campaign=killbait.mastodon_world

#computing #datapipelines #dataengineering #lakeh...

Data Pipeline Architecture: Core Layers, Design Patterns, and Best Practices for Modern Data Systems

This article explains data pipeline architecture, the framework that defines how data is collected, processed, stored, and delivered to users, applications, and AI systems. Rather than focusing on a single technology, pipeline architecture describes the overall blueprint governing data movement and transformation. The article distinguishes between logical architecture, which defines pipeline stages and functions, and physical architecture, which specifies the technologies used to implement them. Modern pipelines typically consist of four core layers: ingestion, transformation, storage, and serving. Data can be ingested in batches or as real-time streams, transformed through cleaning and enrichment processes, stored in data lakes, warehouses, or lakehouses, and finally delivered to analysts, business users, machine learning models, or operational applications. The article reviews common architectural patterns including batch, streaming, Lambda, Kappa, and Medallion architectures, highlighting the strengths and trade-offs of each. It also explains the evolution from ETL (Extract, Transform, Load) to ELT (Extract, Load, Transform), noting that modern cloud platforms make it practical to store raw data first and transform it later. Best practices include separating ingestion from transformation, designing idempotent processes, implementing data quality checks, handling schema changes, using open storage formats, maintaining governance controls, and monitoring pipelines end-to-end. The article emphasizes that selecting an architecture should depend on business requirements such as latency, cost, scalability, and reliability. Databricks presents its own platform approach, which combines ingestion, orchestration, storage, governance, and both batch and streaming processing into a unified lakehouse environment.

KillBait

KillBait Jun 16

Data Pipeline Architecture: Core Layers, Design Patterns, and Best Practices for Modern Data Systems

📰 Original title: What is data pipeline architecture?

🤖 IA: It's not clickbait ✅
👥 Users: It's not clickbait ✅

View full AI summary https://en.killbait.com/data-pipeline-architecture-core-layers-design-patterns-and-best-practices-for-modern-data-systems.html?utm_source=mastodon_social&utm_medium=social&utm_campaign=killbait.mastodon_social

#computing #datapipelines #dataengineering #lak...

Data Pipeline Architecture: Core Layers, Design Patterns, and Best Practices for Modern Data Systems

This article explains data pipeline architecture, the framework that defines how data is collected, processed, stored, and delivered to users, applications, and AI systems. Rather than focusing on a single technology, pipeline architecture describes the overall blueprint governing data movement and transformation. The article distinguishes between logical architecture, which defines pipeline stages and functions, and physical architecture, which specifies the technologies used to implement them. Modern pipelines typically consist of four core layers: ingestion, transformation, storage, and serving. Data can be ingested in batches or as real-time streams, transformed through cleaning and enrichment processes, stored in data lakes, warehouses, or lakehouses, and finally delivered to analysts, business users, machine learning models, or operational applications. The article reviews common architectural patterns including batch, streaming, Lambda, Kappa, and Medallion architectures, highlighting the strengths and trade-offs of each. It also explains the evolution from ETL (Extract, Transform, Load) to ELT (Extract, Load, Transform), noting that modern cloud platforms make it practical to store raw data first and transform it later. Best practices include separating ingestion from transformation, designing idempotent processes, implementing data quality checks, handling schema changes, using open storage formats, maintaining governance controls, and monitoring pipelines end-to-end. The article emphasizes that selecting an architecture should depend on business requirements such as latency, cost, scalability, and reliability. Databricks presents its own platform approach, which combines ingestion, orchestration, storage, governance, and both batch and streaming processing into a unified lakehouse environment.

KillBait

InfoQ Jun 15

#Slack replaced SSH-based execution in Amazon EMR pipelines with Quarry - a REST-driven orchestration layer.

The migration covered 700+ Airflow operators, improving security, reliability & observability while removing direct SSH access to production clusters and enabling server-side job lifecycle management.

🔗 Learn more: https://bit.ly/4eJs0V6

#InfoQ #Observability #Migration #DataPipelines #SoftwareArchitecture

Mahmoud Zalt May 29

Lazy Pipelines, Fast Backends digs into how to keep data pipelines easy to write while still hitting serious performance in the backend.

👉 https://zalt.me/blog/2026/05/lazy-pipelines-fast-backends

#datapipelines #backend #performance

InfoQ May 8

#LinkedIn has launched a unified integrations platform to standardize & reconcile hiring data across systems.

• 72% faster onboarding
• Improved data consistency and completeness
• Scalable AI-driven hiring enabled via standardized schemas, orchestration workflows, and centralized data processing

Learn more: https://bit.ly/48KFwof

#SoftwareArchitecture #EvolutionaryArchitecture #DataPipelines #DataAnalytics #InfoQ

InfoQ May 7

#Confluent introduces a new approach in #ApacheKafka that moves schema IDs from message payloads to record headers.

✅ Simplify schema governance & evolution.
✅ Improve compatibility across serialization formats
✅ Reduce coupling between data & metadata in event-driven architectures

Read the deep dive on #InfoQ ⇨ https://bit.ly/4tF7Fot

#ML #EventStreamProcessing #ProtocolBuffers #DataPipelines #DataAnalytics

amah_codes Apr 30

🎉 Milestone Unlocked: Finished the Data Engineering Zoomcamp!

In 10 weeks, I moved from scripting to architecting systems. We built real production-grade infrastructure using Spark, Kafka, Airflow, and Kestra—not just hobby projects.

Capstone: A Storage Hard Drive Dashboard using real failure data from Backblaze
Stack: Terraform + Docker infra, Airflow orchestration, dbt modeling, Streamlit viz.

Key Lessons:
✅️ "It works on my laptop" isn't a strategy.
✅ Need IaC, partitioning, clustering, and strict error handling.
✅ dbt ensures reproducible, tested models.
✅ Infra is invisible work—if it breaks, your code fails.

Take the leap! It’s challenging but by week 10, pieces click into place. Seeing my pipeline run autonomously felt like crossing the finish line. 🏁

Thanks Data Talks Club team! On to the next challenge!

My project: https://github.com/ammartin8/hard_drive_analytics_dashboard

#mastodon #fediverse #data #spark #dataengineering #ai #technology #datatools #datapipelines #fedihire #thursday #sql #observability #etl #python #github