Hi All! I started posting my data engineering learning journey and thought it would be great share here as well!
🚀 Week 3 of Data Engineering Zoomcamp by DataTalksClub complete! I'm really enjoying how hands on and practical the course is so far!
This week I focused on data warehousing with #Google #BigQuery. Coming from the world of #Microsoft Azure, it was a great experience to get familiar with BigQuery's serverless architecture and how it manages and processes big data at scale. Here's what I learned:
✅️ Created external tables from GCS bucket data sources (CSV/Parquet)
✅️ Use partitioning/clustering to save on cost & enhance speed of processing SQL queries
✅️ Used both #Docker & #Kestra to orchestrate the extraction, transfer, and loading 20+ million NYC taxi data at scale into a GCS bucket
✅️ Understand the advantages of columnar storage and query optimization
Check out my work here: https://github.com/ammartin8/data_engineering_zoom_camp/blob/main/modules/module_3/project_03/README.md
#googlecloud #dataengineering #microsoft #cloud #bigdata #dataanalytics #fedihire #linux #data
Looking for a cheaper or free/self-hosted alternative to #zapier.
Anyone got some direction to point me to?
I’ve seen #n8n and #Automatisch and #kestra and they all seem nice but have varying and very limited integration with other services (the one big area Zapier wins.. but the price is just ridiculous for a private individual)
Spun up an LXC and installed #kestra, an “Open Source, Declarative Orchestration Platform” (https://kestra.io/). The little youtube influencer videos and kestra’s own tutorial videos didn’t indicate how many little feature locks 🔐 were littering the application…
Like there aren’t enough orchestration engines… sheesh.
In meinem neuen Blogartikel zeige ich, wie das Open-Source-Orchestrierungstool Kestra hilft, komplexe Prozesse effizient zu gestalten – YAML-basiert, flexibel und bereit für die Cloud.
🔧 Ideal für DevOps, Datenpipelines oder als AI Agent ersatz.
📘 Jetzt lesen: https://www.marcogriep.de/posts/kestra-orchestrierungs-tool-zur-optimierung-von-workflows/
Datenpipelines müssen nicht nur robust und skalierbar sein, sondern auch einfach zu verwalten und zu warten. Genau hier setzt Kestra an – ein Orchestrierungs-Tool, das die Erstellung und Verwaltung von Workflows vereinfacht.