Mastodawn

🎉 Milestone Unlocked: Finished the Data Engineering Zoomcamp!

In 10 weeks, I moved from scripting to architecting systems. We built real production-grade infrastructure using Spark, Kafka, Airflow, and Kestra—not just hobby projects.

Capstone: A Storage Hard Drive Dashboard using real failure data from Backblaze
Stack: Terraform + Docker infra, Airflow orchestration, dbt modeling, Streamlit viz.

Key Lessons:
✅️ "It works on my laptop" isn't a strategy.
✅ Need IaC, partitioning, clustering, and strict error handling.
✅ dbt ensures reproducible, tested models.
✅ Infra is invisible work—if it breaks, your code fails.

Take the leap! It’s challenging but by week 10, pieces click into place. Seeing my pipeline run autonomously felt like crossing the finish line. 🏁

Thanks Data Talks Club team! On to the next challenge!

My project: https://github.com/ammartin8/hard_drive_analytics_dashboard

#mastodon #fediverse #data #spark #dataengineering #ai #technology #datatools #datapipelines #fedihire #thursday #sql #observability #etl #python #github

amah_codes Mar 11

Diving deep into Spark batch processing!⚡️

Learned how to:
✅ Optimize data pipelines with filtering, repartitioning & grouping
✅ Design efficient ETL pipelines with Spark
✅ Understanding when and how to use partitioning strategies
✅ Use Google Cloud Storage (GCS) as a data source for Spark applications and configuring Spark to read Parquet or other formats from GCS
✅ Visualize execution plans for efficient coding
✅ Review the Spark UI for performance monitoring

💡 Key takeaway: One thing that amazes me about distributed computing is how we've transformed from struggling with massive datasets to generating insights in near real-time. As an analyst who has dealt with long wait times in processing data, spark saves so much time in getting results faster and make data-driven decisions more quickly.

Review my work here: https://github.com/ammartin8/data_engineering_zoom_camp/blob/main/modules/module_6/project_06/README.md

#mastodon #fediverse #data #spark #dataengineering #ai #technology #opensource #datatools #datapipelines #fedihire #wednesday #sql #observability #etl #python

99Tools Mar 8

Need to shrink or extract data quickly?

Explore Compress & Decompress Tools that help you reduce file size, encode data efficiently, and unpack compressed content with ease. Fast, simple utilities designed for developers and anyone working with data online.

🔗 https://99tools.net/category/compress-decompress-tools

#Compression #DeveloperTools #WebDevelopment #CodingTools #DataTools #Programming #DevUtilities #OnlineTools #TechTools #99Tools

PPC Land Mar 4

FYI: 3 days left: Amazon is locking the door on the data tools sellers depended on: With Amazon's BSA Agent Policy taking effect in 3 days, an industry expert argues the review cap and new rules are closing the third-party AI data pipeline for good. https://ppc.land/3-days-left-amazon-is-locking-the-door-on-the-data-tools-sellers-depended-on/ #Amazon #DataTools #Ecommerce #BSAAgentPolicy #AIdatapipeline

3 days left: Amazon is locking the door on the data tools sellers depended on

With Amazon's BSA Agent Policy taking effect in 3 days, an industry expert argues the review cap and new rules are closing the third-party AI data pipeline for good.

PPC Land

amah_codes Mar 2

Just completed a project building an end-to-end data pipeline for NYC taxi data using dlt 🚕📊! What a ride! 😅 The REST API extraction was particularly fun (in a challenging way) but dlt's modular design made it manageable. Here’s what I learned:

✅ Full life cycle: From REST API extraction to DuckDB loading, all in one framework
✅ Reproducibility: Tracked every transformation with dlt's lineage features
✅ Modular design: Defined reusable components for extracting and normalizing data
✅ Handles complexity: Seamlessly handled pagination from the API
Big takeaway: dlt isn't just tooling, it's a framework for thinking about data pipelines that emphasizes transparency and reproducibility which is essential for any modern data stack

#dlt #dataengineering #datapiplines #etl #fediverse #mastodon #opensource #oss #ai #data #linux #technology #duckdb #datatools

PPC Land Mar 2

ICYMI: 3 days left: Amazon is locking the door on the data tools sellers depended on: With Amazon's BSA Agent Policy taking effect in 3 days, an industry expert argues the review cap and new rules are closing the third-party AI data pipeline for good. https://ppc.land/3-days-left-amazon-is-locking-the-door-on-the-data-tools-sellers-depended-on/ #Amazon #DataTools #BSAAgentPolicy #ThirdParty #AIData

3 days left: Amazon is locking the door on the data tools sellers depended on

With Amazon's BSA Agent Policy taking effect in 3 days, an industry expert argues the review cap and new rules are closing the third-party AI data pipeline for good.

PPC Land

Reel Tubes Mar 2

Discover RQLBrowser from Logilab — a sleek way to explore and test RQL queries! Great demo for devs, data folks, and curious tinkerers. Check out the UI, examples, and tips to speed up your data workflow. #RQL #RQLBrowser #Logilab #DataTools #DevTools #FOSS #PeerTube #English
https://peertube.logilab.fr/videos/watch/7b487426-83a9-435f-b23c-cbdb93dce788

RQLBrowser

PeerTube

Reddit Tech VN Bot Jan 30

🛠️ Đã ra mắt CODED FLOWS – công cụ kéo‑thả Python trực quan, biến code AI thành các block tái sử dụng. Người dùng có thể kéo‑thả, chia sẻ package (ML, query DB, chuẩn dữ liệu) và xuất ra script Python. Giúp giảm việc viết lại mã lặp lại trong các dự án nội bộ. #CodedFlows #AI #Python #NoCode #Automation #CôngCụ #LậpTrình #DataTools

https://www.reddit.com/r/SideProject/comments/1qr6qna/i_built_a_visual_python_tool_that_turns_ai/

Python Job Support Jan 17

Building a Simple Tech Stack: Avoid Costly Mistakes and Boost Efficiency! #data #datatools

How to should you choose be best data tools to make an efficient stack? Keep it simple! source

https://quadexcel.com/wp/building-a-simple-tech-stack-avoid-costly-mistakes-and-boost-efficiency-data-datatools/

Building a Simple Tech Stack: Avoid Costly Mistakes and Boost Efficiency! #data #datatools - QuadExcel.com

How to should you choose be best data tools to make an efficient stack? Keep it simple! source

QuadExcel.com

DataFormatHub Jan 13

📰 Open Source 2026: Why OpenTofu, Biome, and Ollama Change Everything

Explore the seismic shifts in open source for 2026. From Rust-powered toolchains to local LLMs, discover why OpenTofu, Biome, and Ollama are the new standards.

#opensource #datatools #github #news

🌍 Also in: 🇪🇸 🇫🇷 🇩🇪 🇧🇷 🇮🇹

🔗 https://dataformathub.com/blog/open-source-2026-why-opentofu-biome-and-ollama-change-everything-lwu?utm_source=mastodon&utm_medium=social&utm_campaign=blog

Open Source 2026: Why OpenTofu, Biome, and Ollama Change Everything

Explore the seismic shifts in open source for 2026. From Rust-powered toolchains to local LLMs, discover why OpenTofu, Biome, and Ollama are the new standards.