⚡ Module 6 of Data Engineering Zoomcamp done!

- Batch processing with Spark 🔥
- PySpark & DataFrames
- Parquet file optimization
- Spark UI on port 4040

My solution: https://github.com/tnotstar/data-engineering-zoomcamp-2026-06-batch

Free course by @DataTalksClub: https://github.com/DataTalksClub/data-engineering-zoomcamp

#dezoomcamp

Module 5 of Data Engineering Zoomcamp done! 🫡

- Data Platforms with Bruin
- End-to-end ELT pipelines
- Data quality & lineage
- Deployment to BigQuery

Free course by @DataTalksClub: https://github.com/DataTalksClub/data-engineering-zoomcamp/

#dezoomcamp

GitHub - DataTalksClub/data-engineering-zoomcamp: Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Join the course here 👇🏼

Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Join the course here 👇🏼 - DataTalksClub/data-engineering-zoomcamp

GitHub

Just wrapped up Module 4 of the #dezoomcamp with @DataTalksClub!

Dived deep into Analytics Engineering with dbt:
* Built production-ready models for NYC taxi data
* Managed lineage for 43M+ FHV records
* Wrote data tests to catch schema drifts

The "T" in ELT is powerful! 🚀

#DataEngineering #dbt #learninginpublic

What a hard day!! Now submitting my first homework for #dezoomcamp by @DataTalksClub 💯
Reviewed another project as part of the project peer review in #DEZoomcamp @DataTalksClub. While the project is appreciated, it is felt that there is scope for improvement in more detailed documentation in a structured way and partitioning and clustering of the tables could have been pursued.
Reviewed Christian Ichebi's project as part of the project peer review in #DEZoomcamp @DataTalksClub. The project pertains to a data pipeline that ends in having transformed data in a Azure SQL Server data platform and sending email alerts on full pipeline success. While the project is appreciated, it is felt that there is scope for improvement in more detailed documentation in a structured way. Also the analytics/dashboard part could have been pursed for fulfilment of the project criterion.

📊 Final product: 3 dashboards in Looker Studio with key insights on SF bike usage in 2023–2024.

From messy CSVs to visual stories — loving this data journey 🚴‍♀️

#dezoomcamp #DataTalksClub #dataengineering

🔍 Project Goals:
• Avg trip time & distance
• Most common bike type
• Most active user type
• Peak ride hours
• Most popular stations

Happy to say: mission accomplished ✅

#dezoomcamp #DataTalksClub #dataengineering

📦 Raw bike trip data and Bay Area counties were loaded into GCS, transformed with dbt, and stored in BigQuery.

Every piece automated with Kestra flows and IaC with Terraform 💪

#dezoomcamp #DataTalksClub #dataengineering

⚡ E-bikes are on the rise in SF!

This project revealed fascinating insights about how different users move around the city on bikes.

Infrastructure: Terraform
Orchestration: Kestra
Transformations: dbt
Warehouse: BigQuery

#dezoomcamp #DataTalksClub #dataengineering