πŸš€ Just finished my #DEZoomcamp project! I built an end-to-end pipeline to process population frequencies from the European Variation Archive (EVA).

The Stack:
πŸ› οΈ Orchestration: #Bruin (Asset-based & lightweight)
πŸ“₯ Ingestion: On-the-fly Python filtering
⚑ DWH: #ManticoreSearch for sub-second variant lookups
πŸ“Š UI: #Gradio dashboard
🐳 Env: #Docker & Codespaces & Cloud Run

Efficiency > Big Budgets. 🧬

πŸ”— https://github.com/tnotstar/data-engineering-zoomcamp-2026-project-attempt-1

#DataEngineering #Python #OpenSource #LearningInPublic #DataTalksClub

GitHub - tnotstar/data-engineering-zoomcamp-2026-project-attempt-1: First project attempt for Data Engineering Zoomcamp

First project attempt for Data Engineering Zoomcamp - tnotstar/data-engineering-zoomcamp-2026-project-attempt-1

GitHub
πŸ§ͺ Phase 7: Integration Testing
The final touch: integration testing for all pipelines! Ensuring smooth functionality between training, deployment, and monitoring 🚦. #MLOpsZoomcamp #DataTalksClub
πŸ“Š Phase 6: Model Monitoring
Set up the monitoring pipeline for detecting data and model drift with Evidently πŸ”Ž. Grafana dashboards are live, and I’m tracking model performance in real time! #MLOpsZoomcamp #DataTalksClub
πŸš€ Phase 5: Deploying the Model
Deploying the model using BentoML πŸ§‘β€πŸ’». Serving it as a scalable API in Docker for production. Everything is automated and ready to go live! #MLOpsZoomcamp #DataTalksClub
βš™οΈ Phase 4: Model Training Pipeline
Finalizing the training pipeline for my optimized model πŸ…. It’s now tracked in MLflow and promoted to production! Let’s prepare it for deployment. #MLOpsZoomcamp #DataTalksClub
πŸ› οΈ Phase 3: Tech Stack Setup
All set up with ZenML, MLflow, Optuna, and Docker-Compose for this MLOps project 🧰. Now the integration begins for seamless pipeline orchestration and experiment tracking! #MLOpsZoomcamp #DataTalksClub
πŸ”§ Phase 2: Model Training & HPO
Feature engineering and model training in full swing! Using XGBRegressor for bike trip predictions 🚴 and optimizing with Optuna to get the best-performing model. #MLOpsZoomcamp #DataTalksClub
🌍 Phase 1: Data Exploration
Kicking off my MLOps journey by exploring CitiBike and weather data in NYC 🌧️. Performing some EDA and cleaning the dataset to build the foundation for my prediction model of bike trips. #MLOpsZoomcamp #DataTalksClub
Putting everything to the test by evaluating a dataset using hit rate, MRR, and different approaches like Minsearch and Qdrant. πŸ§ͺ Can't wait to see how they perform! πŸ“Š #LLMZoomcamp #DataTalksClub
Experimenting with Qdrant to perform efficient vector searches. πŸ”Ž It’s a game-changer for building fast and accurate search pipelines in LLMs. ⚑ Time to compare this approach with others! #LLMZoomcamp #DataTalksClub