Rini M

@rm8989
0 Followers
0 Following
10 Posts
Built an end-to-end data pipeline using GCP, Airflow, PySpark, and BigQuery to analyze thermal anomaly data (India 🇮🇳 vs USA 🇺🇸).
Uncovered patterns in fire frequency, intensity, and seasonality through interactive dashboards.
#DataEngineering #GCP #Airflow #BigQuery #PySpark

Homework Objective

The original pipelines processed NYC Taxi data for **2019 and 2020**.
The task was to **extend the existing flows to include data for 2021**, specifically:

- **January 2021 – July 2021**
- **Both Yellow and Green Taxi datasets*

Data Engineering Zoomcamp 2026 – Module 2 Homework The focus of this module is workflow orchestration using **Kestra**, including:
- Building parameterized ETL pipelines
- Using variables and templating
- Scheduling workflows
- Backfilling historical data
- Extending pipelines
My homework git repository link https://t.co/cDC511aeXg
GitHub - rinimondalgit/de-zoomcamp-hw1: Data Engineering Zoomcamp 2026 - Module 1 Homework

Data Engineering Zoomcamp 2026 - Module 1 Homework - rinimondalgit/de-zoomcamp-hw1

GitHub

I am pursuing Data- Engineering Zoomcamp from Datat talk club : what I learned in the first week

The homework covers:
\- Running Docker containers
\- Setting up PostgreSQL and pgAdmin with Docker Compose
\- Loading NYC Green Taxi data into Postgres
\- Writing SQL queries to answer analytical questions
\- Basic Terraform workflow concepts

https://www.instagram.com/p/DQicxPhjWzV/?utm_source=ig_web_copy_link In this chapter of the ML Zoomcamp by DataTalks.Club (led by Alexey Grigorev), we dived into Decision Trees and Ensemble Learning—two core components in supervised machine learning that offer high interpretability and flexibility. This chapter addresses decision trees, their structure, splitting methods, as well as ensemble techniques like bagging, boosting, and stacking to improve model performance. Notable briefings on the same are as follows:
Rini Mondal on Instagram: "In this chapter of the ML Zoomcamp by DataTalks.Club (led by Alexey Grigorev), we dived into Decision Trees and Ensemble Learning—two core components in supervised machine learning that offer high interpretability and flexibility. This chapter addresses decision trees, their structure, splitting methods, as well as ensemble techniques like bagging, boosting, and stacking to improve model performance. Notable briefings on the same are as follows: Decision Trees: Core Concepts and Learning In this section, the course covers decision trees as intuitive, rule-based algorithms that are effective yet prone to overfitting on complex datasets. Key topics include: Splitting Criteria: Decision trees divide data by optimizing splits to minimize classification error. Concepts like "impurity" are introduced, helping learners understand how criteria such as Gini impurity and entropy guide the algorithm in choosing splits that reduce classification mistakes. Overfitting risks are discussed, particularly with deep trees that may learn too much noise from training data. Hyperparameters Tuning: Overfitting risks are addressed through hyperparameters like max_depth and min_samples_split, which limit the tree’s depth or require a minimum number of data points to create a split. This control helps maintain model generalizability. Random Forests: Reducing Variance with Bagging Reduce Variance: By training multiple trees on bootstrapped samples and averaging their predictions, Random Forests minimize the variance seen in individual decision trees. Each tree votes, and the most common prediction is taken as the final output. Feature Randomization: Not only are data samples randomized, but each split only considers a random subset of features, reducing correlation among trees and further lowering overfitting risks. Hyperparameters Tuning: Important parameters include n_estimators (number of trees) and max_features (maximum features per split). Tuning these parameters helps balance model performance and computational cost, which is demonstrated through hands-on coding examples in Python. Boosting: Correcting Weak Learners Boosting techniques improve model accuracy by correcting"

0 likes, 0 comments - riniplantmom on November 1, 2025: "In this chapter of the ML Zoomcamp by DataTalks.Club (led by Alexey Grigorev), we dived into Decision Trees and Ensemble Learning—two core components in supervised machine learning that offer high interpretability and flexibility. This chapter addresses decision trees, their structure, splitting methods, as well as ensemble techniques like bagging, boosting, and stacking to improve model performance. Notable briefings on the same are as follows: Decision Trees: Core Concepts and Learning In this section, the course covers decision trees as intuitive, rule-based algorithms that are effective yet prone to overfitting on complex datasets. Key topics include: Splitting Criteria: Decision trees divide data by optimizing splits to minimize classification error. Concepts like "impurity" are introduced, helping learners understand how criteria such as Gini impurity and entropy guide the algorithm in choosing splits that reduce classification mistakes. Overfitting risks are discussed, particularly with deep trees that may learn too much noise from training data. Hyperparameters Tuning: Overfitting risks are addressed through hyperparameters like max_depth and min_samples_split, which limit the tree’s depth or require a minimum number of data points to create a split. This control helps maintain model generalizability. Random Forests: Reducing Variance with Bagging Reduce Variance: By training multiple trees on bootstrapped samples and averaging their predictions, Random Forests minimize the variance seen in individual decision trees. Each tree votes, and the most common prediction is taken as the final output. Feature Randomization: Not only are data samples randomized, but each split only considers a random subset of features, reducing correlation among trees and further lowering overfitting risks. Hyperparameters Tuning: Important parameters include n_estimators (number of trees) and max_features (maximum features per split). Tuning these parameters helps balance model performance and computational cost, which is demonstrated through hands-on coding examples in Python. Boosting: Correcting Weak Learners Boosting techniques improve model accuracy by correcting".

Instagram
Orchestrating ML projects has never been easier. At the #MachineLearningZoomcamp, you’ll learn to integrate Docker and Pipenv into your workflow. Sign up now!
🛠️ Enhance your orchestration skills with Docker and Pipenv at the upcoming #MachineLearningZoomcamp. Build, test, and deploy with confidence!
Want to take your Machine Learning projects to the next level? Join the #MachineLearningZoomcamp and learn how to orchestrate with Docker and Pipenv. We’re waiting for you! 📦📚
📊 Orchestration is key for effective ML. At the #MachineLearningZoomcamp, we’ll explore how Docker and Pipenv help you achieve that. Don’t miss out! 🌟💻