Skip to content

siddharth277/Ensemble-weather-prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🌤️ Delhi Weather Predictor — ML Pipeline

End-to-end machine learning pipeline for forecasting Delhi's daily mean temperature using XGBoost, LightGBM, Bidirectional LSTM, ARIMA, SARIMA, and a tuned Ensemble — served via a Streamlit web app.

Python XGBoost LightGBM TensorFlow Streamlit


📁 Project Structure

ai_ml_wheather_prediction-main/
│
├── app/
│   └── main.py                          ← Streamlit web app (all models)
│
├── data/
│   ├── raw/
│   │   ├── Train.csv                    ← Raw training data (2013–2016)
│   │   └── Test.csv                     ← Raw test data (2017)
│   ├── processed/                       ← Auto-generated cleaned CSVs
│   └── predictions/                     ← Per-model prediction CSVs
│
├── models/                              ← Saved model files (generated by notebooks)
│   ├── xgboost_model.pkl
│   ├── lightgbm_model.pkl
│   ├── lstm_model.keras
│   ├── lstm_scaler.pkl                  ← MinMaxScaler fitted on LSTM training data
│   ├── lstm_features.pkl                ← Feature column list for LSTM inference
│   ├── arima_model.pkl
│   ├── sarima_model.pkl
│   ├── ensemble_weights.pkl
│   └── feature_meta.pkl
│
├── notebooks/
│   ├── 02_eda_cleaning.py               ← EDA, outlier removal, clean CSV export
│   ├── 03_feature_engineering.py        ← 40+ lag, rolling, time, interaction features
│   ├── 03_feature_engineering_baseline.py
│   ├── 04_model_train_evaluate.py       ← XGBoost + LightGBM training + SHAP
│   ├── 04_model_train_baseline.py
│   ├── 05_lstm_model.py                 ← BiLSTM + MultiHeadAttention training
│   ├── 05_arima_model.py                ← ARIMA + SARIMA training
│   └── 06_ensemble.py                   ← Combine all predictions, evaluate ensemble
│
├── reports/                             ← Auto-generated plots and figures
│   ├── figure/
│   └── shap_plots/
│
├── src/                                 ← Reserved for utility modules
├── RUN_IN_COLAB.ipynb                   ← ⭐ Open this in Google Colab to train models
├── requirements.txt
├── .gitignore
└── README.md

⚙️ Pipeline Overview

Run the notebooks in order in Google Colab before launching the app.

Step File What it does
1 02_eda_cleaning.py Load raw CSVs, fix outliers, impute missing values, save cleaned data
2 03_feature_engineering.py Create 40+ lag, rolling, EMA, seasonal, and cross-features
3 04_model_train_evaluate.py Train XGBoost + LightGBM, generate SHAP plots, save models
4 05_lstm_model.py Train Bidirectional LSTM + MultiHeadAttention, save model + scaler + feature list
5 05_arima_model.py Fit ARIMA and SARIMA on temperature series, save models
6 06_ensemble.py Merge all predictions, compute weighted ensemble, evaluate all models
7 app/main.py Run the Streamlit app

🚀 Running the App

pip install -r requirements.txt
streamlit run app/main.py

Make sure all model files are present in models/ before launching.


🧠 Model Architecture

XGBoost & LightGBM

  • Trained on 40+ engineered features: lag temperatures, rolling means/std, EMA, heat index, pressure delta, seasonal cyclical encodings
  • Feature importance analysed with SHAP

LSTM — Bidirectional + Multi-Head Attention

  • Architecture: BiLSTM(256) → BiLSTM(128) + Residual → MultiHeadAttention(4 heads) + LSTM(64) → Dense(128→64→32→1)
  • Input: 30-day sliding window of 40+ features (velocity, momentum, z-score, interaction terms)
  • Trained with Huber loss, Adam optimizer, EarlyStopping (patience=35), ReduceLROnPlateau
  • Important: lstm_scaler.pkl and lstm_features.pkl are saved alongside the model and must be present for correct inference

ARIMA & SARIMA

  • Statistical time-series models fitted on the training temperature series
  • Used for one-step-ahead forecasting appended to the observed series

Ensemble

  • Tuned manual weights: XGBoost 35% · LightGBM 35% · LSTM 15% · ARIMA 8% · SARIMA 7%
  • Weights auto-renormalise if any model file is missing


🖥️ App Features

  • Live sliders for today's temperature, humidity, wind speed, and pressure
  • Lag inputs for yesterday's and 2-days-ago temperature
  • Individual prediction cards for all 5 models + ensemble
  • Bar chart comparison across all models
  • Summary table with RMSE and R² reference values
  • Graceful degradation — ensemble still runs if optional models (LSTM/ARIMA/SARIMA) are not loaded

📦 Requirements

See requirements.txt. Key dependencies:

streamlit
pandas
numpy
scikit-learn
xgboost
lightgbm
tensorflow
statsmodels
joblib
matplotlib
shap

📝 Dataset

  • Source: Delhi Climate Data 2013–2017
  • Target: meantemp — daily mean temperature in °C
  • Features: humidity, wind_speed, meanpressure + 40+ engineered features
  • Train: 2013–2016 | Test: 2017 (114 rows)

👤 Author

Divyansh Prakash | GitHub: @DivyanshPrakashIIT

Siddharth Shukla | GitHub: @Siddharth

About

End-to-end machine learning pipeline for forecasting Delhi's daily mean temperature using XGBoost, LightGBM, Bidirectional LSTM, ARIMA, SARIMA, and a tuned Ensemble — served via a Streamlit web app.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors