Data-Engineering-Project using Indian Rainfall Data API

An end-to-end data engineering pipeline on Databricks leveraging the publicly available RainFall API. This project covers:

Data Ingestion (Bronze)
Data Processing & Cleaning (Silver)
Data Quality & Delivery (Gold)

Medallion Layers:

Layer	Purpose
Bronze	Ingest raw data from API into Parquet
Silver	Clean, dedupe, enrich; enforce schemas with PySpark
Gold	Spliting the date as Fact & Dimension Table

PROJECT ARCHITECTURE

Phase 1: Bronze Layer (Raw Ingestion)

Sources
- "District-wise Rainfall Distribution" as API Request
- Source from "https://ndap.niti.gov.in/dataset/7319"

Storage
- All raw ingestions stored as Parquet in the rainfall_data/bronze_layer container

Phase 2: Silver (Cleansing & Enrichment)

Transformations
- Split multi-valued columns (e.g., Daily Actual)
- Remove duplicates
- Cast of data types for analytics readiness
Storage
- Cleaned Parquet files in the rainfall_data/silver_layer container

Phase 3: Gold (Quality & Aggregation)

Transformations
- Removed Unncessary columns
- De-normalizing dataset to tables(Fact & Dimension)
- Re-naming columns for ease of use
Output
- Aggregated the tables as rainfall_data.rain_fact_table and workspace.rainfall_data.state_table

Technology Stack

Component	Purpose
Databricks	Spark-based ETL & Delta Live Tables
Unity Catlog	ACID-compliant, performant data format
Python / PySpark	Data transformation logic

Future Improvements

Implementing Concurrency to reduce the time taken for the API calls.
Moving to a External storage instead of Unity Catlog.
Automate the whole process using Databricks job pipeline.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
India Rainfall/Rainfall Data Pipeline		India Rainfall/Rainfall Data Pipeline
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data-Engineering-Project using Indian Rainfall Data API

PROJECT ARCHITECTURE

Phase 1: Bronze Layer (Raw Ingestion)

Phase 2: Silver (Cleansing & Enrichment)

Phase 3: Gold (Quality & Aggregation)

Technology Stack

Future Improvements

About

Uh oh!

Releases

Packages

Languages

DeepakReddy02/Databricks-Data-engineering-project

Folders and files

Latest commit

History

Repository files navigation

Data-Engineering-Project using Indian Rainfall Data API

PROJECT ARCHITECTURE

Phase 1: Bronze Layer (Raw Ingestion)

Phase 2: Silver (Cleansing & Enrichment)

Phase 3: Gold (Quality & Aggregation)

Technology Stack

Future Improvements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages