A modern data pipeline leveraging dbt, Airflow, and Snowflake to transform TPC-H sample data into analytics-ready models.
This project demonstrates a production-grade data transformation pipeline that:
- Sources TPC-H sample data from Snowflake
- Transforms raw data through staging and intermediate models
- Creates final fact tables for analytics
- Implements data quality tests
- Orchestrates the entire workflow using Airflow and Cosmos
The project follows a layered transformation approach:
stg_tpch_orders: Standardizes orders datastg_tpch_line_items: Processes line item details
int_order_items: Combines orders with line itemsint_order_items_summary: Aggregates order metrics
fct_orders: Final fact table with order details and metrics
The project repository is organized as follows:
/Snowflake-Airflow-Date-Pipeline-Project
│
├── dags/ # Airflow DAGs
│ └── tpch_dag.py # Main DAG definition
│
├── dbt/ # dbt project directory
│ ├── models/ # dbt models
│ │ ├── staging/ # Staging models
│ │ ├── intermediate/ # Intermediate models
│ │ └── marts/ # Mart models
│ └── dbt_project.yml # dbt project configuration
│
├── docker/ # Docker setup files
│ └── Dockerfile # Dockerfile for the project
│
├── readme_photos/ # Images for README
│ └── DAG.png # Pipeline flow image
│
└── README.md # Project README file
This structure ensures a clear separation of concerns, making it easier to manage and scale the project.
To set up the project, follow these steps:
-
Clone the repository:
git clone https://github.com/yourusername/Snowflake-Airflow-Date-Pipeline-Project.git cd Snowflake-Airflow-Date-Pipeline-Project -
Initialize the Astro project:
astro dev init
-
Start the Airflow environment:
astro dev start
-
Set up dbt profiles:
- Create a
profiles.ymlfile in the~/.dbt/directory with your Snowflake credentials.
- Create a
-
Install dbt dependencies:
astro dev run dbt deps
-
Run dbt seed to load seed data:
astro dev run dbt seed
-
Run dbt models:
astro dev run dbt run
Access Airflow UI at http://localhost:8080
- Enable DAG
dbt_dag - Trigger DAG manually or wait for schedule
- Test dbt models:
dbt test --profiles-dir /usr/local/airflow/include/dbt/
