Skip to content

haziqishere/Snowflake-and-Airflow-Data-Pipeline-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Snowflake Data Pipeline Project

A modern data pipeline leveraging dbt, Airflow, and Snowflake to transform TPC-H sample data into analytics-ready models.

Pipeline Flow

Tech Stack

Snowflake Apache Airflow dbt Docker Python

Project Overview

This project demonstrates a production-grade data transformation pipeline that:

  • Sources TPC-H sample data from Snowflake
  • Transforms raw data through staging and intermediate models
  • Creates final fact tables for analytics
  • Implements data quality tests
  • Orchestrates the entire workflow using Airflow and Cosmos

Data Models

The project follows a layered transformation approach:

Staging Layer

  • stg_tpch_orders: Standardizes orders data
  • stg_tpch_line_items: Processes line item details

Intermediate Layer

  • int_order_items: Combines orders with line items
  • int_order_items_summary: Aggregates order metrics

Mart Layer

  • fct_orders: Final fact table with order details and metrics

Project Structure

The project repository is organized as follows:

/Snowflake-Airflow-Date-Pipeline-Project
│
├── dags/                   # Airflow DAGs
│   └── tpch_dag.py         # Main DAG definition
│
├── dbt/                    # dbt project directory
│   ├── models/             # dbt models
│   │   ├── staging/        # Staging models
│   │   ├── intermediate/   # Intermediate models
│   │   └── marts/          # Mart models
│   └── dbt_project.yml     # dbt project configuration
│
├── docker/                 # Docker setup files
│   └── Dockerfile          # Dockerfile for the project
│
├── readme_photos/          # Images for README
│   └── DAG.png             # Pipeline flow image
│
└── README.md               # Project README file

This structure ensures a clear separation of concerns, making it easier to manage and scale the project.

Setup

To set up the project, follow these steps:

  1. Clone the repository:

    git clone https://github.com/yourusername/Snowflake-Airflow-Date-Pipeline-Project.git
    cd Snowflake-Airflow-Date-Pipeline-Project
  2. Initialize the Astro project:

    astro dev init
  3. Start the Airflow environment:

    astro dev start
  4. Set up dbt profiles:

    • Create a profiles.yml file in the ~/.dbt/ directory with your Snowflake credentials.
  5. Install dbt dependencies:

    astro dev run dbt deps
  6. Run dbt seed to load seed data:

    astro dev run dbt seed
  7. Run dbt models:

    astro dev run dbt run
    
    

Usage

Access Airflow UI at http://localhost:8080

Running the Pipeline

  • Enable DAG dbt_dag
  • Trigger DAG manually or wait for schedule

Local Development

  • Test dbt models:
    dbt test --profiles-dir /usr/local/airflow/include/dbt/

About

Dats Engineering Tutorial Project that uses Snowflake, dbt and Airflow

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published