0% found this document useful (0 votes)

71 views3 pages

End-to-End Data Engineering Projects

Arslan Ali is a skilled Data Engineer with extensive experience in data analysis, engineering, and cleaning, particularly in the banking sector, proficient in Python and Power BI. He has worked on various projects involving end-to-end data pipelines, data lake implementations, and ETL processes using Azure technologies, Apache Spark, and machine learning models. Arslan holds a Bachelor of Science in Computer Engineering and relevant certifications, including Databricks Data Engineer Associate Certification.

Uploaded by

Uzair Ali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

71 views3 pages

End-to-End Data Engineering Projects

Uploaded by

Uzair Ali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Arslan Ali

arslanmushtaq4343@[Link] | +923106119450 | [Link]/in/arslanali434343 |

[Link]
Career Objective
Skilled Data Engineer with extensive experience in data analysis, data engineering, and data
cleaning within the banking sector. Proficient in Python and Power BI fordata validation and
transformation. Adept at leveraging cloud-based technologies to enhance data-driven
decision-making.
Skills
• Databricks Certified • Apache Spark • ETL • SSIS • Hadoop • Python • SQL • Data Lake
• Data Warehouse • Lakehouse • Airbyte • Airflow • Jenkins • Git • ML Forecasting Model
• Image Recognition Models • Text Recognition Models • SQL Server • Oracle • Power BI
• Microsoft Azure • AWS •Azure Data Lake Storage • Azure Data Factory •Azure Databricks •Azure Data
Lake Analytics •Azure Active Directory •Azure Key Vault

Experience

Techlogix, Lahore
Software Engineer (Data Engineer)

Project: End-to-End Data Pipeline for Financial Data Processing May 2024 – Present

 Designed an end-to-end data pipeline for processing financial data, orchestrated using Azure Data Factory for
task automation.
 Implemented data ingestion and processing using Azure Databricks with Apache Spark, efficiently handling
large-scale financial datasets in a Docker-containerized environment.
 Enforced security measures for data encryption and access control using Azure Key Vault to ensure compliance
with data governance standards.
 Processed and ingested data using Spark and delivered it to Azure Data Lake Storage for distributed storage and
parallel processing.
 Utilized Azure Data Lake Analytics to analyze the processed data, enabling further analysis and reporting.
 Integrated the database with external systems through Azure API Management, allowing front-end platform
services to fetch data in real-time for business intelligence and customer interaction.
 Leveraged Azure Kubernetes Service (AKS) for managing Docker containers throughout the pipeline, enhancing
portability, scalability, and efficient resource management.
 Tools and Technologies: Azure Data Factory, Azure Databricks, Azure Key Vault, Azure Data Lake Storage, Azure
Data Lake Analytics, Azure API Management, Azure Kubernetes Service (AKS), Docker.

Project: Data Lake Implementation for Scalable Data Processing Jan 2024 – Apr 2024

 Developed a comprehensive Azure Data Lake architecture to support scalable data ingestion and processing
workflows for various data sources, including structured and unstructured data.
 Implemented data ingestion pipelines using Azure Data Factory, enabling seamless movement of data from on-
premises and cloud sources into Azure Data Lake Storage.
 Employed Azure Databricks for data processing, leveraging Apache Spark to transform and analyze large
datasets efficiently.
 Utilized Azure Data Lake Analytics to perform analytics directly on data stored in the Data Lake, allowing for
flexible querying and data manipulation.
 Established data governance and security measures by integrating Azure Active Directory and Azure Key Vault
to control access and manage sensitive information.
 Created a series of data visualizations using Power BI, connecting directly to Azure Data Lake Storage to enable
real-time insights and reporting.
 Tools and Technologies: Azure Data Lake Storage, Azure Data Factory, Azure Databricks, Azure Data Lake
Analytics, Azure Active Directory, Azure Key Vault, Power BI.

Project: Spark Optimization and SQL to Spark Migration with Polar POC Oct 2023 – Dec 2023

 Optimized local Spark jobs by tuning executor memory, shuffle partitions, and parallelism settings for
efficient resource utilization.
 Improved Spark memory management using RDD persistence, broadcast variables, and in-memory
caching strategies.
 Configured key Spark parameters like spark. executor. Memory, [Link],
and [Link]. shuffle. partitions to enhance performance and reduce costs.
 Developed PySpark unit tests using pytest for validation of data transformations and logic integrity.
 Conducted a POC comparing Apache Spark with PolarDB, achieving 50x faster query execution on
Polar for specific workloads.
 Led the migration from SQL to PySpark using DataFrames API and Spark SQL, ensuring optimized
query performance.
 Tools and Technologies: Spark, PolarDB, PySpark, SQL, Unit Testing (pytest), Spark Configuration, RDD
Caching, Memory Management.

Project: Data Management and ETL for Blue Cross Blue Shield (BCBS) Jun 2023 – Sep 2023

 Managed petabyte-scale data for Blue Cross Blue Shield (BCBS), utilizing Databricks for advanced data
processing and analytics.
 Used Stonebranch for workflow orchestration and AWS S3 for secure data storage.
 Designed and implemented a robust ETL pipeline following the Medallion architecture.
 Employed optimization techniques, including Delta operations, to reduce DBU costs.
 Created comprehensive documentation to ensure clarity and ease of understanding for stakeholders
and team members.
 Tools and Technologies: Databricks, Stonebranch, AWS S3, Medallion architecture, Delta operations.

Project: Data Ingestion and Integration for End-to-End ELT Pipeline Jan 2023 – May 2023

 Led integration of heterogeneous data sources, including Oracle DB, CSV, and Excel (~58GB, 77 million
rows) using SQL Server Integration Services (SSIS) to perform ETL operations.
 Utilized Python (Pandas, NumPy) for data cleaning and validation, rectifying discrepancies in banking
data. Jupyter Notebooks were used for validation workflows and exploratory data analysis, along with
SQL-based checks for regulatory compliance.
 Implemented data segmentation through Python scripts, integrated into the ETL process for real-time
processing.
 Established validation checkpoints within the ETL pipeline to ensure data quality pre- and post-
transformation. Performed complex data transformations using aggregations, CTEs, and window
functions for balance calculations.
 Automated validation scripts in Python to ensure ETL execution integrity, performing before-and-after
comparisons for data accuracy. Streamlined the validation process with automated quality checks.
 Tools and Technologies: SSIS, Python, Pandas, NumPy, SQL, Jupyter Notebooks, Oracle DB, CSV, Excel.
Project: Data Modeling for BOP CC and BOP RF for Customer Analysis Sep 2022 – Dec 2022

 Created multiple dashboards for customer analysis using DAX and SQL.
 Modeled measures, applied binnings, and sorting using DAX.
 Worked with Excel and CSV as data sources.
 Used PowerPoint for presentations and analysis summary.
 Tools and Technologies: DAX, SQL, Excel, CSV, PowerPoint.

Project: Feature Risk May 2022 – Aug 2022

 Applied machine learning techniques to develop and enhance forecasting models

 Developed forecasting models including: XGBoost, ARIMA, Facebook Prophet
 Focused on optimization for balanced predictions
 Tools and Technologies: Machine Learning, Python, SQL, Jupiter Notebook

Broadstone (Python Dev), Lahore June 2021 – Dec 2021

 Developed and maintained databases using Oracle Database.
 Created and managed APIs for seamless data integration and extraction.

 Automated data validation and cleaning processes using Python, enhancing data quality for financial
reporting.

Education
Bachelor of Science in Computer Engineering Sep 2018 – June 2022
University of Engineering and Technology, Lahore

Major Courses: Database Systems, Data Mining

Certifications
Databricks Data Engineer Associate Certification Sep 2018 – June 2022

Apache Airflow Fundamentals Jul 2024- Jul 2026

Certification for Apache Airflow Fundamentals can demonstrate the
fundamental skills needed to create, manage and monitor DAGs on
Apache Airflow effectively.

Data Engineering Expertise Overview
No ratings yet
Data Engineering Expertise Overview
2 pages
Data Engineer Resume: Big Data & ETL Expertise
No ratings yet
Data Engineer Resume: Big Data & ETL Expertise
4 pages
Senior Data Engineer Profile: Kishore Reddy
No ratings yet
Senior Data Engineer Profile: Kishore Reddy
6 pages
Data Engineer with 8+ Years Experience
No ratings yet
Data Engineer with 8+ Years Experience
4 pages
Azure Data Engineer & Databricks Expert
No ratings yet
Azure Data Engineer & Databricks Expert
8 pages
Azure Data Engineer Resume Summary
No ratings yet
Azure Data Engineer Resume Summary
2 pages
Data Engineer Resume and Skills Overview
No ratings yet
Data Engineer Resume and Skills Overview
7 pages
Azure Big Data Analytics Expertise
No ratings yet
Azure Big Data Analytics Expertise
1 page
Data Engineer Resume: Big Data & Cloud Expertise
No ratings yet
Data Engineer Resume: Big Data & Cloud Expertise
4 pages
Shreya Ravula: Senior Data Engineer Profile
No ratings yet
Shreya Ravula: Senior Data Engineer Profile
9 pages
Big Data & Cloud Data Engineering Expert
No ratings yet
Big Data & Cloud Data Engineering Expert
4 pages
Data Engineering Expertise Overview
No ratings yet
Data Engineering Expertise Overview
4 pages
Azure Data Engineer Job Description
No ratings yet
Azure Data Engineer Job Description
6 pages
Building Big Data Pipelines with Apache Beam
No ratings yet
Building Big Data Pipelines with Apache Beam
8 pages
Real-Time Integration: DocuSign & Snowflake
No ratings yet
Real-Time Integration: DocuSign & Snowflake
5 pages
Azure Data Engineer with 5+ Years Experience
No ratings yet
Azure Data Engineer with 5+ Years Experience
2 pages
Data Engineer Profile and Expertise
No ratings yet
Data Engineer Profile and Expertise
5 pages
Data Engineering Expertise Overview
No ratings yet
Data Engineering Expertise Overview
5 pages
Data Engineer Resume - Ravi Samal
No ratings yet
Data Engineer Resume - Ravi Samal
2 pages
Hemanth S: Data Engineer Profile
No ratings yet
Hemanth S: Data Engineer Profile
7 pages
Shivam Gupta: Data Engineer Profile
No ratings yet
Shivam Gupta: Data Engineer Profile
1 page
Data Engineer with Cloud & Big Data Expertise
No ratings yet
Data Engineer with Cloud & Big Data Expertise
3 pages
Big Data Engineer Resume with Azure Expertise
No ratings yet
Big Data Engineer Resume with Azure Expertise
4 pages
Abdul Hameed Mohamed
No ratings yet
Abdul Hameed Mohamed
7 pages
TD Bank Data Engineering Overview
No ratings yet
TD Bank Data Engineering Overview
3 pages
Senior Data Engineer Profile Summary
No ratings yet
Senior Data Engineer Profile Summary
2 pages
Abhishek Mattihalli: Data Science Profile
No ratings yet
Abhishek Mattihalli: Data Science Profile
2 pages
Pravalika Garlapati: Senior Data Engineer
No ratings yet
Pravalika Garlapati: Senior Data Engineer
4 pages
Senior Data Engineer with Big Data Expertise
No ratings yet
Senior Data Engineer with Big Data Expertise
6 pages
Data Engineering with Scala & Spark
No ratings yet
Data Engineering with Scala & Spark
10 pages
Abhinay Varma: Data Engineering Expertise
No ratings yet
Abhinay Varma: Data Engineering Expertise
6 pages
Azure Data Engineer Resume
No ratings yet
Azure Data Engineer Resume
3 pages
Azure Data Engineer Resume Overview
No ratings yet
Azure Data Engineer Resume Overview
6 pages
Data Engineer & Analyst Resume Overview
No ratings yet
Data Engineer & Analyst Resume Overview
2 pages
Data Engineer with Cloud Expertise
No ratings yet
Data Engineer with Cloud Expertise
4 pages
Data Engineering and Warehouse Solutions
No ratings yet
Data Engineering and Warehouse Solutions
7 pages
Azure Data Engineer with Snowflake Expertise
No ratings yet
Azure Data Engineer with Snowflake Expertise
6 pages
Data Engineer Profile: Sai K. Bharath
No ratings yet
Data Engineer Profile: Sai K. Bharath
6 pages
Data Engineer Resume: Azure & SQL Expertise
No ratings yet
Data Engineer Resume: Azure & SQL Expertise
4 pages
Azure Data Engineer Expertise in Databricks
No ratings yet
Azure Data Engineer Expertise in Databricks
6 pages
Data Engineering Expertise Overview
No ratings yet
Data Engineering Expertise Overview
6 pages
CI/CD Solutions in Azure Data Engineering
No ratings yet
CI/CD Solutions in Azure Data Engineering
4 pages
Senior Data Engineer Profile Summary
No ratings yet
Senior Data Engineer Profile Summary
6 pages
Azure Data Engineer Resume Summary
No ratings yet
Azure Data Engineer Resume Summary
4 pages
Manan Singhal: Big Data Engineer Profile
No ratings yet
Manan Singhal: Big Data Engineer Profile
1 page
Experienced Data Engineer Profile
No ratings yet
Experienced Data Engineer Profile
5 pages
Azure Data Engineer with 10 Years Experience
No ratings yet
Azure Data Engineer with 10 Years Experience
2 pages
Senior Data Engineer Resume Highlights
No ratings yet
Senior Data Engineer Resume Highlights
5 pages
Big Data Engineer with Azure Expertise
No ratings yet
Big Data Engineer with Azure Expertise
10 pages
Davinder Gill's Data Engineering Expertise
No ratings yet
Davinder Gill's Data Engineering Expertise
5 pages
Devinder Gill: Azure Data Engineer Profile
No ratings yet
Devinder Gill: Azure Data Engineer Profile
5 pages
Data Engineering Expertise Overview
No ratings yet
Data Engineering Expertise Overview
6 pages
Senior Data Engineer Resume Overview
No ratings yet
Senior Data Engineer Resume Overview
6 pages
Senior Azure Data Engineer Resume
No ratings yet
Senior Azure Data Engineer Resume
4 pages
Data Engineer Resume: ETL & Big Data Expertise
No ratings yet
Data Engineer Resume: ETL & Big Data Expertise
2 pages
Farhan Erwin: Data Engineer Profile
No ratings yet
Farhan Erwin: Data Engineer Profile
9 pages
Full Stack Developer Expertise Overview
No ratings yet
Full Stack Developer Expertise Overview
1 page
Introduction to Neural Networks Course
No ratings yet
Introduction to Neural Networks Course
32 pages
FYP Project Presentation Request Letter
No ratings yet
FYP Project Presentation Request Letter
1 page
Deep Learning for Lung Disease Detection
No ratings yet
Deep Learning for Lung Disease Detection
6 pages
Medical Store Billing System Project Report
83% (6)
Medical Store Billing System Project Report
36 pages
Comprehensive Technical Writing Overview
No ratings yet
Comprehensive Technical Writing Overview
4 pages
F5 r10600 Load Balancing Solutions
No ratings yet
F5 r10600 Load Balancing Solutions
13 pages
Top 8 Overrated Excel Functions
No ratings yet
Top 8 Overrated Excel Functions
1 page
UCI Dataset for Credit Risk Prediction
No ratings yet
UCI Dataset for Credit Risk Prediction
13 pages
Navokta: Story-Sharing Platform Overview
No ratings yet
Navokta: Story-Sharing Platform Overview
1 page
C Programming Practice Exercises
100% (1)
C Programming Practice Exercises
6 pages
Invoice Approval Process Guide
No ratings yet
Invoice Approval Process Guide
22 pages
Electric Vehicle Charge & Range Insights
No ratings yet
Electric Vehicle Charge & Range Insights
23 pages
Automated Theorem Proving in Software Design
No ratings yet
Automated Theorem Proving in Software Design
18 pages
AT89S52 External Interrupt Addressing
No ratings yet
AT89S52 External Interrupt Addressing
20 pages
Wildfire Administration
No ratings yet
Wildfire Administration
96 pages
Heimdal Patch Management
No ratings yet
Heimdal Patch Management
5 pages
Medicinal Plant Identification with ML
No ratings yet
Medicinal Plant Identification with ML
55 pages
The First and Last King of Haiti: The Rise and Fall of Henry Christophe Instant Download
No ratings yet
The First and Last King of Haiti: The Rise and Fall of Henry Christophe Instant Download
100 pages
TDS6213 Data Structures Lab Assignment
No ratings yet
TDS6213 Data Structures Lab Assignment
4 pages
GMK5250L Outrigger Load Calculator
100% (1)
GMK5250L Outrigger Load Calculator
1 page
Excel Assignment: Formatting & Calculations
100% (1)
Excel Assignment: Formatting & Calculations
59 pages
Enterprise Architecture v2 To Mapping Guide
100% (1)
Enterprise Architecture v2 To Mapping Guide
22 pages
Free Internet Trick with NapsternetV
No ratings yet
Free Internet Trick with NapsternetV
5 pages
ECM Calibration Transmission Software Toyota Landcruiser
No ratings yet
ECM Calibration Transmission Software Toyota Landcruiser
5 pages
Onshore Production Solutions Overview
No ratings yet
Onshore Production Solutions Overview
97 pages
IT Class Schedule July-Sept 2024
No ratings yet
IT Class Schedule July-Sept 2024
2 pages
Network Security: Common Threats & Techniques
No ratings yet
Network Security: Common Threats & Techniques
8 pages
C++ Constructor and Copy Constructor Guide
No ratings yet
C++ Constructor and Copy Constructor Guide
38 pages
CMPG 121 Exam Sample Questions
No ratings yet
CMPG 121 Exam Sample Questions
8 pages
AWS Pricing and Monitoring Overview
No ratings yet
AWS Pricing and Monitoring Overview
70 pages
PICSY: A New Currency System
No ratings yet
PICSY: A New Currency System
61 pages
.NET Full-Stack Developer Resume
No ratings yet
.NET Full-Stack Developer Resume
6 pages
Agentic AI Frameworks Comparison Guide
No ratings yet
Agentic AI Frameworks Comparison Guide
1 page

End-to-End Data Engineering Projects

Uploaded by

End-to-End Data Engineering Projects

Uploaded by

Arslan Ali

arslanmushtaq4343@[Link] | +923106119450 | [Link]/in/arslanali434343 |

Project: Feature Risk May 2022 – Aug 2022

 Applied machine learning techniques to develop and enhance forecasting models

Broadstone (Python Dev), Lahore June 2021 – Dec 2021

Major Courses: Database Systems, Data Mining

Apache Airflow Fundamentals Jul 2024- Jul 2026

You might also like