AVIRAL BHARDWAJ
♦ +91-9752227743 ♦ [email protected] ♦ WWW: https://github.com/aviral-bhardwaj
♦ LinkedIn: https://www.linkedin.com/in/aviralb/
PROFESSIONAL SUMMARY
Experienced Data Engineer with 6 years in Big Data Engineering also an Open source contributor of
some projects like AWS,MarkitDown by Microsoft and Unity Catalog by Linux Foundation AI &
Data (previously Databricks ), specializing in Databricks products; certified as Databricks Data
Engineer Professional, Spark Associate, ML Associate, Data Analyst, and Data Engineer Associate;
proficient in AWS, PySpark, Python, and SQL
CERTIFICATIONS
Databricks Certified Associate Developer for Apache Spark 3.0
Databricks Certified Data Engineer Associate
Databricks Certified Data Engineer Professional
Databricks Certified Generative Engineer Associate
Databricks Certified Data Analyst Associate
Databricks Certified Machine Learning Associate
SKILLS
Git version control ETL development
Big data processing Python programming
Spark development SQL programming
Databricks
PROFESSIONAL EXPERIENCE
Senior Data Engineer, 07/2024 - Current
Coforge Limited
Project 6 - Senior AWS Databricks Data Engineer at Coforge Limited
Implemented Databricks products including Unity Catalog, Service Principals, and Cluster Policies
Developed data pipelines to provide cost usage notifications to key stakeholders
Collaborated with Fivetran team to integrate audit logs into S3 storage
Successfully migrated Databricks jobs across dev, test, and production workspaces, ensuring seamless
deployment
Collaborating with a major HR client (Trient) to implement Databricks solutions
Developing and optimizing AIBI dashboards for enhanced data visualization and insights
Leveraging Databricks ecosystem to streamline big data processing and analytics
Senior Data Engineer, 01/2024 - Current
Gulf Marketing Group International
Project 5-GMG Retail OMS-AWS Senior Databricks Data Engineer
• Successfully completed the GMG Retail project for Order Management System (OMS) within the given
timelines.
• Utilized the Fluent Commerce API to gather data within OMS, including sales, customers, delivery, and
return orders.
• Created data pipelines to accumulate and process data in Unity Catalog across bronze, silver, and gold
layers.
• Validated information from the business and created gold views in Unity Catalog according to the
requirements.
• Utilized PySpark and Python for writing APIs and code, while Postman was used for data viewing.
• Developed historical and incremental data pipelines to load data into the OMS data lake.
Project 4- GMG HealthCare O9- Senior AWS Databricks Data Engineer
• Pioneered the Data Engineering division at GMG's Technology Center in Gurgaon, hire and lead a
dynamic 5-member team to new heights.
• Utilized Databricks for efficient data migration from SAP HANA DW to the Data Lake, showcasing
technical acumen and strategic data management.
• Leveraged core AWS services including S3, EC2, AWS Glue, and IAM Roles to build a robust and
scalable data infrastructure, enhancing system reliability and security.
• Advanced Databricks utilization, mastering Unity Catalog, Delta Live Pipelines and Spark APIs to craft
sophisticated data processing solutions, with incremental and historical loads.
Senior Data Engineer, 09/2019 - 01/2024
Knowledge Lens
Project 3-Coca-Cola-Azure Senior Databricks Data Engineer
Collaborated with the Coca-Cola team to transition their on-premise Oracle database to Azure Cloud
Storage
Developed ADF pipelines with linked connections to transfer data from Oracle to Azure Data
Factory
Leveraged batch-processing within ADF to facilitate data flow and created a Databricks workflow that
used PySpark to transform the data according to business requirements
Managed Databricks flow to store data in Azure Blob Storage as part of the data ingestion process
Project 2- Amgen Inc (Contract of ZS Associates) – Databricks & AWS Senior Data Engineer
Collaborated with Amgen Inc
To develop data pipelines for various medical streams such as oncology, respiratory, COVID-19, and
other medicinal data
Loaded data from vendors in batches into AWS S3 storage and built ETL pipelines in AWS Glue and
Databricks using PySpark and Python
For validation, utilized SQL
Implemented open-source Airflow on AWS EC2 machines to orchestrate data pipelines
Extensively used Databricks to implement features such as E2 Workspace migration, Unity Catalog,
Databricks Secrets, Databricks Jobs, Databricks Workflow, Databricks Warehouse, and Delta
Lake functionalities
Optimized Databricks configurations for peak performance and cost efficiency, while ensuring data
quality through Delta Lake implementations
Conducted performance tuning of PySpark and Databricks to enhance pipeline flow efficiency
Developed an AWS Lambda function to automate EC2 machines, enabling cost savings by shutting
down AWS machines every Friday and starting them up each Monday morning
Engineered IAM roles and JSON policies for Databricks and AWS to enhance system security and
strengthen data protection measures
Project 1-AstraZeneca Inc (Contract of ZS Associates) – AWS and Databricks Data Engineer
Collaborated with the AstraZeneca client, utilizing AWS services to execute data loads
Leveraged various AWS services, including AWS EC2 for Airflow orchestration, IAM roles for
access control, AWS S3 for data storage, AWS Athena for querying data in S3 after data processing,
AWS EMR for data processing, and AWS CloudWatch and CloudTrail for user tracking and log
monitoring
Conducted data validation using Excel
EDUCATION
Post-Graduate Diploma: Big Data Analytics, 09/2019
CDAC Pune - CDAC
Engineering: Electronics and Telecommunication, 01/2016
Rustam Ji Institute of Technology - Gwalior