ARIMA GUPTA SUMMARY
Data Engineer with around five years of experience in Finance, Manufacturing
[email protected] &Analytics, Automotive and Healthcare domain
Google Cloud Certified – Professional Data Engineer, Professional Cloud Architect,
+91-9021162660/ 8009986224 Associate Cloud Engineer, AWS Public Cloud Associate-Professional
228, Sankata Devi,
Lakhimpur Kheri, PROFESSIONAL EXPERIENCE
Uttar Pradesh, India - 262701
BIG DATA TSE (Google Cloud, Bangalore, IN / April 2022– Present)
Client: GCP Premium Customers
https://www.linkedin.com/in/ari Project Goal: To deliver best support experience to our customers by fixing their
ma-gupta/ production data pipeline issues with product expertise, efficiency and accuracy.
Roles & Responsibilities
Troubleshoot users’ issues, identify the cause and collaborate with PSO team,
escalation managers, product SWEs/SREs in order to fix it.
Develop supportability tools, write knowledge based articles, update internal
playbooks in order to provide efficient support and reduce customer contact rate.
Analyze daily health check reports of ongoing & resolved APAC Dataproc issues
Tech Stack: Cloud Dataproc(Specialized), BigQuery, PubSub, Datafusion, Dataflow
PROFESSIONAL SKILLS
Languages: DATA ENGINEER CONSULTANT (KPMG, Bangalore, IN / May 2021–April 2022)
C, Python, SQL Client: Capri Global Corporate Limited
Project Goal: To deliver a reporting suite for business to keep a track on KPIs in their
Big Data Technologies: operations by designing, developing and deploying an enterprise wide data lake.
Spark, Pyspark, Hadoop Roles & Responsibilities
Database/Datawarehouse: Worked on project from scratch-requirement gathering, documentation, designing.
MySQL, Oracle, Snowflake Data volume – 90TB of historical load and 5GB of daily incremental load.
Created end to end pipeline from disparate sources such as RDS Oracle DB(1200+
GCP Services: master and transaction tables, views from SQL server, tables from SAP and
Dataproc, BigQuery, PubSub, encrypted xml files to redshift datawarehouse.
Datafusion, Dataflow, GCS Tech stack: AWS Glue, Dynamo DB, Lambda, CloudWatch, Redshift ,Pyspark,
Amazon Web Services: Client: Adani Ports and SEZ Jul 2021 – Sep 2021
S3, Lambda, EMR, CloudWatch, Project Goal: To perform analytics on business units - ports, logistics and dry cargo
SNS, Step function, AWS Glue Roles & Responsibilities
Translate business requirement to create pipeline from different sources such as Oracle
DB, flat files, Mercury DB, API end-points to PostgreSQL in RDS.
EDUCATION Enhanced & optimized existing pipelines to reduce turn around time.
Developed, scheduled & automated near-real time, daily run and monthly run jobs.
Bachelor of Technology – Tech stack: Python, Pandas, Lambda, EC2, SNS, CloudWatch, S3, RDS, PostgreSQL
Computer Science (85.6%)
PSIT College of Engineering, Client: Maruti Suzuki Limited May 2021 - Jul 2021
Uttar Pradesh (IN), 208020 Project Goal: To combine historical observations, current forecasts and statistical weather
2014 - 2018 forecasts to create single dataset via API calls for data scientist team
Roles & Responsibilities
Higher Secondary (90.6%) Worked on visual crossing APIs to fetch csv/json data hourly, daily of 1700+ cities.
St. Don Bosco College, Developed an automated pipeline to transform the data & write to curated layer in S3.
Uttar Pradesh (IN), 262701 Tech stack: Pyspark, Pandas, Step Function, EMR, CloudWatch, Jupyter Notebook
2012 - 2014
SENIOR SYSTEMS ENGINEER (Infosys Ltd., Pune, IN / Jul 2018– May 2021)
Secondary School (93.5%) Client: Semiconductor Manufacturing Company
St. Don Bosco College, Project Goal: To setup a near-real time data lake to analyze the manufactured chips.
Uttar Pradesh (IN), 262701 Roles & Responsibilities
2010 - 2012 Worked as developer and also on documentation of runbook manual, flowchart design,
and code guidelines checklist, technical design document.
Worked on challenges like performance tuning (80GB data on daily basis), rollback
strategies, stored procedure fixes, duplicate records and structured streaming issues
Tech Stack: Pyspark, EMR, Snowflake, S3, Step Function, Structured Streaming, SNS