0% found this document useful (0 votes)
11 views7 pages

Sankalp de Resume

Uploaded by

rohitkalpaguri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views7 pages

Sankalp de Resume

Uploaded by

rohitkalpaguri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Sankalp Dekate

Data Engineer
Email: [email protected]
ph. no: +1 678-798-8956 Ext: 104

SUMMARY

 Around 9+ years of highly qualified professional experience in Analysis, Design, Development,


and Implementation as a Data Engineer and Data Analyst.
 Strong experience in Data Analyst, Data mining with large data sets of structured and
unstructured data, Data Acquisition, Data Validation, Predictive modeling, Statistical modeling,
Data Visualization.
 Expertise in designing, developing, and optimizing cloud-native data pipelines on AWS.
 Experience in working with ETL, Big Data, Python/Scala, Relational Database Management
Systems (RDBMS), and enterprise-level cloud-based computing and applications.
 Skilled in ETL/ELT workflows, big data processing, and data warehousing using services like
Glue, Redshift, S3, EMR, Lambda, and Athena.
 Experience in developing enterprise level solution using batch processing and streaming
frameworks like Spark Streaming, Apache Kafka and Apache Flink.
 Design and implement scalable enterprise monitoring systems by applying continuous
integration. Perform maintenance and troubleshooting of the enterprise Redhat Openshift
systems.
 Work to continuously improve speed, efficiency and scalability of Openshift systems.
 Experience in analyzing data using Python, SQL, Microsoft Excel, Hive, PySpark, Spark SQL for
Data Mining, Data Cleansing and Machine Learning.
 Ability in making Spark Applications using Python (PySpark) and Scala.
 Experienced in Amazon AWS Cloud infrastructure services like EC2, VPC, S3, SNS, SQS, IAM,
RDS, Cloud Watch, Cloud Front, Elastic Load Balancers and Cloud Trial.
 Hands on experience working Amazon Web Services (AWS) using Elastic Map Reduce (EMR),
Redshift, and EC2 for data processing.
 Hands on experience on Unififed Data Analytics with Databricks, Databricks Workspace User
Interface, Managing Databricks Notebooks, with Python and with Spark SQL.
 Experience in developing Spark applications using Spark - SQL in Databricks for data extraction,
transformation, and aggregation from multiple file formats for analyzing & transforming the
data to uncover insights into the customer usage patterns.
 Experience with querying on data present in Cassandra cluster using CQL (Cassandra Query
Language).
 Designed and developed real-time streaming data pipelines using Apache Flink to process high-
volume event streams from Kafka.
 Experience working with Snowflake Multi cluster and virtual warehouses in Snowflake.
 Expertise in creating Spark Applications using Python (PySpark) and Scala.
 Good understanding of data modeling (Dimensional & Relational) concepts like Star-Schema
Modeling, Snowflake Schema Modeling, Fact and Dimension tables.
 Exceptional skills in SQL server reporting services, analysis services, Tableau, and data
visualization tools.
 Setting up data in AWS using S3 bucket and configuring instance backups to S3 bucket.
 Expert in working with Hive Data Warehouse tool-creating tables, data distribution by
implementing partitioning and bucketing, writing, and optimizing the Hive SQL queries.
 Experience with MapReduce, Pig, Programming Model, Installation and Configuration of
Hadoop, HBase, ETL, Sqoop and Flume using Unix commands.
 Responsible for building scalable distributed data solutions in both batch and streaming mode
on Big Query using Kafka, Spark and Core Java.
 Substantial experience working with big data infrastructure tools such as Python and Redshift
also proficient in Scala, Spark, Spark Streaming.
 Exceptional skills in SQL server reporting services, analysis services, Tableau, and data
visualization tools.
 Migrated legacy Spark Streaming jobs to Apache Flink for better fault tolerance and scalability.
 Strong skills in analytical, presentation, communication, and problem-solving with the ability to
work independently as well as in a team and had the ability to follow the best practices and
principles defined for the team.

TECHNICAL SKILLS:

Databases My SQL, NoSQL, SQL Server, Mongo DB, SQL loader, Cassandra, PostgreSQL, Oracle

BI/Reporting Tools Tableau, Power BI


Programming/ Scripting SQL, T-SQL, Python, Java, C++, Spark (PySpark, Scala), Shell, JSON
Languages
Big Data Ecosystem HDFS, Hive, HBase, MapReduce, Kafka, Sqoop,
Cloud Computing Tools Amazon AWS (EMR, EC2, S3, RDS, Redshift, Snowflake, Glue, Elastic search, kinesis),
Microsoft Azure (Data Lake, Data Storage, Data Bricks, Azure Data Factory,
Machine Learning, data pipeline, data analytics), Snowflake, SnowSQL
Web Technologies HTML, XML, JSON, CSS, jQuery, JavaScript

PROFESSIONAL EXPERIENCE

Client: Alcon Mar 2024 – Present


Role: Lead Data Engineer
Responsibilities:
 Managing BI and data Warehousing teams and designing the units to correspondingly scale.
 Defined the processes needed to achieve the operational excellence in all areas, including
project management and system reliability.
 Designed and implemented serverless ETL pipelines using AWS Glue and Lambda to process
2TB+ of daily data from multiple sources.
 Built cross-functional relationships with data scientists, PMs and software engineers to
understand data needs
 Drove the design, building, and launching of new data models and data pipelines in production
 Responsible for 100% of all data quality across product verticals and related business areas
 Worked on EMR cluster to run PySpark Jobs for Data Ingestion.
 Develop a Data pipeline using Airflow and Python to ingest current data and historical data in
the data staging area.
 Involved in the development of real time streaming applications using Pyspark, Apache Flink,
Kafka, Hive on distributed Hadoop Cluster.
 Built data lake on S3 with Glue catalog integration and enabled cross-account data sharing using
Lake Formation.
 Responsible for implementing monitoring solutions in Terraform, Docker, and Jenkins.
 Designed and Implement test environment on AWS.
 Designed and developed Flink pipelines to consume streaming data from Kafka and applied
business logic to message and transform and serialize raw data..
 Collaborated with Data Scientists to deliver feature-rich datasets for ML model training.
 Integrated Java with databases (SQL and NoSQL) for data retrieval and storage.
 Developed and maintained Python scripts and libraries for data extraction, transformation, and
loading (ETL) processes.
 Developed common Flink module for serializing and deserializing AVRO data by applying schema.
 Used OpenShift for creating new projects, services for load balancing and adding them to Routes to
be accessible from outside, troubleshooting pods through ssh and logs.
 Created E1 Test and E2 Test environments for Unit, Sanity, Functional and Performance Testing,
replicated all apps in Openshift.
 Created Flink DataStream APIs for real-time ETL and downstream analytics.
 Developed Spark streaming code in Databricks for dynamic data reception, preprocessing,
enrichment, transformation, and validation.
 Performed post-implementation troubleshooting of new applications and application upgrades.

Client: Johnson Controls Inc. Jan 2022 – Feb 2024


Role: Sr. Data Engineer
Responsibilities:
 Worked on EMR cluster to run PySpark Jobs for Data Ingestion.
 Develop a Data pipeline using Airflow and Python to ingest current data and historical data in
the data staging area.
 Wrote PySpark scripts for data ingestion into AWS redshift tables.
 Developed Airflow digs for data ingestion from source systems to the data warehouse.
 Developed PySpark code to convert the .csv files to parquet.
 Developed frameworks that involve PySpark code to bring the data from different source
systems into AWS s3 and to move data from AWS s3 – staging area to Curated area.
 Implemented Restful web service to interact with Redis Cache framework
 Converted Talend Joblets to support the snowflake functionality.
 Built different visualizations and reports in tableau using Snowflake data.
 Recreated existing SQL Server objects in snowflake.
 Implement One-time Data Migration of Multistate level data from SQL server to Snowflake by
using Python and SnowSQL.
 Extensively worked on Spark Streaming and Apache Kafka to fetch live stream data.
 Responsible for performing Machine-learning techniques regression/classification to predict the
outcomes.
 Converted Scripts from Oracle to Teradata.
 Experience in writing queries in SQL and R to extract, transform and load (ETL) data from large
datasets using Data Staging.
 Implemented CI/CD pipelines using Jenkins and build and deploy the applications.
 A highly immersive Data Science program involving Data Manipulation & Visualization, Machine
Learning, Python programming, SQL, GIT, Unix Commands, NoSQL, MongoDB, Hadoop.
 Converting SQL codes to Spark codes using Scala, PySpark, and Spark -SQL for faster testing and
processing of data.
 Good knowledge in setting up batch intervals, split intervals, and window intervals in Spark
Streaming.
 Expertise in building ETL Solutions end to end using different Source systems like SQL, Oracle,
and Filers and performed different types of Transformations using Python modules and
functions and Loaded into Global Data warehouse (Hadoop and Teradata) and data marts.
 Developed Informatica mappings, sessions, and workflows to load transformed data into EDW
from various source systems such as SQL Server, Teradata, and Flat Files.
 Developed, configured, and monitored Apache Hadoop, HDFS, SQL databases and for
monitoring, administrating, and implementing performance tuning functions, and database
queries on distributed systems.
 Modernized data analytics environment by using cloud-based Hadoop platform and
Version control system GIT.
 Performed post-implementation troubleshooting of new applications and application upgrades.

Environment: Spark SQL, Python, Scala, Tableau, AWS, NoSQL, R, ETL, MongoDB, Hadoop, Docker,
Jenkins, Github, Map Reduce, Snowflake, Teradata.

Client: Travellers Insurance Nov 2020– Dec 2021


Role: Data Engineer
Responsibilities:
 Involved in designing and deploying multi-tier applications using all the AWS services (EC2, S3,
RDS, Dynamo DB, SNS, SQS, IAM) focusing on high availability, fault tolerance, and auto-scaling
in AWS Cloud Formation
 Using Spark, performed various transformations and actions and the result data is saved back to
HDFS from there to the target database Snowflake
 Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for
small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS
EMR.
 Good command on CQL to run queries on the data present in Cassandra Cluster.
 Worked in building ETL pipeline for data ingestion, data transformation, and data validation on
cloud service AWS, working along with data steward under data compliance.
 Worked on scheduling all jobs using Airflow scripts using python and added different tasks to
DAG, and LAMBDA.
 Implemented a CI/CD pipeline using Jenkins, Airflow for Containers from Docker, and
Kubernetes.
 Experience in moving high and low volume data objects from Teradata and Hadoop to
Snowflake
 Improved performance of tables through load testing using Cassandra stress tool.
 Used Data Build Tool for transformations in the ETL process, AWS lambda, AWS SQS
 Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction,
transformation, and aggregation from multiple file formats for analyzing & transforming the
data to uncover insights into the customer usage patterns.
 Responsible for implementing monitoring solutions in Terraform, Docker, and Jenkins.
 Designed and Implement a test environment on AWS.
 Designed AWS Cloud Formation templates to create VPC, subnets to ensure successful
deployment of Web applications and database templates.
 Responsible for estimating the cluster size, monitoring and troubleshooting of the Spark data
bricks cluster.
 Implemented Apache-spark code to read multiple tables from the real-time records and filter
the data based on the requirement.
 Troubleshoot and resolve complex production issues while providing data analysis and data
validation.

Environment: SQL Server, Hadoop, ETL operations, Data Warehousing, Data Modelling, Teradata,
Cassandra, Snowflake, AWS Cloud computing architecture, EC2, S3, Python, Spark, Scala, Spark-SQL,

Client: Cloudspace, Charlotte, NC Aug 2018 – May 2020


Role: Data Engineer
Responsibilities:
 Extracted, transformed, and loaded data from source systems to Azure Data Storage services using
Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics.
 Primarily involved in data migration planning using SQL, SQL Azure, Azure Storage, Azure Data
Factory, SSIS, and PowerShell.
 Designed, developed, and deployed data pipelines using Azure services, including HDInsight, Data
Lake, Blob Storage, Data Factory, Synapse, Cosmos DB, Data Warehouse, Key Vault, SQL DB, and
DevOps.
 Set up a Spark Cluster to process over 2 Tb of data, implementing various Spark jobs for data
transformations and actions.
 Developed a fault-tolerant Kafka Producer using NIFI and Spark Streaming for data ingestion from
a RESTful web service into an RDBMS.
 Designed custom PySpark functions (UDFs) for specialized data transformations and operations.
 Integrated Java with databases (SQL and NoSQL) for data retrieval and storage.
 Developed and maintained Python scripts and libraries for data extraction, transformation, and
loading (ETL) processes.
 Implemented data cleaning and preprocessing routines using Python to ensure data quality and
consistency.
 Developed Spark streaming code in Databricks for dynamic data reception, preprocessing,
enrichment, transformation, and validation.
 Implemented data synchronization processes and ETL pipelines using Spring Boot.
 Achieved a 20% improvement in data loading and transformation efficiency within Snowflake,
including data ingestion and SQL-based data transformations.
 Conducted in-depth data analysis of production issues using SQL, Python, and Tableau, identifying
root causes, and providing recommendations for resolution.
 Conducted testing of data pipelines to ensure accuracy, completeness, and data quality.
 Packaged the code into Azure DevOps Git Repo.
 Migrated the code from DEV to QA and PROD environments. Deployed the code using Azure CI/CD
pipelines and configured builds using GitHub Actions.
 Maintained code by executing it using Azure Pipelines.Ensured data readiness for downstream
systems, including ML model building and dynamic Power BI visualizations and dashboards
creation.
 Provided ongoing support for data processing pipelines and resolved production issues.

IBM India Pvt Ltd, Pune, India July 2016 – July 2018
Role: Data Engineer
Responsibilities:
 Developed tools using Python to automate some of the menial tasks. Interfacing with
supervisors, artists, systems administrators, and production to ensure production deadlines are
met.
 Used Python and Django to create graphics, XML processing, data exchange and business logic
implementation.
 Utilized PyUnit, the Python unit test framework, for all Python applications.
 Using AmazonEC2 command-line interface along with Bash/Python to automate repetitive
work.
 Used Python-based GUI components for the front-end functionality such as selection criteria.
 Designed and managed API system deployment using a fast HTTP server and Confidential AWS
architecture
 Worked No Confidential Web Services (AWS) Cloud services such as EC2, EBS, S3, VPC, Cloud
Watch, and Elastic Load Balancer.
 Setup database in AWS using RDS and configuring backups for the S3 bucket.
 Worked on Ad hoc queries, Indexing, Replication, Load balancing, and Aggregation in
MongoDB.
 Helped the big data analytics team with the implementation of python scripts for Sqoop, Spark
and Hadoop batch Data Streaming.

Environment: Python, SQL Server, AWS Cloud services, Web services, GUI, MongoDB, Django, Pyspark.

Omegasoft Technologies, Pune, India Dec 2015 – June 2016


Role: Jr. Data Analyst
Responsibilities:

 Created new reporting tools in Python that tracked website usage statistics in multiple countries
which replaced the existing system with 87% more efficiency
 Designed and initiated automated tests for application using Python unit testing framework
pytest and Selenium WebDriver API for browser automation testing, which increased test
coverage.
 Developed processes for data mapping, compliance and statement validation by mapping
merchants in a new backend software system
 Tracked the site traffic and extracted monthly traffic reports
 Created a data warehousing using SQL Server and Excel to store, organize and analyze large
amounts of data

You might also like