0% found this document useful (0 votes)
41 views6 pages

Kiran - Data Engineer

Kiran is a Data Engineer with over 14 years of experience in Data Warehouse projects and expertise in GCP, Big Data technologies, and ETL processes. He has a proven track record of developing data pipelines, migrating legacy systems to modern architectures, and implementing data management solutions using tools like Airflow, Spark, and Tableau. Kiran holds certifications as a GCP Professional Data Engineer and has worked with various clients, including Verizon and BestBuy, to enhance data processing and analytics capabilities.

Uploaded by

bikram115566
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views6 pages

Kiran - Data Engineer

Kiran is a Data Engineer with over 14 years of experience in Data Warehouse projects and expertise in GCP, Big Data technologies, and ETL processes. He has a proven track record of developing data pipelines, migrating legacy systems to modern architectures, and implementing data management solutions using tools like Airflow, Spark, and Tableau. Kiran holds certifications as a GCP Professional Data Engineer and has worked with various clients, including Verizon and BestBuy, to enhance data processing and analytics capabilities.

Uploaded by

bikram115566
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Kiran

GCP/Hive/Python/Spark/Informatica/Sqoop and Tableau


Data Engineer

Professional Summary
 Innovative and experienced developer with 14 + years of experience in Data
Warehouse projects.
 Have strong experience on major components of Bigdata ecosystem like Hadoop,
Hive, Sqoop, Spark & Python, Google Dataproc, Google BigQuery & other GCP
components.
 Experience in data management and implementation of Bigdata applications using
Airflow. Developed custom operators by enhancing the existing Airflow operators.
 Certified GCP Professional Data Engineer and GCP Cloud Associate Engineer.
 Developed data pipelines and migrated existing legacy pipelines to GCP Composer
(Airflow).
 Migrated SAS programs to Airflow & BigQuery.
 Experience in spinning up the ephemeral GCP Dataproc clusters for pySpark jobs.
 Configured & Managed IAM policies at resource level using the Deployment
manager.
 Deployed GCP components (cloud functions, BigQuery, Datasets, Dataproc) using
GCP Deployment manager.
 Worked on building the data lake to extract data from traditional database to HDFS
environment, performing the data transformations using Hive.
 Experience in importing and exporting data using Sqoop from RDBMS to HDFS and
vice-versa.
 In-depth understanding of Hadoop Architecture including YARN and various
components such as HDFS, Resource Manager, Node Manager, Name Node, Data Node.
 Developed spark streaming applications using Kafka.
 Experienced in developing visualization reports using Tableau.
 Handled different data formats like Avro, Parquet, and ORC formats.
 Experienced in Design, Development, Testing and Maintenance of various Data
Warehousing and Business Intelligence (BI) applications in complex business environments
using Informatica.
 Well versed in Conceptual, Logical/Physical, Relational, and Multi-dimensional
modeling, Data analysis and Data Transformation (ETL).
 Extensively worked on the ETL mappings, analysis, and documentation of OLAP
reports requirements. Solid understanding of OLAP concepts and challenges, especially
with large data sets.
 Implemented complex business rules by developing robust mappings/mapplets
using various transformations like Unconnected and Connected lookups, Normalizer,
Source Qualifier, Router, Filter, Expression, Aggregator, Joiner, Update Strategy etc.
 Proficient in developing Entity-Relationship diagrams, Star/Snowflake Schema
Designs, and expert in modeling Transactional Databases and Data Warehouse.
 Excellent technical, logical, code debugging and problem-solving capabilities.

Technical Skills Summary


 Big Data: GCP BigQuery, Cloud Functions, Spark (PySpark),
Airflow, Hive, Sqoop.
 Programming: Python, Shell scripting, PL/SQL.
 RDBMS: Oracle 10g &11g, Teradata.
 ETL: Informatica, DataStage.
 Visualization Tools: Tableau.

WORK EXPERIENCE
08/2020 – till date
Data Engineer

Client: Verizon
November 2021 to Till Date
PROJECT DESCRIPTION:
Developing data pipelines to process sales, offers & customers data to build analytical tables.

Responsibilities:
 Worked closely with Product Owners (PO) for requirements, Jira stories and source
to target mappings (STM). Analyzed STM’s based on source databases.
 Developed custom Airflow operators by enhancing the existing Airflow operators to
perform validations and execute multiple BigQuery SQLs with a single operator.
 Created Airflow Operators (in python) to fetch the data using the API calls and to
GCS buckets and then to load the data to BigQuery.
 Developed common reusable python functions to utilize in Airflow DAGs.
 Developing Airflow (GCP composer) jobs to process BigQuery data.
 Migrating Teradata Stored Procedures to Airflow.
 Migrated oozie workflows to GCP Airflow.
 Migrated Hive scripts to BigQuery.
 Implemented BigQuery policy tags to restrict the data access to the PII fields.
 Processed privacy data and built analytical tables for privacy teams.
 Review the code developed by peers and approve the code reviews.
 Preparing the design documents and documenting all the work/changes done.
 Prepare scripts to automatically validate the data and identify the quality issues with
the data.
 Developed Airflow DAGs to get the data from APIs to GCS (Google Cloud Storage)
buckets.

Environment: Google Cloud Platform (GCP) BigQuery, Airflow, GitHub, Python.

Client: BestBuy
August 2020 to November 2021
PROJECT DESCRIPTION:
Developing data pipelines to ingest data using API calls and process the data to build the feature
store for AI/ML needs.
Responsibilities:
 Develop pySpark jobs for processing the sales & offers data and prepare tables to
analyze the campaigns and their impacts.
 Developing the cloud functions (python) to perform feature definition file
validations and then push the files into the GCS buckets.
 Worked closely with Data Scientists to build the feature store.
 Created Airflow Operators (in python) to fetch the using API calls and load the data
to GCS buckets and then to BigQuery.
 Developed logging methods using cloud functions & pubsub topics for the Airflow
DAG executions.
 Migrating SAS programs to Airflow DAGs & BigQuery.
 Preparing the deployment scripts to deploy and manage the GCP components using
GCP Deployment manager.
 Build Tableau reports as per the business team requirements.
 Developed Airflow DAGs to get the data from APIs to GCS (Google Cloud Storage)
buckets.

Environment: Google Cloud Platform (GCP) BigQuery, Airflow, Github, Python.

WIPRO TECHNOLOGIES 11/2012 – 07/2020


Data Engineer

Client: Kohl’s, CA, USA.


February 2018 to July 2020
PROJECT DESCRIPTION:
The scope of this project is to process sales & customer data to analyze the customer behavior
(Customer 3600), offers impacts (Loyalty programs) and also to generate data for Campaign.
Responsibilities:

 Driving end to end data analytic solutions including gathering the business
requirements, implementing the solutions, developing reports, deploying and
monitoring the data piplines.
 Analyze & process the customer data to provide customer 360 0 view for the
marketing team using Hive.
 Process the transaction data sourcing from Mosaic (ecom) & SalesHub system to
generate Analytical reports of sales Using Spark & Python.
 Process the Demand, verified data & Offers to analyze the various loyalty programs
using Spark.
 Implemented incremental model for multiple databases to ingest data from GCP
MySql source to GCP bucket using sqoop.
 Developed applications which can spin up ephemeral Google Dataproc clusters for
running pySpark jobs.
 Generate reports in Tableau using the aggregated data to give a visual
representation.
 Migrated the aggregated data from Hive to Bigquery for faster query responses for
the business teams.
 Performed testing & lead the QA team.
 Analyzed & fixed the code to optimize the flows to avoid long processing and
wastage of cluster resources.
 Analyze the YARN to understand the major blockages.
 Developed POCs on Airflow and then migrated the Data Pipelines to Airflow.
 Collaborated with the infrastructure, network, database, application, and BI teams
to ensure data quality and availability.

Environment: Google Cloud Platform(GCP) BigQuery, HDFS, Hive, Sqoop, Bitbucket, Jenkins, Linux
Shell, Teradata, Python, Agile and Scrum model.

Client: Walmart, AR, USA.


January 2016 to January 2018
PROJECT DESCRIPTION:
The scope of this project is to move data from RDBMS to Hadoop cluster, convert current ETL jobs to
Hive scripts & spark applications for data analysis.
Responsibilities:
 Used Sqoop to import data from RDMS (Informix, Oracle & Teradata) to HDFS and
later analyzed data using various Hadoop components. Also automated the steps to
import the data from various databases.
 Extensively worked on creating Hive external and internal tables and then applied
HiveQL to aggregate the data.
 Migrated ETL jobs to Hive scripts for transformations, joins, aggregations.
 Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for increasing
performance benefit and helping in organizing data in a logical fashion
 Involved in converting Hive/SQL queries into Spark transformations using Spark SQL.
 Handled importing of data from various data sources, performed data control checks
using Spark and loaded data into HDFS.
 Developed basic reports in Tableau.
 Worked with datasets in Hive- creating loading and saving datasets using different
dataset operations.
 Worked with JIRA and Git.
Environment: HDFS, Sqoop, Hive, Informix, Oracle, pySpark, Tableau.

Client: SCHNEIDER ELECTRIC ,HYD,IN.


November 2012 to December 2015
PROJECT DESCRIPTION:
The scope of this project is to migrate the data from the legacy systems to SAP. As part of migration
we have to prepare the Data as per SAP structure and also Validate the Data before loading into SAP
system.
RESPONSIBILITIES:
As part of Data Preparation (Using Datastage) I have fulfilled the following responsibilities:
 Prepared UNIX scripts to validate the source files and moving them to respective
source folders.
 Scheduling the UNIX scripts to run the required Jobs as per the schedule time.
 Develop the PL/SQL programs to validations at Database levels.
 Tuning the PL/SQL programs for improving the performance.
 Design and development of jobs using DataStage Designer to load data from
different source files to target database.
 Extensively worked with performance tuning of Datastage Jobs and sessions.
 Run Data Stage jobs to process the data extracted (Biweekly) from the legacy
systems and make it suitable to load into SAP systems.
 Monitor and validate load process for each source extract and fix the issues in case
of any arises.
 Archiving files and maintenance of the development environment using Shell
scripts.

As part of Data Validation (Using Informatica) I have fulfilled the following responsibilities:
 Developed on PL/SQL procedures to make File Validation Checks.
 Involved in Informatica Development, administration and fixing production issues.
 Designed and developed Informatica 8x mappings to extract, transform and loading
of the data into Oracle 10g target tables.
 Worked on Informatica power center client tools like Designer, Workflow Manager,
Workflow Monitor, and Repository Manager.
 Implemented slowly changing dimensions methodology and developed mappings
to keep track of historical data.
 Involved in performance tuning the Informatica mappings.
 Expertise in using TOAD and SQL for accessing the Oracle database.
 Used TOAD to run SQL queries and validate the data in Data warehouse and Data
mart.

Environment: Informatica Power Center v9.1, Datastage v8.5, Flat files, Oracle10g & Oracle 11g,
PL/SQL, UNIX Shell Programming.
Tata Consultancy Ltd 09/2011 - 10/2012
Senior Software Engineer

Client: Agilent Technologies


September 2011 to October 2012
PROJECT DESCRIPTION:
The project comprises of almost three data marts. As a Data mart team, the main responsibility is to
work as per the user request to enhance the functionality of Informatica mappings. It also involved
in the new developments of the mappings and enhancements of the current mappings as per the
clients requirements.
RESPONSIBILITIES:
 Designed and Developed Informatica Mappings to Extract, Transform and Load data.
The source and target are based on Oracle.
 Used various transformations like Source Qualifier, Aggregator, expression, Joiner,
Connected and Unconnected lookups, Filters, Sequence Generator, Router, Update
strategy, Union and Stored Procedures to develop the mappings.
 Developed several Mappings and Mapplets using corresponding Source, Targets and
Transformations.
 Performed Unit Testing and wrote various test cases and precise documentation to
outline the dataflow for the mappings.
 Created various DDL Scripts for creating the tables with indexes and partitions.
 Created PL/SQL packages, Stored Procedures and Triggers for data transformation
on the data warehouse.
 Effectively worked on the performance tuning of the mappings for better
performance. Followed standard rules for performance tuning.
 Migrated the Mappings to different environments, development, testing, UAT and
Production.
 Used parameter files to provide the details of the Source and Target databases and
other parameters
 Preparing daily, weekly and monthly reports.

Environment: Informatica v8.X, CSV files, Oracle9i, PL/SQL Programming.

Infosys Technologies 07/2008 -


08/2011
Senior System Engineer

Client: Microsoft
July 2008 to August 2011
PROJECT DESCRIPTION:
The scope of this project is to integrate various source systems and develop the Datastage jobs to
implement the Business rules while loading data into the target systems.
RESPONSIBILITIES:
 Automated the repeated tasks with the help of Shell scripts.
 Design and development of jobs using DataStage Designer to load data from
different heterogeneous source files to target databases.
 Implemented many transformation activities in DataStage before loading the data
into various dimensions and fact tables.
 Used Transformer stage, Aggregator stage, Merge stage and Sequential file stage
and Sort stages in designing jobs.
 Used Row generator and Peek stage while testing the job designs.
 Worked with Local and Shared Containers.
 Used parallel processing capabilities, Session-Partitioning and Target Table
partitioning utilities.
 Created Reusable Transformations using Shared Containers.
 Developed PL/SQL stored procedures.
 Identified bottlenecks and performance tuned PL/SQL programs.

Environment: Datastage v7.X, Flat files, Oracle9i, PL/SQL, UNIX Shell scripting.

Education
Bachelor of Engineering 2004 - 2008
CBIT, Osmania University, India

You might also like