0% found this document useful (0 votes)
75 views7 pages

Senior Data Engineer Resume

Rohith T is a Senior Data Engineer with 9 years of experience in Big Data technologies, specializing in Hadoop, Spark, and ETL processes. He has a strong background in developing data processing jobs, managing cloud infrastructures, and implementing data pipelines across various platforms including AWS and GCP. His technical skills encompass a wide range of programming languages, databases, and tools, making him proficient in data analytics and machine learning initiatives.

Uploaded by

abhinavin4u
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views7 pages

Senior Data Engineer Resume

Rohith T is a Senior Data Engineer with 9 years of experience in Big Data technologies, specializing in Hadoop, Spark, and ETL processes. He has a strong background in developing data processing jobs, managing cloud infrastructures, and implementing data pipelines across various platforms including AWS and GCP. His technical skills encompass a wide range of programming languages, databases, and tools, making him proficient in data analytics and machine learning initiatives.

Uploaded by

abhinavin4u
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Senior Data Engineer

NAME : Rohith T
G-Mail: [email protected]
Contact: +1 (972)-646-6110
SUMMARY
● Experienced Data Engineer with 9 years of expertise in Data Applications, Big Data implementations, Hadoop,
and ETL Data Warehouse Analysis.
● Proficient in Big Data Ecosystems including Hadoop, Pig, Hive, HDFS, MapReduce, Sqoop, Storm, Spark, Airflow,
Snowflake, Teradata, Flume, Kafka, Yarn, Oozie, and Zookeeper.
● Deep Knowledge of Big Data Technologies and the Hadoop ecosystem, with a strong grasp of MapReduce and
Hadoop infrastructure.
● Skilled in Developing Comprehensive Data Processing Jobs for data analysis using MapReduce, Spark, and Hive.

● Experienced with the Apache Spark Ecosystem, including Spark-Core, SQL, Data Frames, RDDs, and familiarity
with Spark Mllib.
● Expert in Spark Streaming with experience in creating RDDs (Resilient Distributed Datasets) using Scala, PySpark,
and Spark-Shell.
● Proficient in Data Manipulation Using Python, including data loading and extraction, as well as analysis and
numerical computations with libraries like NumPy, SciPy, and Pandas.
● Experienced in Using Pig Scripts for data transformations, event joins, filtering, and pre-aggregation before data
is stored in HDFS.
● Strong Knowledge of Hive analytical functions and experience in extending Hive functionality through custom
UDFs.
● Expertise in writing Map Reduce Jobs in Python for processing large sets of structured, semi-structured and
unstructured data sets and stores them in HDFS.
● Good understanding of data modeling (Dimensional & Relational) concepts like Star-Schema Modeling,
Snowflake Schema Modeling, Fact and Dimension tables.
● Used Amazon Web Services Elastic Compute Cloud (AWS EC2) to launch cloud instance.

● Hands on experience in setting up workflow using Apache Airflow and Oozie workflow engine for managing and
scheduling Hadoop jobs.
● Strong experience in working with UNIX/LINUX environments, writing shell scripts.

● Excellent knowledge of J2EE architecture, design patterns, object modeling using various J2EE technologies and
frameworks with Comprehensive experience in Web-based applications using J2EE Frameworks like Spring,
Hibernate, Struts and JMS.
● Worked with various formats of files like delimited text files, clickstream log files, Apache log files, Avro files,
JSON files, XML Files.
● Experienced in working in SDLC, Agile and Waterfall Methodologies.
● Strong skills in analytical, presentation, communication, problem solving with the ability to work independently
as well as in a team and had the ability to follow the best practices and principles defined for the team.

TECHNICAL SKILLS:
● Hadoop/Big Data Technologies: Hadoop 3.0, HDFS, MapReduce, Druid, Base 1.4, Apache Pig, Hive 2.3, Sqoop
1.4, Apache Impala 2.1, Oozie 4.3, Yarn, NIFI, Apache Flume 1.8, Kafka 1.1, Zookeeper
● Cloud Platform: Amazon AWS, EC2, EC3, Aurora,GCP, MS Azure, Azure SQL Database, Azure SQL Data
Warehouse, Azure Analysis Services, HDInsight, Azure Data Lake, Data Factory
● Hadoop Distributions: Cloudera, Hortonworks, MapR

● Programming Language: Java, Scala, Python 3.6, SQL, PL/SQL, Shell Scripting, Storm 1.0

● Databases: Oracle 12c/11g, SQL, PostgreSQL

● Operating Systems: Linux, Unix, Windows 10/8/7

● IDE and Tools: Eclipse 4.7, NetBeans 8.2, Quantexa, Intellij, Maven

● NoSQL Databases: HBase 1.4, Cassandra 3.11, MongoDB, Accumulo

● Web/Application Server: Apache Tomcat 9.0.7, WSDL’s

● SDLC Methodologies: Agile, Waterfall

● Version Control: GIT, SVN, CVS

● Other Tools: Visual Studio 2010, Business Intelligence Studio 2008, Attunity, Power BI, SQL Server Integration
Services (SSIS) 2005/2008, ANSI SQL, SQL Server Reporting Services (SSRS) 2008, SQL Server 2008 R2

PROFESSIONAL EXPERIENCE

Role: Sr. Data Engineer


Client: Zolon Tech Inc, Herndon, Virginia. Duration: Apr 2022 – till date
Responsibilities
● Led big data analytics, predictive analytics, and machine learning initiatives, ensuring effective execution and
integration.
● Implemented a proof of concept by deploying solutions on AWS S3 and Snowflake, demonstrating practical
application and feasibility.
● Leveraged AWS services to design and optimize big data architectures, analytics solutions, enterprise data
warehouses, and business intelligence frameworks, focusing on architecture, scalability, flexibility,
availability, and performance to enhance decision-making.
● Developed Scala scripts and UDFs using Spark DataFrames, SQL, and RDDs for data aggregation, querying,
and writing results back into S3 buckets.
● Performed data cleansing and mining, ensuring high-quality data for analysis.

● Crafted and executed programs in Apache Spark with Scala to perform ETL jobs on ingested data.

● Utilized Spark Streaming to process streaming data in batches for efficient batch processing with the Spark
engine.
● Created Spark applications for data validation, cleansing, transformation, and custom aggregation, using
Spark SQL for data analysis and providing insights for data scientists.
● Automated data ingestion processes using Python and Scala from various sources such as APIs, AWS S3,
Teradata, and Snowflake.
● Designed and developed Spark workflows in Scala to pull data from AWS S3 and Snowflake, applying
necessary transformations.
● Applied Spark RDD transformations to align business analysis with data processing actions.

● Automated workflows and scripts using Apache Airflow and shell scripting to ensure daily execution in
production environments.
● Developed scripts in Python to read CSV, JSON, and Parquet files from S3 buckets and load them into AWS
S3, DynamoDB, and Snowflake.
● Implemented AWS Lambda functions to execute scripts in response to events from Amazon DynamoDB
tables, S3 buckets, or HTTP requests via Amazon API Gateway.
● Migrated data from AWS S3 to Snowflake by creating custom read/write utility functions using Scala.

● Worked with Snowflake schemas and data warehousing, processing batch and streaming data pipelines using
Snowpipe and Matillion from AWS S3 data lakes.
● Profiled structured, unstructured, and semi-structured data from various sources to identify patterns and
implemented data quality metrics using queries or Python scripts.
● Installed and configured Apache Airflow for S3 and Snowflake data warehouses, creating DAGs for
automated workflows.
● Developed DAGs using Email Operator, Bash Operator, and Spark Livy Operator to execute tasks on EC2
instances.
● Deployed code to EMR via CI/CD pipelines using Jenkins.

● Utilized Code Cloud for version control, managing code check-ins and check-outs effectively.

Environment: Agile Scrum, MapReduce, Snowflake, Pig, Spark, Scala, Hive, Kafka, Python, Airflow, JSON, Parquet,
CSV, Code cloud, AWS.

Role: Sr. Data Engineer


Client: Medical Informatics Engineering, Fort Wayne, Indiana Duration: Jan 2019– Apr 2022
Responsibilities
● Designed and implemented scalable, efficient data pipelines on Google Cloud Platform (GCP) to ingest,
process, and store large datasets.
● Developed, managed, and optimized data warehouses using BigQuery, ensuring efficient data storage,
retrieval, and query performance.
● Built and maintained ETL/ELT processes using GCP services such as Dataflow, Dataproc, and Cloud Composer,
integrating data from diverse sources.
● Set up and managed cloud infrastructure, including IAM roles, VPCs, and networking, to ensure secure and
efficient data operations.
● Ensured data accuracy, consistency, and security by implementing data quality checks, validation processes,
and governance policies.
● Worked with Cloud Storage, Cloud Pub/Sub, and Cloud SQL for efficient data storage, streaming, and
relational database management.
● Automated data pipeline deployments and monitoring using CI/CD tools like Jenkins, Cloud Build, or GitLab
CI, reducing manual intervention.
● Utilized Terraform and Deployment Manager for Infrastructure as Code (IaC) to automate resource
provisioning and management.
● Implemented real-time data processing solutions using Apache Beam and Google Cloud Dataflow, handling
high-throughput data streams.
● Developed and optimized SQL queries and used BigQuery for large-scale data analysis and reporting,
ensuring performance at scale.
● Managed and orchestrated data workflows using Apache Airflow on Cloud Composer, integrating complex
data processing tasks.
● Employed machine learning models and integrated them into data pipelines for predictive analytics and
automated decision-making.
● Collaborated with data scientists, analysts, and stakeholders to translate business requirements into
technical solutions.
● Ensured compliance with industry standards and regulations for data privacy and security, implementing
encryption and access controls.

Environment: Google Cloud Platform (GCP), BigQuery, Dataflow, Apache Beam, Dataproc, Terraform, Jenkins, GitLab
CI, Apache Airflow, HBase, Kafka, Python, Storm, JSON, Parquet, GIT, JSON SerDe, Cloudera.

Role: Data Engineer


Client: Cigna, Philadelphia, Pennsylvania Duration: Dec 2017 – Jan 2019
Responsibilities
● Participated in designing and deploying multi-tier applications using a range of AWS services, including EC2,
Route 53, S3, RDS, DynamoDB, SNS, SQS, and IAM, with a focus on high availability, fault tolerance, and auto-
scaling through AWS CloudFormation.
● Supported continuous storage solutions in AWS, utilizing Elastic Block Storage, S3, and Glacier.

● Created and managed volumes and configured snapshots for EC2 instances.

● Used the Data Frame API in Scala to convert distributed collections of data organized into named columns,
and developed predictive analytics using Apache Spark Scala APIs.
● Developed Scala scripts leveraging DataFrames, SQL, Datasets, and RDDs/MapReduce in Spark for data
aggregation, querying, and writing data back into OLTP systems via Sqoop.
● Constructed Hive queries to preprocess data required for business processes.

● Designed HBase tables to manage large sets of structured, semi-structured, and unstructured data from
UNIX, NoSQL, and various portfolios.
● Implemented generalized solution models using AWS SageMaker.

● Demonstrated extensive expertise with core Spark APIs and data processing on EMR clusters.

● Developed and deployed AWS Lambda functions for ETL migration services, creating serverless data pipelines
that interact with Glue Catalog and can be queried from Athena.
● Programmed in Hive, Spark SQL, Java, C#, and Python to streamline incoming data, build data pipelines, and
orchestrate data processes for useful insights.
● Managed ETL pipelines to source tables and deliver calculated ratio data from AWS to Data Mart (SQL Server)
and Credit Edge servers.
● Gained experience in tuning relational databases (e.g., Microsoft SQL Server, Oracle, MySQL) and columnar
databases (e.g., Amazon Redshift, Microsoft SQL Data Warehouse).

Environment: Hortonworks, Hadoop, HDFS, AWS Glue, AWS Athena, EMR, Pig, Sqoop, Hive, NoSQL, HBase, Shell
Scripting, Scala, Spark, Spark SQL, AWS, SQL Server, Tableau.

Role: Data Engineer


Client: Swiss Re, Hyderabad, India Duration: Nov 2016 – Oct 2017
Responsibilities
● Developed logical and physical data models that represent current and future state data elements and data
flows using Erwin 4.5.
● Designed and built data marts according to specified requirements.

● Extracted data from diverse sources, including data files and customized tools such as Meridian and Oracle.
● Worked with views, stored procedures, triggers, and SQL queries for data loading (staging) to enhance and
maintain existing functionalities.
● Conducted analysis of source systems, requirements, and the existing OLTP system to identify necessary
dimensions and facts from the database.
● Created a Data Acquisition and Interface System Design Document.

● Designed the dimensional model for the data warehouse, ensuring alignment with source data layouts and
requirements.
● Extracted data from IBM mainframes, including fixed-width, delimited, and line-sequential files in binary
formats, for integration into an enterprise data warehouse.
● Extensively used Oracle ETL processes for address data cleansing.

● Developed and optimized ETL processes for handling affiliations from data sources using Oracle and
Informatica, tested with high data volumes.
● Responsible for the development, support, and maintenance of ETL (Extract, Transform, Load) processes
using Oracle and Informatica PowerCenter.
● Created common reusable objects for the ETL team and ensured adherence to coding standards.

● Reviewed high-level design specifications, ETL coding, and mapping standards.

● Designed new database tables to meet business information needs and developed mapping documents as
guidelines for ETL coding.
● Used ETL to extract files for external vendors and coordinated these efforts.

● Migrated mappings from development to testing environments and from testing to production.

● Performed unit testing and tuning to improve performance.

● Developed Informatica jobs to replace mainframe jobs for loading data into the data warehouse.

● Created various documents, including source-to-target data mapping documents and unit test cases
documents.

Environment: Informatica Power Center 8.1/7.1.2, Erwin 4.5, Oracle 10g/9i, Teradata V2R5, XML, PL/SQL, SQL Server
2005/2000 (Enterprise Manager, Query Analyzer), Sybase, SQL* Loader, SQL * Plus, Autosys, OLAP, Windows
XP/NT/2000, Sun Solaris UNIX, MS Office 2003, Visio Project, Shell scripts.

Role: Big Data Application Engineer


Client: DXC Technology, Hyderabad, India Duration: May 2015 – Nov 2016
Responsibilities
● Applied Agile methodologies throughout the software development lifecycle (SDLC) for effective project
management.
● Participated in the collection and analysis of requirements, reviewed existing design documents, and handled
the planning, development, and testing phases of applications.
● Leveraged PySpark to execute data transformations, deploying these processes on Azure HDInsight for data
ingestion, cleansing, and identity resolution tasks.
● Created a file-based data lake infrastructure utilizing Azure Blob Storage, Azure Data Factory, and Azure
HDInsight, and employed HBase for data storage and retrieval.
● Implemented business rules for contact deduplication using Spark transformations with both Spark Scala and
PySpark.
● Developed graph database nodes and relationships with Cypher language for improved data management.

● Engineered Spark jobs using Spark DataFrames to convert JSON documents into flat file formats.

● Developed microservices with AWS Lambda to facilitate API interactions with third-party vendors, including
Melissa and StrikeIron.
● Designed and managed data processing and scheduling pipelines using Azure Data Factory.

● Built Azure ML Studio pipelines incorporating Python code to run Naïve Bayes and XGBoost classification
algorithms for persona mapping.
● Facilitated data ingestion from Salesforce, SAP, SQL Server, and Teradata into Azure Data Lake using Azure
Data Factory.
● Created a module to handle contact deduplication for sales and marketing data.

● Ran Hive queries on Parquet tables stored within Hive to conduct data analysis.

● Developed REST APIs using the Flask framework (Python) to be consumed by frontend user interfaces.

● Tested REST API functionality using Python scripts and Postman.

● Wrote Azure Automation Runbook scripts to manage the scaling and operational control of the HDInsight
Cluster.
● Saved results from REST API calls in a Redis database to enable efficient retrieval for repeated queries.

Environment: Spark, Spark-Streaming, Spark SQL, HDFS, Hive, Apache Kafka, Sqoop, Java, Scala, Linux, Azure SQL
Database, Azure ML studio Jenkins, Flask Framework, Intellij, PyCharm, Eclipse, Git, Azure Data Factory, Tableau,
MySQL, Postman, Agile Methodologies, AWS lambda, Azure Cloud, Docker.

You might also like