ARUNABHA GUPTA
arunabhagupta11@[Link]
Contact No - 9903827152
Permanent Address: The Preserve, Flat 101, Belathur Colony, Bangalore-560067
My Belief: To learn continuously, contribute effectively and grow consistently both as an individual & as a team.
Past Experience:
TCS - 4.75 Years, Bangalore (Swiss Reinsurance (Life & Health), Deutsche Bank (Private& Commercial Clients (PCC) International))
IBM - 11 Months, Kolkata (DHL (Customer Analytics), XL (Claims Analytics))
Cognizant - 4.83 Years, Kolkata (Travelers (Claims Analytics), BNY Mellon, Nike (Order Fulfillment), Walmart, Belk)
PROFILE
Design and build data pipelines for business use case which in turn populates a use case specific data lake
Design and build data utilities for data cleaning, harmonizing and anonymization
Design and build write back enabled applications with interactive widgets
Carry out exploratory analysis on data science use cases
Hands on experience on Sentiment Analysis, Text Mining, NLP, Unsupervised and Supervised techniques such as
regression, classification, clustering and other applied predictive techniques
TECHNICAL DETAILS
Languages: Python, Pyspark, SQL, YAML, PostgresSQL , Spark ML
Analytical Skills: Exploratory Data Analysis, Hypothesis Testing
Supervised and Un-Supervised Machine Learning Techniques such as Linear Regression, Logistic Regression,
Ensemble Techniques-Bagging, boosting, Random Forest, Natural Language processing-Sentiment Analysis, Topic
Mining, Neural Network, KNN,PCA
Tools: Palantir Foundry (SLATE, CONTOUR, HUBBLE, CODE_REPO, CODE_WORKBOOK)
Cognos, Azure ML Databricks, Jupyter Notebook, PowerBI, GCP, AWS
PROFESSIONAL EXPERIENCE
Data Science:
Claims Customer International Address Parsing:
Multilingual tokenization (Using Hidden Markov Model)
Abbreviation expansion (Using Wordnet)
Address language classification (Using the FTRL-Proximal method to induce sparsity)
Numeric expression parsing
Health-Care Claims Customer insights:
Analyzing customer feedback records for Topic Modelling. (Using Hierarchical Bayesian Model-Latent Dirichlet Allocation).
Emotion Mining and getting Sentiment Scores.
Twitter Customer Sentiment Analysis:
Sourcing data via spark streaming and twitter API, data cleaning
Data understanding and pre-processing text by translating non-English comments using goggle translator, handling
emoji etc.
Identifying different aspects in text data using analytical techniques such as topic modelling
Industry Categorization:
Gathering data from client provided API via customized scrapper
Cleaning data, preprocessing steps
Data visualization and EDA to determine variable importance
Feature engineering to create variables that were used in training model
Model Selection (Used Random Forest and Stochastic Gradient Descent Model)
Building machine learning data pipelines
Clustering - Statistically grouping furnaces with similar efficiency & finding factors affecting efficiency:
Pre-processing data and cleaning of data
Dimensionality Reduction using (PCA/LDA)
Determining optimal number of clusters using Elbow curve
Internal Cluster Validation using Silhouette score
Data Utilities:
Designing data utilities to ensure seamless ingestion of data from unstructured raw files.
Data cleaning and data harmonization
Data anonymization and combining multiple data files into a single source
Snake casing using customized library
Dataset homogeneity utility
Dataset formatting utility
Palantir Foundry platform specific tools used: CODE-Workbook for automation
Build tool: gradle
Versioning: GIT
Data Engineering:
Build Data pipelines on a scalable data platform.
Sourcing data from unstructured/structured raw files
Cleaning data as per industry standards
Building data pipelines for the underlying business logic
Optimizing the code for efficient memory utilization
Explaining the technical implications in an understandable manner to end business users
Designing Dashboards for business users to validate the changes
Data Pipeline Business Use Cases I have so far worked on
Premium Validation and Anomaly detection
Mortality Analysis
Medical Reinsurance
Smart Underwriting Risk Assessment
Valuations
Covid-19 Response
Covid-19 What If analysis – In partnership with Palantir
Palantir platform specific tools used: CONTOUR/HUBBLE/CODE-REPO
Build Tool: Gradle
Versioning: GIT
Dashboard Designing/Application Design:
Building applications on scalable analytics platform
Primarily responsible for developing interactive dashboards using Cognos/PowerBI for end business users.
Additionally I was able to scale up to developing front-end applications on SLATE - Palantir Foundry
Applications Contributed In:
Actual vs. Expected & Data Freshness dashboard for SwissRe Life and Health
Dashboards Contributed In:
Private and Commercial Clients International for Deutsche Bank Spain
Claims dashboard for XL Catlin Group
Order Fulfillment System dashboard for Nike US
Insurance dashboard for Travelers Inc.
Store Expansion Dashboard for Walmart US
Tools: Cognos/PowerBI/Palantir Foundry SLATE
Build Tool: Gradle
Versioning: GIT
AWARDS AND RECOGNITION
TCS: [Link]
IBM: Best Performer (Internal)
CTS: Pearls of Wisdom (Internal)
ACADEMIC PROFILE
Degree University/Institution Year (CGPA)
Bachelor in Comp App WBUT 2010(83%)