Swapna
Professional Summary
AI/ML Engineer with 11 years of experience in transforming business requirements into analytical models,
designing algorithms, building models, developing data mining and reporting solutions that scale across a
massive volume of structured and unstructured data.
Built scalable ML pipelines using tools like MLflow, Airflow, and Docker for continuous integration and
deployment (CI/CD) of AI models.
Expert in Data Science process life cycle: Data Acquisition, Data Preparation, Modeling (Feature Engineering,
Model Evaluation) and Deployment.
Equipped with experience in utilizing statistical techniques which include hypothesis testing, Principal
Component Analysis (PCA), ANOVA, sampling distributions, chi-square tests, time-series analysis,
discriminant analysis, Bayesian inference, multivariate analysis
Efficient in pre-processing data including Data cleaning, Correlation analysis, Imputation, Visualization, Feature
Scaling and Dimensionality Reduction techniques using Machine learning platforms like Python Data Science
Packages (Scikit-Learn, Pandas, NumPy).
Applied text pre-processing and normalization techniques, such as tokenization, POS tagging, and parsing.
Expertise using NLP techniques (BOW, TF-IDF, Word2Vec) and toolkits such as NLTK, Genism, SpaCy.
Experienced in tuning models using Grid Search, Randomized Grid Search, K-Fold Cross Validation.
Strong Understanding with artificial neural networks, convolutional neural networks, and deep learning
Skilled in using statistical methods including exploratory data analysis, regression analysis,
regularized linear models, time-series analysis, cluster analysis, goodness of fit, Monte Carlo simulation,
sampling, cross-validation, ANOVA, A/B testing, etc.
Expertise in building various machine learning models using algorithms such as Linear Regression, Logistic
Regression, Naive Bayes, Support Vector Machines (SVM), Decision trees, KNN, K-means Clustering,
Ensemble methods (Bagging, Gradient Boosting).
Experience in Text mining, Topic modeling, Natural Language Processing (NLP), Content Classification,
Sentiment analysis, Market Basket Analysis, Recommendation systems, Entity recognition etc.
Working experience in Natural Language Processing (NLP) and Deep understanding of Statistics/Linear
Algebra/Calculus and various optimization algorithms like gradient descent.
Familiar with key data science concepts (statistics, data visualization, machine learning, etc.). Experienced in
Python, MATLAB, SAS, PySpark programming for statistic and quantitative analysis.
Exposure to AI and Deep learning platforms such as TensorFlow, Keras, AWS ML
Experience working with Big Data tools such as Hadoop – HDFS and Map Reduce, Hive QL, Sqoop, Pig Latin
and Apache Spark (PySpark).
Extensive experience working with RDBMS such as SQL Server, MySQL, and NoSQL databases such as
MongoDB, HBase.
Knowledge on Time Series Analysis using AR, MA, ARIMA, GARCH and ARCH model.
Experience in building production quality and large-scale deployment of applications related to natural
language processing and machine learning algorithms.
Experience with high performance computing (cluster computing on AWS with Spark/Hadoop) and building
real-time analysis with Kafka and Spark Streaming. Knowledge using Qlik, Tableau, and Power BI
Generated data visualizations using tools such as Tableau, Python Matplotlib, Python Seaborn, R.
Knowledge and experience working in Agile environments including the scrum process and used Project
Management tools like Project Libre, Jira and version control tools such as GitHub/Git.
Technical Skills
Data Sources AWS snowflake, PostgreSQL, MS SQL Server, MongoDB, MySQL, HBase,
Amazon Redshift, Databricks, Teradata.
Statistical Methods Hypothesis Testing, ANOVA, Principal Component Analysis (PCA), Time Series,
Correlation (Chi-square test, covariance), Multivariate Analysis, Bayes Law.
Machine Learning Linear Regression, Logistic Regression, Naive Bayes, Decision Trees, Random
Forest, Support Vector Machines (SVM), K-Means Clustering, K-Nearest
Neighbors (KNN), Random Forest, Gradient Boosting Trees, Ada Boosting, PCA,
LDA, Natural Language Processing
Deep Learning Artificial Neural Networks, Convolutional Neural Networks, RNN, Deep Learning on
AWS, Keras API,CI/CD pipelines.
Hadoop Ecosystem Hadoop, Spark, MapReduce, Hive QL, HDFS, Sqoop, Pig Latin
Data Visualization Tableau, Python (Matplotlib, Seaborn), R(ggplot2), Power BI, QlikView, [Link]
Languages Python (NumPy, Pandas, Scikit-learn, Matplotlib, Seaborn), R, SQL, MATLAB,
Spark, Java, C#
Operating Systems UNIX Shell Scripting (via PuTTY client), Linux, Windows, Mac OS
Other tools and TensorFlow, Keras, AWS ML, NLTK, SpaCy, Gensim, MS Office Suite, Google
technologies Analytics, GitHub, AWS (EC2/S3/Redshift/EMR/Lambda/Snowflake)
Certifications Deep Learning with Python from Data camp.
Business Intelligence Predictive Analytics, A/B Testing, Forecasting, Decision Trees, Reinforcement
& Decision Science: Learning
Professional Experience
Maxim Healthcare - Columbia, MD Mar - 2022 to
Present Role: Data Scientist/ Machine learning Engineer
Roles & Responsibilities:
Build classification models based on advisor performance and deduced knowledge rule for high, medium
performing advisors
Deduced treatment plans for advisors by inspiring from interactions of high performing advisors.
Prioritized leads and created nurturing journey based on learnings from previous modelling practices.
Identified data vendors to optimize the performance of existing models and pave way to build new models.
Collaborated with data engineers and operation team to implement the ETL process, wrote and optimized SQL
queries to perform data extraction to fit the analytical requirements (Data Ingestion).
Performed data analysis by retrieving the data from the Hadoop cluster.
Performed univariate and multivariate analysis on the data to identify any underlying pattern in the data and
associations between the variables.
Under the PMO Organization, led the company wide AGILE Transformation initiative in conjunction with AGILE
Sponsor team represented by areas of Business, Technology, Marketing, HR, Compliance, IT etc.
As an overall AGILE transformation lead, developed implemented and promoted AGILE best practices and
standards across the enterprise and AGILE teams. Drove the organization-wide AGILE adoption strategy and
rollout plans. Provided solutions for scaling AGILE across projects, programs, and portfolios and improve
Application delivery.
Explored and analyzed the customer specific features by using Matplotlib in Python and ggplot2 in R.
Performed data imputation using Scikit-learn package in Python.
Participated in features engineering such as feature generating, PCA, feature normalization and label encoding
with Scikit-learn pre-processing.
Used Python (NumPy, SciPy, pandas, Scikit-learn, seaborn) and R to develop a variety of models and
algorithms for analytic purposes.
Worked on Natural Language Processing with NLTK module of python and developed NLP models for
sentiment analysis (LLM Integration).
Experimented and built predictive models including ensemble models using machine learning algorithms such
as Logistic regression, Random Forests, and KNN to predict customer churn.
Conducted analysis of customer behaviors and discover the value of customers with RMF analysis; applied
customer segmentation with clustering algorithms such as K-Means Clustering. Gaussian Mixture Model and
Hierarchical Clustering.
Used F-Score, AUC/ROC, Confusion Matrix, Precision, and Recall evaluating different models’ performance.
Designed and implemented a recommendation system which leveraged Google Analytics data and the
machine learning models and utilized Collaborative filtering techniques to recommend courses for different
customers.
Used LangChain memory components to preserve conversation context across multiple analytical steps,
improving model coherence and performance.
Integrated LangChain with SQL and Pandas agents, allowing LLMs to directly analyze tabular data and deliver
insights from relational databases.
Built modular, reusable LangChain components for use in time-series forecasting, anomaly detection, and
classification tasks powered by LLMs.
Designed rich data visualizations to model data into human-readable form with Tableau and Matplotlib.
Technology Stack: Hadoop, HDFS, Python, R, Tableau, Machine Learning (Logistic regression/ Random
Forests/ KNN/ K-Means Clustering/ Hierarchical Clustering/ Ensemble methods/ Collaborative filtering), JIRA,
GitHub, Agile/ SCRUM, GCP
Global Logic, Hyderabad, India Dec 2019 to Jan 2022
Machine Learning Engineer
Roles & Responsibilities:
Designed and deployed scalable machine learning models (e.g., XGBoost, CNNs, transformers) for
classification and regression tasks, improving prediction accuracy by 25%.
Built end-to-end deep learning pipelines using TensorFlow and PyTorch for NLP and computer vision
applications, reducing manual processing time by 40%.
Developed custom recommendation systems using collaborative filtering and neural networks, increasing user
engagement by 18%.
Communicated and coordinated with end client for collecting data and performed ETL to define the uniform
standard format. Queried and retrieved data from Oracle database servers to get the dataset.
In the pre-processing phase, used Pandas to remove or replace all the missing data and balanced the dataset
with Over-sampling the minority label class and Under-sampling the majority label class.
Used PCA and other feature engineering, feature scaling, Sickie-learn pre-processing techniques to reduce the
high dimensional data using entire patient visit history, proprietary comorbidity flags and comorbidity scoring
from over 12 million EMR and claims data.
Experimented with predictive models including Logistic Regression, Support Vector Machine (SVM), Gradient
Boosting and Random Forest using Python Scikit-learn to predict whether a patient might be readmitted.
Designed and implemented Cross-validation and statistical tests including ANOVA, Chi-square test to verify
the models’ significance.
Implemented, tuned and tested the model on AWS EC2 with the best performing algorithm and parameters.
Set up data pre-processing pipeline to guarantee the consistency between the training data and new coming
data.
Deployed the model on AWS Lambda. Collected the feedback after deployment, retrained the model and
tweaked the parameters to improve the performance.
Designed, developed and maintained daily and monthly summary, trending and benchmark reports in Tableau
Desktop.
Used Agile methodology and Scrum process for project developing.
Technology Stack: AWS EC2, S3, Oracle DB, AWS, Linux, Python (ScikitLearn/NumPy/Pandas/Matplotlib),
Machine Learning (Logistic Regression/Support Vector Machine/Gradient Boosting/Random Forest), Tableau.
Higate Infosystems Pvt, Hyderabad, India Feb 2012 to Nov 2019
Data Scientist
Responsibilities:
Worked on the project from gathering requirements to developing the entire application.
Worked on Anaconda Python Environment.
Created, activated and programmed in Anaconda environment.
Wrote programs for performance calculations using NumPy and SQL Alchemy.
Wrote python routines to log into the websites and fetch data for selected options.
Used python modules of Urllib, urllib2, Requests for web crawling.
Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data mining
solutions to various business problems and generating data visualizations using R, Python and Tableau. Used
with other packages such as Beautiful Soup for data parsing.
Involved in development of Web Services using SOAP for sending and getting data from the external interface
in the XML format. Used with other packages such as Beautiful Soup for data parsing
Worked on development of SQL and stored procedures on MYSQL.
Analyzed the code completely and have reduced the code redundancy to the optimal level.
Design and build a text classification application using different text classification models.
Used Jira for defect tracking and project management.
Worked on writing and as well as read data from CSV and excel file formats.
Involved in Sprint planning sessions and participated in the daily Agile SCRUM meetings.
Conducted every day scrum as part of the SCRUM Master role.
Developed the project in Linux environment.
Worked on resulting reports of the application.
Performed QA testing on the application.
Held meetings with client and worked for the entire project with limited help from the client.
Environment: Python, Anaconda, Spyder (IDE), Windows 7, Teradata, Requests, urllib, urllib2, Beautiful Soup,
Tableau, python libraries such as NumPy, SQL Alchemy, MySQ