Anik Manik
Data Scientist at Cigniti Technologies, Hyderabad.
Ex-Associate Systems Engineer (Analytics) at IBM, Bangalore.
______________________________________________________________________________________
CONTACT PROFILE SUMMARY
• Experienced IT professional with 6+ years of overall experience in Analytics and Data Science.
• Currently working as a Data Scientist in Cigniti Technologies with more than 2.6 years of
Email:
[email protected] experience.
Phone: (+91) 9477672426 • Formerly worked with IBM as Data Analyst in Health Care Insurance domain.
• Good understanding of Statistics, Data Analytics, Machine Learning and Deep Learning
Address: algorithms.
Bengaluru, Karnataka - 560045 • Experienced in solving Time Series Forecasting utilizing advanced statistical models, machine
learning algorithms, and data visualization techniques to deliver accurate predictions and
DOB: 27/07/1994
actionable insights.
• Experienced in building and deploying Machine Learning models on cloud servers such as AWS
and Azure, leveraging Flask and Django frameworks for seamless integration and scalability.
- - - - - - - - - - - - - - - - - -- • Project experience in solving Insurance Amount prediction Regression problem.
• Worked on personal projects like Text Classification by using Natural Language Processing (NLP)
and Healthcare Image Segmentation and Classification (Computer Vision) problem using neural
networks.
GitHub: • Worked on research projects like Multivariate Data Drift detection using Dimensionality
https://github.com/anikmanik04
Reduction and Reconstruction.
Medium: • Passionate about learning Generative AI - Prompt Engineering for LLMs.
https://medium.com/@iamanik4
SKILLS
LinkedIn:
https://www.linkedin.com/in/ani • Python, Data Cleaning, Exploratory Data Analysis, Data Modelling, Machine Learning, Deep
kmanik04/
Learning, good knowledge of Probability and Statistics.
• Hands-on experience with Sk-Learn, SciPy, Numpy, Pandas, Gensim, XGBoost, Tensorflow,
Keras, Open-CV etc.
• Hands on experience with Machine Learning algorithms such as Logistic Regression, Naive
Bayes, Decision Trees, Random Forest, PCA, Linear Regression, ARIMA, SARIMA, SVMs, Bagging,
Boosting, KNN, K-Means Clustering, CNN.
• Working knowledge of Flask Framework, Experience with version control using GIT.
• Working knowledge in data visualization tools like Tableau and PowerBI.
• Good understanding of relational databases and working knowledge of data extraction from
Microsoft SQL Server.
• Proficient in Excel and PowerPoint.
• Self-directed, self-disciplined and a team worker with excellent communication skills.
PROFESSIONAL EXPERIENCE
Data Scientist at RoundSqr (Part of Cigniti Technologies), Hyderabad
November 2021 to Present
• Closely worked with stakeholders on Client-wise Aggregated Network-Care-Use Time-Series Forecasting problem for one of the largest
childcare and eldercare providers in US.
• Worked on a challenging project to build a deep learning solution on Digital Fingerprinting of Images of HVAC coils for a leading climate
control solution provider in US.
• Worked on Multivariate Data Drift detection using dimensionality reduction and reconstruction technique.
• Built QA generation tool for Cigniti using OpenAI (ChatGPT) API.
Associate Systems Engineer (Analytics) at IBM, Bangalore
May 2018 - November 2021
• Worked on data collection and data analysis for a US based leading global Healthcare Insurance provider.
• Also worked on POC projects like Health Insurance Premium Prediction Regression problem and Demand Forecast problem for the
same client.
ON JOB PROJECTS
Project Name: Build QA generation tool from pdf files using Retrieval Augmented Generation (RAG).
Summary: Generate assessment questions based on some given category and sub-category from pdfs using OpenAI API and find answers for these
questions.
Role:
• Collect the topic and sub-topic and feed them in the OpenAI API for question generation.
• Feed those same questions for answer generation.
• Convert pdf files to list of string, data preprocessing and then convert to Document object.
• Convert to vector using OpenAI Embeddings and store them in FAISS vector database.
• Perform similarity search and augment the search results with the prompt before feeding to LLM generation model [GPT-3.5-Turbo]
• Build a streamlit app and deploy it in AWS.
Project Name: Client-wise Aggregated Network-Care-Use Time Series Forecasting.
Summary: Forecast Network Care Use (Demand) for the next 1 year on client-wise aggregated level for three types of clients i.e. Ramped Up,
Ramping Up and New Clients.
Role:
• Understand Entity Relationship diagram and extract data from Microsoft SQL Server, Data cleaning and Preprocessing.
• Outlier detection, data imputation and feature engineering.
• Understand stationarity in the data using ADF test, converting it into a stationary series if not, and also find seasonality if any.
• Build univariate models like Simple Moving Average, Exponential Moving Average, ARIMA, SARIMA etc.
• Build multivariate models like Linear Regressor, Random Forest, XGBoost, LSTM etc.
• Choose error metric and perform error-analysis on the above models for the forecasted data.
Project Name: POC on Health Insurance Premium Prediction Regression problem.
Summary: Predict Health Insurance Premium based on different features and different insurance type like LTD, STD, Vision, Dental etc.
Role:
• Understand Entity Relationship diagram and extract data from SQL database.
• Exploratory Data Analysis, Outlier detection, data imputation and feature engineering.
• Apply different machine learning regression models like linear regression, decision tree regressor, random forest regressor etc. and
choose the best model based on the error metric.
• Building data pipeline and error analysis.
Project Name: Digital Fingerprinting of Images of HVAC Colis to backtrack them and minimize production failure.
Summary: Given moving HVAC coil images captured at different stages of the manufacturing process in the assembly line, backtrack the journey
of coils to identify the batch of ‘leak test’ failed defective coils using techniques like Motion Detection, Object Detection, Object Tracking, Feature
Descriptor and Image Matching and model deployment using Django.
Role:
• Collect Image data from Flir IP camera video feed located at different sections of manufacturing process and annotate them.
• Detect motion using background subtraction technique and then mark and crop the area of interest (object detection) using Faster R-
CNN with ResNet-50 as the backbone.
• Uniquely identify the coils while moving through the conveyor belt to remove duplicates using dlib correlation tracker (object tracking).
• Generate key points/feature descriptors using ORB and match them with previous stage of images using Hamming Distance measure.
Project Name: Multivariate Data Drift detection using Dimensionality Reduction and Reconstruction.
Summary: Detect multivariate drift in multi-dimensional data (that cannot be detected with univariate approaches) using dimensionality reduction
and reconstruction techniques like PCA.
Role:
• Data Generation and Data Collection.
• Univariate Drift detection followed by Multivariate Drift detection using dimensionality reduction and reconstruction techniques (PCA).
• Selection of Error metric and Error Calculation.
Project Name: Data Analysis and Visualization of customer data for Healthcare Insurance provider client.
Summary: Collect, clean and preprocess data. Tell the summary of the story that the data is telling and display this in plot/chart form.
Role:
• Understand ER diagram and collect data from SQL database.
• Clean and preprocess the data using Python. Prepare charts using Matplotlib and Seaborn python libraries.
• Make a summary from the data, prepare a report, and present the insights to the client.
CASE STUDIES AND PROJECTS
Project Name: Healthcare Provider Fraud Detection Analysis using Machine Learning.
Summary: Build a binary classification model based on the claims filed by the provider along with Inpatient data, Outpatient data, Beneficiary
details to predict Healthcare Provider Fraud.
GitHub Link: https://github.com/anikmanik04/healthcare-provider-fraud-detection
Medium Blog Link: https://bit.ly/3vTB0DE
Role:
• Understand Entity Relationship Diagram, clean and preprocess the data.
• Plot different charts and draw conclusions from them.
• Understand the business problem, choose performance metric, and formulate ML solution.
• Train binary classifier ML models from sklearn and build own custom bagging model.
• Error analysis, pipeline building and deployment using flask in AWS.
Project Name: Detection and Semantic Segmentation of Pneumothorax Disease from X-Ray Images using Deep Learning.
Summary: Build a binary image classification model to detect if the x-ray image contains pneumothorax. If yes, then pass it through a semantic
segmentation model to identify and mark the affected part.
GitHub Link: https://github.com/anikmanik04/SIIM-ACR-Pneumothorax-Segmentation
Medium Blog Link: https://bit.ly/4cPYHxn
Role:
• Collect and extract information from the x-ray(DICOM format) images. Perform EDA on the extracted data.
• First a binary classifier model using transfer learning and VGG19 architecture with pre-trained ImageNet weights is built to detect if the
image contains pneumothorax.
• Then a semantic segmentation model is built on the positive pneumothorax data only, along with their corresponding masks. UNET
architecture is used for this semantic segmentation task replacing the encoder part of the UNET model with pre-trained DenseNet121
backbone with ImageNet weights and kept the same decoder part.
• Perform error analysis and build final data pipeline.
Project Name: Amazon Review Classification - NLP.
Summary: Build a binary classification model based on the customer reviews of amazon e-commerce website.
GitHub Link: https://github.com/anikmanik04/amazon-review-classification-with-BERT
Role:
• Collect, clean, and preprocess the text data.
• Plot different charts and draw conclusions from them.
• Train binary classifier ML and DL models.
• Creating a BERT model from TensorFlow, getting the embedding vector from BERT.
• Using the embedding data apply NN and classify the reviews.
• Error analysis, pipeline building for the trained model.
ACADEMIC DETAILS AND CERTIFICATIONS
• Applied AI Machine Learning course and certificate of completion, Hyderabad (April 2020 - March 2021)
• B. Tech (2017) in EE, Jalpaiguri Government Engineering College, Jalpaiguri-West Bengal (DGPA: 8.11)
• WBCHSE (12th) 2011, Indas High School, Bankura -West Bengal (Score: 83.60%)
• WBBSE (10th) 2009, Patit High School, Bankura -West Bengal (Score: 84.62%)
• PCEP - Certified Entry Level Python Programmer from Python Institute – October 2020
• Python for Data Science from IBM – November 2020
• Data Analysis Using Python from IBM – November 2020
• Data Visualization Using Python from IBM – November 2020
• Machine Learning with Python (Level 1) from IBM – November 2020
• Applied Data Science with Python (Level 2) from IBM – November 2020
• Microsoft Certified Azure Fundamentals – June 2021
BLOGS
Topic: Credit Scoring: Model Validation and Best Practices
Link: Credit Scoring: Model Validation and Best Practices | Anik Manik | RoundSqr (Part of Cigniti)
Cigniti: Credit Scoring Model Validation and Best Practices
Topic: Healthcare Provider Fraud Detection Analysis using Machine Learning.
Link: https://bit.ly/3vTB0DE
Topic: Detection and Semantic Segmentation of Pneumothorax Disease from X-Ray Images using Deep Learning.
Link: https://bit.ly/4cPYHxn
HOBBIES AND INTERESTS
• Making toys and gadgets from recycled materials.
• Swimming