DS ML Python
DS ML Python
Shubhankar Agrawal
Abstract
This document serves as a quick refresher for Data Science and Machine Learning interviews. It covers Python concepts required for Machine
Learning and Data Science. This requires the reader to have a foundational level knowledge with tertiary education in the field. This PDF contains
material for revision over key concepts that are tested in interviews.
Contents
Table 2. Primitive Data Types
1 Python Fundamentals 1 Type Bytes (Header) Example Reference
1.1 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 int 4 (24) 42 Variable size
1.2 Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 float 8 (16) 12.46 C Double
Primitives • Non-Primitives bool 4 (24) True Integer
complex 16 (16) True Two Floats
1.3 Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Typing • Object Reference • Evaluation • Garbage Collection • Styling
bytes n (24) b01 Variable size
1–4
Python: Data Science and ML Refresher
2.2. Variants
Binary Search (log 𝑛) more efficient than Linear.
2.2.1. Tuple
Dynamic Programming
Variants of tuples with added functionalities.
collections.namedtuple: Access elements by name. • Fibonacci
dataclasses.dataclass: Class decorator. Mutable. • Knapsack (0-1, Repeated, Double Knapsack)
numpy.recarray: Variant of ndarray with named fields. • Longest Common Subsequence
• Longest Increasing Subsequence
2.2.2. List • Coin Change
Variants of lists with added functionalities. • Edit Distance (Levenshtein)
collections.deque: Double ended queue implemented as a doubly
Other algorithms to be familiar with:
linked list.
numpy.array: List with single data type. Memory efficient. • Huffman Encoding
numpy.ndarray: Multidimensional array with vectorized opera- • N Queens
tions. • Non Overlapping Activities
pandas.Series: List with custom index mapping to each element. • Subset sum
• Trie Build and Search
2.2.3. Set • Fast Exponentiation
Variants of sets with added functionalities. • Sliding Window contiguous subsequence problems
frozenset: Immutable set. • Reservoir Sampling
blist.sortedset: Maintains elements in sorted order with tree. • Bit manipulation (AND &, OR |, XOR̂)
2–4
Python: Data Science and ML Refresher
Inherit attributes from super-class. Yield values one at a time. Maintain state. Lazy evaluation.
Use the super() method to class super-class methods.
Composition: Have a class instance as an attribute. def my_generator () :
yield 1
Abstract Base Class: ABC to define skeleton. yield 2
yield 3
3.3. Method Decorators
gen = my_generator ()
Methods should have the self attribute. print ( next ( gen ) ) # 1
@classmethod: Shared amongst instances.
Code 5. Generating
@staticmethod: Don’t depend on instance.
@abstractmethod: Implemented by subclass.
4.3. Concurrency
4. Advanced Topics The Global Interpreter Lock (GIL) prevents multi-threading in Python
Threading: IO Bound Tasks
4.1. Pythonic Functionalities
Multiprocessing: CPU Bound Tasks
Here are some functionalities that help in writing mintainable Python Co-Routine: async functions with await keyword.
code. Executor: ThreadPool and ProcessPool executors.
4.1.1. List Comprehension
4.4. Type Hinting
Concise, clear Pythonic implementation of loops. Type Hint with pre-compile checks with a library like MyPy.
Example: [x[:3] for x in items if type(x) == ’str’]
def greet ( name : str , age : int ) -> str :
4.1.2. Lambda
return f " Hello , { name }. You are { age } years
Anonymous functions to be used within modules. old . "
Example: lambda x: x+1
# Example usage
message = greet ( " Alice " , 30)
4.1.3. Context Manager
print ( message )
Safely open close and operate with files. Uses the with keyword.
Code 6. Type Hints
4.1.4. Decorators
Add additional functionality to a function wrapping with more code.
5. Python Project
# Using the decorator with a custom message
5.1. Dependency Management
@ l o g _ f u n c t i o n _ c a l l s ( " Logging " )
def say_hello ( name ) : Create a virtual environment to collate the dependencies to avoid
print ( f " Hello , { name }! " ) polluting the global Python.
Code 2. Decorator Usage • virtualenv
• conda
3–4
Python: Data Science and ML Refresher
• mamba 7. Contact me
• uv
You can contact me through these methods:
Use a requirements.txt to manage all versions.
Personal Website - astronights.github.io
# [email protected]
5.2. Relevant Files ï linkedin.com/in/shubhankar-agrawal
Important files to have.
__init__.py: Mark directory as module and summarize imports
conftest.py: Pre-test functionality.
5.3. Testing
unittest: In built module.
pytest: Simple module with easy checks.
6. Libraries
Relevant Python libraries for Data Science / Machine Learning Roles
Table 8. Libraries
Type Name Purpose
General json JSON conversions
time Time profiling
datetime datetime parsing
DSA heapq Heaps
collections Collections
Scientific scipy Scientific
numpy Numeric
pandas DataFrames
Visualization matplotlib MatPlotLib
seabon Seaborn
Big Data pyarrow Arrow
polars Polars
duckdb DuckDB
pyhive Hive
impyla Impala
pymongo MongoDB
sqlalchemy DB Toolkit
Machine Learning scikit-learn
statsmodels Time Series
autoarima ARIMA
lightgbm LightGBM
xgboost XGBoost
Deep Learning torch PyTorch
tensorflow TensorFlow
keras Keras
LLMs langchain LangChain
llamaindex LlamaIndex
Concurrency pyspark PySpark
ray Ray
asyncio Asynchronous
threading Thread
multiprocessing Multi-Processing
concurrent Concurrency
API django Django
flask Flask Server
fastapi FastAPI Server
Typing pydantic Models
mypy Typehint Check
4–4