0% found this document useful (0 votes)
3 views

DS ML Python

This document is a refresher guide for Data Science and Machine Learning interviews, focusing on essential Python concepts. It covers topics such as Python fundamentals, data structures, object-oriented programming, advanced topics, and project management. The material is designed for individuals with a foundational knowledge in the field, providing key concepts and practical examples relevant for interviews.

Uploaded by

Albeniz
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

DS ML Python

This document is a refresher guide for Data Science and Machine Learning interviews, focusing on essential Python concepts. It covers topics such as Python fundamentals, data structures, object-oriented programming, advanced topics, and project management. The material is designed for individuals with a foundational knowledge in the field, providing key concepts and practical examples relevant for interviews.

Uploaded by

Albeniz
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Python: Data Science and ML Refresher

Shubhankar Agrawal

Abstract
This document serves as a quick refresher for Data Science and Machine Learning interviews. It covers Python concepts required for Machine
Learning and Data Science. This requires the reader to have a foundational level knowledge with tertiary education in the field. This PDF contains
material for revision over key concepts that are tested in interviews.

Contents
Table 2. Primitive Data Types
1 Python Fundamentals 1 Type Bytes (Header) Example Reference
1.1 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 int 4 (24) 42 Variable size
1.2 Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 float 8 (16) 12.46 C Double
Primitives • Non-Primitives bool 4 (24) True Integer
complex 16 (16) True Two Floats
1.3 Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Typing • Object Reference • Evaluation • Garbage Collection • Styling
bytes n (24) b01 Variable size

2 Data Structures and Algorithms 2


2.1 Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2. Data Types
2.2 Variants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2.1. Primitives
Tuple • List • Set • Dictionary
Floating Point Precision
2.3 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

3 Object Oriented Programming 3


Table 3. Primitive Data Types
3.1 Dunder Methods . . . . . . . . . . . . . . . . . . . . . . . . . 3
Type Bits Sign Exponent Significant
3.2 Inheritance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Single 32 1 8 23
3.3 Method Decorators . . . . . . . . . . . . . . . . . . . . . . . . 3
Double (default) 64 1 11 52
4 Advanced Topics 3
4.1 Pythonic Functionalities . . . . . . . . . . . . . . . . . . . . . 3
List Comprehension • Lambda • Context Manager • Decorators 1.2.2. Non-Primitives
4.2 Control Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Most of these are collection-based objects. Require reallocation when
Iterating • Generating memory exceeds.
4.3 Concurrency . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Tuple: Immutable.
4.4 Type Hinting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 List: Indices map to memory hashes. Mutable.
Set: HashSet. Mutable.
5 Python Project 3
Dictionary: HashMap. Mutable.
5.1 Dependency Management . . . . . . . . . . . . . . . . . . . . 3
5.2 Relevant Files . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
5.3 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Table 4. Non-Primitive Data Types
Type Bytes (Header) Object Additional
6 Libraries 4
bytearray 32 (24) b” 1
7 Contact me 4
tuple 16 (24) (a,b) 8
list 32 (24) [a,b] 8
1. Python Fundamentals set 192 (24) a,b Hash Table
dict 208 (24) a:b 16 + Hash Table
Interpreted language. Code converted to bytecode using interpreter
(CPython default).

1.1. Variables System Configuration


Object references. The id function is used to get the object identifier.
All experiments have been run on:
Values [−5, 256] cached on startup.
System: Windows 64 bit
Python: 3.10
Table 1. Object Headers (Overhead)
Bytes Description Notes 1.3. Concepts
ob_refcnt 8 Reference Count 1.3.1. Typing
ob_type 8 Pointer to Class
Strong v Weak: Strong means type matters during operations (Can-
ob_size 8 # Elements Variable Length
not add str to int).
Static v Dynamic: Types can change in runtime (object of str can
Mutability: Object can be changed => Hashable be reassigned to int).

1–4
Python: Data Science and ML Refresher

1.3.2. Object Reference 2.2.4. Dictionary


Mutable objects are call by reference, immutable objects are call by Variants of dictionaries with added functionalities.
value. collections.OrderedDict: Maintains order of insertion. Default
Use nonlocal keyword to reference variable outside function (in- in Python 3.
side module), and global keyword for global variable in script. collections.defaultdict: Returns default value if missing.
collection.Counter: Frequency dictionary.
1.3.3. Evaluation collections.ChainMap: Maintains update order of elements
Eager: Evaluate complete function. across dictionaries.
Lazy: Evaluate only what is necessary. frozendict: Immutable dict.
blist.sorteddict: Maintains elements in sorted order with tree.
1.3.4. Garbage Collection pandas.DataFrame: Two dimensional tabular data.
Objects are destroyed when 0 references (Strong vs Weak). Variants of dicts, tuples, queues (collections)
1.3.5. Styling 2.3. Algorithms
Python Enhancement Proposal 8 (PEP8) is a comprehensive style Common Algorithms to be known and their time complexities
guide for Python.
Casing:
Table 6. Sorting
• camelCase
• PascalCase Name Big-O Time Space
• snake_case Tim (default) 𝑛 ⋅ log 𝑛 n
Bubble 𝑛2 1
2. Data Structures and Algorithms Insertion 𝑛2 1
Selection 𝑛2 1
2.1. Data Structures Merge 𝑛 ⋅ log 𝑛 n
Python implements data structures of tuples, lists, sets and dictionar- Quick 𝑛2 log 𝑛
ies by default. Heap 𝑛 ⋅ log 𝑛 1
heapq: Min heap. logn complexities. Count 𝑛+𝑘 𝑛
bintrees.FastRBTree: Red Black Tree. Radix 𝑛⋅𝑘 𝑛+𝑘
networkx.Graph: Graph with Nodes and Edges
Thread safe queue libraries:
• queue.Queue Table 7. Graph
• queue.LIFOQueue
Name Big-O Time Purpose
• queue.PriorityQueue
DFS 𝑉+𝐸 Traverse Graph
BFS 𝑉+𝐸 Traverse Graph
Dijkstra (𝑉 + 𝐸) log 𝑉 Shortest path S -> All Nodes
Table 5. Time Complexity - Big O
Bellman-Ford 𝑉𝐸 Shortest paths (- weights)
Function Tuple List Set Dictionary Floyd-Warshall 𝑉3 Shortest paths (All vertices)
x in s n n 1 1 Kruskal 𝐸 log 𝐸 Minimum Spanning Tree
Get Item 1 1 - 1 Prim (𝑉 + 𝐸) log 𝑉 MST from arbitrary node
Append Item - 1 1 1 Topologic Sort 𝑉+𝐸 Order DAG forward edges
Delete Item - n 1 1 Tarjan 𝑉+𝐸 Strongly Connected Comps
A* Search 𝐸 log 𝑉 Shortest path + heuristics
Union-Find 𝛼(𝑉) Merge connected comps

2.2. Variants
Binary Search (log 𝑛) more efficient than Linear.
2.2.1. Tuple
Dynamic Programming
Variants of tuples with added functionalities.
collections.namedtuple: Access elements by name. • Fibonacci
dataclasses.dataclass: Class decorator. Mutable. • Knapsack (0-1, Repeated, Double Knapsack)
numpy.recarray: Variant of ndarray with named fields. • Longest Common Subsequence
• Longest Increasing Subsequence
2.2.2. List • Coin Change
Variants of lists with added functionalities. • Edit Distance (Levenshtein)
collections.deque: Double ended queue implemented as a doubly
Other algorithms to be familiar with:
linked list.
numpy.array: List with single data type. Memory efficient. • Huffman Encoding
numpy.ndarray: Multidimensional array with vectorized opera- • N Queens
tions. • Non Overlapping Activities
pandas.Series: List with custom index mapping to each element. • Subset sum
• Trie Build and Search
2.2.3. Set • Fast Exponentiation
Variants of sets with added functionalities. • Sliding Window contiguous subsequence problems
frozenset: Immutable set. • Reservoir Sampling
blist.sortedset: Maintains elements in sorted order with tree. • Bit manipulation (AND &, OR |, XOR̂)

2–4
Python: Data Science and ML Refresher

3. Object Oriented Programming


def l o g _ f u n c t i o n _ c a l l s ( msg ) :
Classes form a key structure in Python. def decorator ( fn ) :
def wrapper (* args , ** kwargs ) :
print ( f " { msg }: { fn . __name__ }. " )
class Person : return func (* args , ** kwargs )
def __init__ ( self , name : str , age : int ) : return wrapper
self . name = name # instance variable return decorator
self . age = age # instance variable
Code 3. Decorator
def greet ( self ) -> str :
return f " Hello , my name is { self . name }
and I am { self . age } years old . "
4.2. Control Flow
# Creating an object ( instance ) of the class • if, elif, else
person = Person ( " Alice " , 30) • for
print ( person . greet () ) # Output : Hello , my name
is Alice and I am 30 years old . • continue, break
• match case
Code 1. Classes
4.2.1. Iterating
Iterating and Generating
3.1. Dunder Methods Iterator: Object that allows you to traverse through a collection.
Iterable: Collection that returns iterator
Double Underscore methosd
__init__: Constructor
iterable = [1 , 2 , 3]
__repr__: Evaluatable representation iterator = iter ( iterable )
__str__: Custom string representation print ( next ( iterator ) ) # 1
__eq__: Check equality
Code 4. Iterating
__hash__: Generate hash

3.2. Inheritance 4.2.2. Generating

Inherit attributes from super-class. Yield values one at a time. Maintain state. Lazy evaluation.
Use the super() method to class super-class methods.
Composition: Have a class instance as an attribute. def my_generator () :
yield 1
Abstract Base Class: ABC to define skeleton. yield 2
yield 3
3.3. Method Decorators
gen = my_generator ()
Methods should have the self attribute. print ( next ( gen ) ) # 1
@classmethod: Shared amongst instances.
Code 5. Generating
@staticmethod: Don’t depend on instance.
@abstractmethod: Implemented by subclass.
4.3. Concurrency
4. Advanced Topics The Global Interpreter Lock (GIL) prevents multi-threading in Python
Threading: IO Bound Tasks
4.1. Pythonic Functionalities
Multiprocessing: CPU Bound Tasks
Here are some functionalities that help in writing mintainable Python Co-Routine: async functions with await keyword.
code. Executor: ThreadPool and ProcessPool executors.
4.1.1. List Comprehension
4.4. Type Hinting
Concise, clear Pythonic implementation of loops. Type Hint with pre-compile checks with a library like MyPy.
Example: [x[:3] for x in items if type(x) == ’str’]
def greet ( name : str , age : int ) -> str :
4.1.2. Lambda
return f " Hello , { name }. You are { age } years
Anonymous functions to be used within modules. old . "
Example: lambda x: x+1
# Example usage
message = greet ( " Alice " , 30)
4.1.3. Context Manager
print ( message )
Safely open close and operate with files. Uses the with keyword.
Code 6. Type Hints
4.1.4. Decorators
Add additional functionality to a function wrapping with more code.
5. Python Project
# Using the decorator with a custom message
5.1. Dependency Management
@ l o g _ f u n c t i o n _ c a l l s ( " Logging " )
def say_hello ( name ) : Create a virtual environment to collate the dependencies to avoid
print ( f " Hello , { name }! " ) polluting the global Python.
Code 2. Decorator Usage • virtualenv
• conda

3–4
Python: Data Science and ML Refresher

• mamba 7. Contact me
• uv
You can contact me through these methods:
Use a requirements.txt to manage all versions.
 Personal Website - astronights.github.io
# [email protected]
5.2. Relevant Files ï linkedin.com/in/shubhankar-agrawal
Important files to have.
__init__.py: Mark directory as module and summarize imports
conftest.py: Pre-test functionality.

5.3. Testing
unittest: In built module.
pytest: Simple module with easy checks.

6. Libraries
Relevant Python libraries for Data Science / Machine Learning Roles

Table 8. Libraries
Type Name Purpose
General json JSON conversions
time Time profiling
datetime datetime parsing
DSA heapq Heaps
collections Collections
Scientific scipy Scientific
numpy Numeric
pandas DataFrames
Visualization matplotlib MatPlotLib
seabon Seaborn
Big Data pyarrow Arrow
polars Polars
duckdb DuckDB
pyhive Hive
impyla Impala
pymongo MongoDB
sqlalchemy DB Toolkit
Machine Learning scikit-learn
statsmodels Time Series
autoarima ARIMA
lightgbm LightGBM
xgboost XGBoost
Deep Learning torch PyTorch
tensorflow TensorFlow
keras Keras
LLMs langchain LangChain
llamaindex LlamaIndex
Concurrency pyspark PySpark
ray Ray
asyncio Asynchronous
threading Thread
multiprocessing Multi-Processing
concurrent Concurrency
API django Django
flask Flask Server
fastapi FastAPI Server
Typing pydantic Models
mypy Typehint Check

4–4

You might also like