Comprehensive EDA Code Snippets with
Descriptions
Pandas Snippets (30+ Operations)
• Display first 5 rows of DataFrame:
df . head ()
• Display last 5 rows of DataFrame:
df . tail ()
• Get summary statistics:
df . describe ()
• Find the mean of a column:
df [ ’ column_name ’]. mean ()
• Find the median of a column:
df [ ’ column_name ’]. median ()
• Find the mode of a column:
df [ ’ column_name ’]. mode () [0]
• Calculate the variance of a column:
df [ ’ column_name ’]. var ()
• Find the standard deviation of a column:
df [ ’ column_name ’]. std ()
1
• Find the covariance matrix:
df . cov ()
• Calculate the correlation matrix:
df . corr ()
• Find unique values in a column:
df [ ’ column_name ’]. unique ()
• Find value counts in a column:
df [ ’ column_name ’]. value_counts ()
• Rename a column:
df . rename ( columns ={ ’ old_name ’: ’ new_name ’} ,
inplace = True )
• Filter rows based on condition:
df [ df [ ’ column_name ’] > 10]
• Group by a column and compute mean:
df . groupby ( ’ column_name ’) . mean ()
• Create a new column based on operation:
df [ ’ new_col ’] = df [ ’ col1 ’] + df [ ’ col2 ’]
• Drop rows with missing values:
df . dropna ( inplace = True )
• Fill missing values with mean:
df [ ’ column_name ’]. fillna ( df [ ’ column_name ’].
mean () , inplace = True )
• Filter rows by multiple conditions:
df [( df [ ’ col1 ’] > 10) & ( df [ ’ col2 ’] == ’ value ’)
]
2
• Reset index of DataFrame:
df . reset_index ( drop = True , inplace = True )
• Sort DataFrame by column:
df . sort_values ( by = ’ column_name ’ , ascending =
False )
• Check for missing values:
df . isnull () . sum ()
• Convert column to datetime:
df [ ’ column_name ’] = pd . to_datetime ( df [ ’
column_name ’])
• Create pivot table:
df . pivot_table ( values = ’ col1 ’ , index = ’ col2 ’ ,
columns = ’ col3 ’)
• Find duplicates in a column:
df [ df . duplicated ([ ’ column_name ’]) ]
• Drop duplicates:
df . drop_duplicates ( inplace = True )
• Apply function to a column:
df [ ’ new_column ’] = df [ ’ column_name ’]. apply (
lambda x : x *2)
• Create dummy variables for categorical columns:
pd . get_dummies ( df [ ’ category_column ’] ,
drop_first = True )
3
NumPy Snippets (30+ Operations)
• Create an array:
np . array ([1 , 2 , 3])
• Create a zeros array:
np . zeros ((3 , 3) )
• Create an identity matrix:
np . eye (3)
• Generate random numbers:
np . random . rand (3 , 3)
• Generate random integers:
np . random . randint (0 , 100 , size =(5 , 5) )
• Find the mean of an array:
np . mean ( arr )
• Find the median of an array:
np . median ( arr )
• Find the variance of an array:
np . var ( arr )
• Find the standard deviation:
np . std ( arr )
• Reshape an array:
np . reshape ( arr , ( rows , cols ) )
• Find the dot product of two arrays:
np . dot ( arr1 , arr2 )
4
• Transpose an array:
arr . T
• Find the inverse of a matrix:
np . linalg . inv ( arr )
• Find eigenvalues and eigenvectors:
np . linalg . eig ( arr )
• Find the determinant of a matrix:
np . linalg . det ( arr )
• Sort an array:
np . sort ( arr )
• Concatenate arrays:
np . concatenate (( arr1 , arr2 ) , axis =0)
• Find the cumulative sum:
np . cumsum ( arr )
• Find the cumulative product:
np . cumprod ( arr )
• Get array of unique values:
np . unique ( arr )
• Find indices of non-zero elements:
np . nonzero ( arr )
• Check if any values in array are true:
np . any ( arr )
• Check if all values in array are true:
5
np . all ( arr )
• Find max element in array:
np . max ( arr )
• Find min element in array:
np . min ( arr )
• Get array of random permutations:
np . random . permutation ( arr )
• Generate random samples from normal distribution:
np . random . normal ( loc =0.0 , scale =1.0 , size =(3 ,
3) )
• Find the percentile of array:
np . percentile ( arr , 90)
article listings color
Comprehensive EDA Code Snippets with Descriptions
article listings color
Comprehensive EDA Code Snippets with Descriptions
Pandas Snippets (30+ Operations)
• Display first 5 rows of DataFrame:
df . head ()
• Display last 5 rows of DataFrame:
df . tail ()
• Get summary statistics:
df . describe ()
• Find the mean of a column:
6
df [ ’ column_name ’]. mean ()
• Find the median of a column:
df [ ’ column_name ’]. median ()
• Find the mode of a column:
df [ ’ column_name ’]. mode () [0]
• Find the variance of a column:
df [ ’ column_name ’]. var ()
• Find the standard deviation of a column:
df [ ’ column_name ’]. std ()
• Find the covariance matrix:
df . cov ()
• Calculate the correlation matrix:
df . corr ()
• Find unique values in a column:
df [ ’ column_name ’]. unique ()
• Find value counts in a column:
df [ ’ column_name ’]. value_counts ()
• Rename a column:
df . rename ( columns ={ ’ old_name ’: ’ new_name ’} ,
inplace = True )
• Filter rows based on condition:
df [ df [ ’ column_name ’] > 10]
• Group by a column and compute mean:
7
df . groupby ( ’ column_name ’) . mean ()
• Create a new column based on operation:
df [ ’ new_col ’] = df [ ’ col1 ’] + df [ ’ col2 ’]
• Drop rows with missing values:
df . dropna ( inplace = True )
• Fill missing values with mean:
df [ ’ column_name ’]. fillna ( df [ ’ column_name ’].
mean () , inplace = True )
• Filter rows by multiple conditions:
df [( df [ ’ col1 ’] > 10) & ( df [ ’ col2 ’] == ’ value ’)
]
• Reset index of DataFrame:
df . reset_index ( drop = True , inplace = True )
• Sort DataFrame by column:
df . sort_values ( by = ’ column_name ’ , ascending =
False )
• Check for missing values:
df . isnull () . sum ()
• Convert column to datetime:
df [ ’ column_name ’] = pd . to_datetime ( df [ ’
column_name ’])
• Create pivot table:
df . pivot_table ( values = ’ col1 ’ , index = ’ col2 ’ ,
columns = ’ col3 ’)
• Find duplicates in a column:
8
df [ df . duplicated ([ ’ column_name ’]) ]
• Drop duplicates:
df . drop_duplicates ( inplace = True )
• Apply function to a column:
df [ ’ new_column ’] = df [ ’ column_name ’]. apply (
lambda x : x *2)
• Create dummy variables for categorical columns:
pd . get_dummies ( df [ ’ category_column ’] ,
drop_first = True )
• Select specific columns:
df [[ ’ column1 ’ , ’ column2 ’]]
• Calculate cumulative sum:
df [ ’ column_name ’]. cumsum ()
• Create a rolling average:
df [ ’ column_name ’]. rolling ( window =5) . mean ()
• Join two DataFrames:
pd . merge ( df1 , df2 , on = ’ common_column ’)
• Concatenate two DataFrames:
pd . concat ([ df1 , df2 ] , axis =1)
NumPy Snippets (30+ Operations)
• Create an array:
np . array ([1 , 2 , 3])
• Create a zeros array:
9
np . zeros ((3 , 3) )
• Create an identity matrix:
np . eye (3)
• Generate random numbers:
np . random . rand (3 , 3)
• Generate random integers:
np . random . randint (0 , 100 , size =(5 , 5) )
• Find the mean of an array:
np . mean ( arr )
• Find the median of an array:
np . median ( arr )
• Find the variance of an array:
np . var ( arr )
• Find the standard deviation:
np . std ( arr )
• Reshape an array:
np . reshape ( arr , ( rows , cols ) )
• Find the dot product of two arrays:
np . dot ( arr1 , arr2 )
• Transpose an array:
arr . T
• Find the inverse of a matrix:
np . linalg . inv ( arr )
10
• Find eigenvalues and eigenvectors:
np . linalg . eig ( arr )
• Find the determinant of a matrix:
np . linalg . det ( arr )
• Sort an array:
np . sort ( arr )
• Find the cumulative sum:
np . cumsum ( arr )
• Find the cumulative product:
np . cumprod ( arr )
• Concatenate two arrays:
np . concatenate (( arr1 , arr2 ) , axis =0)
• Find the maximum value in an array:
np . max ( arr )
• Find the minimum value in an array:
np . min ( arr )
• Find the index of the maximum value:
np . argmax ( arr )
• Find the index of the minimum value:
np . argmin ( arr )
• Create an array of ones:
np . ones ((3 , 3) )
• Flatten an array:
11
arr . flatten ()
• Find the shape of an array:
arr . shape
• Find the rank of a matrix:
np . linalg . matrix_rank ( arr )
• Find the trace of a matrix:
np . trace ( arr )
• Repeat elements of an array:
np . tile ( arr , (2 , 2) )
• Slice an array:
arr [1:3]
Matplotlib Snippets (30+ Visualizations)
• Create a simple line plot:
plt . plot (x , y )
plt . show ()
• Set plot title and labels:
plt . title ( ’ Title ’)
plt . xlabel ( ’X - axis ’)
plt . ylabel ( ’Y - axis ’)
• Create a bar chart:
plt . bar (x , y )
• Create a scatter plot:
plt . scatter (x , y )
12
• Create a histogram:
plt . hist ( data , bins =10)
• Create a box plot:
plt . boxplot ( data )
• Set axis limits:
plt . xlim (0 , 10)
plt . ylim (0 , 100)
• Display grid on the plot:
plt . grid ( True )
• Create a subplot:
plt . subplot (2 , 1 , 1)
plt . plot (x , y )
• Save a plot as image:
plt . savefig ( ’ plot . png ’)
• Change line style and color:
plt . plot (x , y , linestyle = ’ - - ’ , color = ’r ’)
• Create a pie chart:
plt . pie ( sizes , labels = labels )
• Change figure size:
plt . figure ( figsize =(8 , 6) )
• Create a filled plot:
plt . fill_between (x , y1 , y2 )
• Create a heatmap:
plt . imshow ( data , cmap = ’ hot ’)
13
• Add legend to the plot:
plt . legend ([ ’ Label1 ’ , ’ Label2 ’])
• Annotate a point on plot:
plt . annotate ( ’ Point ’ , xy =( x , y ) , xytext =( x +1 ,
y +10) ,
arrowprops = dict ( facecolor = ’ black
’) )
• Create a violin plot:
plt . violinplot ( data )
• Create a stacked bar chart:
plt . bar (x , y1 , label = ’ Y1 ’)
plt . bar (x , y2 , bottom = y1 , label = ’ Y2 ’)
plt . legend ()
• Set logarithmic scale:
plt . xscale ( ’ log ’)
• Change marker style:
plt . plot (x , y , marker = ’o ’)
• Plot a function:
x = np . linspace (0 , 10 , 100)
plt . plot (x , np . sin ( x ) )
• Set axis aspect ratio:
plt . gca () . set_aspect ( ’ equal ’ , adjustable = ’ box
’)
• Fill under a line plot:
plt . fill (x , y )
• Create a polar plot:
plt . subplot ( projection = ’ polar ’)
plt . plot ( theta , r )
14
• Create a quiver plot:
plt . quiver (x , y , u , v )
• Create a contour plot:
plt . contour (X , Y , Z )
• Add text to plot:
plt . text (1 , 1 , ’ Text ’ , fontsize =12)
• Draw a horizontal line:
plt . axhline ( y =0.5 , color = ’r ’)
• Draw a vertical line:
plt . axvline ( x =0.5 , color = ’g ’)
• Create a 3D plot:
ax = plt . axes ( projection = ’3d ’)
ax . plot3D (x , y , z )
Seaborn Snippets (30+ Visualizations)
• Create a seaborn scatter plot:
sns . scatterplot ( x = ’ col1 ’ , y = ’ col2 ’ , data = df )
• Create a seaborn line plot:
sns . lineplot ( x = ’ col1 ’ , y = ’ col2 ’ , data = df )
• Create a seaborn bar plot:
sns . barplot ( x = ’ col1 ’ , y = ’ col2 ’ , data = df )
• Create a seaborn box plot:
sns . boxplot ( x = ’ col1 ’ , y = ’ col2 ’ , data = df )
• Create a seaborn histogram:
15
sns . histplot ( df [ ’ column ’] , kde = True )
\ item \ textbf { Add a legend :}
\ begin { lstlisting }
plt . legend ([ ’ Label1 ’ , ’ Label2 ’])
• Create a stacked bar chart:
plt . bar (x , y1 , label = ’ Y1 ’)
plt . bar (x , y2 , bottom = y1 , label = ’ Y2 ’)
plt . legend ()
• Save a figure:
plt . savefig ( ’ figure . png ’)
• Create a pie chart:
plt . pie ( sizes , labels = labels , autopct = ’%1.1 f
%% ’)
• Create a 3D plot:
from mpl_toolkits . mplot3d import Axes3D
fig = plt . figure ()
ax = fig . add_subplot (111 , projection = ’3d ’)
ax . scatter (x , y , z )
• Create a contour plot:
plt . contour (X , Y , Z )
• Create a heatmap:
plt . imshow ( data , cmap = ’ hot ’ , interpolation = ’
nearest ’)
plt . colorbar ()
• Change figure size:
plt . figure ( figsize =(10 , 5) )
• Add annotations:
plt . annotate ( ’ Point ’ , xy =( x , y ) , xytext =( x +1 ,
y +1) , arrowprops = dict ( facecolor = ’ black ’ ,
arrowstyle = ’ - > ’) )
16
• Create a violin plot:
plt . violinplot ( data )
• Create a pair plot using Seaborn:
import seaborn as sns
sns . pairplot ( df )
• Customize tick marks:
plt . xticks ( rotation =45)
• Create a polar plot:
plt . polar ( theta , r )
• Create a histogram with density:
plt . hist ( data , density = True , bins =10)
• Create a filled area plot:
plt . fill_between (x , y1 , y2 )
• Overlay multiple plots:
plt . plot (x , y1 , label = ’ Y1 ’)
plt . plot (x , y2 , label = ’ Y2 ’)
plt . legend ()
• Show the plot:
plt . show ()
17