Pandas Cheat Sheet Updating Rows/Columns
Import convention
>>> import pandas as pd >>> [Link](columns={'age':'Age'}) >>> p_df.append(
- Renames the column names >>> [Link](to_replace=[51, 69.3], value = 58) {'name':'Jim lake',
- Replaces values ‘51, 69.3’ to 58 in the whole dataframe’. 'first':'Jim',
Creating data >>> [Link][2, ['age', 'weight']] = [35, 89.1]
- Updates row values at given columns. >>> df['age'].replace({51:58})
'last':'lake'}, ignore_index=True)
- Appends rows to p_df and returns a new object.
Creating Series - Replace value ‘51’ in age column to ‘58’.
>>> df["age"].apply(lambda x: x + 5) >>> df2 = [Link](
>>> s = [Link]([3, -5, 7, 4], - Updates the column value as per the lambda function. >>> p_df[['first', 'last']] = {"place" : ["HYD","DEL"],
index=['a', 'b', 'c', 'd']) p_df['name'].[Link](' ', expand=True) "state" : ["TEL", "UP"]})
>>> [Link](max) - Splits columns. >>> p_df.merge(df2, on="place")
Creating Dataframe - Applies the given function on dataframe. - Merge DataFrames or Series objects similar to SQL join operation.
>>> age_df = [Link](
>>> df = [Link](
>>> p_df = [Link]( {"age": [35, 17]}) >>> df.sort_values(by='age')
{"name" : ["Ram","Rahul","Ravi"],
{"name" : ["Jack Smith", 'Jane Lodge'], >>> [Link]([p_df, age_df], axis=1)
"age" : [51, 28, 19], - Sort by the values of the given column.
"place" : ["HYD", "DEL"]})
"weight" : [69.3, 44.6, 36.9]}) - Concatenates pandas objects along axis.
>>> df['age'].nlargest(2)
>>> p_df.applymap([Link]) >>> p_df.drop(labels='last', axis='columns') - Orders first 2 rows based on given column in descending order.
Loading Data - Applies the function to every element. - Removes rows or columns by specifying label names and
corresponding axis. >>> df['age'].nsmallest(2)
>>> df = pd.read_csv('[Link]')
>>> df['name'].map({'Rahul':'Raghu'}) - Orders first 2 rows based on given column in ascending order.
- Loading the data from a csv file into python.
- Map values of the Series according to input correspondence.
Properties of Dataframe Accessing Data Filtering Based on Criteria
>>> [Link](n) First n rows >>> [Link][0] Row by label >>> df [df['age'] > 50] Extracts rows that meet logical criteria.
>>> [Link](n) Last n rows >>> [Link][[0, 2], Group of rows and columns by label(s)
>>> [Link] Shape of df ['age', 'weight']] >>> [Link]('age < weight & age>=11') DataFrame resulting from the provided query expression.
>>> [Link] Column labels >>> [Link][[0, 1]] Group of rows and columns by indices.
>>> [Link] Datatypes of columns >>> filter = df['name'].[Link]('Rah') Series resulting from the provided string query expression.
>>>[Link][1,'weight'] Single value for a row-column label pair.
>>> df[filter]
>>>[Link] Summary statistics
Display Options Grouping and Aggregation Cleaning Data
>>> pd.set_option('display.max_rows', n)
Handling Missing Values Changing Datatypes
- Sets the max visible rows for dataframe. >>> [Link](by=['age', 'name'])
>>> nan_df = [Link]({ "A" :[1.0, -3.0, 1.0],
- Returns Groupby object grouped by values in given "B" : [1.0, [Link], 1.0], >>> df['weight'].astype('int64')
>>> pd.reset_option('display') - Converts 'weight' column into integer.
columns. "C" : [3.0, -2.0, 3.0],
- Resets all the display options.
"D" :[1.0, -3.0, 1.0])}
>>> df['name'].value_counts() >>> nan_df.isna() >>> [Link]('string')
- Counts the number of times each value is repeated. - Returns a boolean same-sized object indicating if the values are NA. - Converts every element in the df to string.
Changing the Index >>> [Link]('name')['age'].mean()
- Splits into groups based on 'age' and
>>> nan_df.fillna(2) >>> data = [Link](
- Fills NA/NaN values with the given value. {'year': [2015, 2016],
aggregation done on 'name'.
'month': [2, 3],'day': [4, 5]})
>>> nan_df.dropna()
>>> df.set_index('name') >>> [Link]() - Returns a DataFrame with the NaN entries dropped from it.
>>> datetime_df = pd.to_datetime(data)
- Set the index to become the ‘name’ column. - Counts non-NA cells for each column or row.
- Converts into datetime datatype.
>>> nan_df.replace('NA', [Link], inplace=True)
>>> df.reset_index() - Handle other missing values by replacing with given value.
>>> df['age'].min() >>> datetime_df.[Link]
- Reset the index of df and use the default one.
- Returns minimum of the values. - Returns months in the timestamps.
>>> df.first_valid_index()
>>> df.sort_index(axis=0) - Index of the first non-NA/null value.
>>> [Link](['sum', 'min', 'mean']) >>> datetime_df.[Link]
- Sort object by labels (along an axis).
- Aggregates the data using the functions: Handling Duplicates - Returns years in the timestamps.
>>> pd.read_csv('[Link]', index_col = 'sum', 'min', mean'. >>> nan_df.duplicated()
- Returns a boolean series for each of the duplicated rows. >>> datetime_df.dt.day_name()
'column_name')
- Setting the index while reading the csv file. >>> df['age'].cumsum() - Returns weekday in the timestamps.
- Returns the cumulative sum of a Series or DataFrame. >>> nan_df.drop_duplicates()
- Returns a dataframe with the duplicated rows removed.