Data manipulation in Python mainly involves creating, modifying and analyzing datasets using Pandas. It helps clean and prepare data for further tasks like analysis or machine learning.
Creating Sample DataFrame
A DataFrame is the core data structure in pandas, used to store data in a tabular form. In this section, we create a DataFrame manually by assigning values to columns, which is useful for small datasets or testing.
import pandas as pd
df = pd.DataFrame()
df['Name'] = ['John', 'Emma', 'Liam', 'Olivia']
df['Age'] = [20, 19, 21, 18]
df['Student'] = [True, True, False, True]
print(df)
Output
Name Age Student 0 John 20 True 1 Emma 19 True 2 Liam 21 False 3 Olivia 18 True
Explanation:
- pd.DataFrame() creates an empty DataFrame
- df['Name'] = [...] adds Name column
- df['Age'] = [...] adds Age column
- df['Student'] = [...] adds Student columnÂ
Adding Row to DataFrame
To add new records to an existing DataFrame, Pandas provides the pd.concat() function. It combines two DataFrames together, making it the correct and recommended way to add rows in modern Pandas versions.
new_row = pd.DataFrame([['Sophia', 22, False]], columns=['Name', 'Age', 'Student'])
df = pd.concat([df, new_row], ignore_index=True)
print(df)
Output

Explanation:
- columns=[...] defines column names
- pd.concat([df, new_row]) combines old and new data
- ignore_index=True resets index
Getting Shape and Info
To quickly understand the structure of a dataset, Pandas provides .shape and .info() methods. These methods help identify the number of rows, columns, data types and presence of missing values.
print(df.shape)
print(df.info())
Output

Explanation:
- df.shape returns (rows, columns)
- df.info() shows column types and non-null values
Statistical Summary
For numerical analysis, Pandas provides the .describe() method. It generates key statistical measures like mean, standard deviation, minimum, maximum and quartiles, helping in understanding data distribution.
print(df.describe())
Output

Explanation: df.describe() gives count, mean, std, min, max, quartiles
Dropping Columns
The .drop() method with axis=1 is used to remove unwanted columns from a DataFrame. This is useful when certain features are not required for analysis or modeling.
df2 = df.drop('Age', axis=1)
print(df2)
Output

Explanation:
- drop('Age', axis=1) removes column
- axis=1 specifies column operation
- print(df2) displays updated data
Dropping Rows
The .drop() method with axis=0 allows removal of specific rows based on their index. This is commonly used to eliminate incorrect or irrelevant records from the dataset.
df2 = df2.drop(2, axis=0)
print(df2)
Output

Explanation:
- drop(2, axis=0) removes row with index 2
- axis=0 specifies row operation
- print(df2) displays updated data
Selecting Data from DataFrame
Pandas provides multiple ways to access data using column names and indexing methods like .loc[]. These methods help retrieve specific columns, multiple columns, or individual rows for analysis.
print(df['Name'])
print(df[['Name', 'Age']])
print(df.loc[0])
Output

Explanation:
- df['Name'] selects single column
- df[['Name', 'Age']] selects multiple columns
- df.loc[0] selects row with index 0
Filtering Data
Filtering in Pandas is done using conditional expressions inside the DataFrame. It allows selecting rows that satisfy a given condition, making it useful for focused data analysis.
print(df[df['Age'] > 19])
Output

Explanation:
- df['Age'] > 19 creates condition
- df[...] filters rows based on condition
Sorting Data
The .sort_values() method is used to arrange data based on column values. It helps in organizing the dataset in ascending or descending order for better readability and analysis.
print(df.sort_values(by='Age'))
Output

Explanation: sort_values(by='Age') sorts data based on Age column