Data Manipulation in Python using Pandas

Last Updated : 11 Apr, 2026

Data manipulation in Python mainly involves creating, modifying and analyzing datasets using Pandas. It helps clean and prepare data for further tasks like analysis or machine learning.

Creating Sample DataFrame

A DataFrame is the core data structure in pandas, used to store data in a tabular form. In this section, we create a DataFrame manually by assigning values to columns, which is useful for small datasets or testing.

Python
import pandas as pd
df = pd.DataFrame()

df['Name'] = ['John', 'Emma', 'Liam', 'Olivia']
df['Age'] = [20, 19, 21, 18]
df['Student'] = [True, True, False, True]

print(df)

Output
     Name  Age  Student
0    John   20     True
1    Emma   19     True
2    Liam   21    False
3  Olivia   18     True

Explanation:

  • pd.DataFrame() creates an empty DataFrame
  • df['Name'] = [...] adds Name column
  • df['Age'] = [...] adds Age column
  • df['Student'] = [...] adds Student column 

Adding Row to DataFrame

To add new records to an existing DataFrame, Pandas provides the pd.concat() function. It combines two DataFrames together, making it the correct and recommended way to add rows in modern Pandas versions.

Python
new_row = pd.DataFrame([['Sophia', 22, False]], columns=['Name', 'Age', 'Student'])
df = pd.concat([df, new_row], ignore_index=True)
print(df)

Output

Screenshot-2026-04-11-112402
New row added to the dataframe

Explanation:

  • columns=[...] defines column names
  • pd.concat([df, new_row]) combines old and new data
  • ignore_index=True resets index

Getting Shape and Info

To quickly understand the structure of a dataset, Pandas provides .shape and .info() methods. These methods help identify the number of rows, columns, data types and presence of missing values.

Python
print(df.shape)
print(df.info())

Output

Screenshot-2026-04-11-113004
Displays shape and detailed information

Explanation:

  • df.shape returns (rows, columns)
  • df.info() shows column types and non-null values

Statistical Summary

For numerical analysis, Pandas provides the .describe() method. It generates key statistical measures like mean, standard deviation, minimum, maximum and quartiles, helping in understanding data distribution.

Python
print(df.describe())

Output

Screenshot-2026-04-11-113229
Displays statistical summary

Explanation: df.describe() gives count, mean, std, min, max, quartiles

Dropping Columns

The .drop() method with axis=1 is used to remove unwanted columns from a DataFrame. This is useful when certain features are not required for analysis or modeling.

Python
df2 = df.drop('Age', axis=1)
print(df2)

Output

Screenshot-2026-04-11-113512
Displays dataframe without Age column

Explanation:

  • drop('Age', axis=1) removes column
  • axis=1 specifies column operation
  • print(df2) displays updated data

Dropping Rows

The .drop() method with axis=0 allows removal of specific rows based on their index. This is commonly used to eliminate incorrect or irrelevant records from the dataset.

Python
df2 = df2.drop(2, axis=0)
print(df2)

Output

Screenshot-2026-04-11-114217
Displays dataframe after removing row

Explanation:

  • drop(2, axis=0) removes row with index 2
  • axis=0 specifies row operation
  • print(df2) displays updated data

Selecting Data from DataFrame

Pandas provides multiple ways to access data using column names and indexing methods like .loc[]. These methods help retrieve specific columns, multiple columns, or individual rows for analysis.

Python
print(df['Name'])
print(df[['Name', 'Age']])
print(df.loc[0])

Output

Screenshot-2026-04-11-115539
Displays selected column, multiple columns and first row

Explanation:

  • df['Name'] selects single column
  • df[['Name', 'Age']] selects multiple columns
  • df.loc[0] selects row with index 0

Filtering Data

Filtering in Pandas is done using conditional expressions inside the DataFrame. It allows selecting rows that satisfy a given condition, making it useful for focused data analysis.

Python
print(df[df['Age'] > 19])

Output

Screenshot-2026-04-11-115804
Displays rows where Age > 19

Explanation:

  • df['Age'] > 19 creates condition
  • df[...] filters rows based on condition

Sorting Data

The .sort_values() method is used to arrange data based on column values. It helps in organizing the dataset in ascending or descending order for better readability and analysis.

Python
print(df.sort_values(by='Age'))

Output

Screenshot-2026-04-11-120001
Displays dataframe sorted by Age

Explanation: sort_values(by='Age') sorts data based on Age column

Comment

Explore