Data Manipulation in Python using Pandas

Last Updated : 11 Apr, 2026

Data manipulation in Python mainly involves creating, modifying and analyzing datasets using Pandas. It helps clean and prepare data for further tasks like analysis or machine learning.

Creating Sample DataFrame

A DataFrame is the core data structure in pandas, used to store data in a tabular form. In this section, we create a DataFrame manually by assigning values to columns, which is useful for small datasets or testing.

Python

import pandas as pd
df = pd.DataFrame()

df['Name'] = ['John', 'Emma', 'Liam', 'Olivia']
df['Age'] = [20, 19, 21, 18]
df['Student'] = [True, True, False, True]

print(df)

Output

     Name  Age  Student
0    John   20     True
1    Emma   19     True
2    Liam   21    False
3  Olivia   18     True

Explanation:

pd.DataFrame() creates an empty DataFrame
df['Name'] = [...] adds Name column
df['Age'] = [...] adds Age column
df['Student'] = [...] adds Student column

Adding Row to DataFrame

To add new records to an existing DataFrame, Pandas provides the pd.concat() function. It combines two DataFrames together, making it the correct and recommended way to add rows in modern Pandas versions.

Python

new_row = pd.DataFrame([['Sophia', 22, False]], columns=['Name', 'Age', 'Student'])
df = pd.concat([df, new_row], ignore_index=True)
print(df)

Output

Screenshot-2026-04-11-112402 — New row added to the dataframe

Explanation:

columns=[...] defines column names
pd.concat([df, new_row]) combines old and new data
ignore_index=True resets index

Getting Shape and Info

To quickly understand the structure of a dataset, Pandas provides .shape and .info() methods. These methods help identify the number of rows, columns, data types and presence of missing values.

Python

print(df.shape)
print(df.info())

Output

Screenshot-2026-04-11-113004 — Displays shape and detailed information

Explanation:

df.shape returns (rows, columns)
df.info() shows column types and non-null values

Statistical Summary

For numerical analysis, Pandas provides the .describe() method. It generates key statistical measures like mean, standard deviation, minimum, maximum and quartiles, helping in understanding data distribution.

Python

print(df.describe())

Output

Screenshot-2026-04-11-113229 — Displays statistical summary

Explanation: df.describe() gives count, mean, std, min, max, quartiles

Dropping Columns

The .drop() method with axis=1 is used to remove unwanted columns from a DataFrame. This is useful when certain features are not required for analysis or modeling.

Python

df2 = df.drop('Age', axis=1)
print(df2)

Output

Screenshot-2026-04-11-113512 — Displays dataframe without Age column

Explanation:

drop('Age', axis=1) removes column
axis=1 specifies column operation
print(df2) displays updated data

Dropping Rows

The .drop() method with axis=0 allows removal of specific rows based on their index. This is commonly used to eliminate incorrect or irrelevant records from the dataset.

Python

df2 = df2.drop(2, axis=0)
print(df2)

Output

Screenshot-2026-04-11-114217 — Displays dataframe after removing row

Explanation:

drop(2, axis=0) removes row with index 2
axis=0 specifies row operation
print(df2) displays updated data

Selecting Data from DataFrame

Pandas provides multiple ways to access data using column names and indexing methods like .loc[]. These methods help retrieve specific columns, multiple columns, or individual rows for analysis.

Python

print(df['Name'])
print(df[['Name', 'Age']])
print(df.loc[0])

Output

Screenshot-2026-04-11-115539 — Displays selected column, multiple columns and first row

Explanation:

df['Name'] selects single column
df[['Name', 'Age']] selects multiple columns
df.loc[0] selects row with index 0

Filtering Data

Filtering in Pandas is done using conditional expressions inside the DataFrame. It allows selecting rows that satisfy a given condition, making it useful for focused data analysis.

Python

print(df[df['Age'] > 19])

Output

Screenshot-2026-04-11-115804 — Displays rows where Age > 19

Explanation:

df['Age'] > 19 creates condition
df[...] filters rows based on condition

Sorting Data

The .sort_values() method is used to arrange data based on column values. It helps in organizing the dataset in ascending or descending order for better readability and analysis.

Python

print(df.sort_values(by='Age'))

Output

Screenshot-2026-04-11-120001 — Displays dataframe sorted by Age

Explanation: sort_values(by='Age') sorts data based on Age column

Comment

Article Tags:

Explore

Introduction

Creating Objects

Viewing Data

Selection & Slicing

Operations

Manipulating Data

Grouping Data

Merging, Joining, Concatenating and Comparing

Working with Date and Time

Working With Text Data

Working with CSV and Excel files

Visualization

Applications and Projects

Courses