Get first N records in Pandas DataFrame
Last Updated :
29 Nov, 2024
When working with large datasets in Python using the Pandas library, it is often necessary to extract a specific number of records from a column to analyze or process the data, such as the first 10 values from a column. For instance, if you have a DataFrame df
with column A
, you can quickly get first 10 values using df['A'].head(10)
.
1. Using head()
Method to Get the First n values
head()
method is one of the simplest ways to retrieve the first few rows of a DataFrame or a specific column. By default, it returns the first five rows, but you can specify any number by passing it as an argument. This method is significant because it provides a quick and efficient way to preview data without loading the entire dataset into memory.
Let us see how to fetch the first n records of a Pandas DataFrame, we will fetch first 10 rows:
Python
import pandas as pd
dict = {'Name' : ['Sumit Tyagi', 'Sukritin','Akriti Goel', 'Sanskriti','Abhishek Jain'],'Age':[22, 20, 45, 21, 22],'Marks':[90, 84, 33, 87, 82]}
df = pd.DataFrame(dict)
print(df)
# Getting first 3 rows from the DataFrame
df_first_3 = df.head(3)
print(df_first_3)
Output:
First n records of a Pandas DataFrame2. Using iloc
for for Positional Selection
The iloc
method allows you to select data by index positions, which is particularly useful when you need precise control over which rows to extract. This method is significant because it supports slicing and indexing similar to Python lists, making it intuitive for users familiar with Python's native data structures.
Python
import pandas as pd
dict = {'Name' : ['Sumit Tyagi', 'Sukritin','Akriti Goel', 'Sanskriti','Abhishek Jain'],'Age':[22, 20, 45, 21, 22],'Marks':[90, 84, 33, 87, 82]}
df = pd.DataFrame(dict)
# Get first 10 values using iloc
first_3_values = df.iloc[:3, df.columns.get_loc('Name')]
print(first_3_values)
Output0 Sumit Tyagi
1 Sukritin
2 Akriti Goel
Name: Name, dtype: object
3. loc
for Label-Based Selection
The loc
method is used for selecting rows and columns by labels. Although it's more commonly used for row selection, it can be adapted for columns by specifying the column name and slicing the rows. This method is significant because it allows for more readable and descriptive code, especially when working with labeled indices.
Python
import pandas as pd
dict = {'Name' : ['Sumit Tyagi', 'Sukritin','Akriti Goel', 'Sanskriti','Abhishek Jain'],'Age':[22, 20, 45, 21, 22],'Marks':[90, 84, 33, 87, 82]}
df = pd.DataFrame(dict)
# Get first 10 values using iloc
first_3_values = df.iloc[:3, df.columns.get_loc('Marks')]
print(first_3_values)
Output0 90
1 84
2 33
Name: Marks, dtype: int64
We can also fetch, first n records of of "specific columns". For example:
Python
import pandas as pd
dict = {'Name' : ['Sumit Tyagi', 'Sukritin','Akriti Goel', 'Sanskriti','Abhishek Jain'],'Age':[22, 20, 45, 21, 22],'Marks':[90, 84, 33, 87, 82]}
df = pd.DataFrame(dict)
# Getting first 2 rows of columns Age and Marks from df
df_first_2 = df[['Age', 'Marks']].head(2)
print(df_first_2)
Output Age Marks
0 22 90
1 20 84
4. Getting first n records with Slice Operator Directly
Using the slice operator ([:]
) is one of the simplest ways to retrieve the first n records from a Pandas column or DataFrame. The slice operator allows you to select specific rows or ranges efficiently and intuitively. The slice operator is a Python-native technique, making it highly intuitive for those familiar with basic list slicing.
Python
import pandas as pd
dict = {'Name' : ['Sumit Tyagi', 'Sukritin','Akriti Goel', 'Sanskriti','Abhishek Jain'],'Age':[22, 20, 45, 21, 22],'Marks':[90, 84, 33, 87, 82]}
df = pd.DataFrame(dict)
# Getting first 2 rows of columns Age and Marks from df
df_first_2 = df[:2]
print(df_first_2)
Output Name Age Marks
0 Sumit Tyagi 22 90
1 Sukritin 20 84
Choosing the Right Method for Extracting Data in Pandas
Method | When to Use | When Not to Use | Why Not to Use |
---|
head() Method | Ideal for quickly inspecting the first few rows of a DataFrame or Series. Useful for verifying if the data has been loaded correctly. | Avoid when complex slicing is needed or when data from specific positions not at the beginning is required. | Limiting for detailed dataset views and lacks filtering capabilities. |
---|
iloc | Use when precise control over row positions with integer indexing is needed. Beneficial for extracting continuous subsets from any part of the DataFrame. | Not suitable for label-based selection or conditional filtering. | Requires knowledge of exact index positions, which may not be practical in dynamic datasets. |
---|
Direct Slice Operator on DataFrame | Useful for quickly extracting a range of rows from the start without specifying column indices. Straightforward with familiar Python list slicing syntax. | Avoid if specific columns are needed or if dealing with complex multi-indexing. | Lacks flexibility for non-continuous or condition-based selections, limiting its use in comprehensive data analysis scenarios. |
---|
loc Method | Ideal for selecting rows and columns by labels, offering more readable and descriptive code. Useful when working with labeled indices. | Not suitable for integer-based indexing or when precise index positions are required without labels. | May be less efficient if labels are not well-defined or if the dataset lacks meaningful index labels. |
---|
Similar Reads
Get last n records of a Pandas DataFrame
Let's discuss how to get last n records of a Pandas DAtaframe. There can be various methods to get the last n records of a Pandas DataFrame. Lets first make a dataframe:Example: Python3 # Import Required Libraries import pandas as pd import numpy as np # Create a dictionary for the dataframe dict =
2 min read
How to Get First Row of Pandas DataFrame?
To get the first row of a Pandas Dataframe there are several methods available, each with its own advantages depending on the situation. The most common methods include using .iloc[], .head(), and .loc[]. Let's understand with this example:Pythonimport pandas as pd data = {'Name': ['Alice', 'Bob', '
4 min read
Get First and Second Largest Values in Pandas DataFrame
When analyzing data in Python using the pandas library, you may encounter situations where you need to find the highest and second-highest values in a DataFrame's columns. This task can be crucial in various contexts, such as ranking, filtering top performers, or performing threshold-based analysis.
4 min read
How to Get First Column of Pandas DataFrame?
In this article, we will discuss how to get the first column of the pandas dataframe in Python programming language. Method 1: Using iloc[] function This function is used to get the first column using slice operator. for the rows we extract all of them, for columns specify the index for first column
4 min read
How to get nth row in a Pandas DataFrame?
Pandas Dataframes are basically table format data that comprises rows and columns. Now for accessing the rows from large datasets, we have different methods like iloc, loc and values in Pandas. The most commonly used method is iloc(). Let us consider a simple example.Method 1. Using iloc() to access
4 min read
Remove last n rows of a Pandas DataFrame
Let's see the various methods to Remove last n rows of a Pandas Dataframe.First, let's make a dataframe: Python3 # Import Required Libraries import pandas as pd # Create a dictionary for the dataframe dict = { 'Name': ['Sukritin', 'Sumit Tyagi', 'Akriti Goel', 'Sanskriti', 'Abhishek Jain'], 'Age': [
3 min read
Reset Index in Pandas Dataframe
Letâs discuss how to reset the index in Pandas DataFrame. Often We start with a huge data frame in Pandas and after manipulating/filtering the data frame, we end up with a much smaller data frame. When we look at the smaller data frame, it might still carry the row index of the original data frame.
6 min read
Get a specific row in a given Pandas DataFrame
In the Pandas Dataframe, we can find the specified row value with the function iloc(). In this function, we pass the row number as a parameter. The core idea behind this is simple: you access the rows by using their index or position. In this article, we'll explore different ways to get a row from a
5 min read
Filter Pandas DataFrame Based on Index
In this article, we are going to see how to filter Pandas Dataframe based on index. We can filter Dataframe based on indexes with the help of filter(). This method is used to Subset rows or columns of the Dataframe according to labels in the specified index. We can use the below syntax to filter Dat
3 min read
Pandas dataframe.sort_index()
Pandas is one of those packages and makes importing and analyzing data much easier. When working with DataFrames, Pandas is used for handling tabular data. Let's learn Pandas DataFrame sort_index() method, which is used to sort the DataFrame based on index or column labels.Pandas sort_index() functi
3 min read