Pandas Interview Questions

Last Updated : 27 Sep, 2025

Pandas is an open-source Python library used for data manipulation and analysis. In interviews, questions on Pandas are often asked to assess your ability to work with structured data effectively. Below are some of the most frequently asked interview questions and answers covering key Pandas topics.

1. List Key Features of Pandas.

Pandas are used for efficient data analysis. The key features of Pandas are as follows:

  • Fast and efficient data manipulation and analysis
  • Provides time-series functionality
  • We can easily handle missing data
  • Faster data merging and joining
  • Flexible reshaping of data sets
  • Group by functionality
  • Data from different file objects can be loaded
  • Integrates with NumPy

2. What are the Different Types of Data Structures in Pandas?

The two data structures that are supported by Pandas are Series and DataFrames.

  • Series is a one-dimensional labelled array that can hold data of any type. It is mostly used to represent a single column or row of data.
  • DataFrame is a two-dimensional heterogeneous data structure. It stores data in a tabular form. Its three main components are data, rows and columns.

3. What is Series in Pandas?

A Series in Pandas is a one-dimensional labelled array. Its columns are like an Excel sheet that can hold any type of data like integer, string, Python objects, etc. Its axis labels are known as the index. Series contains homogeneous data and its values can be changed but the size of the series is immutable. A series can be created from a Python tuple, list and dictionary. The syntax for creating a series is as follows:

Python
import pandas as pd
series = pd.Series(data)

4. What are the Different Ways to Create a Series?

In Pandas, a series can be created in many ways. They are as follows:

1. Creating a Series from a List

We can create a series using a Python list and pass it to the Series() constructor.

Python
import pandas as pd
list1 = ['g', 'e', 'e', 'k', 's']

print(pd.Series(list1))
  

Output:

0 g
1 e
2 e
3 k
4 s
dtype: object

2. Creating a Series from Dictionary

A Series can also be created from a Python dictionary. The keys of the dictionary as used to construct indexes of the series.

Python
import pandas as pd

dict = {'Geeks': 10,'for': 20, 'geeks': 30}

print(pd.Series(dict))
  

Output:

Geeks 10
for 20
geeks 30
dtype: int64

3. Creating a Series from Scalar Value

To create a series from a Scalar value, we must provide an index. The Series constructor will take two arguments, one will be the scalar value and the other will be a list of indexes. The value will repeat until all the index values are filled.

Python
import pandas as pd
import numpy as np

ser = pd.Series(10, index=[0, 1, 2, 3, 4, 5])

print(ser)
  

Output:

0 10
1 10
2 10
3 10
4 10
5 10
dtype: int64

4. Creating a Series using NumPy Functions

The Numpy module's functions, such as numpy.linspace() and numpy.random.randn() can also be used to create a Pandas series.

Python
import pandas as pd
import numpy as np

ser1 = pd.Series(np.linspace(3, 33, 3))
print(ser1)

ser2 = pd.Series(np.random.randn(3))
print("\n", ser2)

Output:

0 3.0
1 18.0
2 33.0
dtype: float64
0 -0.341027
1 -1.700664
2 0.364409
dtype: float64

5. Creating a Series using List Comprehension

Here, we will use the Python list comprehension technique to create a series in Pandas. We will use the range function to define the values and a for loop for indexes.

Python
import pandas as pd
ser = pd.Series(range(1, 20, 3),
index=[x for x in 'abcdefg'])
print(ser)

Output:

a 1
b 4
c 7
d 10
e 13
f 16
g 19
dtype: int64

5. What is a DataFrame in Pandas?

A DataFrame in Panda is a data structure used to store the data in tabular form, that is in the form of rows and columns. It is two-dimensional, size-mutable and heterogeneous in nature. The main components of a dataframe are data, rows and columns. A dataframe can be created by loading the dataset from existing storage such as SL database, CSV file, Excel file, etc. The syntax for creating a dataframe is as follows:

Python
import pandas as pd
dataframe = pd.DataFrame(data)

6. What are the Different ways to Create a DataFrame in Pandas?

In Pandas, a dataframe can be created in many ways. They are as follows:

1. Creating a DataFrame using a List

In order to create a DataFrame from a Python list, just pass the list to the DataFrame() constructor.

Python
import pandas as pd

lst = ['Geeks', 'For', 'Geeks', 'is',
'portal', 'for', 'Geeks']

print(pd.DataFrame(lst))

Output:

0
0 Geeks
1 For
2 Geeks
3 is
4 portal
5 for
6 Geeks

2. Creating a DataFrame using a Dictionary

A DataFrame can be created from a Python dictionary and passed to the DataFrame() constructor. The Keys of the dictionary will be the column names and the values of the dictionary are the data of the DataFrame.

Python
import pandas as pd

data = {'Name':['Tom', 'nick', 'krish', 'jack'], 'Age':[20, 21, 19, 18]}

print(pd.DataFrame(data))

Output:

Name Age
0 Tom 20
1 nick 21
2 krish 19
3 jack 18

3. Creating a DataFrame using a List of Dictionaries

Another way to create a DataFrame is by using Python list of dictionaries. The list is passed to the DataFrame() constructor. The Keys of each dictionary element will be the column names.

Python
import pandas as pd

lst = [{1: 'Geeks', 2: 'For', 3: 'Geeks'},
{1: 'Portal', 2: 'for', 3: 'Geeks'}]

print(pd.DataFrame(lst))

Output:

1 2 3
0 Geeks For Geeks
1 Portal for Geeks

4. Creating a DataFrame from Pandas Series

A DataFrame in Pandas can also be created by using the Pandas series.

Python
import pandas as pd

lst = pd.Series(['Geeks', 'For', 'Geeks'])

print(pd.DataFrame(lst))

Output:

0
0 Geeks
1 For
2 Geeks

7. How to Read Data into a DataFrame from a CSV file?

We can create a data frame from a CSV (Comma Separated Values) file. This can be done by using the read_csv() method which takes the csv file as the parameter.

Python
pandas.read_csv(file_name)

Another way to do this is by using the read_table() method which takes the CSV file and a delimiter value as the parameter.

Python
pandas.read_table(file_name, delimiter)

8. How can a DataFrame be Converted to an Excel File?

A Pandas dataframe can be converted to an Excel file by using the to_excel() function which takes the file name as the parameter. We can also specify the sheet name in this function.

Python
DataFrame.to_excel(file_name)

9. How to Convert a DataFrame into a Numpy Array?

Pandas Numpy is an inbuilt Python package that is used to perform large numerical computations. It is used for processing multidimensional array elements to perform complicated mathematical operations.

Pandas dataframe can be converted to a NumPy array by using the to_numpy() method. We can also provide the datatype as an optional argument.

Python
Dataframe.to_numpy()

We can also use .values to convert dataframe values to NumPy array

Python
df.values

10. How to access the first few rows of a dataframe?

The first few records of a dataframe can be accessed by using the pandas head() method. It takes one optional argument n, which is the number of rows. By default, it returns the first 5 rows of the dataframe. The head() method has the following syntax:

Python
df.head(n)

Another way to do it is by using iloc() method. It is similar to the Python list-slicing technique. It has the following syntax:

Python
df.iloc[:n]

11. How to Select a Single Column of a DataFrame?

There are many ways to Select a single column of a dataframe. They are as follows:

By using the Dot operator, we can access any column of a dataframe.

Python
Dataframe.column_name

Another way to select a column is by using the square brackets [].

Python
DataFrame[column_name]

12. How to Rename a Column in a DataFrame?

A column of the dataframe can be renamed by using the rename() function. We can rename a single as well as multiple columns at the same time using this method.

Python
DataFrame.rename(columns={'column1': 'COLUMN_1', 'column2':'COLUMN_2'}, inplace=True)

Another way is by using the set_axis() function which takes the new column name and axis to be replaced with the new name.

Python
DataFrame.set_axis(labels=['COLUMN_1','COLUMN_2'], axis=1, inplace=True)

In case we want to add a prefix or suffix to the column names, we can use the add_prefix() or add_suffix() methods.

Python
DataFrame.add_prefix(prefix='PREFIX_')
DataFrame.add_suffix(suffix='_suffix')

13. How to add Row or Column to an Existing Dataframe?

1. Adding Rows

The df.loc[] is used to access a group of rows or columns and can be used to add a row to a dataframe.

Python
DataFrame.loc[Row_Index]=new_row

We can also add multiple rows in a dataframe by using pandas.concat() function which takes a list of dataframes to be added together.

Python
pandas.concat([Dataframe1,Dataframe2])

2. Adding Columns

We can add a column to an existing dataframe by just declaring the column name and the list or dictionary of values.

Python
DataFrame[data] = list_of_values

Another way to add a column is by using df.insert() method which take a value where the column should be added, column name and the value of the column as parameters.

Python
DataFrameName.insert(col_index, col_name, value)

We can also add a column to a dataframe by using df.assign() function

Python
DataFrame.assign(**kwargs)

14. How to Delete an Row or Column from an Existing DataFrame?

We can delete a row or a column from a dataframe by using df.drop() method. and provide the row or column name as the parameter.

1. To delete a column

Python
DataFrame.drop(['Column_Name'], axis=1)

2. To delete a row

Python
DataFrame.drop([Row_Index_Number], axis=0)

15. How to Merge Two DataFrames?

In pandas, we can combine two dataframes using the pandas.merge() method which takes 2 dataframes as the parameters.

Python
import pandas as pd

df1 = pd.DataFrame({'A': [1, 2, 3],
'B': [4, 5, 6]},
index=[10, 20, 30])

df2 = pd.DataFrame({'C': [7, 8, 9],
'D': [10, 11, 12]},
index=[20, 30, 40])

result = pd.merge(df1, df2, left_index=True, right_index=True)
print(result)

Output:

A B C D
20 2 5 7 10
30 3 6 8 11

16. How to Sort a Dataframe?

A dataframe in pandas can be sorted in ascending or descending order according to a particular column. We can do so by using the sort_values() method and providing the column name according to which we want to sort the dataframe. We can also sort it by multiple columns.

To sort it in descending order, we pass an additional parameter 'ascending' and set it to False.

Python
DataFrame.sort_values(by='Age',ascending=True)

17. How to Compute Mean, Median, Mode, Variance, Standard Deviation and Various Quantile Ranges in Pandas?

The mean, median, mode, Variance, Standard Deviation and Quantile range can be computed using the following commands in Python.

  • DataFrame.mean(): To calculate the mean
  • DataFrame.median(): To calculate median
  • DataFrame.mode(): To calculate the mode
  • DataFrame.var(): To calculate variance
  • DataFrame.std(): To calculate the standard deviation
  • DataFrame.quantile(): To calculate quantile range, with range value as a parameter

18. Difference between Shallow Copy and Deep Copy?

In Pandas, there are two ways to create a copy of the Series. They are as follows:

1. Shallow Copy is a copy of the series object where the indices and the data of the original object are not copied. It only copies the references to the indices and data. This means any changes made to a series will be reflected in the other. A shallow copy of the series can be created by writing the following syntax:

Python
ser.copy(deep=False)

2. Deep Copy is a copy of the series object where it has its own indices and data. This means changes made to a copy of the object will not be reflected to the original series object. A deep copy of the series can be created by writing the following syntax:

Python
ser.copy(deep=True)

The default value of the deep parameter of the copy() function is set to True.

19. How to Check and Remove Duplicate Values in Pandas.

In pandas, duplicate values can be checked by using the duplicated() method.

Python
DataFrame.duplicated()

To remove the duplicated values we can use the drop_duplicates() method.

Python
DataFrame.drop_duplicates()

20. How to Handle Missing Data in Pandas?

Generally dataset has some missing values and it can happen for a variety of reasons such as data collection issues, data entry errors or data not being available for certain observations. This can cause a big problem. To handle these missing values Pandas provides various functions.

These functions are used for detecting, removing and replacing null values in Pandas DataFrame:

  • isnull(): It returns True for NaN values or null values and False for present values
  • notnull(): It returns False for NaN values and True for present values
  • dropna(): It analyzes and drops Rows/Columns with Null values
  • fillna(): It let the user replace NaN values with some value of their own
  • replace(): It is used to replace a string, regex, list, dictionary, series, number, etc.
  • interpolate(): It fills NA values in the dataframe or series.

21. Difference between the interpolate() and fillna()

The interpolate() and fillna() methods in pandas are used to handle missing or NaN (Not a Number) values in a DataFrame or Series. The following table shows the difference between interpolate() and fillna():

Featureinterpolate()fillna()
PurposeEstimates and fills missing values using interpolation techniquesFills missing values using a specified constant or computed value
How it worksCalculates missing values based on existing surrounding dataDirectly replaces NaN with given value(s) or strategies
Common methods supportedlinear, polynomial, time, spline, etc.0, mean(), median(), mode(), forward-fill (ffill), back-fill (bfill) etc.
Data types supportedMainly numeric and datetime data (where logical continuity exists)Numeric, categorical and datetime data
Use caseUsed when missing values depend on surrounding trends or time sequenceUsed when missing values can be filled with a fixed or computed known value
Return typeReturns a DataFrame/Series with estimated values replacing NaNReturns a DataFrame/Series with specified values replacing NaN
Exampledf['col'].interpolate(method='linear')df['col'].fillna(df['col'].mean())

22. Difference between map(), applymap() and apply()

The map(), applymap() and apply() methods are used in pandas for applying functions or transformations to elements in a DataFrame or Series. The following table shows the difference between map(), applymap() and apply():

Featuremap()applymap()apply()
Defined onSeries onlyDataFrame onlyBoth Series and DataFrame
Works onEach element of a SeriesEach element of a DataFrameEntire row/column (or whole Series)
Axis supportNo axis parameterNo axis parameterHas axis parameter (axis=0 for columns, axis=1 for rows)
Function application levelElement-wiseElement-wiseRow-wise or Column-wise (can also be element-wise on Series)
Typical useApply a function/dict to each element of a SeriesApply a function to each element of a DataFrameApply a function across rows/columns or to a whole Series
Example use caseConvert each name in a Series to uppercaseSquare each element in a numeric DataFrameCalculate sum/mean of each row or column
Return typeSeriesDataFrameSeries (if applied on DataFrame rows/columns) or scalar (if aggregated)

23. How to Set and Reset the Index in a Panda dataFrame?

1. Set Index: We can set the index to a Pandas dataframe by using the set_index() method, which is used to set a list, series or dataframe as the index of a dataframe.

Python
DataFrame.set_index('Column_Name')

2. Reset Index: The index of Pandas dataframes can be reset by using the reset_index() method. It can be used to simply reset the index to the default integer index beginning at 0.

Python
DataFrame.reset_index(inplace = True)

24. What is Reindexing in Pandas?

Reindexing in Pandas as the name suggests means changing the index of the rows and columns of a dataframe. It can be done by using the Pandas reindex() method. In case of missing values or new values that are not present in the dataframe, the reindex() method assigns it as NaN.

Python
df.reindex(new_index)

25. What is Multi-Indexing in Pandas?

Multi-indexing refers to selecting two or more rows or columns in the index. It is a multi-level or hierarchical object for pandas object and deals with data analysis and works with higher dimensional data. Multi-indexing in Pandas can be achieved by using a number of functions such as:

  • MultiIndex.from_arrays
  • MultiIndex.from_tuples
  • MultiIndex.from_product
  • MultiIndex.from_frame

26. What is the difference between loc and iloc in Pandas?

1. loc: It is label-based i.e you access rows and columns using their labels (row and column names).

Python
df.loc[row_labels, column_labels]

2. iloc: It is integer-position based and here you access rows and columns using their numeric index positions (row and column numbers).

Python
df.iloc[row_positions, column_positions]

27. What is the Significance of Pandas Described Command?

Pandas describe() is used to view some basic statistical details of a data frame or a series of numeric values. It can give a different output when it is applied to a series of strings. It can get details like percentile, mean, standard deviation, etc.

Python
DataFrame.describe()

28. How to Find the Correlation Using Pandas?

Pandas dataframe.corr() method is used to find the correlation of all the columns of a dataframe. It automatically ignores any missing or non-numerical values.

Python
DataFrame.corr()

29. What is groupby() in Pandas and how is it used?

The groupby() function in Pandas is used to split the data into groups based on one or more columns, then apply an operation (like aggregation, transformation or filtering) on each group separately.

Python
df.groupby(by_column)

For example:

Python
import pandas as pd

data = {'Dept': ['IT', 'IT', 'HR', 'HR'],
        'Salary': [50000, 60000, 45000, 55000]}
df = pd.DataFrame(data)

result = df.groupby('Dept')['Salary'].mean()
print(result)

Output:

Dept
HR 50000.0
IT 55000.0
Name: Salary, dtype: float64

30. How can we use Pivot table in Pandas?

In Pandas, pivot_table() is used to summarize and reshape data into a tabular format. It allows you to aggregate values like sum, mean, count, etc by specifying which columns become rows (index), which become columns and which contain the values to aggregate.

We can pivot the dataframe in Pandas by using the pivot_table() method. To unpivot the dataframe to its original form we can melt the dataframe by using the melt() method.

31. What is the difference between pivot_table() and groupby()

Both pivot_table() and groupby() are useful methods in pandas used for aggregating and summarizing data. The following table shows the difference between pivot_table() and groupby():

Featurepivot_table()groupby()
PurposeSummarizes and aggregates data in a tabular (pivoted) formatPerforms aggregation on grouped data of one or more columns
ReshapingUsed to reshape data based on column valuesUsed to group data based on categorical variables
Output structureReturns a new reshaped DataFrameReturns a GroupBy object which must be followed by aggregation functions
Multi-level groupingCan handle multiple levels of grouping using index and columns parametersCan handle multiple levels of grouping using multiple column names in groupby()
Comparison across dimensionsUsed when we want to compare data across multiple dimensionsUsed to summarize data within groups
Typical use caseSummarizing data with one axis as rows and another as columnsGrouping by one or more columns and then applying aggregation

32. What is Data Aggregation in Pandas?

In Pandas, data aggregation refers to the act of summarizing or decreasing data in order to produce a statistical summary of one or more columns in a dataset. In order to calculate statistical measures like sum, mean, minimum, maximum, count, etc aggregation functions must be applied to groups or subsets of data.

The agg() function in Pandas is frequently used to aggregate data. Applying one or more aggregation functions to one or more columns in a DataFrame or Series is possible using this approach. Pandas' built-in functions or specially created user-defined functions can be used as aggregation functions.

Python
DataFrame.agg({'Col_name1': ['sum', 'min', 'max'], 'Col_name2': 'count'})

33. Difference between join(), merge() and concat()

The following table shows the difference between join(), merge() and concat():

Featurejoin()merge()concat()
PurposeCombines two DataFrames on their index or on a key column.Combines DataFrames using common columns or indices (like SQL joins).Combines DataFrames along rows or columns.
Default onJoins on index by default.Joins on common columns by default.Just stacks DataFrames without joining keys.
Join typesleft, right, inner, outerleft, right, inner, outerNot applicable (simply concatenates).
Axis supportAlways horizontal (columns)Always horizontal (columns)Can be vertical (rows) or horizontal (columns) using axis.
Typical useCombine DataFrames by their index labels.Combine DataFrames based on matching column values.Stack multiple DataFrames into one.

34. What is Time Series in Pandas?

Time series is a collection of data points with timestamps. It depicts the evolution of quantity over time. Pandas provide various functions to handle time series data efficiently. It is used to work with data timestamps, resampling time series for different time periods, working with missing data, slicing the data using timestamps, etc.

We have various time-series function in pandas like:

Pandas Built-in Function

Operation

pandas.to_datetime(DataFrame['Date'])

Convert 'Date' column of DataFrame to datetime dtype

DataFrame.set_index('Date', inplace=True)

Set 'Date' as the index

DataFrame.resample('H').sum()

Resample time series to a different frequency (e.g., Hourly, daily, weekly, monthly etc)

DataFrame.interpolate()

Fill missing values using linear interpolation

DataFrame.loc[start_date:end_date]

Slice the data based on timestamps

35. How to convert a String to Datetime in Pandas?

A Python string can be converted to a DateTime object by using:

1. Pandas.to_datetime()

Python
import pandas as pd


date_string = '2023-07-17'
dateTime = pd.to_datetime(date_string)
print(dateTime)

Output:

2023-07-17 00:00:00

2. datetime.strptime

Python
from datetime import datetime

date_string = '2023-07-17'
dateTime = datetime.strptime(date_string, '%Y-%m-%d')
print(dateTime)

Output:

2023-07-17 00:00:00

36. What is Time Delta in Pandas?

The time delta is the difference in dates and time. It indicates the duration or difference in time. The time delta object can be created by using the timedelta() method and providing the number of weeks, days, seconds, milliseconds, etc as the parameter.

Python
Duration = pandas.Timedelta(days=7, hours=4, minutes=30, seconds=23)

With the help of the Timedelta data type, you can easily perform arithmetic operations, comparisons and other time-related manipulations. In terms of different units, such as days, hours, minutes, seconds, milliseconds and microseconds.

37. How to make Label Encoding using Pandas?

Label encoding is used to convert categorical data into numerical data so that a machine-learning model can fit it. To apply label encoding using pandas we can use:

1. pandas.Categorical().codes: It only gives codes.

  • pd.Categorical() converts the data into a Categorical type.
  • .codes gives the integer code for each category.
Python
import pandas as pd

data = pd.Series(['Red', 'Blue', 'Green', 'Blue'])
encoded = pd.Categorical(data).codes
print(encoded)

Output:

[2 0 1 0]

2. pandas.factorize(): It gives both codes and unique labels.

  • factorize() returns a tuple: (encoded_array, unique_categories).
  • The first array gives integer codes and the second gives the mapping of categories.
  • It automatically detects unique values and assigns codes in their first-seen order.
Python
import pandas as pd

data = pd.Series(['Red', 'Blue', 'Green', 'Blue'])
encoded, uniques = pd.factorize(data)
print(encoded)

Output:

[0 1 2 1]

38. How to make Onehot Encoding using Pandas?

One-hot encoding is a technique for representing categorical data as numerical values in a machine-learning model. It works by creating a separate binary variable for each category in the data. The value of the binary variable is 1 if the observation belongs to that category and 0 otherwise. It can improve the performance of the model.

To apply one hot encoding, we greater a dummy column for our dataframe by using get_dummies() method.

Python
pd.get_dummies(data, columns=None)

For example:

Python
import pandas as pd

df = pd.DataFrame({'Color': ['Red', 'Blue', 'Green']})

encoded = pd.get_dummies(df, columns=['Color'])
print(encoded)

Output:

Color_Blue Color_Green Color_Red
0 0 0 1
1 1 0 0
2 0 1 0

Comment

Explore