12 IP Pandas DataFrame - Question Bank
12 IP Pandas DataFrame - Question Bank
Som, a data analyst has designed the DataFrame df that contains data about Computer Olympiad
with ‘CO1’, ‘CO2’, ‘CO3’, ‘CO4’, ‘CO5’ as indexes shown below. Answer the following questions:
i. df.shape
The shape attribute of a DataFrame returns a tuple representing its dimensions (number of rows,
number of columns).
DataFrame df has 5 rows and 4 columns, so the output will be (5, 4).
ii. df[2:4]
This statement is using slicing to select rows from index 2 (inclusive) to index 4 (exclusive). So, it will
display rows with indexes 'CO3' and 'CO4'.
B. Write a Python statement to display the data of the Topper column of indexes CO2 to CO4.
df.loc['CO2':'CO4', 'Topper']
This code uses the loc method to select rows with index labels 'CO2' to 'CO4' and the 'Topper' column.
C. Write a Python statement to compute and display the difference of data of Tot_students column and
First_Runnerup column of the above given DataFrame.
This code subtracts the 'First_Runnerup' column from the 'Tot_students' column and displays the
resulting Series with the differences.
[[‘Divya’,’HR’,95000],[‘Mamta’,’Marketing’,97000],[‘Payal’,’IT’,980000], [‘Deepak’,’Sales’,79000]]
# Given list
data = [['Divya', 'HR', 95000],
['Mamta', 'Marketing', 97000],
['Payal', 'IT', 980000],
['Deepak', 'Sales', 79000]]
# Create a DataFrame
df = pd.DataFrame(data, columns=['Name', 'Department', 'Salary'])
This code will create a DataFrame with columns 'Name', 'Department', and 'Salary' and populate it
with the data from the given list.
import pandas as pd
df1 = pd.DataFrame(data)
(i) Write command to compute sum of every column of the data frame.
column_sums = df1.sum()
To compute the sum of every column of the data frame, you can simply use the sum() method on the
DataFrame. This will give you a Series with the sum of each numeric column.
rainfall_mean = df1['RainFall'].mean()
To compute the mean of the 'RainFall' column, you can use the mean() method on that specific column.
maxtemp_median = df1['Maxtemp'].median()
To compute the median of the 'Maxtemp' column, you can use the median() method on that specific
column.
import pandas as pd
Genre = pd.DataFrame(data)
We add a new column 'Num_Copies' to the DataFrame Genre with the specified data.
ii. Add a new genre of type ‘Folk Tale' having code as “FT” and 600 number of copies.
We add a new row for the genre 'Folk Tale' with a code of 'FT' and 600 copies using pd.Series and
append. This operation appends a new row to the DataFrame.
We rename the column 'Code' to 'Book_Code' using the rename method with a dictionary specifying
the column name change.
OR
We rename the column 'Code' to 'Book_Code' without using a dictionary by directly assigning new
column names to the columns attribute of the DataFrame.
import pandas as pd
data = [{'a': 10, 'b': 20},{'a': 6, 'b': 32, 'c': 22}]
#with two column indices, values same as dictionary keys
df1 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b'])
#With two column indices with one index with other name
df2 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b1'])
print(df1)
print(df2)
The provided code creates two Pandas DataFrames (df1 and df2) using the same data dictionary but
with different column names.
Output:
# Output for df1
a b
first 10 20
second 6 32
6 Ekam, a Data Analyst with a multinational brand has designed the DataFrame df that contains the four
quarter’s sales data of different stores as shown below:
a. print(df.size)
The df.size attribute returns the total number of elements in the DataFrame, which is the product of the
number of rows and columns.
There are 3 rows and 5 columns in the DataFrame df, so the output will be 3 * 5 = 15
b. print(df[1:3])
This statement is using slicing to select rows from index 1 (inclusive) to index 3 (exclusive). So, it will
display rows 1 and 2 of the DataFrame df.
To delete the last row from the DataFrame, you can use the drop method with the index of the last row.
df = df.drop(df.index[-1])
iii. Write Python statement to add a new column Total_Sales which is the addition of all the 4 quarter
sales
import pandas as pd
data2 = {'mark1': [30, 20, 20, 50], 'mark2': [20, 25, 30, 30]}
df2 = pd.DataFrame(data2)
Write the commands to do the following operations on the dataframes given above :
(iii) To rename column mark1 as marks1 in both the dataframes df1 and df2.
(iv) To change the index label of df1 from 0 to zero and from 1 to one.
The index labels '0' and '1' in df1 are renamed to 'zero' and 'one', respectively, using the rename
method with the index parameter.
OR
df1.index = ['zero', 'one', 2, 3]
The index labels '0' and '1' in df1 are changed to 'zero' and 'one' without using a dictionary by directly
assigning new index labels to df1.index.
import pandas as pd
Year1={'Q1':5000,'Q2':8000,'Q3':12000,'Q4': 18000}
Year2={'A' :13000,'B':14000,'C':12000}
totSales={1:Year1, 2:Year2}
df=pd.DataFrame(totSales)
print(df)
The index of the DataFrame df consists of the row labels, which are the quarters ('Q1', 'Q2', 'Q3', 'Q4',
'A', 'B', 'C'). So, the index is: ['Q1', 'Q2', 'Q3', 'Q4', 'A', 'B', 'C'].
The column names of the DataFrame df are the keys from the totSales dictionary, which are 1 and 2 in
this case. So, the column names are: [1, 2].
9 Write a Python code to create a DataFrame with appropriate column headings from the list given
below:
[[101,'Gurman',98],[102,'Rajveer',95],[103,'Samar' ,96],[104,'Yuvraj',88]]
import pandas as pd
data = [[101, 'Gurman', 98], [102, 'Rajveer', 95], [103, 'Samar', 96], [104, 'Yuvraj', 88]]
This code will create a DataFrame with the specified column headings 'StudentID', 'Name', and 'Score'
from the given list.
Stock = pd.DataFrame(data)
We add a new column 'Special_Price' to the DataFrame with the specified data. This can be done by
creating a list of special prices and assigning it as a new column to the DataFrame.
ii. Add a new book named ‘The Secret' having price 800.
We create a dictionary new_book_data representing the new row and then use the append method to
add it to the DataFrame Stock. This operation appends a new row to the DataFrame with the specified
data for the 'Name' and 'Price' columns.
We remove the 'Special_Price' column using the drop method with axis=1. This operation drops the
specified column from the DataFrame.
import pandas as pd
Write commands to :
ii. Add a new row with values ( 5 , Mridula ,X, F , 9.8, Science)
12 Write a program in Python Pandas to create the following DataFrame batsman from a Dictionary:
import pandas as pd
df = pd.DataFrame(data)
2) Display the highest score in both Score1 and Score2 of the DataFrame.
hs1 = df['Score1'].max()
hs2 = df['Score2'].max()
print(df)
13 Write a python code to create a dataframe with appropriate headings from the list given below :
[['S101', 'Amy', 70], ['S102', 'Bandhi', 69], ['S104', 'Cathy', 75], ['S105', 'Gundaho', 82]]
import pandas as pd
data = [['S101', 'Amy', 70], ['S102', 'Bandhi', 69], ['S104', 'Cathy', 75], ['S105', 'Gundaho', 82]]
This code will create a DataFrame with the specified column headings 'StudentID', 'Name', and 'Score'
from the given list.
14 Write a small python code to create a dataframe with headings(a and b) from the list given below :
[[1,2],[3,4],[5,6],[7,8]]
import pandas as pd
This code will create a DataFrame with the specified column headings 'a' and 'b' from the given list.
15 Consider the following dataframe, and answer the questions given below:
import pandas as pd
df = pd.DataFrame({“Quarter1":[2000, 4000, 5000, 4400, 10000], "Quarter2":[5800, 2500, 5400,
3000, 2900], "Quarter3":[20000, 16000, 7000, 3600, 8200], "Quarter4":[1400, 3700, 1700, 2000,
6000]})
(ii) Use sum() function to find the sum of all the values over the index axis.
To find the sum of all the values over the index axis (i.e., summing up all the values in the DataFrame),
you can use the sum() function with the axis parameter set to 0.
sum_over_index = df.sum(axis=0)
This will give you a Series containing the sum of values for each column (Quarter1, Quarter2,
Quarter3, and Quarter4).
To find the median of the DataFrame df, you can use the median() function without specifying the axis
parameter.
median_value = df.median()
This will give you a Series containing the median value for each column (Quarter1, Quarter2,
Quarter3, and Quarter4).