0% found this document useful (0 votes)
6 views

12 IP Pandas DataFrame - Question Bank

The document provides a comprehensive guide on various operations and manipulations using Pandas DataFrames in Python. It includes examples of creating DataFrames, performing calculations, adding and renaming columns, and manipulating data such as slicing and summing. Additionally, it covers specific tasks like appending new rows, deleting rows, and displaying data in a structured format.

Uploaded by

predinoontheway
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

12 IP Pandas DataFrame - Question Bank

The document provides a comprehensive guide on various operations and manipulations using Pandas DataFrames in Python. It includes examples of creating DataFrames, performing calculations, adding and renaming columns, and manipulating data such as slicing and summing. Additionally, it covers specific tasks like appending new rows, deleting rows, and displaying data in a structured format.

Uploaded by

predinoontheway
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

1 Mr.

Som, a data analyst has designed the DataFrame df that contains data about Computer Olympiad
with ‘CO1’, ‘CO2’, ‘CO3’, ‘CO4’, ‘CO5’ as indexes shown below. Answer the following questions:

A. Predict the output of the following python statement:

i. df.shape

The shape attribute of a DataFrame returns a tuple representing its dimensions (number of rows,
number of columns).

DataFrame df has 5 rows and 4 columns, so the output will be (5, 4).

ii. df[2:4]

This statement is using slicing to select rows from index 2 (inclusive) to index 4 (exclusive). So, it will
display rows with indexes 'CO3' and 'CO4'.

School Tot_Students Topper First_Runnerup


CO3 GPS 20 18 2
CO4 MPS 18 10 8

B. Write a Python statement to display the data of the Topper column of indexes CO2 to CO4.

df.loc['CO2':'CO4', 'Topper']

This code uses the loc method to select rows with index labels 'CO2' to 'CO4' and the 'Topper' column.

C. Write a Python statement to compute and display the difference of data of Tot_students column and
First_Runnerup column of the above given DataFrame.

Diff = df['Tot_students'] - df['First_Runnerup']


print(Diff)

This code subtracts the 'First_Runnerup' column from the 'Tot_students' column and displays the
resulting Series with the differences.

2 Create a DataFrame in Python from the given list:

[[‘Divya’,’HR’,95000],[‘Mamta’,’Marketing’,97000],[‘Payal’,’IT’,980000], [‘Deepak’,’Sales’,79000]]

Also give appropriate column headings as shown below:


import pandas as pd

# Given list
data = [['Divya', 'HR', 95000],
['Mamta', 'Marketing', 97000],
['Payal', 'IT', 980000],
['Deepak', 'Sales', 79000]]

# Create a DataFrame
df = pd.DataFrame(data, columns=['Name', 'Department', 'Salary'])

# Print the DataFrame


print(df)

This code will create a DataFrame with columns 'Name', 'Department', and 'Salary' and populate it
with the data from the given list.

3 Given a data frame df1 as shown below:

import pandas as pd

# Create the DataFrame df1


data = {'City': ['Delhi', 'Bengaluru', 'Chennai', 'Mumbai', 'Kolkata'],
'Maxtemp': [40, 31, 35, 29, 39],
'MinTemp': [32, 25, 27, 21, 23],
'RainFall': [24.1, 36.2, 40.8, 35.2, 41.8]}

df1 = pd.DataFrame(data)

(i) Write command to compute sum of every column of the data frame.

column_sums = df1.sum()

To compute the sum of every column of the data frame, you can simply use the sum() method on the
DataFrame. This will give you a Series with the sum of each numeric column.

(ii) Write command to compute mean of column Rainfall.

rainfall_mean = df1['RainFall'].mean()

To compute the mean of the 'RainFall' column, you can use the mean() method on that specific column.

(iii) Write command to compute Median of the Maxtemp Column.

maxtemp_median = df1['Maxtemp'].median()

To compute the median of the 'Maxtemp' column, you can use the median() method on that specific
column.

4 Consider the given DataFrame ‘Genre’:

import pandas as pd

# Create the initial DataFrame


data = {'Type': ['Fiction', 'Non Fiction', 'Drama', 'Poetry'],
'Code': ['F', 'NF', 'D', 'P']}

Genre = pd.DataFrame(data)

Write suitable Python statements for the following:

i. Add a column called Num_Copies with the following data:


[300,290,450,760].

Genre['Num_Copies'] = [300, 290, 450, 760]

We add a new column 'Num_Copies' to the DataFrame Genre with the specified data.

ii. Add a new genre of type ‘Folk Tale' having code as “FT” and 600 number of copies.

new_genre = {'Type': 'Folk Tale', 'Code': 'FT', 'Num_Copies': 600}


Genre = Genre.append(new_genre)

We add a new row for the genre 'Folk Tale' with a code of 'FT' and 600 copies using pd.Series and
append. This operation appends a new row to the DataFrame.

iii. Rename the column ‘Code’ to ‘Book_Code’.

Genre = Genre.rename(columns={'Code': 'Book_Code'})

We rename the column 'Code' to 'Book_Code' using the rename method with a dictionary specifying
the column name change.

OR

Genre.columns = ['Type', 'Book_Code', 'Num_Copies']

We rename the column 'Code' to 'Book_Code' without using a dictionary by directly assigning new
column names to the columns attribute of the DataFrame.

5 Find the output of the following code:

import pandas as pd
data = [{'a': 10, 'b': 20},{'a': 6, 'b': 32, 'c': 22}]
#with two column indices, values same as dictionary keys
df1 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b'])
#With two column indices with one index with other name
df2 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b1'])
print(df1)
print(df2)

The provided code creates two Pandas DataFrames (df1 and df2) using the same data dictionary but
with different column names.

Output:
# Output for df1
a b
first 10 20
second 6 32

# Output for df2


a b1
first 10 20
second 6 32

6 Ekam, a Data Analyst with a multinational brand has designed the DataFrame df that contains the four
quarter’s sales data of different stores as shown below:

Answer the following questions:

i. Predict the output of the following python statement:

a. print(df.size)

The df.size attribute returns the total number of elements in the DataFrame, which is the product of the
number of rows and columns.

There are 3 rows and 5 columns in the DataFrame df, so the output will be 3 * 5 = 15

b. print(df[1:3])

This statement is using slicing to select rows from index 1 (inclusive) to index 3 (exclusive). So, it will
display rows 1 and 2 of the DataFrame df.

Store2 350 340 403 210


Store3 250 180 145 160

ii. Delete the last row from the DataFrame.

To delete the last row from the DataFrame, you can use the drop method with the index of the last row.

df = df.drop(df.index[-1])

iii. Write Python statement to add a new column Total_Sales which is the addition of all the 4 quarter
sales

df['Total_Sales'] = df['Qtr1'] + df['Qtr2'] + df['Qtr3'] + df['Qtr4']


This code calculates the sum of the columns 'Qtr1', 'Qtr2', 'Qtr3', and 'Qtr4' for each row and assigns
the result to a new column 'Total_Sales'.

7 Write the code in pandas to create the following dataframes :

import pandas as pd

# Create df1 and df2


data1 = {'mark1': [10, 40, 15, 40], 'mark2': [15, 45, 30, 70]}
df1 = pd.DataFrame(data1)

data2 = {'mark1': [30, 20, 20, 50], 'mark2': [20, 25, 30, 30]}
df2 = pd.DataFrame(data2)

Write the commands to do the following operations on the dataframes given above :

(i) To add dataframes df1 and df2.

result_add = df1 + df2

The addition of df1 and df2 is calculated and stored in result_add.

(ii) To subtract df2 from df1

result_subtract = df1 - df2

The subtraction of df2 from df1 is calculated and stored in result_subtract.

(iii) To rename column mark1 as marks1 in both the dataframes df1 and df2.

df1 = df1.rename(columns={'mark1': 'marks1'})


df2 = df2.rename(columns={'mark1': 'marks1'})

The column 'mark1' is renamed to 'marks1' in both df1 and df2.

(iv) To change the index label of df1 from 0 to zero and from 1 to one.

df1 = df1.rename(index={0: 'zero', 1: 'one'})

The index labels '0' and '1' in df1 are renamed to 'zero' and 'one', respectively, using the rename
method with the index parameter.

OR
df1.index = ['zero', 'one', 2, 3]

The index labels '0' and '1' in df1 are changed to 'zero' and 'one' without using a dictionary by directly
assigning new index labels to df1.index.

8 Carefully observe the following code:

import pandas as pd
Year1={'Q1':5000,'Q2':8000,'Q3':12000,'Q4': 18000}
Year2={'A' :13000,'B':14000,'C':12000}
totSales={1:Year1, 2:Year2}
df=pd.DataFrame(totSales)
print(df)

Created DataFrame will be:


1 2
Q1 5000.0 NaN
Q2 8000.0 NaN
Q3 12000.0 NaN
Q4 18000.0 NaN
A NaN 13000.0
B NaN 14000.0
C NaN 12000.0

i. List the index of the DataFrame df

The index of the DataFrame df consists of the row labels, which are the quarters ('Q1', 'Q2', 'Q3', 'Q4',
'A', 'B', 'C'). So, the index is: ['Q1', 'Q2', 'Q3', 'Q4', 'A', 'B', 'C'].

ii. List the column names of DataFrame df.

The column names of the DataFrame df are the keys from the totSales dictionary, which are 1 and 2 in
this case. So, the column names are: [1, 2].

9 Write a Python code to create a DataFrame with appropriate column headings from the list given
below:

[[101,'Gurman',98],[102,'Rajveer',95],[103,'Samar' ,96],[104,'Yuvraj',88]]

import pandas as pd

data = [[101, 'Gurman', 98], [102, 'Rajveer', 95], [103, 'Samar', 96], [104, 'Yuvraj', 88]]

# Define column headings


columns = ['StudentID', 'Name', 'Score']

# Create the DataFrame


df = pd.DataFrame(data, columns=columns)

# Print the DataFrame


print(df)

This code will create a DataFrame with the specified column headings 'StudentID', 'Name', and 'Score'
from the given list.

10 Consider the given DataFrame ‘Stock’:


import pandas as pd

# Create the DataFrame


data = {'Name': ['Nancy Drew', 'Hardy Boys', 'Diary of a Wimpy Kid', 'Harry Potter'], 'Price': [150,
180, 225, 500]}

Stock = pd.DataFrame(data)

Write suitable Python statements for the following:

i. Add a column called Special_Price with the following data:


[135,150,200,440].

special_prices = [135, 150, 200, 440]


Stock['Special_Price'] = special_prices

We add a new column 'Special_Price' to the DataFrame with the specified data. This can be done by
creating a list of special prices and assigning it as a new column to the DataFrame.
ii. Add a new book named ‘The Secret' having price 800.

new_book_data = {'Name': 'The Secret', 'Price': 800}


Stock = Stock.append(new_book_data)

We create a dictionary new_book_data representing the new row and then use the append method to
add it to the DataFrame Stock. This operation appends a new row to the DataFrame with the specified
data for the 'Name' and 'Price' columns.

iii. Remove the column Special_Price.

Stock = Stock.drop('Special_Price', axis=1)

We remove the 'Special_Price' column using the drop method with axis=1. This operation drops the
specified column from the DataFrame.

11 Consider the following DataFrame, classframe

import pandas as pd

# Create the DataFrame using multiple lists


L1 = [1, 'Aman', 'IX', 'E', 8.7, 'Science'],
L2 = [2, 'Preeti', 'X', 'F', 8.9, 'Arts'],
L3 = [3, 'Kartikey', 'IX', 'D', 9.2, 'Science'],
L4 = [4, 'Lakshay', 'X', 'A', 9.4, 'Commerce']
data = [L1, L2,, L3, L4]
index = [‘St1’,’St2’,’St3’, ‘St4’]
columns = ['Rollno', 'Name', 'Class', 'Section', 'CGPA', 'Stream']

classframe = pd.DataFrame(data=data, index=index, columns=columns)


print(classframe)

Write commands to :

i. Add a new column ‘Activity’ to the Dataframe

classframe['Activity'] = ['A1', 'A2', 'A3', 'A4']

ii. Add a new row with values ( 5 , Mridula ,X, F , 9.8, Science)

new_row = [5, 'Mridula', 'X', 'F', 9.8, 'Science', 'St5']


classframe.loc [ len ( classframe ) ] = new_row

12 Write a program in Python Pandas to create the following DataFrame batsman from a Dictionary:

import pandas as pd

# Create the DataFrame from a dictionary


data = { 'B_NO': [1, 2, 3, 4], 'Name': ['Sunil Pillai', 'Gaurav Sharma', 'Piyush Goel', 'Kartik Thakur'],
'Score1': [90, 65, 70, 80], 'Score2': [80, 45, 90, 76] }

df = pd.DataFrame(data)

Perform the following operations on the DataFrame :

1) Add both the scores of a batsman and assign to column “Total”

df['Total'] = df['Score1'] + df['Score2']

2) Display the highest score in both Score1 and Score2 of the DataFrame.

hs1 = df['Score1'].max()
hs2 = df['Score2'].max()

print("Highest Score in Score1:", hs1)


print("Highest Score in Score2:", hs2)

3) Display the DataFrame

print(df)

13 Write a python code to create a dataframe with appropriate headings from the list given below :
[['S101', 'Amy', 70], ['S102', 'Bandhi', 69], ['S104', 'Cathy', 75], ['S105', 'Gundaho', 82]]

import pandas as pd
data = [['S101', 'Amy', 70], ['S102', 'Bandhi', 69], ['S104', 'Cathy', 75], ['S105', 'Gundaho', 82]]

# Define column headings


columns = ['StudentID', 'Name', 'Score']

# Create the DataFrame


df = pd.DataFrame(data, columns=columns)

# Print the DataFrame


print(df)

This code will create a DataFrame with the specified column headings 'StudentID', 'Name', and 'Score'
from the given list.

14 Write a small python code to create a dataframe with headings(a and b) from the list given below :
[[1,2],[3,4],[5,6],[7,8]]

import pandas as pd

data = [[1, 2], [3, 4], [5, 6], [7, 8]]

# Define column headings


columns = ['a', 'b']

# Create the DataFrame


df = pd.DataFrame(data, columns=columns)

# Print the DataFrame


print(df)

This code will create a DataFrame with the specified column headings 'a' and 'b' from the given list.

15 Consider the following dataframe, and answer the questions given below:

import pandas as pd
df = pd.DataFrame({“Quarter1":[2000, 4000, 5000, 4400, 10000], "Quarter2":[5800, 2500, 5400,
3000, 2900], "Quarter3":[20000, 16000, 7000, 3600, 8200], "Quarter4":[1400, 3700, 1700, 2000,
6000]})
(ii) Use sum() function to find the sum of all the values over the index axis.

To find the sum of all the values over the index axis (i.e., summing up all the values in the DataFrame),
you can use the sum() function with the axis parameter set to 0.

sum_over_index = df.sum(axis=0)

This will give you a Series containing the sum of values for each column (Quarter1, Quarter2,
Quarter3, and Quarter4).

(iii) Find the median of the dataframe df.

To find the median of the DataFrame df, you can use the median() function without specifying the axis
parameter.

median_value = df.median()
This will give you a Series containing the median value for each column (Quarter1, Quarter2,
Quarter3, and Quarter4).

You might also like