0% found this document useful (0 votes)

178 views11 pages

Jupyter Notebook: Class Scores Analysis

Q: What are the benefits and limitations of using a custom function for creating and saving a DataFrame in a typical data analysis workflow?

Using a custom function for creating and saving a DataFrame offers benefits such as modularity, reusability, and the ability to standardize and automate repetitive tasks. This approach can improve code maintainability by encapsulating functionality within named functions, thereby making the codebase easier to manage and understand. However, limitations may include potential inflexibility regarding changes in data structure and increased complexity in debugging if the function does not handle diverse input conditions or exceptions adequately .

Q: Evaluate the effectiveness of using pandas for data manipulation tasks compared to handling data manually without libraries.

Using pandas for data manipulation is considerably more effective than manual handling due to pandas' high-level abstractions and built-in functionalities for data operations, such as reading/writing files, handling missing data, and performing statistical analyses. These capabilities streamline workflows, improve efficiency, and reduce the likelihood of errors present in manual data handling. Furthermore, pandas enhances code readability and maintainability by encapsulating complex operations in simple, well-documented functions .

Q: What are the key steps involved in creating a DataFrame from raw data using a custom function in Python?

To create a DataFrame from raw data using a custom function, the main steps include defining a custom function like 'DataFrameCreator', which receives a 2D structure of lists containing data and a corresponding list of column names. Inside the function, a pandas DataFrame is initialized, and each list of data is assigned to respective columns of the DataFrame using the column names. Finally, the DataFrame is returned or processed further as needed. This approach allows for structured data management and the ability to save the DataFrame to a file if needed .

Q: Illustrate the process of aligning multiple datasets to compute a composite score using Python.

To compute a composite score from multiple datasets in Python, first ensure data alignment by using indexes or keys (such as student names) that match across datasets. Use Python's zip function to iterate in parallel over lists of scores for different subjects. Calculate the composite, or average, score for each aligned entry by aggregating scores (e.g., summing and averaging across subject scores), and store these in a new list. Finally, incorporate this list into a DataFrame as a new column to facilitate integrated analysis, leveraging pandas for structured data handling .

Q: How is the average score calculated for each student, and how is it incorporated into the DataFrame structure?

The average score for each student is calculated by first iterating through each student's scores across three subjects: math, chemistry, and biology. The mean of these scores is computed using either a statistical mean function or manual computation by summing the scores and dividing by the number of subjects. This average score is appended to a list 'totalScore'. Subsequently, when the DataFrame is created, this list is included as an additional column labeled 'Average Score', thus integrating statistical analysis directly into the DataFrame structure for comprehensive data representation and analysis .

Q: Analyze the efficiency and readability differences between a for-loop and list comprehension while processing a list of scores.

List comprehension is generally more efficient and concise than an equivalent for-loop for processing lists, due to its inline capability and reduced verbosity. A complex operation that might span several lines in a for-loop can usually be expressed in a single line with list comprehension. This not only reduces execution time by minimizing repetitive iterations but also enhances code readability by providing a clear and direct expression of operation on the list. However, list comprehensions may reduce readability in very complex expressions compared to well-commented loops .

Q: What implications do conditional score adjustments have on data integrity and analysis?

Conditional score adjustments, such as setting minimum scores, affect data integrity by altering raw data which may introduce bias or deviation from the original data distribution. For analysis, these adjustments can lead to skewed representations of performance, potentially overestimating student capability where lower scores are raised to the threshold. It is essential to document these transformations comprehensively to maintain transparency in data analysis and interpretation .

Q: Describe the role of the modulus operation in generating a new list of remainders from a division operation on a set of numbers.

The modulus operation, represented as '%', plays a crucial role in generating a list of remainders from a division operation by allowing the calculation of the remainder of each division of a list element by a specified divisor. In practical application, such as with 'testList', a loop iterates over each element of the list, applies the modulus operation with a divisor (e.g., 3), and appends the resulting remainder to a new list (e.g., 'divBy3'). This technique is valuable for solving problems involving periodicity or cyclicity within a dataset .

Q: Explain how the function 'fileSaver' is utilized in the context of saving a DataFrame to a CSV file.

The 'fileSaver' function is used to save a DataFrame to a CSV file by taking the DataFrame as an input parameter and specifying the file path and name for saving. Within the function, methods like 'to_csv()' from pandas are utilized to write the DataFrame to disk at the specified path. This function ensures data persistence by enabling the user to store data in a structured format that can be accessed or shared in the future .

Q: How does the list comprehension method work for processing student math scores below a certain threshold?

In the context of adjusting scores that fall below a threshold, list comprehension can be employed to efficiently process a list of student scores by setting any score below 50 to 50. The syntax involves iterating over the list of scores and using a conditional expression to replace each score below the threshold. Specifically, in the given example, a comprehension can iterate over the 'mathsScore' list and apply 'score = 50 if score < 50 else score' to ensure no score is below the threshold .

The document summarizes hands-on exercises using Jupyter Notebook to work with lists, loops, functions, and pandas. It shows examples of list comprehension, for loops, defining functions, reading and saving CSV files, and creating a DataFrame from lists of student data including names, test scores, and calculated average scores. The key steps are: 1) defining lists of student names and test scores, 2) using loops and functions to calculate average scores, 3) passing data to a function to create a DataFrame, and 4) displaying the DataFrame with student data and average scores.

Uploaded by

Ridwan Dere

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

178 views11 pages

Jupyter Notebook: Class Scores Analysis

Uploaded by

Ridwan Dere

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

8/18/22, 11:10 AM Fifth Class Hands On - Jupyter Notebook

List Comprehension
In [ ]:

1 #Scenarion; David is a teacher and has five students in his class,

2 #David wants to ensure no students scores below 50 in the Maths test.
3 #If a students scores below 50, David would like to adjust the score to 50.

In [5]:

1 studentNames = ['Enoch', 'Francisca', 'Lilian', 'Agbo', 'Emeka',]

2 mathsScore = [34, 21, 56, 78, 88]

In [ ]:

In [2]:

1 for students, scores in zip(studentNames,mathsScore):

2 if scores < 50:
3 scores = 50
4 print(students, scores)

Enoch 50

Francisca 50

Lilian 56

Agbo 78

Emeka 88

localhost:8889/notebooks/Desktop/Leraning Data Analysis/One Campus Academy/Workspace/Fifth Class Hands [Link] 1/11

8/18/22, 11:10 AM Fifth Class Hands On - Jupyter Notebook

In [3]:

1 for x in range(50):
2 print(x)

localhost:8889/notebooks/Desktop/Leraning Data Analysis/One Campus Academy/Workspace/Fifth Class Hands [Link] 2/11

8/18/22, 11:10 AM Fifth Class Hands On - Jupyter Notebook

In [29]:

1 #List Comprehension
2
3 mathsScore = [34, 21, 56, 78, 88]
4 myList = [(x*2) for x in mathsScore if x < 50]
5 myList

Out[29]:

[68, 42]

In [30]:

1 newList = []
2
3 for x in myList:
4 if x < 50:
5 [Link](x)
6 print(newList)

[42]

In [31]:

1 #remainder operation: You use the % sign

2
3 95%6
4

Out[31]:

In [14]:

1 #To get how many times a numerator divides a denominator you us the modulus // sign
2
3 65//6

Out[14]:

In [32]:

1 testList = [56, 54, 76, 87, 23, 45, 78, 90]

2 #generate a new list containing the remainder of divsion of each item by 3
3 divBy3 = []
4 for test in testList:
5 solve = test%3
6 [Link](solve)
7 print(divBy3)
8

[2, 0, 1, 0, 2, 0, 0, 0]

localhost:8889/notebooks/Desktop/Leraning Data Analysis/One Campus Academy/Workspace/Fifth Class Hands [Link] 3/11

8/18/22, 11:10 AM Fifth Class Hands On - Jupyter Notebook

In [33]:

1 DIV_3 = [(a%3) for a in testList ]

2 DIV_3

Out[33]:

[2, 0, 1, 0, 2, 0, 0, 0]

In [ ]:

1 #Revising what we did in Functions

2 #The SYNTAX IS ;
3 def functionName ():
4 body
5 return #note that this return statement is optional

In [22]:

1 #function that saves a file to disc

2
3 def fileSaver(dataframe):
4 pass
5
6 def DataFrameCreator(listSet,colNamesSet):
7 pass
8

In [23]:

1 #reading and saving files (csv and excel) to/from disk

2
3 import pandas as pd

In [ ]:

1 #read_csv() to read a csv file from disk

2 #read_excel() to read an excel file from disk
3 #read_txt() to read a text file from disk
4
5 #to_excel() to save a file to excel
6 #to_csv() to save a file to csv

In [38]:

1 #Generate a 2D table of the below result using functions and pandas

2 #The overall score should be average of the three score
3 studNames = ['Haga', 'Hasgo', 'Easy', 'Online', 'Akom', 'Dere', 'Andu', 'Tunny', 'Eesha
4 mathScore = [90,85,90,65,90,45,51,56,91,68]
5 chemScore = [80,95,80,75,60,45,61,86,57,56]
6 bioScore = [70,55,40,85,50,45,71,63,87,59]
7
8 OverAllScore = []
9 print(len(mathScore))
10 print(len(chemScore))
11 print(len(bioScore))

localhost:8889/notebooks/Desktop/Leraning Data Analysis/One Campus Academy/Workspace/Fifth Class Hands [Link] 4/11

8/18/22, 11:10 AM Fifth Class Hands On - Jupyter Notebook

In [39]:

1 import statistics
2
3 for m,c,b in zip(mathScore,chemScore,bioScore):
4 averageScore = [Link]([m,c,b])
5 [Link](averageScore)
6

In [40]:

1 print(OverAllScore)

[80, 78.33333333333333, 70, 75, 66.66666666666667, 45, 61, 68.3333333333333

3, 78.33333333333333, 61]

In [36]:

1 totalScore = []
2 for m,c,b in zip(mathScore,chemScore,bioScore):
3 averageScore = (m+c+b)/3
4 [Link](averageScore)

In [41]:

1 print(totalScore)

[80.0, 78.33333333333333, 70.0, 75.0, 66.66666666666667, 45.0, 61.0, 68.3333

3333333333, 78.33333333333333, 61.0]

localhost:8889/notebooks/Desktop/Leraning Data Analysis/One Campus Academy/Workspace/Fifth Class Hands [Link] 5/11

8/18/22, 11:10 AM Fifth Class Hands On - Jupyter Notebook

In [42]:

Student Names Maths Score Chemistry Score Biology Score Average Score

0 Haga 90 80 70 80.000000

1 Hasgo 85 95 55 78.333333

2 Easy 90 80 40 70.000000

3 Online 65 75 85 75.000000

4 Akom 90 60 50 66.666667

5 Dere 45 45 45 45.000000

6 Andu 51 61 71 61.000000

7 Tunny 56 86 63 68.333333

8 Eesha 91 57 87 78.333333

9 Mori 68 56 59 61.000000

localhost:8889/notebooks/Desktop/Leraning Data Analysis/One Campus Academy/Workspace/Fifth Class Hands [Link] 6/11

8/18/22, 11:10 AM Fifth Class Hands On - Jupyter Notebook

In [55]:

1 tableVals= [studNames,mathScore,chemScore,bioScore,totalScore]
2 tableVals

Out[55]:

[['Haga',

'Hasgo',

'Easy',

'Online',

'Akom',

'Dere',

'Andu',

'Tunny',

'Eesha',

'Mori'],

[90, 85, 90, 65, 90, 45, 51, 56, 91, 68],

[80, 95, 80, 75, 60, 45, 61, 86, 57, 56],

[70, 55, 40, 85, 50, 45, 71, 63, 87, 59],

[80.0,

78.33333333333333,

70.0,

75.0,

66.66666666666667,

45.0,

61.0,

68.33333333333333,

78.33333333333333,

61.0]]

In [43]:

1 dataTable = [Link]()
2 dataTable ['Student Names']= studNames
3 dataTable ['Maths Score'] = mathScore
4 dataTable ['Chemistry Score'] = chemScore
5 dataTable ['Biology Score'] = bioScore
6 dataTable ['Average Score'] = totalScore
7 dataTable

Out[43]:

Student Names Maths Score Chemistry Score Biology Score Average Score

0 Haga 90 80 70 80.000000

1 Hasgo 85 95 55 78.333333

2 Easy 90 80 40 70.000000

3 Online 65 75 85 75.000000

4 Akom 90 60 50 66.666667

5 Dere 45 45 45 45.000000

6 Andu 51 61 71 61.000000

7 Tunny 56 86 63 68.333333

8 Eesha 91 57 87 78.333333

9 Mori 68 56 59 61.000000

localhost:8889/notebooks/Desktop/Leraning Data Analysis/One Campus Academy/Workspace/Fifth Class Hands [Link] 7/11

8/18/22, 11:10 AM Fifth Class Hands On - Jupyter Notebook

In [44]:

1 colNames = ['Student Names', 'Maths Score', 'Chemistry Score', 'Biology Score', 'Averag
2 colNames[4]
3

Out[44]:

'Average Score'

localhost:8889/notebooks/Desktop/Leraning Data Analysis/One Campus Academy/Workspace/Fifth Class Hands [Link] 8/11

8/18/22, 11:10 AM Fifth Class Hands On - Jupyter Notebook

In [64]:

1 import pandas as pd
2 studNames = ['Haga', 'Hasgo', 'Easy', 'Online', 'Akom', 'Dere', 'Andu', 'Tunny', 'Eesha
3 mathScore = [90,85,90,65,90,45,51,56,91,68]
4 chemScore = [80,95,80,75,60,45,61,86,57,56]
5 bioScore = [70,55,40,85,50,45,71,63,87,59]
6
7 totalScore = []
8
9 for m,c,b in zip(mathScore,chemScore,bioScore):
10 averageScore = (m+c+b)/3
11 [Link](averageScore)
12
13 dataTable = [Link]()
14 dataTable ['Student Names']= studNames
15 dataTable ['Maths Score'] = mathScore
16 dataTable ['Chemistry Score'] = chemScore
17 dataTable ['Biology Score'] = bioScore
18 dataTable ['Average Score'] = totalScore
19
20
21
22 #Task: pass the list and the column name to our DataFrameCreator function
23
24 tableVals= [studNames,mathScore,chemScore,bioScore,totalScore]
25 colNames = ['Student Names', 'Maths Score', 'Chemistry Score', 'Biology Score', 'Averag
26
27
28 def fileSaver(datatable):
29 print('begin saving')
30 saveName = '[Link]'
31 savePath = 'c:/filestore/'
32 datatable.to_csv(savePath+saveName)
33 print('saved file')
34
35 DataFrameCreator(tableVals,colNames)
36
37 def DataFrameCreator(tableVals,colNames):
38 dataTable = [Link]()
39 dataTable [colNames[0]]= tableVals[0]
40 dataTable [colNames[1]] = tableVals[1]
41 dataTable [colNames[2]] = tableVals[2]
42 dataTable [colNames[3]] = tableVals[3]
43 dataTable [colNames[4]] = tableVals[4]
44 print(dataTable)
45 print('calling function to save file')
46 fileSaver(datatable)
47
48

Student Names Maths Score Chemistry Score Biology Score Average Scor
e

0 Haga 90 80 70 80.00000
0

1 Hasgo 85 95 55 78.33333
3

2 Easy 90 80 40 70.00000
0

3 Online 65 75 85 75.00000
0

localhost:8889/notebooks/Desktop/Leraning Data Analysis/One Campus Academy/Workspace/Fifth Class Hands [Link] 9/11

8/18/22, 11:10 AM Fifth Class Hands On - Jupyter Notebook

4 Akom 90 60 50 66.66666
7

5 Dere 45 45 45 45.00000
0

6 Andu 51 61 71 61.00000
0

7 Tunny 56 86 63 68.33333
3

8 Eesha 91 57 87 78.33333
3

9 Mori 68 56 59 61.00000
0

calling function to save file

begin saving

saved file

In [66]:

1 fileName = '[Link]'
2 dirPath = 'c:/filestore/'
3 file = pd.read_csv(dirPath+fileName)
4 file

Out[66]:

Unnamed: 0 Student Names Maths Score Chemistry Score Biology Score Average Score

0 0 Haga 90 80 70 80.000000

1 1 Hasgo 85 95 55 78.333333

2 2 Easy 90 80 40 70.000000

3 3 Online 65 75 85 75.000000

4 4 Akom 90 60 50 66.666667

5 5 Dere 45 45 45 45.000000

6 6 Andu 51 61 71 61.000000

7 7 Tunny 56 86 63 68.333333

8 8 Eesha 91 57 87 78.333333

9 9 Mori 68 56 59 61.000000

In [ ]:

localhost:8889/notebooks/Desktop/Leraning Data Analysis/One Campus Academy/Workspace/Fifth Class Hands [Link] 10/11

8/18/22, 11:10 AM Fifth Class Hands On - Jupyter Notebook

localhost:8889/notebooks/Desktop/Leraning Data Analysis/One Campus Academy/Workspace/Fifth Class Hands [Link] 11/11

Common questions