0% found this document useful (0 votes)
3 views

Python Basics V3

Uploaded by

bogexi4626
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Python Basics V3

Uploaded by

bogexi4626
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

10/10/2024, 23:31 Python Basics - Jupyter Notebook

Class 2 - September 24, 2024

Python Text
In [2]: # This is a comment, starts with '#'

Working with integer data type


In [3]: x = 10 # I'm assigning 10 to x

In [4]: x

Out[4]: 10

In [5]: X = 5

In [6]: X

Out[6]: 5

Working with float data type


In [7]: f = 10.3

In [8]: f

Out[8]: 10.3

Basic Arithmetic Operations in Python


In [9]: x = 10
y = 5

In [10]: # Addition
x + y

Out[10]: 15

In [11]: # Subtraction
x - y

Out[11]: 5

localhost:8888/notebooks/Python Basics/Python Basics.ipynb#The-problem-here-is-that-we-have-some-string-data-mixed-with-numeric-data,-and-we-need-to-… 1/33


10/10/2024, 23:31 Python Basics - Jupyter Notebook

In [12]: # Multiplication
x * y

Out[12]: 50

In [13]: # Division
int(x / y)

Out[13]: 2

In [ ]: ​

In [14]: # This is a comment

Working with string data type


In [15]: firstname = "jaMes"
lastname = "mujeeb"

In [16]: my_name = firstname +' '+ lastname

In [17]: my_name

Out[17]: 'jaMes mujeeb'

In [18]: type(my_name)

Out[18]: str

In [19]: my_name.title() # change to title format

Out[19]: 'James Mujeeb'

In [20]: my_name.upper() # change to upper case

Out[20]: 'JAMES MUJEEB'

In [21]: my_name.lower()

Out[21]: 'james mujeeb'

In [ ]: ​

if statement in python
Creating Conditions in Python

Syntax:

We write if, followed by the condition (e.g., x == 10), then a colon (:).

The condition is formed using comparison operators. Examples of comparison operators in


Python include:
localhost:8888/notebooks/Python Basics/Python Basics.ipynb#The-problem-here-is-that-we-have-some-string-data-mixed-with-numeric-data,-and-we-need-to-… 2/33
10/10/2024, 23:31 Python Basics - Jupyter Notebook

==
Meaning: Equal to Example: x == y checks if x is equal to y.

!=
Meaning: Not equal to Example: x != y checks if x is not equal to y.

>
Meaning: Greater than Example: x > y checks if x is greater than y.

<
Meaning: Less than Example: x < y checks if x is less than y.

>=
Meaning: Greater than or equal to Example: x >= y checks if x is greater than or equal to y.

<=
Meaning: Less than or equal to Example: x <= y checks if x is less than or equal to y.

is
Meaning: Object identity (checks if both operands refer to the same object) Example: x is y
checks if x and y refer to the same object in memory.

is not
Meaning: Negation of object identity (checks if operands refer to different objects) Example:
x is not y checks if x and y refer to different objects.

in
Meaning: Membership (checks if a value exists within a sequence) Example: x in y checks if
x is an element in y (where y could be a list, tuple, or string).

not in
Meaning: Negation of membership (checks if a value does not exist within a sequence)
Example: x not in y checks if x is not an element in y.

if condition:
print("Condition is true")

localhost:8888/notebooks/Python Basics/Python Basics.ipynb#The-problem-here-is-that-we-have-some-string-data-mixed-with-numeric-data,-and-we-need-to-… 3/33


10/10/2024, 23:31 Python Basics - Jupyter Notebook

()
In [22]: x = 10 # This is assigning x to value of 10
x == 10 # This is checking if x is equals 10, output: True or False

Out[22]: True

Explanation of Comparison Operators


Primarily used to compare two or more values, and the result is
either True or False

In [23]: ​
1 == 2 # This is checking if 1 is equal 2

Out[23]: False

In [24]: # ! stands for not, This is checking is 1 not equal 2

In [25]: 1 != 2

Out[25]: True

In [26]: name = "James"

In [27]: if name == "James":


print("Correct")

Correct

Assignment: Python Integer Variables and if


Statements
Objective:
To assess your understanding of working with integer variables and conditional statements
using if in Python.

Instructions:
Create two integer variables k and v .
Assign any values of your choice to k and v .
Write an if statement to check if k is greater than 10:
If k > 10 , add k and v together and print the result.
If k <= 10 , subtract v from k and print the result.

localhost:8888/notebooks/Python Basics/Python Basics.ipynb#The-problem-here-is-that-we-have-some-string-data-mixed-with-numeric-data,-and-we-need-to-… 4/33


10/10/2024, 23:31 Python Basics - Jupyter Notebook

In [ ]: ​

Class 3 - October 1, 2024

Revision of our previous class

if , elif, else statement

list data type (indexing, slicing), when to use [] and ()

for loop

Simple python function

User input

Step by step approach to solving a problem.

In [12]: k = 9
v = 25

In [13]: # check if k is greater than v


if k > 10:
answer = k + v # add k and v
print(answer)

# check if k is less than or equals 10
elif k <= 10:
answer = k - v
print(answer)

-16

Grading system with if, elif and else statement


Take student score Our grades are A, B, C, D, F.

In [3]: score = 50

localhost:8888/notebooks/Python Basics/Python Basics.ipynb#The-problem-here-is-that-we-have-some-string-data-mixed-with-numeric-data,-and-we-need-to-… 5/33


10/10/2024, 23:31 Python Basics - Jupyter Notebook

In [4]: # Check if student score is greater or equals 80


if score >= 80:
print("Grade is A")

# Check if student score is greater or equals 70
elif score >= 70:
print("Grade is B")

# Check if student score is greater or equals 50


elif score >= 50:
print("Grade is C")

# Check if student score is greater or equals 40


elif score >= 40:
print("Grade is D")

# If none of the above condition is applicable


else:
print("You are not serious ") 🧐
Grade is C

In [ ]: ​

More on String Operations


In [77]: text = "My country name is Nigeria"

In [71]: text.title()

Out[71]: 'My Country Name Is Nigeria'

In [72]: text.split()

Out[72]: ['My', 'country', 'name', 'is', 'Nigeria']

In [79]: text.split('name')

Out[79]: ['My country ', ' is Nigeria']

In [86]: names = "Paul-Adeoye,Faith-Jumoke,Glory-Ade"


names

Out[86]: 'Paul-Adeoye,Faith-Jumoke,Glory-Ade'

In [87]: names.split(',')

Out[87]: ['Paul-Adeoye', 'Faith-Jumoke', 'Glory-Ade']

In [88]: names.split(',')[0]

Out[88]: 'Paul-Adeoye'

localhost:8888/notebooks/Python Basics/Python Basics.ipynb#The-problem-here-is-that-we-have-some-string-data-mixed-with-numeric-data,-and-we-need-to-… 6/33


10/10/2024, 23:31 Python Basics - Jupyter Notebook

In [89]: names.split(',')[0].split('-')

Out[89]: ['Paul', 'Adeoye']

In [90]: names.split(',')[0].split('-')[0]

Out[90]: 'Paul'

In [95]: names

Out[95]: 'Paul-Adeoye,Faith-Jumoke,Glory-Ade'

In [92]: names.find('Faith')

Out[92]: 12

In [96]: # .replace - what we want to replace, and what to replace with


names.replace('Ade', 'Dea')

Out[96]: 'Paul-Deaoye,Faith-Jumoke,Glory-Dea'

In [99]: len(names) # length

Out[99]: 34

Python Functions
In [ ]: # Create the functions
# Write the blocks of code to perform an action
# Call the function

In [15]: def grading_system(name, score):

# Check if student score is greater or equals 80


if score >= 80:
print(name, "Grade is A".lower())

# Check if student score is greater or equals 70
elif score >= 70:
print(name, "Grade is B".lower())

# Check if student score is greater or equals 50
elif score >= 50:
print(name, "Grade is C".lower())

# Check if student score is greater or equals 40
elif score >= 40:
print(name, "Grade is D".lower())

# If none of the above condition is applicable
else:
print(name, "You are not serious ".lower()) 🧐

localhost:8888/notebooks/Python Basics/Python Basics.ipynb#The-problem-here-is-that-we-have-some-string-data-mixed-with-numeric-data,-and-we-need-to-… 7/33


10/10/2024, 23:31 Python Basics - Jupyter Notebook

In [16]: name = input("Enter your name: ")


score = int(input("Enter your score: "))

# call the function, this functions needs two parameters, which are na
grading_system(name, score)

Enter your name: Ridwan


Enter your score: 65
Ridwan grade is c

In [ ]: ​

In [ ]: ​

In [16]: # [] - Mostly use for list

In [17]: # () - When creating, calling a function in python

List data types

We define list by using [ ]

Items in a list are seperated by comma


In [18]: names = ["Abayomi", "Musa", "Yahaya"]

In [19]: print(names)

['Abayomi', 'Musa', 'Yahaya']

How many items do we have in this list


In [20]: len(names) ## len - To check the length of our list

Out[20]: 3

In [21]: names[2].upper()

Out[21]: 'YAHAYA'

In [ ]: ​

In [22]: participants_age = [12, 15, 34, 20, 57, 29]

In [23]: len(participants_age)

Out[23]: 6

localhost:8888/notebooks/Python Basics/Python Basics.ipynb#The-problem-here-is-that-we-have-some-string-data-mixed-with-numeric-data,-and-we-need-to-… 8/33


10/10/2024, 23:31 Python Basics - Jupyter Notebook

In [24]: participants_age[5] + participants_age[0]

Out[24]: 41

In [25]: participants_age[2]

Out[25]: 34

In [26]: type(participants_age[2])

Out[26]: int

In [ ]: ​

In [27]: ## for loop - Perform iteration on maybe list

In [28]: for name in names:


print(name)

Abayomi
Musa
Yahaya

In [29]: for x in names:


print(x)

Abayomi
Musa
Yahaya

In [30]: for age in participants_age:


print(age)

12
15
34
20
57
29

In [31]: ## range - set boundary


range(5)

Out[31]: range(0, 5)

In [32]: ​
range(1,10)

Out[32]: range(1, 10)

**

localhost:8888/notebooks/Python Basics/Python Basics.ipynb#The-problem-here-is-that-we-have-some-string-data-mixed-with-numeric-data,-and-we-need-to-… 9/33


10/10/2024, 23:31 Python Basics - Jupyter Notebook

In [33]: for i in range(10):


print(i)

0
1
2
3
4
5
6
7
8
9

In [34]: for i in range(10):


print(i+2)

2
3
4
5
6
7
8
9
10
11

In [35]: # perform multiplication operation


5 * '*'

Out[35]: '*****'

In [36]: for i in range(1,6):


print(i * '*')

*
**
***
****
*****

In [ ]: ​

Python Function
In [37]: ## You first define a function
## write the function logic
## Call the function

In [38]: def yahaya():


print("Good evening Mr. yahaya")

localhost:8888/notebooks/Python Basics/Python Basics.ipynb#The-problem-here-is-that-we-have-some-string-data-mixed-with-numeric-data,-and-we-need-to… 10/33


10/10/2024, 23:31 Python Basics - Jupyter Notebook

In [39]: yahaya()

Good evening Mr. yahaya

String formatting
In [40]: my_name = "Ade"
print(f"Welcome {my_name}")

Welcome Ade

In [41]: ## Function parameters



## I want the function to greet anybody that calls the function

In [42]: def yahaya(name):


print(f"Good evening Mr. {name}")

In [43]: ​
yahaya("Kelvin")

Good evening Mr. Kelvin

In [ ]: ​

Prompt user input


Note: After typing inside the input box, press Enter or return button
on your keyboard while your cursor is still inside the input box to
proceed

In [44]: input()

Out[44]: '7'

Note that by default user input is coming in as string


data type
In [8]: input("Please enter first number ")

Please enter first number 6

Out[8]: '6'

localhost:8888/notebooks/Python Basics/Python Basics.ipynb#The-problem-here-is-that-we-have-some-string-data-mixed-with-numeric-data,-and-we-need-to… 11/33


10/10/2024, 23:31 Python Basics - Jupyter Notebook

In [9]: # if we need the input to be integer


# wrap the input inside int() function
int(input("Please enter first number "))

Please enter first number 3

Out[9]: 3

In [11]: # Prompt users to enter the first number


first_number = int(input("Please enter first number "))

# Prompt users to enter second number
second_number = int(input("Please enter second number "))

# Perform addition operation on the two numbers
add_number = first_number + second_number
print(f"The result of the addition is {add_number}")

Please enter first number 6


Please enter second number 4
The result of the addition is 10

In [ ]: ​

In [ ]: ​

Step-by-Step Approach to Solving a Problem


Objective: How to break down problems and think logically.

Key Points:

Understand the Problem: What is being asked?


Break It Down: Divide the problem into smaller steps.
Write Code for Each Step: Implement the solution step by step.
Test and Debug: Ensure your code works and fix errors.

In [ ]: ​

In [ ]: ​

In [ ]: ​

Python List Operations and Methods


Explained as Arrays or Vectors

localhost:8888/notebooks/Python Basics/Python Basics.ipynb#The-problem-here-is-that-we-have-some-string-data-mixed-with-numeric-data,-and-we-need-to… 12/33


10/10/2024, 23:31 Python Basics - Jupyter Notebook

A list is a versatile and mutable collection that can store multiple


values, including different data types, just like an array in other
programming languages. Python lists allow indexing, slicing, and
various operations like adding or removing elements.

Key Concepts:
Indexing: Each element in a list is associated with an index, starting from 0.
1. Positive index: Starts from 0, moving left to right.
2. Negative index: Starts from -1, moving right to left.

In [ ]: my_list = ['apple', 'banana', 'cherry']


print(my_list[0]) # Output: 'apple'
print(my_list[-1]) # Output: 'cherry'

Slice from Index 1 to 4

In [2]: my_list = [0, 1, 2, 3, 4, 5, 6]


print(my_list[1:4]) # Output: [1, 2, 3]

[1, 2, 3]

In [3]: my_list[-7]

Out[3]: 0

In [ ]: ​

In [ ]: ​

Mutability
In Python, mutability refers to whether an object’s state can be
changed after it has been created. An object that can be changed
is called mutable object, while an object that cannot be changed is
called immutable object. Let's break this down with examples:

In [ ]: ​

Immutable Variables

localhost:8888/notebooks/Python Basics/Python Basics.ipynb#The-problem-here-is-that-we-have-some-string-data-mixed-with-numeric-data,-and-we-need-to… 13/33


10/10/2024, 23:31 Python Basics - Jupyter Notebook

These variables cannot have their value or state modified. Once a


value is assigned during declaration, it remains fixed and cannot be

In [ ]: ​

In [15]: 10=5 ##not possible

Cell In[15], line 1


10=5 ##not possible
^
SyntaxError: cannot assign to literal here. Maybe you meant '==' ins
tead of '='?

Tuple (tuple): Once a tuple is created, you cannot change its content (no adding,
removing, or modifying elements).

In [16]: my_tuple = (1, 2, 3)


my_tuple[0] = 10 # This will raise an error as tuples are immutable

--------------------------------------------------------------------
-------
TypeError Traceback (most recent cal
l last)
Cell In[16], line 2
1 my_tuple = (1, 2, 3)
----> 2 my_tuple[0] = 10

TypeError: 'tuple' object does not support item assignment

In [ ]: ​

In [ ]: ​

Mutable Variables
These variables can be changed after their creation. Both the
elements and their order can be modified.

List: A list in Python is mutable. You can add, remove, or modify elements.

In [17]: # Example: Mutable List


my_list = [1, 2, 3]
my_list[0] = 10 # Changing the first element
print(my_list) # Output: [10, 2, 3]

[10, 2, 3]

localhost:8888/notebooks/Python Basics/Python Basics.ipynb#The-problem-here-is-that-we-have-some-string-data-mixed-with-numeric-data,-and-we-need-to… 14/33


10/10/2024, 23:31 Python Basics - Jupyter Notebook

In [ ]: ​

Set: A set is also mutable, meaning you can add or remove elements. However, a
set does not allow duplicate elements, and the order of elements is not preserved.

In [18]: # Example: Mutable Set


my_set = {1, 2, 3}
my_set.add(4) # Adding an element
print(my_set) # Output: {1, 2, 3, 4}

{1, 2, 3, 4}

In [ ]: ​

Dictionary (dict):You can modify, add, or remove key-value pairs.

In [19]: my_dict = {'a': 1, 'b': 2}


my_dict['c'] = 3 # Adds a new key-value pair ('c': 3)
my_dict['a'] = 10 # Modifies the value of the key 'a'
print(my_dict)

{'a': 10, 'b': 2, 'c': 3}

Key Points to Remember

List: Mutable (Can change elements, add/remove items)

Set: Mutable (Can add/remove elements but elements are


unordered and unique)

Dictionary: You can modify, add, or remove key-value pairs.


Tuple: Immutable (Cannot change, add, or remove elements)

A list is like a box with multiple slots, where you can replace or
remove any item. A set is like a basket that can hold unique
objects, where you can add or remove items, but you can't predict
the order. A tuple is like a sealed container where once the items
are placed inside, you cannot modify them.

In [ ]: ​

In [ ]: ​

Class 4 - October 8, 2024

Exercise
localhost:8888/notebooks/Python Basics/Python Basics.ipynb#The-problem-here-is-that-we-have-some-string-data-mixed-with-numeric-data,-and-we-need-to… 15/33
10/10/2024, 23:31 Python Basics - Jupyter Notebook

Practice Exercise 1:
Problem: Write a program that asks for two numbers and outputs their sum.

Steps:

1. Take user input for two numbers.


2. Convert the input to integers.
3. Add the numbers.
4. Print the result.

Practice Exercise 2:
Problem: We need to create a system that checks if a user can withdraw money based on
their account balance and the amount they want to withdraw.

Steps:
1. Ask the user for their current balance (data type - float).
2. Ask the user how much they want to withdraw (data type - float).
3. Check if the balance is sufficient for the withdrawal.
4. Print a message to approve or deny the withdrawal based on the balance.

Solution to Exercise 1
In [18]: def add(x, y):
# This function takes in x, y

# Change the data type to integer (number)
# and perform addition operation using '+' sign
z = int(x) + int(y)

# Present the result clearly


result = f"The answer is {z}"

# return the result from this function


return result

In [19]: x = input("Enter first number: ")


y = input("Enter second number: ")

# call the 'add' function above
add(x, y)

Enter first number: 5


Enter second number: 2

Out[19]: 'The answer is 7'

In [ ]: ​

localhost:8888/notebooks/Python Basics/Python Basics.ipynb#The-problem-here-is-that-we-have-some-string-data-mixed-with-numeric-data,-and-we-need-to… 16/33


10/10/2024, 23:31 Python Basics - Jupyter Notebook

Solution to Exercise 2
In [21]: # Define a function 'withdraw' that takes in two arguments:
# 'current_balance' (the balance in the account)
# and 'withdraw_amount' (the amount to withdraw).

def withdraw(current_balance, withdraw_amount):

## Convert the input values to float, ensuring
## we are working with numbers that can have decimal points.
current_balance = float(current_balance)
withdraw_amount = float(withdraw_amount)

# Check if the current balance is greater
# than or equal to the withdrawal amount.
if current_balance >= withdraw_amount:
# If there is enough balance, subtract
# the withdrawal amount from the current balance.
new_balance = current_balance - withdraw_amount

# Inform the user that the transaction was approved.


return "Your transaction is approved"

else:
# If there are not enough funds, notify
# the user that the transaction was not approved.
return "Your transaction is not approved"

In [22]: current_balance = input("Enter current_balance: ")


withdraw_amount = input("Enter withdraw_amount: ")

## call withdraw function
withdraw(current_balance, withdraw_amount)

Enter current_balance: 500.45


Enter withdraw_amount: 9800.45

Out[22]: 'Your transaction is not approved'

In [ ]: ​

localhost:8888/notebooks/Python Basics/Python Basics.ipynb#The-problem-here-is-that-we-have-some-string-data-mixed-with-numeric-data,-and-we-need-to… 17/33


10/10/2024, 23:31 Python Basics - Jupyter Notebook

1. Importing NumPy and Loading Data


Start by importing the numpy library and creating or
loading an array.
In [38]: import numpy as np

# Example data: ages of individuals
ages = np.array([40, 70, 35, 25, 45, 80, 55, 23, 65, 30, 75, 50])
print(ages)

[40 70 35 25 45 80 55 23 65 30 75 50]

In [41]: ages.shape

Out[41]: (12,)

In [42]: ages.ndim

Out[42]: 1

2. Basic Descriptive Statistics


With NumPy, you can quickly calculate important
statistics like mean, median, standard deviation, and
more.
In [19]: # Calculate mean (average)
mean_age = np.mean(ages)
print("Mean Age:", mean_age)

Mean Age: 49.416666666666664

In [20]: # Calculate median


median_age = np.median(ages)
print("Median Age:", median_age)

Median Age: 47.5

In [12]: # Calculate standard deviation


std_dev_age = np.std(ages)
print("Standard Deviation of Age:", std_dev_age)

Standard Deviation of Age: 17.260262647673315

In [13]: # Calculate variance


variance_age = np.var(ages)
print("Variance of Age:", variance_age)

Variance of Age: 297.9166666666667

localhost:8888/notebooks/Python Basics/Python Basics.ipynb#The-problem-here-is-that-we-have-some-string-data-mixed-with-numeric-data,-and-we-need-to… 18/33


10/10/2024, 23:31 Python Basics - Jupyter Notebook

In [ ]: ​

3. Data Manipulation with NumPy


You can perform various operations to manipulate the
array data for deeper insights.

Sorting the Data - Sorting is useful to understand the


range of values.
In [24]: # Sort the ages
sorted_ages = np.sort(ages)
print("Sorted Ages:", sorted_ages)

Sorted Ages: [23 25 30 35 40 45 50 55 65 70 75 80]

In [ ]: ​

Finding Minimum and Maximum


It is essential to know the range of your data.
In [27]: # Find the minimum and maximum values
min_age = np.min(ages)
max_age = np.max(ages)

print("Minimum Age:", min_age)
print("Maximum Age:", max_age)

Minimum Age: 23
Maximum Age: 80

In [ ]: ​

Filtering Data Based on Conditions


You can filter data to focus on specific subsets, such
as finding people over a certain age.
In [28]: # Filter ages greater than 50
ages_above_50 = ages[ages > 50]
print("Ages above 50:", ages_above_50)

Ages above 50: [70 80 55 65 75]

localhost:8888/notebooks/Python Basics/Python Basics.ipynb#The-problem-here-is-that-we-have-some-string-data-mixed-with-numeric-data,-and-we-need-to… 19/33


10/10/2024, 23:31 Python Basics - Jupyter Notebook

In [ ]: ​

4. Mathematical Operations on Data


NumPy allows you to perform mathematical operations
across the entire array.

Adding or Subtracting from Array

In [30]: # Increase all ages by 5


increased_ages = ages + 5
print("Increased Ages by 5:", increased_ages)

Increased Ages by 5: [45 75 40 30 50 85 60 28 70 35 80 55]

In [ ]: ​

Exploring Tabular Data with Pandas

Import pandas library


In [24]: import pandas as pd

Create a DataFrame with employee data


In [101]: ​
df_employees = pd.DataFrame({
'Name': ['Rahman', 'James', 'Joy', 'Kelvin', 'Faith'],
'HoursWorked': [35, 40, 45, 30, None],
'MonthlySalary': [3000, 3500, 4000, 2500, 2000]
})

# Display the DataFrame
df_employees

Out[101]:
Name HoursWorked MonthlySalary

0 Rahman 35.0 3000

1 James 40.0 3500

2 Joy 45.0 4000

3 Kelvin 30.0 2500

4 Faith NaN 2000

localhost:8888/notebooks/Python Basics/Python Basics.ipynb#The-problem-here-is-that-we-have-some-string-data-mixed-with-numeric-data,-and-we-need-to… 20/33


10/10/2024, 23:31 Python Basics - Jupyter Notebook

Loading Data from a File, downloaded from


chisquares.com

Read data from your local directory, note you need to copy this file
from your download folder to this working directory (folder), or
make sure to link the file properly.

For .csv file format - read_csv,

For .xlsx file format - read_excel

In [108]: data = pd.read_excel("dataset (13).xlsx")



# display the first five rows
data.head()

Out[108]:
Unique_ID_Chisquares Participant_ID_Chisquares Collection_Wave_Chisquares Data_Sou

0 670559c19e6dfaa1a37e2514 NaN 1

1 670559c29e6dfaa1a37e2516 NaN 1

2 670559c39e6dfaa1a37e2518 NaN 1

3 670559c39e6dfaa1a37e2519 NaN 1

4 670559c39e6dfaa1a37e251a NaN 1

We don't necesarily need to start analysing all the columns

Look for the desired columns for the analysis (hints: columns that
starts with 'X_')

In [109]: data = data[["X_1_25_1_0_name", "X_2_25_1_0_hoursworked",


"X_3_25_1_0_monthlysalary"]]
data.head()

Out[109]:
X_1_25_1_0_name X_2_25_1_0_hoursworked X_3_25_1_0_monthlysalary

0 Olusegun 3000.0 500000

1 Kwame Manu 1500.0 2300

2 J. K. 1600.0 5000

3 Adam Kunle Lawal NaN NaN

4 Amos 2000.0 500000

localhost:8888/notebooks/Python Basics/Python Basics.ipynb#The-problem-here-is-that-we-have-some-string-data-mixed-with-numeric-data,-and-we-need-to… 21/33


10/10/2024, 23:31 Python Basics - Jupyter Notebook

In [ ]: ​

Or you can use this easy method, given the hint above

In [110]: # Select columns that start with 'X_'


data = data.filter(like='X_', axis=1)
data.head()

Out[110]:
X_1_25_1_0_name X_2_25_1_0_hoursworked X_3_25_1_0_monthlysalary

0 Olusegun 3000.0 500000

1 Kwame Manu 1500.0 2300

2 J. K. 1600.0 5000

3 Adam Kunle Lawal NaN NaN

4 Amos 2000.0 500000

In [ ]: ​

Let's rename the column names

For instance X_1_25_1_0_name


From this we can split by '_', then we desire to use the last item, which is 'name'

In [111]: # Step 2: Define a function to get the


# last part after splitting the column name by '_'
def rename_column(col_name):
return col_name.split('_')[5]

# Step 3: Rename the columns using the custom function
data = data.rename(columns=rename_column)

data.head()

Out[111]:
name hoursworked monthlysalary

0 Olusegun 3000.0 500000

1 Kwame Manu 1500.0 2300

2 J. K. 1600.0 5000

3 Adam Kunle Lawal NaN NaN

4 Amos 2000.0 500000

In [ ]: ​

In [ ]: ​

localhost:8888/notebooks/Python Basics/Python Basics.ipynb#The-problem-here-is-that-we-have-some-string-data-mixed-with-numeric-data,-and-we-need-to… 22/33


10/10/2024, 23:31 Python Basics - Jupyter Notebook

Summarizing Data: Get a quick overview of your


dataset using the .describe() method, which provides
statistics like mean, median, standard deviation,
etc.:
In [112]: data.describe()

Out[112]:
hoursworked

count 25.000000

mean 35982.800000

std 170116.833293

min 8.000000

25% 211.000000

50% 1500.000000

75% 2080.000000

max 852425.000000

Quickly get a technical summary of the dataframe


In [113]: data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 49 entries, 0 to 48
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 name 40 non-null object
1 hoursworked 25 non-null float64
2 monthlysalary 24 non-null object
dtypes: float64(1), object(2)
memory usage: 1.3+ KB

localhost:8888/notebooks/Python Basics/Python Basics.ipynb#The-problem-here-is-that-we-have-some-string-data-mixed-with-numeric-data,-and-we-need-to… 23/33


10/10/2024, 23:31 Python Basics - Jupyter Notebook

From the above summary, monthlySalary is also supposed to be


numeric (float or int), so let's explore and clean it. In a real-world 😁
task, there's no guarantee that we will always receive clean data.
😉 Hence, this is the reason why we are taking this course: to be
able to handle this kind of data easily with Python . ✌️
In [114]: # select first 20 rows from monthlysalary column
data['monthlysalary'].head(20)

Out[114]: 0 500000
1 2300
2 5000
3 NaN
4 500000
5 £1500
6 R20000
7 30000
8 NaN
9 $3000
10 NaN
11 207,000
12 1000
13 2500
14 NaN
15 150,000
16 NaN
17 150,000
18 400000
19 145000
Name: monthlysalary, dtype: object

The problem here is that we have some string data


mixed with numeric data, and we need to extract only
the numeric values.
In [ ]: ​

Regex
Regex (short for regular expression) is a sequence of characters that defines a search
pattern. It’s used for finding, matching, and manipulating text.

Here’s a simple breakdown:


Think of it as a search tool: You tell regex what pattern you're looking for in a string,
and it finds it.
Common uses: Find and replace certain words, extract numbers, check if text fits a
pattern (like an email or phone number).

localhost:8888/notebooks/Python Basics/Python Basics.ipynb#The-problem-here-is-that-we-have-some-string-data-mixed-with-numeric-data,-and-we-need-to… 24/33


10/10/2024, 23:31 Python Basics - Jupyter Notebook

Key parts of regex:


Literal characters: These are normal characters like a , b , 1 , etc., that match
exactly.
Metacharacters (special characters):
. (dot): Matches any character.
\d : Matches any digit (0-9).
\w : Matches any letter or number.
\s : Matches spaces.
* , + , ? : Modify how many times something appears (e.g., * means "zero or
more times").
[ ] M h h i id h b k ( [ b ] h ' ' 'b'

Example:
To match all numbers in a string, you might use the pattern \d+ , which means "one or
more digits."

In [115]: import re

text = "Price is $3000 and tax is 500"
pattern = r'\d+' # Match one or more digits

# Find all numbers
numbers = re.findall(pattern, text)
print(numbers) # Output: ['3000', '500']

['3000', '500']

localhost:8888/notebooks/Python Basics/Python Basics.ipynb#The-problem-here-is-that-we-have-some-string-data-mixed-with-numeric-data,-and-we-need-to… 25/33


10/10/2024, 23:31 Python Basics - Jupyter Notebook

Now let's apply this to our monthlysalary data


In [116]: # Replace non-numeric characters and convert to float
data['monthlysalary'] = \
data['monthlysalary'].str.replace(r'[^\d.]', '', regex=True)

data.head(20)

Out[116]:
name hoursworked monthlysalary

0 Olusegun 3000.0 500000

1 Kwame Manu 1500.0 2300

2 J. K. 1600.0 5000

3 Adam Kunle Lawal NaN NaN

4 Amos 2000.0 500000

5 Inimfon 1000.0 1500

6 Kwandokuhle Shabalala 2400.0 20000

7 Michael Oduor Otieno 2080.0 30000

8 Munkaila Tirmizhi Abubakr NaN NaN

9 DUNSIN 960.0 3000

10 NaN NaN NaN

11 Philip Ajibola Bankole 211.0 207000

12 Ola 3000.0 1000

13 Irene Adinorkie Okutu 1500.0 2500

14 Herbert Onuoha NaN NaN

15 Masud 1152.0 150000

16 NaN NaN NaN

17 Emmanuel Oke 8.0 150000

18 Ummu 12000.0 400000

19 Abraham Garpiya Galumje 10100.0 145000

In [94]: # Replace any remaining empty strings with NaN


data['monthlysalary'].replace('', np.nan, inplace=True)

# Convert the column to float
data['monthlysalary'] = data['monthlysalary'].astype(float)

localhost:8888/notebooks/Python Basics/Python Basics.ipynb#The-problem-here-is-that-we-have-some-string-data-mixed-with-numeric-data,-and-we-need-to… 26/33


10/10/2024, 23:31 Python Basics - Jupyter Notebook

In [95]: data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 49 entries, 0 to 48
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 name 40 non-null object
1 hoursworked 25 non-null float64
2 monthlysalary 23 non-null float64
dtypes: float64(2), object(1)
memory usage: 1.3+ KB

🎉 Great news! 🎉
We've successfully transformed the monthlysalary column into the desired numeric
format! 💪
Now it's all cleaned up and ready for further analysis. 🚀
In [ ]: ​

Accessing Data in a DataFrame

There are several ways to access data from a Pandas


DataFrame.

You can access a row by its index using .loc[ ]

In [75]: # Get data for index value 2


data.loc[2]

Out[75]: name J. K.
hoursworked 1600.0
monthlysalary 5000.0
Name: 2, dtype: object

You can access a range of rows by specifying a slice of indices

In [77]: # Get rows from index 0 to 2


data.loc[0:2]

Out[77]:
name hoursworked monthlysalary

0 Olusegun 3000.0 500000.0

1 Kwame Manu 1500.0 2300.0

2 J. K. 1600.0 5000.0

localhost:8888/notebooks/Python Basics/Python Basics.ipynb#The-problem-here-is-that-we-have-some-string-data-mixed-with-numeric-data,-and-we-need-to… 27/33


10/10/2024, 23:31 Python Basics - Jupyter Notebook

You can also select a specific value from the DataFrame using
.iloc[ ], which accesses data by position

In [78]: # Get HoursWorked and MonthlySalary


# for first two rows, and column index 1 and 2
# [row, column]
data.iloc[0:2, [1, 2]]

Out[78]:
hoursworked monthlysalary

0 3000.0 500000.0

1 1500.0 2300.0

Filtering Data
For example, you can filter employees who work more than 40
hours per week:

In [80]: # Filter employees who work more than 1500 hours


data[data['hoursworked'] > 1500]

Out[80]:
name hoursworked monthlysalary

0 Olusegun 3000.0 500000.0

2 J. K. 1600.0 5000.0

4 Amos 2000.0 500000.0

6 Kwandokuhle Shabalala 2400.0 20000.0

7 Michael Oduor Otieno 2080.0 30000.0

12 Ola 3000.0 1000.0

18 Ummu 12000.0 400000.0

19 Abraham Garpiya Galumje 10100.0 145000.0

23 Solomon Oladimeji 1600.0 100000.0

36 Nana 852425.0 3451.0

localhost:8888/notebooks/Python Basics/Python Basics.ipynb#The-problem-here-is-that-we-have-some-string-data-mixed-with-numeric-data,-and-we-need-to… 28/33


10/10/2024, 23:31 Python Basics - Jupyter Notebook

You can achieve the same result using the .query() method:

In [81]: # Using query method to filter


# employees who work more than 1500 hours
data.query('hoursworked > 1500')

Out[81]:
name hoursworked monthlysalary

0 Olusegun 3000.0 500000.0

2 J. K. 1600.0 5000.0

4 Amos 2000.0 500000.0

6 Kwandokuhle Shabalala 2400.0 20000.0

7 Michael Oduor Otieno 2080.0 30000.0

12 Ola 3000.0 1000.0

18 Ummu 12000.0 400000.0

19 Abraham Garpiya Galumje 10100.0 145000.0

23 Solomon Oladimeji 1600.0 100000.0

36 Nana 852425.0 3451.0

In [ ]: ​

In [82]: # Sort the employees by their MonthlySalary


data.sort_values(by='monthlysalary', ascending=False)

Out[82]:
name hoursworked monthlysalary

30 John Lee 1500.0 700000.0

0 Olusegun 3000.0 500000.0

4 Amos 2000.0 500000.0

18 Ummu 12000.0 400000.0

11 Philip Ajibola Bankole 211.0 207000.0

17 Emmanuel Oke 8.0 150000.0

15 Masud 1152.0 150000.0

19 Abraham Garpiya Galumje 10100.0 145000.0

20 Esther 72.0 140000.0

34 Sylvester 8.0 120000.0

23 Solomon Oladimeji 1600.0 100000.0

localhost:8888/notebooks/Python Basics/Python Basics.ipynb#The-problem-here-is-that-we-have-some-string-data-mixed-with-numeric-data,-and-we-need-to… 29/33


10/10/2024, 23:31 Python Basics - Jupyter Notebook

Handling Missing Data


In [83]: # Check for missing values
data.isnull()

Out[83]:
name hoursworked monthlysalary

0 False False False

1 False False False

2 False False False

3 False True True

4 False False False

5 False False False

6 False False False

7 False False False

8 False True True

9 False False False

10 True True True

In [84]: # Fill missing values with a default value (e.g., 0)


data.fillna(0, inplace=True)
data

Out[84]:
name hoursworked monthlysalary

0 Olusegun 3000.0 500000.0

1 Kwame Manu 1500.0 2300.0

2 J. K. 1600.0 5000.0

3 Adam Kunle Lawal 0.0 0.0

4 Amos 2000.0 500000.0

5 Inimfon 1000.0 1500.0

6 Kwandokuhle Shabalala 2400.0 20000.0

7 Michael Oduor Otieno 2080.0 30000.0

8 Munkaila Tirmizhi Abubakr 0.0 0.0

9 DUNSIN 960.0 3000.0

10 0 0.0 0.0

localhost:8888/notebooks/Python Basics/Python Basics.ipynb#The-problem-here-is-that-we-have-some-string-data-mixed-with-numeric-data,-and-we-need-to… 30/33


10/10/2024, 23:31 Python Basics - Jupyter Notebook

In [96]: # Drop rows with missing values


data.dropna(inplace=True)
data

Out[96]:
name hoursworked monthlysalary

0 Olusegun 3000.0 500000.0

1 Kwame Manu 1500.0 2300.0

2 J. K. 1600.0 5000.0

4 Amos 2000.0 500000.0

5 Inimfon 1000.0 1500.0

6 Kwandokuhle Shabalala 2400.0 20000.0

7 Michael Oduor Otieno 2080.0 30000.0

9 DUNSIN 960.0 3000.0

11 Philip Ajibola Bankole 211.0 207000.0

12 Ola 3000.0 1000.0

13 Irene Adinorkie Okutu 1500.0 2500.0

In [97]: # get mean of one or more columns (other methods


# includes .sum(), .min(), .max(),
# .median(), .var() - variance, .quantile(),
# .std() - standard deviation etc.)
data[["hoursworked", "monthlysalary"]].mean()

Out[97]: hoursworked 39067.869565


monthlysalary 138450.043478
dtype: float64

localhost:8888/notebooks/Python Basics/Python Basics.ipynb#The-problem-here-is-that-we-have-some-string-data-mixed-with-numeric-data,-and-we-need-to… 31/33


10/10/2024, 23:31 Python Basics - Jupyter Notebook

In [98]: # Define a function to classify the salary level


def classify_salary(salary):
if salary > 3500:
return 'Senior'
elif salary > 2500:
return 'Mid-Level'
else:
return 'Junior'

# Apply the function to the
# 'MonthlySalary' column to create the 'Level' column
data['level'] = data['monthlysalary'].apply(classify_salary)
data

Out[98]:
name hoursworked monthlysalary level

0 Olusegun 3000.0 500000.0 Senior

1 Kwame Manu 1500.0 2300.0 Junior

2 J. K. 1600.0 5000.0 Senior

4 Amos 2000.0 500000.0 Senior

5 Inimfon 1000.0 1500.0 Junior

6 Kwandokuhle Shabalala 2400.0 20000.0 Senior

7 Michael Oduor Otieno 2080.0 30000.0 Senior

9 DUNSIN 960.0 3000.0 Mid-Level

11 Philip Ajibola Bankole 211.0 207000.0 Senior

12 Ola 3000.0 1000.0 Junior

13 Irene Adinorkie Okutu 1500.0 2500.0 Junior

In [99]: # groupby column 'Level'


print(data.groupby("level")['name'].count())

level
Junior 7
Mid-Level 2
Senior 14
Name: name, dtype: int64

In [100]: # Calculate the mean MonthlySalary for each group


data.groupby("level")["monthlysalary"].mean().reset_index()

Out[100]:
level monthlysalary

0 Junior 1557.142857

1 Mid-Level 3225.500000

2 Senior 226214.285714

In [ ]: ​

localhost:8888/notebooks/Python Basics/Python Basics.ipynb#The-problem-here-is-that-we-have-some-string-data-mixed-with-numeric-data,-and-we-need-to… 32/33


10/10/2024, 23:31 Python Basics - Jupyter Notebook

In [ ]: ​

localhost:8888/notebooks/Python Basics/Python Basics.ipynb#The-problem-here-is-that-we-have-some-string-data-mixed-with-numeric-data,-and-we-need-to… 33/33

You might also like