0% found this document useful (0 votes)
196 views5 pages

DS100-1 WS 2.5 Enrico, DM

This document contains code examples for cleaning and manipulating data in Python. It includes examples of importing CSV files, concatenating files, melting a dataframe, extracting columns from another column, and merging files. The examples demonstrate common data cleaning tasks like handling missing data, transforming column types, and joining datasets.

Uploaded by

Analyn Enrico
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
196 views5 pages

DS100-1 WS 2.5 Enrico, DM

This document contains code examples for cleaning and manipulating data in Python. It includes examples of importing CSV files, concatenating files, melting a dataframe, extracting columns from another column, and merging files. The examples demonstrate common data cleaning tasks like handling missing data, transforming column types, and joining datasets.

Uploaded by

Analyn Enrico
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Worksheet 2.

5
DS100-1
CLEANING DATA IN PYTHON
APPLIED DATA SCIENCE
Name:

Enrico, Dionne Marc L. Page 1 of 1

Write codes in Jupyter notebook as required by the problems. Copy both code and output as screen grab or screen shot and paste
them here.

1 Import literacy_birth_rate.csv and assign into a dataframe named data_1. Write a code that explores this
dataframe. List at least 4 problems associated with this dataframe.
Code and Output

import pandas as pd
literacy_birth_rate_df = pd.read_csv("literacy_birth_rate.csv")
print(literacy_birth_rate_df.info())

2 Import the following files: uber_apr.csv, uber_may.csv, uber_jun.csv. Concatenate these files into a single file,
uber. Print the first 6 lines of the resulting DataFrame. Ensure that the indexes are in order.
Code and Output

import pandas as pd
# Read in the csv files using the read_csv function
apr_df = pd.read_csv("uber_apr.csv")
Page 1 of 5
may_df = pd.read_csv("uber_may.csv")
jun_df = pd.read_csv("uber_jun.csv")

uber_df = pd.concat([apr_df, may_df, jun_df])


print(uber_df.head(6))

3 Import tuberculosis.csv. Print the first five lines. Melt the DataFrame, keeping the country and year columns fixed.
Print the last five lines of the melted DataFrame.
Code and Output

import pandas as pd
tb = pd.read_csv("tuberculosis.csv")
print(tb.head())
tb_melt = pd.melt(tb, id_vars=['country', 'year'])
print(tb_melt.tail())

Page 2 of 5
4 Use the melted DataFrame in the previous problem. Create (and populate) a gender and an age column from the variable
column. Print the first five lines of the resulting DataFrame. Convert the age column to a numeric data type. Hint: use
pd.to_numeric, with the errors parameter equal to ‘coerce’. Show evidence that this column has indeed been
transformed into a numeric.
Code and Output

import pandas as pd
tb = pd.read_csv("tuberculosis.csv")
print(tb.head())
tb_melt = pd.melt(tb, id_vars=['country', 'year'])
print(tb_melt.tail())
tb_melt['gender'] = tb_melt.variable.str[0]
tb_melt['age'] = tb_melt.variable.str[1:]
print(tb_melt.head())
tb_melt['age']=pd.to_numeric(tb_melt['age'], errors='coerce')
print(tb_melt.info(['age']))
print("The age column is now in float64")

Page 3 of 5
5 Merge the files site.csv and visited.csv into a single dataframe. Use the column name of site and the column
site of visited. Make sure that the index labels are in order. Print the resulting dataframe.
Code and Output

import pandas as pd
# merging two csv files

df = pd.concat(
map(pd.read_csv, ['site.csv', 'visited.csv']), ignore_index=True)
print(df)

Page 4 of 5
Page 5 of 5

You might also like