0% found this document useful (0 votes)
42 views

Python Data Science Toolbox

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

Python Data Science Toolbox

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

print(next(superhero))

PYTHON DATA SCIENCE TOOLBOX (PART 2)

Iterators vs. Iterables Iterating over iterables (2)


The environment has been pre-loaded with the variables flash1 and flash2.
 Create an iterator object small_value over range(3) using the
Try printing out their values with print() and next() to figure out which is
function iter().
an iterable and which is an iterator.
 Using a for loop, iterate over range(3), printing the value for every
iteration. Use num as the loop variable.
Both flash1 and flash2 are iterators.  Create an iterator object googol over range(10 ** 100).

Both flash1 and flash2 are iterables. # Create an iterator for range(3): small_value
small_value = iter(range(3))
flash1 is an iterable and flash2 is an iterator. # Print the values in small_value
Iterating over iterables (1) print(next(small_value))
print(next(small_value))
 Create a for loop to loop over flash and print the values in the list.
Use person as the loop variable. print(next(small_value))
 Create an iterator for the list flash and assign the result to superhero. # Loop over range(3) and print the valuesfor num in range(3):
 Print each of the items from superhero using next() 4 times.
print(num)

# Create a list of strings: flash # Create an iterator for range(10 ** 100): googol

flash = ['jay garrick', 'barry allen', 'wally west', 'bart allen'] googol = iter(range(10 ** 100))

# Print each list item in flash using a for loopfor person in flash: # Print the first 5 values from googol

print(person) print(next(googol))

# Create an iterator for flash: superhero print(next(googol))

superhero = iter(flash) print(next(googol))

# Print each item from the iterator print(next(googol))

print(next(superhero)) print(next(googol))

print(next(superhero)) Iterators as function arguments


print(next(superhero))
 Create a range object that would produce the values from 10 to 20 # Create a list of strings: mutants
using range(). Assign the result to values.
 Use the list() function to create a list of values from the range mutants = ['charles xavier',
object values. Assign the result to values_list. 'bobby drake',
 Use the sum() function to get the sum of the values from 10 to 20
from the range object values. Assign the result to values_sum. 'kurt wagner',
'max eisenhardt',
# Create a range object: values 'kitty pryde']
values = range(10, 21) # Create a list of tuples: mutant_list
# Print the range object mutant_list = list(enumerate(mutants))
print(values) # Print the list of tuples
# Create a list of integers: values_list print(mutant_list)
values_list = list(values) # Unpack and print the tuple pairsfor index1, value1 in enumerate(mutants):
# Print values_list print(index1, value1)
print(values_list) # Change the start indexfor index2, value2 in enumerate(mutants, start=1):
# Get the sum of values: values_sum print(index2, value2)
values_sum = sum(values)
Using zip
# Print values_sum
print(values_sum)  Using zip() with list(), create a list of tuples from the three
lists mutants, aliases, and powers (in that order) and assign the result
Using enumerate to mutant_data.
 Using zip(), create a zip object called mutant_zip from the three
 Create a list of tuples from mutants and assign the result lists mutants, aliases, and powers.
to mutant_list. Make sure you generate the tuples  Complete the for loop by unpacking the zip object you created and
using enumerate() and turn the result from it into a list using list(). printing the tuple values. Use value1, value2, value3 for the values
 Complete the first for loop by unpacking the tuples generated by from each of mutants, aliases, and powers, in that order.
calling enumerate() on mutants. Use index1 for the index
and value1 for the value when unpacking the tuple. # edited/added
 Complete the second for loop similarly as with the first, but this time
change the starting index to start from 1 by passing it in as an aliases = ['prof x', 'iceman', 'nightcrawler', 'magneto', 'shadowcat']
argument to the start parameter of enumerate(). Use index2 for the powers = ['telepathy', 'thermokinesis', 'teleportation', 'magnetokinesis', 'intang
index and value2 for the value when unpacking the tuple. ibility']
# Create a list of tuples: mutant_data print(*z1)
mutant_data = list(zip(mutants, aliases, powers)) # Re-create a zip object from mutants and powers: z1
# Print the list of tuples z1 = zip(mutants, powers)
print(mutant_data) # 'Unzip' the tuples in z1 by unpacking with * and zip(): result1, result2
# Create a zip object using the three lists: mutant_zip result1, result2 = zip(*z1)
mutant_zip = zip(mutants, aliases, powers) # Check if unpacked tuples are equivalent to original tuples
# Print the zip object print(result1 == mutants)
print(mutant_zip) print(result2 == powers)
# Unpack the zip object and print the tuple valuesfor value1, value2, value3 i
n mutant_zip: Processing large amounts of Twitter data
print(value1, value2, value3)  Initialize an empty dictionary counts_dict for storing the results of
processing the Twitter data.
Using * and zip to ‘unzip’  Iterate over the 'tweets.csv' file by using a for loop. Use the loop
variable chunk and iterate over the call to pd.read_csv() with
 Create a zip object by using zip() on mutants and powers, in that order. a chunksize of 10.
Assign the result to z1.  In the inner loop, iterate over the column 'lang' in chunk by using
 Print the tuples in z1 by unpacking them into positional arguments a for loop. Use the loop variable entry.
using the * operator in a print() call.
 Because the previous print() call would have exhausted the elements
# edited/addedimport pandas as pd
in z1, recreate the zip object you defined earlier and assign the result
again to z1. # Initialize an empty dictionary: counts_dict
 ‘Unzip’ the tuples in z1 by unpacking them into positional arguments counts_dict = {}
using the * operator in a zip() call. Assign the results
to result1 and result2, in that order. # Iterate over the file chunk by chunkfor chunk in pd.read_csv('tweets.csv', c
 The last print() statements prints the output of hunksize=10):
comparing result1 to mutants and result2 to powers. Click Submit
Answer to see if the unpacked result1 and result2 are equivalent
to mutants and powers, respectively. # Iterate over the column in DataFrame
for entry in chunk['lang']:
# Create a zip object from mutants and powers: z1 if entry in counts_dict.keys():
z1 = zip(mutants, powers) counts_dict[entry] += 1
# Print the tuples in z1 by unpacking with *
else: for entry in chunk[colname]:
counts_dict[entry] = 1 if entry in counts_dict.keys():
# Print the populated dictionary counts_dict[entry] += 1
print(counts_dict) else:
counts_dict[entry] = 1
Extracting information for large amounts of Twitter data

 Define the function count_entries(), which has 3 parameters. The first # Return counts_dict
parameter is csv_file for the filename, the second is c_size for the
return counts_dict
chunk size, and the last is colname for the column name.
 Iterate over the file in csv_file file by using a for loop. Use the loop # Call count_entries(): result_counts
variable chunk and iterate over the call to pd.read_csv(), result_counts = count_entries('tweets.csv', 10, 'lang')
passing c_size to chunksize.
 In the inner loop, iterate over the column given # Print result_counts
by colname in chunk by using a for loop. Use the loop variable entry. print(result_counts)
 Call the count_entries() function by passing to it the
filename 'tweets.csv', the size of chunks 10, and the name of the
Write a basic list comprehension
column to count, 'lang'. Assign the result of the call to the
variable result_counts. The following list has been pre-loaded in the environment.

doctor = ['house', 'cuddy', 'chase', 'thirteen', 'wilson']


# Define count_entries()def count_entries(csv_file, c_size, colname):
"""Return a dictionary with counts of How would a list comprehension that produces a list of the first character of
each string in doctor look like? Note that the list comprehension uses doc as
occurrences as value for each key.""" the iterator variable. What will the output be?

# Initialize an empty dictionary: counts_dict The list comprehension is [for doc in doctor: doc[0]] and produces the
list ['h', 'c', 'c', 't', 'w'].
counts_dict = {}

The list comprehension is [doc[0] for doc in doctor] and produces the
# Iterate over the file chunk by chunk list ['h', 'c', 'c', 't', 'w'].
for chunk in pd.read_csv(csv_file, chunksize=c_size):
The list comprehension is [doc[0] in doctor] and produces the list ['h', 'c',
'c', 't', 'w'].
# Iterate over the column in DataFrame
List comprehension over iterables  Using the range of numbers from 0 to 9 as your iterable and i as your
You know that list comprehensions can be built over iterables. Given the iterator variable, write a list comprehension that produces a list of
following objects below, which of these can we build list comprehensions numbers consisting of the squared values of i.
over?
# Create list comprehension: squares
doctor = ['house', 'cuddy', 'chase', 'thirteen', 'wilson']
squares = [i**2 for i in range(0,10)]

range(50) Nested list comprehensions

 In the inner list comprehension - that is, the output expression of the
underwood = 'After all, we are nothing more or less than what we choose to r nested list comprehension - create a list of values
eveal.' from 0 to 4 using range(). Use col as the iterator variable.
 In the iterable part of your nested list comprehension, use range() to
count 5 rows - that is, create a list of values from 0 to 4. Use row as
jean = '24601' the iterator variable; note that you won’t be needing this variable to
create values in the list of lists.
flash = ['jay garrick', 'barry allen', 'wally west', 'bart allen']
# Create a 5 x 5 matrix using a list of lists: matrix
matrix = [[col for col in range(5)] for row in range(5)]
valjean = 24601
# Print the matrixfor row in matrix:
print(row)
You can build list comprehensions over all the objects except the string of
number characters jean.
Using conditionals in comprehensions (1)

You can build list comprehensions over all the objects except the string  Use member as the iterator variable in the list comprehension. For the
lists doctor and flash. conditional, use len() to evaluate the iterator variable. Note that you
only want strings with 7 characters or more.
You can build list comprehensions over all the objects except range(50).
# Create a list of strings: fellowship
You can build list comprehensions over all the objects except the integer fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']
object valjean.
# Create list comprehension: new_fellowship
Writing list comprehensions new_fellowship = [member for member in fellowship if len(member) >= 7]
# Print the new list To help with that task, the following code has been pre-loaded in the
environment:
print(new_fellowship)
# List of strings
Using conditionals in comprehensions (2) fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']

 In the output expression, keep the string as-is if the number of


characters is >= 7, else replace it with an empty string - that is, '' or "". # List comprehension
fellow1 = [member for member in fellowship if len(member) >= 7]
# Create a list of strings: fellowship
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']
# Generator expression
# Create list comprehension: new_fellowship
fellow2 = (member for member in fellowship if len(member) >= 7)
new_fellowship = [member if len(member) >= 7 else '' for member in fellow
ship] Try to play around with fellow1 and fellow2 by figuring out their types and
printing out their values. Based on your observations and what you can recall
# Print the new list
from the video, select from the options below the best description for the
print(new_fellowship) difference between list comprehensions and generators.

Dict comprehensions List comprehensions and generators are not different at all; they are just
Create a dict comprehension where the key is a string in fellowship and the different ways of writing the same thing.
value is the length of the string. Remember to use the syntax <key> :
<value> in the output expression part of the comprehension to create the A list comprehension produces a list as output, a generator produces a
members of the dictionary. Use member as the iterator variable. generator object.
# Create a list of strings: fellowship
A list comprehension produces a list as output that can be iterated over, a
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli'] generator produces a generator object that can’t be iterated over.
# Create dict comprehension: new_fellowship
new_fellowship = { member:len(member) for member in fellowship } Write your own generator expressions

# Print the new dictionary  Create a generator object that will produce values from 0 to 30.
print(new_fellowship) Assign the result to result and use num as the iterator variable in the
generator expression.
 Print the first 5 values by using next() appropriately in print().
List comprehensions vs. generators
 Print the rest of the values by using a for loop to iterate over the  Complete the function header for the function get_lengths() that has a
generator object. single parameter, input_list.
 In the for loop in the function definition, yield the length of the strings
# Create generator object: result in input_list.
 Complete the iterable part of the for loop for printing the values
result = (num for num in range(31)) generated by the get_lengths() generator function. Supply the call
# Print the first 5 values to get_lengths(), passing in the list lannister.
print(next(result))
# Create a list of strings
print(next(result))
lannister = ['cersei', 'jaime', 'tywin', 'tyrion', 'joffrey']
print(next(result))
# Define generator function get_lengthsdef get_lengths(input_list):
print(next(result))
"""Generator function that yields the
print(next(result))
length of the strings in input_list."""
# Print the rest of the valuesfor value in result:
print(value)
# Yield the length of a string
Changing the output in generator expressions for person in input_list:
yield len(person)
 Write a generator expression that will generate the lengths of each
string in lannister. Use person as the iterator variable. Assign the # Print the values generated by get_lengths()for value in get_lengths(lanniste
result to lengths. r):
 Supply the correct iterable in the for loop for printing the values in the print(value)
generator object.
List comprehensions for time-stamped data
# Create a list of strings: lannister
lannister = ['cersei', 'jaime', 'tywin', 'tyrion', 'joffrey']  Extract the column 'created_at' from df and assign the result
to tweet_time. Fun fact: the extracted column in tweet_time here is a
# Create a generator object: lengths Series data structure!
lengths = (len(person) for person in lannister)  Create a list comprehension that extracts the time from each row
in tweet_time. Each row is a string that represents a timestamp, and
# Iterate over and print the values in lengthsfor value in lengths: you will access the 12th to 19th characters in the string to extract the
print(value) time. Use entry as the iterator variable and assign the result
to tweet_clock_time. Remember that Python uses 0-based indexing!
Build a generator
# edited/added  Create a zip object by calling zip() and passing to
it feature_names and row_vals. Assign the result to zipped_lists.
df = pd.read_csv('tweets.csv')  Create a dictionary from the zipped_lists zip object by
# Extract the created_at column from df: tweet_time calling dict() with zipped_lists. Assign the resulting dictionary
to rs_dict.
tweet_time = df['created_at']
# Extract the clock time: tweet_clock_time # edited/added
tweet_clock_time = [entry[11:19] for entry in tweet_time] feature_names = ['CountryName', 'CountryCode', 'IndicatorName', 'Indicator
# Print the extracted times Code', 'Year', 'Value']
print(tweet_clock_time) row_vals = ['Arab World', 'ARB', 'Adolescent fertility rate (births per 1,000 w
omen ages 15-19)', 'SP.ADO.TFRT', '1960', '133.56090740552298']
Conditional list comprehensions for time-stamped data # Zip lists: zipped_lists
zipped_lists = zip(feature_names, row_vals)
 Extract the column 'created_at' from df and assign the result
to tweet_time. # Create a dictionary: rs_dict
 Create a list comprehension that extracts the time from each row rs_dict = dict(zipped_lists)
in tweet_time. Each row is a string that represents a timestamp, and
you will access the 12th to 19th characters in the string to extract the # Print the dictionary
time. Use entry as the iterator variable and assign the result print(rs_dict)
to tweet_clock_time. Additionally, add a conditional expression that
checks whether entry[17:19] is equal to '19'. Writing a function to help you

# Extract the created_at column from df: tweet_time  Define the function lists2dict() with two parameters: first is list1 and
tweet_time = df['created_at'] second is list2.
 Return the resulting dictionary rs_dict in lists2dict().
# Extract the clock time: tweet_clock_time  Call the lists2dict() function with the
tweet_clock_time = [entry[11:19] for entry in tweet_time if entry[17:19] == ' arguments feature_names and row_vals. Assign the result of the
19'] function call to rs_fxn.
# Print the extracted times
# Define lists2dict()def lists2dict(list1, list2):
print(tweet_clock_time)
"""Return a dictionary where list1 provides
Dictionaries for data science the keys and list2 provides the values."""
# Zip lists: zipped_lists print(row_lists[1])
zipped_lists = zip(list1, list2) # Turn list of lists into list of dicts: list_of_dicts
list_of_dicts = [lists2dict(feature_names, sublist) for sublist in row_lists]
# Create a dictionary: rs_dict # Print the first two dictionaries in list_of_dicts
rs_dict = dict(zipped_lists) print(list_of_dicts[0])
print(list_of_dicts[1])
# Return the dictionary
Turning this all into a DataFrame
return rs_dict
# Call lists2dict: rs_fxn  To use the DataFrame() function you need, first import the pandas
package with the alias pd.
rs_fxn = lists2dict(feature_names, row_vals)
 Create a DataFrame from the list of dictionaries in list_of_dicts by
# Print rs_fxn calling pd.DataFrame(). Assign the resulting DataFrame to df.
print(rs_fxn)  Inspect the contents of df printing the head of the DataFrame. Head of
the DataFrame df can be accessed by calling df.head().
Using a list comprehension
# Import the pandas packageimport pandas as pd
 Inspect the contents of row_lists by printing the first two lists # Turn list of lists into list of dicts: list_of_dicts
in row_lists.
 Create a list comprehension that generates a dictionary list_of_dicts = [lists2dict(feature_names, sublist) for sublist in row_lists]
using lists2dict() for each sublist in row_lists. The keys are from # Turn list of dicts into a DataFrame: df
the feature_names list and the values are the row entries in row_lists.
df = pd.DataFrame(list_of_dicts)
Use sublist as your iterator variable and assign the resulting list of
dictionaries to list_of_dicts. # Print the head of the DataFrame
 Look at the first two dictionaries in list_of_dicts by printing them out. print(df.head())

# edited/addedimport csvwith open('row_lists.csv', 'r', newline='') as csvfile: Processing data in chunks (1)
reader = csv.reader(csvfile)
 Use open() to bind the csv file 'world_dev_ind.csv' as file in the
row_lists = [row for row in reader] context manager.
# Print the first two lists in row_lists  Complete the for loop so that it iterates 1000 times to perform the
loop body and process only the first 1000 rows of data of the file.
print(row_lists[0])
# Open a connection to the filewith open('world_dev_ind.csv') as file: Writing a generator to load data in chunks (2)

 In the function read_large_file(), read a line from file_object by using


# Skip the column names the method readline(). Assign the result to data.
 In the function read_large_file(), yield the line read from the file data.
file.readline()  In the context manager, create a generator object gen_file by calling
your generator function read_large_file() and passing file to it.
 Print the first three lines produced by the generator
# Initialize an empty dictionary: counts_dict
object gen_file using next().
counts_dict = {}
# Define read_large_file()def read_large_file(file_object):
# Process only the first 1000 rows """A generator function to read a large file lazily."""
for j in range(0, 1000):
# Loop indefinitely until the end of the file
# Split the current line into a list: line while True:
line = file.readline().split(',')
# Read a line from the file: data
# Get the value for the first column: first_col data = file_object.readline()
first_col = line[0]
# Break if this is the end of the file
# If the column value is in the dict, increment its value if not data:
if first_col in counts_dict.keys(): break
counts_dict[first_col] += 1
# Yield the line of data
# Else, add to the dict and set value to 1 yield data
else: # Open a connection to the filewith open('world_dev_ind.csv') as file:
counts_dict[first_col] = 1
# Print the resulting dictionary # Create a generator object for the file: gen_file
print(counts_dict) gen_file = read_large_file(file)
print(counts_dict)
# Print the first three lines of the file
Writing an iterator to load data in chunks (1)
print(next(gen_file))
print(next(gen_file))  Use pd.read_csv() to read in 'ind_pop.csv' in chunks of size 10.
Assign the result to df_reader.
print(next(gen_file))
 Print the first two chunks from df_reader.
Writing a generator to load data in chunks (3)
# Import the pandas packageimport pandas as pd
 Bind the file 'world_dev_ind.csv' to file in the context manager # Initialize reader object: df_reader
with open().
df_reader = pd.read_csv('ind_pop.csv', chunksize=10)
 Complete the for loop so that it iterates over the generator from the
call to read_large_file() to process all the rows of the file. # Print two chunks
print(next(df_reader))
# Initialize an empty dictionary: counts_dict print(next(df_reader))
counts_dict = {}
# Open a connection to the filewith open('world_dev_ind.csv') as file: Writing an iterator to load data in chunks (2)

 Use pd.read_csv() to read in the file in 'ind_pop_data.csv' in chunks


# Iterate over the generator from read_large_file() of size 1000. Assign the result to urb_pop_reader.
 Get the first DataFrame chunk from the iterable urb_pop_reader and
for line in read_large_file(file):
assign this to df_urb_pop.
 Select only the rows of df_urb_pop that have
row = line.split(',') a 'CountryCode' of 'CEB'. To do this, compare
whether df_urb_pop['CountryCode'] is equal to 'CEB' within the
first_col = row[0] square brackets in df_urb_pop[____].
 Using zip(), zip together the 'Total Population' and 'Urban population
(% of total)' columns of df_pop_ceb. Assign the resulting zip object
if first_col in counts_dict.keys(): to pops.
counts_dict[first_col] += 1
else: # Initialize reader object: urb_pop_reader
counts_dict[first_col] = 1 urb_pop_reader = pd.read_csv('ind_pop_data.csv', chunksize=1000)
# Print # Get the first DataFrame chunk: df_urb_pop
df_urb_pop = next(urb_pop_reader) df_urb_pop = next(urb_pop_reader)
# Check out the head of the DataFrame df_pop_ceb = df_urb_pop[df_urb_pop['CountryCode'] == 'CEB']
print(df_urb_pop.head()) pops = zip(df_pop_ceb['Total Population'],
# Check out specific country: df_pop_ceb df_pop_ceb['Urban population (% of total)'])
df_pop_ceb = df_urb_pop[df_urb_pop['CountryCode'] == 'CEB'] pops_list = list(pops)
# Zip DataFrame columns of interest: pops # Use list comprehension to create new DataFrame column 'Total Urban Pop
pops = zip(df_pop_ceb['Total Population'], ulation'

df_pop_ceb['Urban population (% of total)']) df_pop_ceb['Total Urban Population'] = [int(tup[0] * tup[1] * 0.01) for tup in
pops_list]
# Turn zip object into list: pops_list
# Plot urban population data
pops_list = list(pops)
df_pop_ceb.plot(kind='scatter', x='Year', y='Total Urban Population')
# Print pops_list
plt.show()
print(pops_list)
Writing an iterator to load data in chunks (4)
Writing an iterator to load data in chunks (3)
 Initialize an empty DataFrame data using pd.DataFrame().
 Write a list comprehension to generate a list of values  In the for loop, iterate over urb_pop_reader to be able to process all
from pops_list for the new column 'Total Urban Population'. the DataFrame chunks in the dataset.
The output expression should be the product of the first and second  Concatenate data and df_pop_ceb by passing a list of the DataFrames
element in each tuple in pops_list. Because the 2nd element is a to pd.concat().
percentage, you also need to either multiply the result by 0.01 or
divide it by 100. In addition, note that the column 'Total Urban
# Initialize reader object: urb_pop_reader
Population' should only be able to take on integer values. To ensure
this, make sure you cast the output expression to an integer with int(). urb_pop_reader = pd.read_csv('ind_pop_data.csv', chunksize=1000)
 Create a scatter plot where the x-axis are values from # Initialize empty DataFrame: data
the 'Year' column and the y-axis are values from the 'Total Urban
Population' column. data = pd.DataFrame()
# Iterate over each DataFrame chunkfor df_urb_pop in urb_pop_reader:
# edited/addedimport matplotlib.pyplot as plt
# Code from previous exercise # Check out specific country: df_pop_ceb
urb_pop_reader = pd.read_csv('ind_pop_data.csv', chunksize=1000) df_pop_ceb = df_urb_pop[df_urb_pop['CountryCode'] == 'CEB']
# Zip DataFrame columns of interest: pops # Initialize reader object: urb_pop_reader
pops = zip(df_pop_ceb['Total Population'], urb_pop_reader = pd.read_csv(filename, chunksize=1000)
df_pop_ceb['Urban population (% of total)'])
# Initialize empty DataFrame: data
# Turn zip object into list: pops_list data = pd.DataFrame()
pops_list = list(pops)
# Iterate over each DataFrame chunk
# Use list comprehension to create new DataFrame column 'Total Urban P for df_urb_pop in urb_pop_reader:
opulation' # Check out specific country: df_pop_ceb
df_pop_ceb['Total Urban Population'] = [int(tup[0] * tup[1] * 0.01) for tup df_pop_ceb = df_urb_pop[df_urb_pop['CountryCode'] == country_code]
in pops_list]

# Zip DataFrame columns of interest: pops


# Concatenate DataFrame chunk to the end of data: data
pops = zip(df_pop_ceb['Total Population'],
data = pd.concat([data, df_pop_ceb])
df_pop_ceb['Urban population (% of total)'])
# Plot urban population data
data.plot(kind='scatter', x='Year', y='Total Urban Population')
# Turn zip object into list: pops_list
plt.show()
pops_list = list(pops)
Writing an iterator to load data in chunks (5)
# Use list comprehension to create new DataFrame column 'Total Urba
 Define the function plot_pop() that has two arguments: first
n Population'
is filename for the file to process and second is country_code for the
country to be processed in the dataset. df_pop_ceb['Total Urban Population'] = [int(tup[0] * tup[1] * 0.01) for t
 Call plot_pop() to process the data for country code 'CEB' in the up in pops_list]
file 'ind_pop_data.csv'.
 Call plot_pop() to process the data for country code 'ARB' in the
file 'ind_pop_data.csv'. # Concatenate DataFrame chunk to the end of data: data
data = pd.concat([data, df_pop_ceb])
# Define plot_pop()def plot_pop(filename, country_code):
# Plot urban population data
data.plot(kind='scatter', x='Year', y='Total Urban Population')
plt.show()
# Set the filename: fn
fn = 'ind_pop_data.csv'
# Call plot_pop for country code 'CEB'
plot_pop(fn, 'CEB')
# Call plot_pop for country code 'ARB'
plot_pop(fn, 'ARB')

You might also like