0% found this document useful (0 votes)

42 views

Python Data Science Toolbox

Uploaded by

Anh Thư Trần Võ

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views

Python Data Science Toolbox

Uploaded by

Anh Thư Trần Võ

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

print(next(superhero))

PYTHON DATA SCIENCE TOOLBOX (PART 2)

Iterators vs. Iterables Iterating over iterables (2)

The environment has been pre-loaded with the variables flash1 and flash2.
 Create an iterator object small_value over range(3) using the
Try printing out their values with print() and next() to figure out which is
function iter().
an iterable and which is an iterator.
 Using a for loop, iterate over range(3), printing the value for every
iteration. Use num as the loop variable.
Both flash1 and flash2 are iterators.  Create an iterator object googol over range(10 ** 100).

Both flash1 and flash2 are iterables. # Create an iterator for range(3): small_value
small_value = iter(range(3))
flash1 is an iterable and flash2 is an iterator. # Print the values in small_value
Iterating over iterables (1) print(next(small_value))
print(next(small_value))
 Create a for loop to loop over flash and print the values in the list.
Use person as the loop variable. print(next(small_value))
 Create an iterator for the list flash and assign the result to superhero. # Loop over range(3) and print the valuesfor num in range(3):
 Print each of the items from superhero using next() 4 times.
print(num)

# Create a list of strings: flash # Create an iterator for range(10 ** 100): googol

flash = ['jay garrick', 'barry allen', 'wally west', 'bart allen'] googol = iter(range(10 ** 100))

# Print each list item in flash using a for loopfor person in flash: # Print the first 5 values from googol

print(person) print(next(googol))

# Create an iterator for flash: superhero print(next(googol))

superhero = iter(flash) print(next(googol))

# Print each item from the iterator print(next(googol))

print(next(superhero)) print(next(googol))

print(next(superhero)) Iterators as function arguments

print(next(superhero))
 Create a range object that would produce the values from 10 to 20 # Create a list of strings: mutants
using range(). Assign the result to values.
 Use the list() function to create a list of values from the range mutants = ['charles xavier',
object values. Assign the result to values_list. 'bobby drake',
 Use the sum() function to get the sum of the values from 10 to 20
from the range object values. Assign the result to values_sum. 'kurt wagner',
'max eisenhardt',
# Create a range object: values 'kitty pryde']
values = range(10, 21) # Create a list of tuples: mutant_list
# Print the range object mutant_list = list(enumerate(mutants))
print(values) # Print the list of tuples
# Create a list of integers: values_list print(mutant_list)
values_list = list(values) # Unpack and print the tuple pairsfor index1, value1 in enumerate(mutants):
# Print values_list print(index1, value1)
print(values_list) # Change the start indexfor index2, value2 in enumerate(mutants, start=1):
# Get the sum of values: values_sum print(index2, value2)
values_sum = sum(values)
Using zip
# Print values_sum
print(values_sum)  Using zip() with list(), create a list of tuples from the three
lists mutants, aliases, and powers (in that order) and assign the result
Using enumerate to mutant_data.
 Using zip(), create a zip object called mutant_zip from the three
 Create a list of tuples from mutants and assign the result lists mutants, aliases, and powers.
to mutant_list. Make sure you generate the tuples  Complete the for loop by unpacking the zip object you created and
using enumerate() and turn the result from it into a list using list(). printing the tuple values. Use value1, value2, value3 for the values
 Complete the first for loop by unpacking the tuples generated by from each of mutants, aliases, and powers, in that order.
calling enumerate() on mutants. Use index1 for the index
and value1 for the value when unpacking the tuple. # edited/added
 Complete the second for loop similarly as with the first, but this time
change the starting index to start from 1 by passing it in as an aliases = ['prof x', 'iceman', 'nightcrawler', 'magneto', 'shadowcat']
argument to the start parameter of enumerate(). Use index2 for the powers = ['telepathy', 'thermokinesis', 'teleportation', 'magnetokinesis', 'intang
index and value2 for the value when unpacking the tuple. ibility']
# Create a list of tuples: mutant_data print(*z1)
mutant_data = list(zip(mutants, aliases, powers)) # Re-create a zip object from mutants and powers: z1
# Print the list of tuples z1 = zip(mutants, powers)
print(mutant_data) # 'Unzip' the tuples in z1 by unpacking with * and zip(): result1, result2
# Create a zip object using the three lists: mutant_zip result1, result2 = zip(*z1)
mutant_zip = zip(mutants, aliases, powers) # Check if unpacked tuples are equivalent to original tuples
# Print the zip object print(result1 == mutants)
print(mutant_zip) print(result2 == powers)
# Unpack the zip object and print the tuple valuesfor value1, value2, value3 i
n mutant_zip: Processing large amounts of Twitter data
print(value1, value2, value3)  Initialize an empty dictionary counts_dict for storing the results of
processing the Twitter data.
Using * and zip to ‘unzip’  Iterate over the 'tweets.csv' file by using a for loop. Use the loop
variable chunk and iterate over the call to pd.read_csv() with
 Create a zip object by using zip() on mutants and powers, in that order. a chunksize of 10.
Assign the result to z1.  In the inner loop, iterate over the column 'lang' in chunk by using
 Print the tuples in z1 by unpacking them into positional arguments a for loop. Use the loop variable entry.
using the * operator in a print() call.
 Because the previous print() call would have exhausted the elements
# edited/addedimport pandas as pd
in z1, recreate the zip object you defined earlier and assign the result
again to z1. # Initialize an empty dictionary: counts_dict
 ‘Unzip’ the tuples in z1 by unpacking them into positional arguments counts_dict = {}
using the * operator in a zip() call. Assign the results
to result1 and result2, in that order. # Iterate over the file chunk by chunkfor chunk in pd.read_csv('tweets.csv', c
 The last print() statements prints the output of hunksize=10):
comparing result1 to mutants and result2 to powers. Click Submit
Answer to see if the unpacked result1 and result2 are equivalent
to mutants and powers, respectively. # Iterate over the column in DataFrame
for entry in chunk['lang']:
# Create a zip object from mutants and powers: z1 if entry in counts_dict.keys():
z1 = zip(mutants, powers) counts_dict[entry] += 1
# Print the tuples in z1 by unpacking with *
else: for entry in chunk[colname]:
counts_dict[entry] = 1 if entry in counts_dict.keys():
# Print the populated dictionary counts_dict[entry] += 1
print(counts_dict) else:
counts_dict[entry] = 1
Extracting information for large amounts of Twitter data

 Define the function count_entries(), which has 3 parameters. The first # Return counts_dict
parameter is csv_file for the filename, the second is c_size for the
return counts_dict
chunk size, and the last is colname for the column name.
 Iterate over the file in csv_file file by using a for loop. Use the loop # Call count_entries(): result_counts
variable chunk and iterate over the call to pd.read_csv(), result_counts = count_entries('tweets.csv', 10, 'lang')
passing c_size to chunksize.
 In the inner loop, iterate over the column given # Print result_counts
by colname in chunk by using a for loop. Use the loop variable entry. print(result_counts)
 Call the count_entries() function by passing to it the
filename 'tweets.csv', the size of chunks 10, and the name of the
Write a basic list comprehension
column to count, 'lang'. Assign the result of the call to the
variable result_counts. The following list has been pre-loaded in the environment.

doctor = ['house', 'cuddy', 'chase', 'thirteen', 'wilson']

# Define count_entries()def count_entries(csv_file, c_size, colname):
"""Return a dictionary with counts of How would a list comprehension that produces a list of the first character of
each string in doctor look like? Note that the list comprehension uses doc as
occurrences as value for each key.""" the iterator variable. What will the output be?

# Initialize an empty dictionary: counts_dict The list comprehension is [for doc in doctor: doc[0]] and produces the
list ['h', 'c', 'c', 't', 'w'].
counts_dict = {}

The list comprehension is [doc[0] for doc in doctor] and produces the
# Iterate over the file chunk by chunk list ['h', 'c', 'c', 't', 'w'].
for chunk in pd.read_csv(csv_file, chunksize=c_size):
The list comprehension is [doc[0] in doctor] and produces the list ['h', 'c',
'c', 't', 'w'].
# Iterate over the column in DataFrame
List comprehension over iterables  Using the range of numbers from 0 to 9 as your iterable and i as your
You know that list comprehensions can be built over iterables. Given the iterator variable, write a list comprehension that produces a list of
following objects below, which of these can we build list comprehensions numbers consisting of the squared values of i.
over?
# Create list comprehension: squares
doctor = ['house', 'cuddy', 'chase', 'thirteen', 'wilson']
squares = [i**2 for i in range(0,10)]

range(50) Nested list comprehensions

 In the inner list comprehension - that is, the output expression of the
underwood = 'After all, we are nothing more or less than what we choose to r nested list comprehension - create a list of values
eveal.' from 0 to 4 using range(). Use col as the iterator variable.
 In the iterable part of your nested list comprehension, use range() to
count 5 rows - that is, create a list of values from 0 to 4. Use row as
jean = '24601' the iterator variable; note that you won’t be needing this variable to
create values in the list of lists.
flash = ['jay garrick', 'barry allen', 'wally west', 'bart allen']
# Create a 5 x 5 matrix using a list of lists: matrix
matrix = [[col for col in range(5)] for row in range(5)]
valjean = 24601
# Print the matrixfor row in matrix:
print(row)
You can build list comprehensions over all the objects except the string of
number characters jean.
Using conditionals in comprehensions (1)

You can build list comprehensions over all the objects except the string  Use member as the iterator variable in the list comprehension. For the
lists doctor and flash. conditional, use len() to evaluate the iterator variable. Note that you
only want strings with 7 characters or more.
You can build list comprehensions over all the objects except range(50).
# Create a list of strings: fellowship
You can build list comprehensions over all the objects except the integer fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']
object valjean.
# Create list comprehension: new_fellowship
Writing list comprehensions new_fellowship = [member for member in fellowship if len(member) >= 7]
# Print the new list To help with that task, the following code has been pre-loaded in the
environment:
print(new_fellowship)
# List of strings
Using conditionals in comprehensions (2) fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']

 In the output expression, keep the string as-is if the number of

characters is >= 7, else replace it with an empty string - that is, '' or "". # List comprehension
fellow1 = [member for member in fellowship if len(member) >= 7]
# Create a list of strings: fellowship
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']
# Generator expression
# Create list comprehension: new_fellowship
fellow2 = (member for member in fellowship if len(member) >= 7)
new_fellowship = [member if len(member) >= 7 else '' for member in fellow
ship] Try to play around with fellow1 and fellow2 by figuring out their types and
printing out their values. Based on your observations and what you can recall
# Print the new list
from the video, select from the options below the best description for the
print(new_fellowship) difference between list comprehensions and generators.

Dict comprehensions List comprehensions and generators are not different at all; they are just
Create a dict comprehension where the key is a string in fellowship and the different ways of writing the same thing.
value is the length of the string. Remember to use the syntax <key> :
<value> in the output expression part of the comprehension to create the A list comprehension produces a list as output, a generator produces a
members of the dictionary. Use member as the iterator variable. generator object.
# Create a list of strings: fellowship
A list comprehension produces a list as output that can be iterated over, a
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli'] generator produces a generator object that can’t be iterated over.
# Create dict comprehension: new_fellowship
new_fellowship = { member:len(member) for member in fellowship } Write your own generator expressions

# Print the new dictionary  Create a generator object that will produce values from 0 to 30.
print(new_fellowship) Assign the result to result and use num as the iterator variable in the
generator expression.
 Print the first 5 values by using next() appropriately in print().
List comprehensions vs. generators
 Print the rest of the values by using a for loop to iterate over the  Complete the function header for the function get_lengths() that has a
generator object. single parameter, input_list.
 In the for loop in the function definition, yield the length of the strings
# Create generator object: result in input_list.
 Complete the iterable part of the for loop for printing the values
result = (num for num in range(31)) generated by the get_lengths() generator function. Supply the call
# Print the first 5 values to get_lengths(), passing in the list lannister.
print(next(result))
# Create a list of strings
print(next(result))
lannister = ['cersei', 'jaime', 'tywin', 'tyrion', 'joffrey']
print(next(result))
# Define generator function get_lengthsdef get_lengths(input_list):
print(next(result))
"""Generator function that yields the
print(next(result))
length of the strings in input_list."""
# Print the rest of the valuesfor value in result:
print(value)
# Yield the length of a string
Changing the output in generator expressions for person in input_list:
yield len(person)
 Write a generator expression that will generate the lengths of each
string in lannister. Use person as the iterator variable. Assign the # Print the values generated by get_lengths()for value in get_lengths(lanniste
result to lengths. r):
 Supply the correct iterable in the for loop for printing the values in the print(value)
generator object.
List comprehensions for time-stamped data
# Create a list of strings: lannister
lannister = ['cersei', 'jaime', 'tywin', 'tyrion', 'joffrey']  Extract the column 'created_at' from df and assign the result
to tweet_time. Fun fact: the extracted column in tweet_time here is a
# Create a generator object: lengths Series data structure!
lengths = (len(person) for person in lannister)  Create a list comprehension that extracts the time from each row
in tweet_time. Each row is a string that represents a timestamp, and
# Iterate over and print the values in lengthsfor value in lengths: you will access the 12th to 19th characters in the string to extract the
print(value) time. Use entry as the iterator variable and assign the result
to tweet_clock_time. Remember that Python uses 0-based indexing!
Build a generator
# edited/added  Create a zip object by calling zip() and passing to
it feature_names and row_vals. Assign the result to zipped_lists.
df = pd.read_csv('tweets.csv')  Create a dictionary from the zipped_lists zip object by
# Extract the created_at column from df: tweet_time calling dict() with zipped_lists. Assign the resulting dictionary
to rs_dict.
tweet_time = df['created_at']
# Extract the clock time: tweet_clock_time # edited/added
tweet_clock_time = [entry[11:19] for entry in tweet_time] feature_names = ['CountryName', 'CountryCode', 'IndicatorName', 'Indicator
# Print the extracted times Code', 'Year', 'Value']
print(tweet_clock_time) row_vals = ['Arab World', 'ARB', 'Adolescent fertility rate (births per 1,000 w
omen ages 15-19)', 'SP.ADO.TFRT', '1960', '133.56090740552298']
Conditional list comprehensions for time-stamped data # Zip lists: zipped_lists
zipped_lists = zip(feature_names, row_vals)
 Extract the column 'created_at' from df and assign the result
to tweet_time. # Create a dictionary: rs_dict
 Create a list comprehension that extracts the time from each row rs_dict = dict(zipped_lists)
in tweet_time. Each row is a string that represents a timestamp, and
you will access the 12th to 19th characters in the string to extract the # Print the dictionary
time. Use entry as the iterator variable and assign the result print(rs_dict)
to tweet_clock_time. Additionally, add a conditional expression that
checks whether entry[17:19] is equal to '19'. Writing a function to help you

# Extract the created_at column from df: tweet_time  Define the function lists2dict() with two parameters: first is list1 and
tweet_time = df['created_at'] second is list2.
 Return the resulting dictionary rs_dict in lists2dict().
# Extract the clock time: tweet_clock_time  Call the lists2dict() function with the
tweet_clock_time = [entry[11:19] for entry in tweet_time if entry[17:19] == ' arguments feature_names and row_vals. Assign the result of the
19'] function call to rs_fxn.
# Print the extracted times
# Define lists2dict()def lists2dict(list1, list2):
print(tweet_clock_time)
"""Return a dictionary where list1 provides
Dictionaries for data science the keys and list2 provides the values."""
# Zip lists: zipped_lists print(row_lists[1])
zipped_lists = zip(list1, list2) # Turn list of lists into list of dicts: list_of_dicts
list_of_dicts = [lists2dict(feature_names, sublist) for sublist in row_lists]
# Create a dictionary: rs_dict # Print the first two dictionaries in list_of_dicts
rs_dict = dict(zipped_lists) print(list_of_dicts[0])
print(list_of_dicts[1])
# Return the dictionary
Turning this all into a DataFrame
return rs_dict
# Call lists2dict: rs_fxn  To use the DataFrame() function you need, first import the pandas
package with the alias pd.
rs_fxn = lists2dict(feature_names, row_vals)
 Create a DataFrame from the list of dictionaries in list_of_dicts by
# Print rs_fxn calling pd.DataFrame(). Assign the resulting DataFrame to df.
print(rs_fxn)  Inspect the contents of df printing the head of the DataFrame. Head of
the DataFrame df can be accessed by calling df.head().
Using a list comprehension
# Import the pandas packageimport pandas as pd
 Inspect the contents of row_lists by printing the first two lists # Turn list of lists into list of dicts: list_of_dicts
in row_lists.
 Create a list comprehension that generates a dictionary list_of_dicts = [lists2dict(feature_names, sublist) for sublist in row_lists]
using lists2dict() for each sublist in row_lists. The keys are from # Turn list of dicts into a DataFrame: df
the feature_names list and the values are the row entries in row_lists.
df = pd.DataFrame(list_of_dicts)
Use sublist as your iterator variable and assign the resulting list of
dictionaries to list_of_dicts. # Print the head of the DataFrame
 Look at the first two dictionaries in list_of_dicts by printing them out. print(df.head())

# edited/addedimport csvwith open('row_lists.csv', 'r', newline='') as csvfile: Processing data in chunks (1)
reader = csv.reader(csvfile)
 Use open() to bind the csv file 'world_dev_ind.csv' as file in the
row_lists = [row for row in reader] context manager.
# Print the first two lists in row_lists  Complete the for loop so that it iterates 1000 times to perform the
loop body and process only the first 1000 rows of data of the file.
print(row_lists[0])
# Open a connection to the filewith open('world_dev_ind.csv') as file: Writing a generator to load data in chunks (2)

 In the function read_large_file(), read a line from file_object by using

# Skip the column names the method readline(). Assign the result to data.
 In the function read_large_file(), yield the line read from the file data.
file.readline()  In the context manager, create a generator object gen_file by calling
your generator function read_large_file() and passing file to it.
 Print the first three lines produced by the generator
# Initialize an empty dictionary: counts_dict
object gen_file using next().
counts_dict = {}
# Define read_large_file()def read_large_file(file_object):
# Process only the first 1000 rows """A generator function to read a large file lazily."""
for j in range(0, 1000):
# Loop indefinitely until the end of the file
# Split the current line into a list: line while True:
line = file.readline().split(',')
# Read a line from the file: data
# Get the value for the first column: first_col data = file_object.readline()
first_col = line[0]
# Break if this is the end of the file
# If the column value is in the dict, increment its value if not data:
if first_col in counts_dict.keys(): break
counts_dict[first_col] += 1
# Yield the line of data
# Else, add to the dict and set value to 1 yield data
else: # Open a connection to the filewith open('world_dev_ind.csv') as file:
counts_dict[first_col] = 1
# Print the resulting dictionary # Create a generator object for the file: gen_file
print(counts_dict) gen_file = read_large_file(file)
print(counts_dict)
# Print the first three lines of the file
Writing an iterator to load data in chunks (1)
print(next(gen_file))
print(next(gen_file))  Use pd.read_csv() to read in 'ind_pop.csv' in chunks of size 10.
Assign the result to df_reader.
print(next(gen_file))
 Print the first two chunks from df_reader.
Writing a generator to load data in chunks (3)
# Import the pandas packageimport pandas as pd
 Bind the file 'world_dev_ind.csv' to file in the context manager # Initialize reader object: df_reader
with open().
df_reader = pd.read_csv('ind_pop.csv', chunksize=10)
 Complete the for loop so that it iterates over the generator from the
call to read_large_file() to process all the rows of the file. # Print two chunks
print(next(df_reader))
# Initialize an empty dictionary: counts_dict print(next(df_reader))
counts_dict = {}
# Open a connection to the filewith open('world_dev_ind.csv') as file: Writing an iterator to load data in chunks (2)

 Use pd.read_csv() to read in the file in 'ind_pop_data.csv' in chunks

# Iterate over the generator from read_large_file() of size 1000. Assign the result to urb_pop_reader.
 Get the first DataFrame chunk from the iterable urb_pop_reader and
for line in read_large_file(file):
assign this to df_urb_pop.
 Select only the rows of df_urb_pop that have
row = line.split(',') a 'CountryCode' of 'CEB'. To do this, compare
whether df_urb_pop['CountryCode'] is equal to 'CEB' within the
first_col = row[0] square brackets in df_urb_pop[____].
 Using zip(), zip together the 'Total Population' and 'Urban population
(% of total)' columns of df_pop_ceb. Assign the resulting zip object
if first_col in counts_dict.keys(): to pops.
counts_dict[first_col] += 1
else: # Initialize reader object: urb_pop_reader
counts_dict[first_col] = 1 urb_pop_reader = pd.read_csv('ind_pop_data.csv', chunksize=1000)
# Print # Get the first DataFrame chunk: df_urb_pop
df_urb_pop = next(urb_pop_reader) df_urb_pop = next(urb_pop_reader)
# Check out the head of the DataFrame df_pop_ceb = df_urb_pop[df_urb_pop['CountryCode'] == 'CEB']
print(df_urb_pop.head()) pops = zip(df_pop_ceb['Total Population'],
# Check out specific country: df_pop_ceb df_pop_ceb['Urban population (% of total)'])
df_pop_ceb = df_urb_pop[df_urb_pop['CountryCode'] == 'CEB'] pops_list = list(pops)
# Zip DataFrame columns of interest: pops # Use list comprehension to create new DataFrame column 'Total Urban Pop
pops = zip(df_pop_ceb['Total Population'], ulation'

df_pop_ceb['Urban population (% of total)']) df_pop_ceb['Total Urban Population'] = [int(tup[0] * tup[1] * 0.01) for tup in
pops_list]
# Turn zip object into list: pops_list
# Plot urban population data
pops_list = list(pops)
df_pop_ceb.plot(kind='scatter', x='Year', y='Total Urban Population')
# Print pops_list
plt.show()
print(pops_list)
Writing an iterator to load data in chunks (4)
Writing an iterator to load data in chunks (3)
 Initialize an empty DataFrame data using pd.DataFrame().
 Write a list comprehension to generate a list of values  In the for loop, iterate over urb_pop_reader to be able to process all
from pops_list for the new column 'Total Urban Population'. the DataFrame chunks in the dataset.
The output expression should be the product of the first and second  Concatenate data and df_pop_ceb by passing a list of the DataFrames
element in each tuple in pops_list. Because the 2nd element is a to pd.concat().
percentage, you also need to either multiply the result by 0.01 or
divide it by 100. In addition, note that the column 'Total Urban
# Initialize reader object: urb_pop_reader
Population' should only be able to take on integer values. To ensure
this, make sure you cast the output expression to an integer with int(). urb_pop_reader = pd.read_csv('ind_pop_data.csv', chunksize=1000)
 Create a scatter plot where the x-axis are values from # Initialize empty DataFrame: data
the 'Year' column and the y-axis are values from the 'Total Urban
Population' column. data = pd.DataFrame()
# Iterate over each DataFrame chunkfor df_urb_pop in urb_pop_reader:
# edited/addedimport matplotlib.pyplot as plt
# Code from previous exercise # Check out specific country: df_pop_ceb
urb_pop_reader = pd.read_csv('ind_pop_data.csv', chunksize=1000) df_pop_ceb = df_urb_pop[df_urb_pop['CountryCode'] == 'CEB']
# Zip DataFrame columns of interest: pops # Initialize reader object: urb_pop_reader
pops = zip(df_pop_ceb['Total Population'], urb_pop_reader = pd.read_csv(filename, chunksize=1000)
df_pop_ceb['Urban population (% of total)'])
# Initialize empty DataFrame: data
# Turn zip object into list: pops_list data = pd.DataFrame()
pops_list = list(pops)
# Iterate over each DataFrame chunk
# Use list comprehension to create new DataFrame column 'Total Urban P for df_urb_pop in urb_pop_reader:
opulation' # Check out specific country: df_pop_ceb
df_pop_ceb['Total Urban Population'] = [int(tup[0] * tup[1] * 0.01) for tup df_pop_ceb = df_urb_pop[df_urb_pop['CountryCode'] == country_code]
in pops_list]

# Zip DataFrame columns of interest: pops

# Concatenate DataFrame chunk to the end of data: data
pops = zip(df_pop_ceb['Total Population'],
data = pd.concat([data, df_pop_ceb])
df_pop_ceb['Urban population (% of total)'])
# Plot urban population data
data.plot(kind='scatter', x='Year', y='Total Urban Population')
# Turn zip object into list: pops_list
plt.show()
pops_list = list(pops)
Writing an iterator to load data in chunks (5)
# Use list comprehension to create new DataFrame column 'Total Urba
 Define the function plot_pop() that has two arguments: first
n Population'
is filename for the file to process and second is country_code for the
country to be processed in the dataset. df_pop_ceb['Total Urban Population'] = [int(tup[0] * tup[1] * 0.01) for t
 Call plot_pop() to process the data for country code 'CEB' in the up in pops_list]
file 'ind_pop_data.csv'.
 Call plot_pop() to process the data for country code 'ARB' in the
file 'ind_pop_data.csv'. # Concatenate DataFrame chunk to the end of data: data
data = pd.concat([data, df_pop_ceb])
# Define plot_pop()def plot_pop(filename, country_code):
# Plot urban population data
data.plot(kind='scatter', x='Year', y='Total Urban Population')
plt.show()
# Set the filename: fn
fn = 'ind_pop_data.csv'
# Call plot_pop for country code 'CEB'
plot_pop(fn, 'CEB')
# Call plot_pop for country code 'ARB'
plot_pop(fn, 'ARB')

Codetantra Complete Solution PDF
No ratings yet
Codetantra Complete Solution PDF
97 pages
MN006055A01-AK Enus MOTOTRBO Customer Programming Software CPS 2 Online Help
No ratings yet
MN006055A01-AK Enus MOTOTRBO Customer Programming Software CPS 2 Online Help
692 pages
Python Data Science Toolbox
No ratings yet
Python Data Science Toolbox
14 pages
Data Analysis in Python_ML
No ratings yet
Data Analysis in Python_ML
21 pages
Unit 4 Python
No ratings yet
Unit 4 Python
21 pages
Python
No ratings yet
Python
28 pages
C:/Users/Rafe/Appdata/Local/Programs/Python/Python35-32/Scripts Object and Data Structures Basics
No ratings yet
C:/Users/Rafe/Appdata/Local/Programs/Python/Python35-32/Scripts Object and Data Structures Basics
16 pages
Unit 1 - Lab Programs
No ratings yet
Unit 1 - Lab Programs
12 pages
Functions
No ratings yet
Functions
7 pages
Python 3 Functions and OOPs FP
No ratings yet
Python 3 Functions and OOPs FP
9 pages
Functions
No ratings yet
Functions
18 pages
ETE PYTHON Question 26 to 50
No ratings yet
ETE PYTHON Question 26 to 50
29 pages
Assignment 3
No ratings yet
Assignment 3
3 pages
Few Solutions For Important Questions (Sessional) - I 20ES2103A - 230503 - 102430
No ratings yet
Few Solutions For Important Questions (Sessional) - I 20ES2103A - 230503 - 102430
14 pages
Notes on CS - python
No ratings yet
Notes on CS - python
10 pages
Java - Numbers Class: Example
No ratings yet
Java - Numbers Class: Example
3 pages
Chapter 1 and 2 cs
No ratings yet
Chapter 1 and 2 cs
12 pages
Python Lab Programs
No ratings yet
Python Lab Programs
58 pages
Python Introduction
No ratings yet
Python Introduction
29 pages
Python 2
No ratings yet
Python 2
6 pages
Lab2 - Python Programming Basics
No ratings yet
Lab2 - Python Programming Basics
16 pages
Function All Part PDF
No ratings yet
Function All Part PDF
26 pages
Python 3 Functions and OOPs FP
No ratings yet
Python 3 Functions and OOPs FP
10 pages
Python Data Types
No ratings yet
Python Data Types
19 pages
ÔN TẬP FINAL NGÔN NGỮ LẬP TRÌNH
No ratings yet
ÔN TẬP FINAL NGÔN NGỮ LẬP TRÌNH
121 pages
DATA HANDLING in Python
No ratings yet
DATA HANDLING in Python
18 pages
Oop2 Midterm Module 3
No ratings yet
Oop2 Midterm Module 3
6 pages
01_Python_I_All_Master_13_02_2025
No ratings yet
01_Python_I_All_Master_13_02_2025
258 pages
Functions2new
No ratings yet
Functions2new
25 pages
Write A Python Code To Print The Sum of Natural Numbers Using Recursive Functions
No ratings yet
Write A Python Code To Print The Sum of Natural Numbers Using Recursive Functions
21 pages
python assignment 3 ad
No ratings yet
python assignment 3 ad
5 pages
Mid Sem Python Answers
No ratings yet
Mid Sem Python Answers
8 pages
Python Data Types
No ratings yet
Python Data Types
21 pages
UNIT-04 CLASSES
No ratings yet
UNIT-04 CLASSES
18 pages
Python_Advanced
No ratings yet
Python_Advanced
16 pages
Python by Ram
No ratings yet
Python by Ram
8 pages
uom-2552-what-is-python-presentation
No ratings yet
uom-2552-what-is-python-presentation
11 pages
JavaScript Day 2
No ratings yet
JavaScript Day 2
35 pages
01 Python Basics
No ratings yet
01 Python Basics
19 pages
Activity 2
No ratings yet
Activity 2
5 pages
While Loop and for Loop (2)
No ratings yet
While Loop and for Loop (2)
45 pages
Iteration in Python
No ratings yet
Iteration in Python
1 page
Review of Python Basics
No ratings yet
Review of Python Basics
39 pages
Loops in Python - Jupyter Notebook_52643069955dc0969f5fd149850a7462
No ratings yet
Loops in Python - Jupyter Notebook_52643069955dc0969f5fd149850a7462
10 pages
Python Unit - 4
No ratings yet
Python Unit - 4
7 pages
python assignment 3 AMAN GAUTAM 039
No ratings yet
python assignment 3 AMAN GAUTAM 039
5 pages
nndlmac
No ratings yet
nndlmac
9 pages
Tutorial 1
No ratings yet
Tutorial 1
8 pages
تلخيص Tuples+ Dict
No ratings yet
تلخيص Tuples+ Dict
31 pages
Unit 5
No ratings yet
Unit 5
20 pages
code EXPLANATIONFOR builtins function
No ratings yet
code EXPLANATIONFOR builtins function
31 pages
Unit III Python
No ratings yet
Unit III Python
18 pages
Here’s More Fun-WPS Office
No ratings yet
Here’s More Fun-WPS Office
10 pages
PyTorch - Basic Operations
No ratings yet
PyTorch - Basic Operations
20 pages
Python Part 2
No ratings yet
Python Part 2
18 pages
Python 101
No ratings yet
Python 101
11 pages
Discrete Structure Lab Work
No ratings yet
Discrete Structure Lab Work
18 pages
Unit - 1 - Datatypes and Variables
No ratings yet
Unit - 1 - Datatypes and Variables
20 pages
Lab5 ReadMe PDF
No ratings yet
Lab5 ReadMe PDF
2 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Pandas
No ratings yet
Pandas
9 pages
Numpy
No ratings yet
Numpy
9 pages
Introduction To Python
No ratings yet
Introduction To Python
13 pages
Python Data Science Toolbox
No ratings yet
Python Data Science Toolbox
16 pages
Introduction To Python
No ratings yet
Introduction To Python
14 pages
Numpy
No ratings yet
Numpy
9 pages
Pandas
No ratings yet
Pandas
9 pages
Python Data Science Toolbox
No ratings yet
Python Data Science Toolbox
17 pages
Chap 06
No ratings yet
Chap 06
52 pages
Chap 01
No ratings yet
Chap 01
37 pages
NYL - Midterm Review
No ratings yet
NYL - Midterm Review
62 pages
Chap 02
No ratings yet
Chap 02
51 pages
Chap 04
100% (1)
Chap 04
72 pages
10 2015 Anal1 ch2 Par2
No ratings yet
10 2015 Anal1 ch2 Par2
94 pages
Analysis 1 Midterm (Update)
No ratings yet
Analysis 1 Midterm (Update)
11 pages
Pdf24 Merged
No ratings yet
Pdf24 Merged
132 pages
Lecture 12 - Chapter 17 - Oligopoly
No ratings yet
Lecture 12 - Chapter 17 - Oligopoly
35 pages
SAP® Live Access: General User Guide
No ratings yet
SAP® Live Access: General User Guide
10 pages
Web Based Stationary Management System
67% (3)
Web Based Stationary Management System
7 pages
Batch_9_Review Final PPT
No ratings yet
Batch_9_Review Final PPT
35 pages
Academy Neovaristy Brochure
No ratings yet
Academy Neovaristy Brochure
24 pages
Do Not Change These Switch Setting.: Section Number Switch Settings 4 5 6 7 8
No ratings yet
Do Not Change These Switch Setting.: Section Number Switch Settings 4 5 6 7 8
8 pages
V51 Protocols English 4
No ratings yet
V51 Protocols English 4
23 pages
REU523 Broch 756111 LREna
No ratings yet
REU523 Broch 756111 LREna
2 pages
Liebert GXT3™, 5kVA-10kVA UPS: Compact UPS Systems For High Power Network Rack Applications
No ratings yet
Liebert GXT3™, 5kVA-10kVA UPS: Compact UPS Systems For High Power Network Rack Applications
8 pages
GE3151 Problem Solving and Python Programming Question Bank 1
No ratings yet
GE3151 Problem Solving and Python Programming Question Bank 1
6 pages
What Is An Interactive Whiteboard
No ratings yet
What Is An Interactive Whiteboard
4 pages
CTR Nsa Network Infrastructure Security Guide 20220615
No ratings yet
CTR Nsa Network Infrastructure Security Guide 20220615
60 pages
CND 2
No ratings yet
CND 2
7 pages
Motif-Rack Xs Editor Owner's Manual
No ratings yet
Motif-Rack Xs Editor Owner's Manual
53 pages
Standard Cell Library Validation Methodology
No ratings yet
Standard Cell Library Validation Methodology
5 pages
Black and White Bookmark Line Minimalist Resume
No ratings yet
Black and White Bookmark Line Minimalist Resume
1 page
Construction of Microcontroller Based Rj45 Terminated Network Cable Tester by Joseph Ishaku Chagwas
No ratings yet
Construction of Microcontroller Based Rj45 Terminated Network Cable Tester by Joseph Ishaku Chagwas
24 pages
How To Create an ISO Image From a CD (or DVD or BD)
No ratings yet
How To Create an ISO Image From a CD (or DVD or BD)
2 pages
Module 1
No ratings yet
Module 1
30 pages
Mondrian 2.3.2 Technical Guide
No ratings yet
Mondrian 2.3.2 Technical Guide
139 pages
Cybersecurity Standards Scorecard 2022 Edition
100% (1)
Cybersecurity Standards Scorecard 2022 Edition
50 pages
ScripProRinKivanaIchSon Scriptable
No ratings yet
ScripProRinKivanaIchSon Scriptable
1 page
Routing Algorithms 1663584102198
No ratings yet
Routing Algorithms 1663584102198
120 pages
AVEVA System Platform
No ratings yet
AVEVA System Platform
8 pages
Ebike User Guide 3 0 PDF
No ratings yet
Ebike User Guide 3 0 PDF
22 pages
CIS Controls v8 Mapping To GSMA FS.31 Baseline Security Controls v2.0
No ratings yet
CIS Controls v8 Mapping To GSMA FS.31 Baseline Security Controls v2.0
106 pages
Presentation 1
No ratings yet
Presentation 1
11 pages
Unit-3: 8085 Microprocessor: (MPI) GTU # 3160712
No ratings yet
Unit-3: 8085 Microprocessor: (MPI) GTU # 3160712
107 pages
Brain Computer Interface With Gus Bam P
No ratings yet
Brain Computer Interface With Gus Bam P
16 pages
Qatar Upda Process
No ratings yet
Qatar Upda Process
3 pages

Python Data Science Toolbox

Uploaded by

Python Data Science Toolbox

Uploaded by

print(next(superhero))

PYTHON DATA SCIENCE TOOLBOX (PART 2)

Iterators vs. Iterables Iterating over iterables (2)

# Create an iterator for flash: superhero print(next(googol))

superhero = iter(flash) print(next(googol))

# Print each item from the iterator print(next(googol))

print(next(superhero)) Iterators as function arguments

doctor = ['house', 'cuddy', 'chase', 'thirteen', 'wilson']

range(50) Nested list comprehensions

 In the output expression, keep the string as-is if the number of

 In the function read_large_file(), read a line from file_object by using

 Use pd.read_csv() to read in the file in 'ind_pop_data.csv' in chunks

# Zip DataFrame columns of interest: pops

You might also like