Intermediate Python
Intermediate Python
print() the last item from both the year and the pop list to see what the
predicted population for the year 2100 is. Use two print() functions. 2060
Before you can start, you should
import matplotlib.pyplot as plt. pyplot is a sub-package of matplotlib,
hence the dot.
2085
Use plt.plot() to build a line plot. year should be mapped on the
horizontal axis, pop on the vertical axis. Don’t forget to finish off
with the plt.show() function to actually display the plot.
2095
# edited/addedimport numpy as np
Line plot (3)
year=list(range(1950,2100+1))
pop=list(np.loadtxt('pop1.txt', dtype=float)) Print the last item from both the list gdp_cap, and the list life_exp; it
is information about Zimbabwe.
# Print the last item from year and pop Build a line chart, with gdp_cap on the x-axis, and life_exp on the y-
print(year[-1]) axis. Does it make sense to plot this data on a line plot?
Don’t forget to finish off with a plt.show() command, to actually
print(pop[-1])
display the plot.
# Import matplotlib.pyplot as pltimport matplotlib.pyplot as plt
# Make a line plot: year on the x-axis, pop on the y-axis # edited/added
plt.plot(year, pop) gdp_cap=list(np.loadtxt('gdp_cap.txt', dtype=float))
# Display the plot with plt.show() life_exp=list(np.loadtxt('life_exp.txt', dtype=float))
plt.show() # Print the last item of gdp_cap and life_exp
print(gdp_cap[-1])
Line Plot (2): Interpretation
print(life_exp[-1])
Have another look at the plot you created in the previous exercise; it’s shown
on the right. Based on the plot, in approximately what year will there be # Make a line plot, gdp_cap on the x-axis, life_exp on the y-axis
more than ten billion human beings on this planet? plt.plot(gdp_cap, life_exp)
# Display the plot
plt.show() plt.show()
Change the line plot that’s coded in the script to a scatter plot. Use plt.hist() to create a histogram of the values in life_exp. Do not
A correlation will become clear when you display the GDP per capita specify the number of bins; Python will set the number of bins to 10
on a logarithmic scale. Add the line plt.xscale('log'). by default for you.
Finish off your script with plt.show() to display the plot. Add plt.show() to actually display the histogram. Can you tell which
bin contains the most observations?
# Change the line plot below to a scatter plot
# Create histogram of life_exp data
plt.scatter(gdp_cap, life_exp)
plt.hist(life_exp)
# Put the x-axis on a logarithmic scale
# Display histogram
plt.xscale('log')
plt.show()
# Show plot
plt.show() Build a histogram (2): bins
Scatter plot (2) Build a histogram of life_exp, with 5 bins. Can you tell which bin
contains the most observations?
Start from scratch: import matplotlib.pyplot as plt. Build another histogram of life_exp, this time with 20 bins. Is this
Build a scatter plot, where pop is mapped on the horizontal axis, better?
and life_exp is mapped on the vertical axis.
Finish the script with plt.show() to actually display the plot. Do you
# Build histogram with 5 bins
see a correlation?
plt.hist(life_exp, bins = 5)
# edited/added # Show and clear plot
pop=list(np.loadtxt('pop2.txt', dtype=float)) plt.show()
# Import packageimport matplotlib.pyplot as plt plt.clf()
# Build Scatter plot # Build histogram with 20 bins
plt.scatter(pop, life_exp) plt.hist(life_exp, bins = 20)
# Show plot # Show and clear plot again
plt.show()
plt.clf() Scatter plot
Additional Customizations
What can you say about the plot? With the strings in countries and capitals, create a
dictionary called europe with 4 key:value pairs.
Beware of capitalization! Make sure you use
The countries in blue, corresponding to Africa, have both low lowercase characters everywhere.
life expectancy and a low GDP per capita. Print out europe to see if the result is what you
expected.
There is a negative correlation between GDP per capita and # Definition of countries and capital
life expectancy.
countries = ['spain', 'france', 'germany', 'norway']
capitals = ['madrid', 'paris', 'berlin', 'oslo']
China has both a lower GDP per capita and lower life expectancy # From string in countries and capitals, create dictionary europe
compared to India. europe = {'spain':'madrid', 'france':'paris', 'germany':'berlin', 'norway':'oslo'}
Motivation for dictionaries # Print europe
print(europe)
Use the index() method on countries to find the index of 'germany'.
Store this index as ind_ger. Access dictionary
Use ind_ger to access the capital of Germany from the capitals list.
Print it out. Check out which keys are in europe by calling the keys() method
on europe. Print out the result.
# Definition of countries and capital Print out the value that belongs to the key 'norway'.
countries = ['spain', 'france', 'germany', 'norway']
# Definition of dictionary
capitals = ['madrid', 'paris', 'berlin', 'oslo']
europe = {'spain':'madrid', 'france':'paris', 'germany':'berlin', 'norway':'oslo' }
# Get index of 'germany': ind_ger
# Print out the keys in europe
ind_ger = countries.index('germany')
print(europe.keys())
# Use ind_ger to print out capital of Germany
# Print out value that belongs to key 'norway'
print(capitals[ind_ger])
print(europe['norway'])
Create dictionary
Dictionary Manipulation (1)
Hit Run Code to see that, indeed, the row labels are not correctly set. # Import the cars.csv data: cars
Specify the row labels by setting cars.index equal to row_labels. cars = pd.read_csv('cars.csv')
Print out cars again and check if the row labels are correct this time.
# Print out cars
print(cars)
import pandas as pd
CSV to DataFrame (2) Square Brackets (2)
Run the code with Run Code and assert that the first column should Select the first 3 observations from cars and print them out.
actually be used as row labels. Select the fourth, fifth and sixth observation, corresponding to row
Specify the index_col argument inside pd.read_csv(): set it to 0, so indexes 3, 4 and 5, and print them out.
that the first column is used as row labels.
Has the printout of cars improved now? # Import cars dataimport pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
# Import pandas as pdimport pandas as pd
# Print out first 3 observations
# Fix import by including index_col
print(cars[0:3])
cars = pd.read_csv('cars.csv', index_col = 0)
# Print out fourth, fifth and sixth observation
# Print out cars
print(cars[3:6])
print(cars)
loc and iloc (1)
Square Brackets (1)
Use loc or iloc to select the observation corresponding to Japan as a
Use single square brackets to print out the country column of cars as a Series. The label of this row is JPN, the index is 2. Make sure to print
Pandas Series. the resulting Series.
Use double square brackets to print out the country column of cars as Use loc or iloc to select the observations for Australia and Egypt as a
a Pandas DataFrame. DataFrame. You can find out about the labels/indexes of these rows
Use double square brackets to print out a DataFrame with both by inspecting cars in the IPython Shell. Make sure to print the
the country and drives_right columns of cars, in this order. resulting DataFrame.
Import the numpy package under the local alias np. # Iterate over rows of carsfor lab, row in cars.iterrows() :
Write a for loop that iterates over all elements in np_height and prints print(lab)
out "x inches" for each element, where x is the value in the array.
print(row)
Write a for loop that visits every element of the np_baseball array and
prints it out.
Loop over DataFrame (2)
# edited/addedimport pandas as pd Using the iterators lab and row, adapt the code in the for loop such
mlb = pd.read_csv('baseball.csv') that the first iteration prints out "US: 809", the second iteration "AUS:
731", and so on.
The output should be in the form "country: cars_per_cap". Make sure # Import cars dataimport pandas as pd
to print out this exact string (with the correct spacing).
cars = pd.read_csv('cars.csv', index_col = 0)
o You can use str() to convert your integer data to a string so # Use .apply(str.upper)
that you can print it in conjunction with the country label.
cars["COUNTRY"] = cars["country"].apply(str.upper)
Use a for loop to add a new column, named COUNTRY, that contains Import numpy as np.
a uppercase version of the country names in the "country" column. Use seed() to set the seed; as an argument, pass 123.
You can use the string method upper() for this. Generate your first random float with rand() and print it out.
To see if your code worked, print out cars. Don’t indent this code, so
that it’s not part of the for loop. # Import numpy as npimport numpy as np
# Set the seed
# Import cars dataimport pandas as pd
np.random.seed(123)
cars = pd.read_csv('cars.csv', index_col = 0)
# Generate and print random float
# Code for loop that adds COUNTRY columnfor lab, row in cars.iterrows() :
print(np.random.rand())
cars.loc[lab, "COUNTRY"] = row["country"].upper()
# Print cars Roll the dice
print(cars)
Use randint() with the appropriate arguments to randomly generate
the integer 1, 2, 3, 4, 5 or 6. This simulates a dice. Print it out.
Add column (2)
Repeat the outcome to see if the second throw is different. Again,
print out the result.
Replace the for loop with a one-liner that uses .apply(str.upper). The
call should give the same result: a column COUNTRY should be
added to cars, containing an uppercase version of the country names. # Import numpy and set seedimport numpy as np
As usual, print out cars to see the fruits of your hard labor
np.random.seed(123) Random Walk
Fill in the specification of the for loop so that the random walk is all_walks.append(random_walk)
simulated 10 times. # Print all_walks
After the random_walk array is entirely populated, append the array
print(all_walks)
to the all_walks list.
Finally, after the top-level for loop, print out all_walks.
Visualize all walks
# NumPy is imported; seed is set Use np.array() to convert all_walks to a NumPy array, np_aw.
# Initialize all_walks (don't change this line) Try to use plt.plot() on np_aw. Also include plt.show(). Does it work
out of the box?
all_walks = []
Transpose np_aw by calling np.transpose() on np_aw. Call the
# Simulate random walk 10 timesfor i in range(10) : result np_aw_t. Now every row in np_all_walks represents the
position after 1 throw for the 10 random walks.
Use plt.plot() to plot np_aw_t; also include a plt.show(). Does it look
# Code from before better this time?
random_walk = [0]
for x in range(100) : # numpy and matplotlib imported, seed set.
step = random_walk[-1] # initialize and populate all_walks