This Is CS50x: Last Week
This Is CS50x: Last Week
OpenCourseWare
Donate (https://cs50.harvard.edu/donate)
(https://www.facebook.com/dmalan)
(https://github.com/dmalan)
(https://www.instagram.com/davidjmalan/)
(https://www.linkedin.com/in/malan/)
(https://orcid.org/0000-0001-5338-2522)
(https://www.quora.com/profile/David-J-
Malan)
(https://www.reddit.com/user/davidjmalan)
(https://twitter.com/davidjmalan)
Lecture 3
Last week
Searching
Big O
Linear search, binary search
Searching with code
Structs
Sorting
Selection sort
Bubble sort
Recursion
Merge sort
Last week
We learned about tools to solve problems, or bugs, in our code. In particular, we discovered how to use a debugger, a tool that allows us
to step slowly through our code and look at values in memory while our program is running.
Another powerful, if less technical, tool is rubber duck debugging, where we try to explain what we’re trying to do to a rubber duck (or
some other object), and in the process realize the problem (and hopefully solution!) on our own.
We looked at memory, visualizing bytes in a grid and storing values in each box, or byte, with variables and arrays.
Searching
It turns out that, with arrays, a computer can’t look at all of the elements at once. Instead, a computer can only to look at them one at a
time, though the order can be arbitrary. (Recall in Week 0, David could only look at one page at a time in a phone book, whether he
flipped through in order or in a more sophisticated way.)
Searching is how we solve the problem of finding a particular value. A simple case might have an input of some array of values, and the
output might simply be a bool , whether or not a particular value is in the array.
Today we’ll look at algorithms for searching. To discuss them, we’ll consider running time, or how long an algorithm takes to run given
some size of input.
Big O
In week 0, we saw different types of algorithms and their running times:
Recall that the red line is searching linearly, one page at a time; the yellow line is searching two pages at a time; and the green line
is searching logarithmically, dividing the problem in half each time.
And these running times are for the worst case, or the case where the value takes the longest to find (on the last page, as opposed
to the first page).
The more formal way to describe each of these running times is with big O notation, which we can think of as “on the order of”. For
example, if our algorithm is linear search, it will take approximately O(n) steps, read as “big O of n” or “on the order of n”. In fact, even
an algorithm that looks at two items at a time and takes n/2 steps has O(n). This is because, as n gets bigger and bigger, only the
dominant factor, or largest term, n, matters. In the chart above, if we zoomed out and changed the units on our axes, we would see the
red and yellow lines end up very close together.
A logarithmic running time is O(log n), no matter what the base is, since this is just an approximation of what fundamentally happens to
running time if n is very large.
There are some common running times:
2
O(n )
O(n log n)
O(n)
An algorithm that takes a constant number of steps, regardless of how big the problem is.
Computer scientists might also use big Ω, big Omega notation, which is the lower bound of number of steps for our algorithm. Big O is
the upper bound of number of steps, or the worst case.
And we have a similar set of the most common big Ω running times:
2
Ω(n )
Ω(n log n)
Ω(n)
Ω(log n)
Ω(1)
(searching in a phone book, since we might find our name on the first page we check)
Return true
Return false
We label each of n doors from 0 to n–1 , and check each of them in order.
“Return false” is outside the for loop, since we only want to do that after we’ve looked behind all the doors.
The big O running time for this algorithm would be O(n), and the lower bound, big Omega, would be Ω(1).
If we know that the numbers behind the doors are sorted, then we can start in the middle, and find our value more efficiently.
For binary search, our algorithm might look like:
If no doors
Return false
R t t
Return true
The upper bound for binary search is O(log n), and the lower bound also Ω(1), if the number we’re looking for is in the middle,
where we happen to start.
With 64 light bulbs, we notice that linear search takes much longer than binary search, which only takes a few steps.
We turned off the light bulbs at a frequency of one hertz, or cycle per second, and a processor’s speed might be measured in gigahertz, or
billions of operations per second.
#include <cs50.h>
#include <stdio.h>
int main(void)
if (numbers[i] == 0)
printf("Found\n");
return 0;
printf("Not found\n");
return 1;
Here we initialize an array with some values in curly braces, and we check the items in the array one at a time, in order, to see if
they’re equal to zero (what we were originally looking for behind the doors on stage).
If we find the value of zero, we return an exit code of 0 (to indicate success). Otherwise, after our for loop, we return 1 (to indicate
failure).
We can do the same for names:
#include <cs50.h>
#include <stdio.h>
#include <string.h>
int main(void)
if (strcmp(names[i], "Ron") == 0)
printf("Found\n");
return 0;
printf("Not found\n");
return 1;
typedef struct
string name;
string number;
person;
We use string for the number , since we want to include symbols and formatting, like plus signs or hyphens.
Our struct contains other data types inside it.
Let’s try to implement our phone book without structs first:
#include <cs50.h>
#include <stdio.h>
#include <string.h>
int main(void)
if (strcmp(names[i], "David") == 0)
return 0;
printf("Not found\n");
return 1;
We’ll need to be careful to make sure that the firstname in names matches the first number in numbers , and so on.
If the name at a certain index i in the names array matches who we’re looking for, we can return the phone number in the
numbers array at the same index.
With structs, we can be a little more confident that we won’t have human errors in our program:
#include <cs50.h>
#include <stdio.h>
#include <string.h>
typedef struct
string name;
string number;
person;
int main(void)
person people[2];
people[0].name = "Brian";
people[0].number = "+1-617-495-1000";
people[1].name = "David";
people[1].number = "+1-949-468-2750";
if (strcmp(people[i].name, "David") == 0)
{
{
return 0;
printf("Not found\n");
return 1;
We create an array of the person struct type, and name it people (as in int numbers[] , though we could name it arbitrarily, like
any other variable). We set the values for each field, or variable, inside each person struct, using the dot operator, . .
In our loop, we can now be more certain that the number corresponds to the name since they are from the same person struct.
We can also improve the design of our program with a constant, like const int NUMBER = 10; , and store our values not in our code
but in a separate file or even a database, which we’ll soon see.
Soon too, we’ll write our own header files with definitions for structs, so they can be shared across different files for our program.
Sorting
If our input is an unsorted list of numbers, there are many algorithms we could use to produce an output of a sorted list, where all the
elements are in order.
With a sorted list, we can use binary search for efficiency, but it might take more time to write a sorting algorithm for that efficiency, so
sometimes we’ll encounter the tradeoff of time it takes a human to write a program compared to the time it takes a computer to run some
algorithm. Other tradeoffs we’ll see might be time and complexity, or time and memory usage.
Selection sort
Brian is backstage with a set of numbers on a shelf, in unsorted order:
6 3 8 5 2 7 4 1
Taking some numbers and moving them to their right place, Brian sorts the numbers pretty quickly.
Going step-by-step, Brian looks at each number in the list, remembering the smallest one we’ve seen so far. He gets to the end, and sees
that 1 is the smallest, and he knows that must go at the beginning, so he’ll just swap it with the number at the beginning, 6:
6 3 8 5 2 7 4 1
– –
1 3 8 5 2 7 4 6
Now Brian knows at least the first number is in the right place, so he can look for the smallest number among the rest, and swap it with
the next unsorted number (now the second number):
1 3 8 5 2 7 4 6
– –
1 2 8 5 3 7 4 6
And he repeats this again, swapping the next smallest, 3, with the 8:
1 2 8 5 3 7 4 6
- -
1 2 3 5 8 7 4 6
The first step in the loop is to look for the smallest item in the unsorted part of the list, which will be between the i’th item and last
item, since we know we’ve sorted up to the “i-1’th” item.
Then, we swap the smallest item with the i’th item, which makes everything up to the item at i sorted.
We look at a visualization online (https://www.cs.usfca.edu/~galles/visualization/ComparisonSort.html) with animations for how the
elements move for selection sort.
For this algorithm, we were looking at roughly all n elements to find the smallest, and making n passes to sort all the elements.
More formally, we can use some math formulas to show that the biggest factor is indeed n . We started with having to look at all n
2
elements, then only n − 1, then n − 2:
n(n + 1)/2
(n
2
+ n)/2
/2 + n/2
2
n
O(n )
2
Since n is the biggest, or dominant, factor, we can say that the algorithm has running time of O(n ).
2 2
Bubble sort
We can try a different algorithm, one where we swap pairs of numbers repeatedly, called bubble sort.
Brian will look at the first two numbers, and swap them so they are in order:
6 3 8 5 2 7 4 1
– –
3 6 8 5 2 7 4 1
The next pair, 6 and 8 , are in order, so we don’t need to swap them.
The next pair, 8 and 5 , need to be swapped:
3 6 8 5 2 7 4 1
– –
3 6 5 8 2 7 4 1
3 6 5 2 8 7 4 1
– –
3 6 5 2 7 8 4 1
– –
3 6 5 2 7 4 8 1
– –
3 6 5 2 7 4 1 8
Our list isn’t sorted yet, but we’re slightly closer to the solution because the biggest value, 8 , has been shifted all the way to the right.
And other bigger numbers have also moved to the right, or “bubbled up”.
Brian will make another pass through the list:
3 6 5 2 7 4 1 8
– –
3 6 5 2 7 4 1 8
– –
3 5 6 2 7 4 1 8
– –
3 5 2 6 7 4 1 8
– –
3 5 2 6 7 4 1 8
– –
3 5 2 6 4 7 1 8
- –
3 5 2 6 4 1 7 8
- -
Swap them
Swap them
Since we are comparing the i'th and i+1'th element, we only need to go up to n– 2 for i . Then, we swap the two elements if
they’re out of order.
And we can stop as soon as the list is sorted, since we can just remember whether we made any swaps. If not, the list must be
sorted already.
To determine the running time for bubble sort, we have n– 1 comparisons in the loop, and at most n– 1 loops, so we get n 2
– 2n + 2 steps
total. But the largest factor, or dominant term, is again n as n gets larger and larger, so we can say that bubble sort has O(n ). So it
2 2
turns out that fundamentally, selection sort and bubble sort have the same upper bound for running time.
The lower bound for running time here would be Ω(n), once we look at all the elements once.
So our upper bounds for running time that we’ve seen are:
2
O(n )
O(n)
linear search
O(log n)
binary search
O(1)
selection sort
Ω(n log n)
Ω(n)
bubble sort
Ω(log n)
Ω(1)
Recursion
Recursion is the ability for a function to call itself. We haven’t seen this in code yet, but we’ve seen something in pseudocode in week 0
that we might be able to convert:
3 Look at page
4 If Smith is on page
5 Call Mike
8 Go back to line 3
11 Go back to line 3
12 Else
13 Quit
3 Look at page
4 If Smith is on page
5 Call Mike
11
12 Else
12 Else
13 Quit
This seems like a cyclical process that will never end, but we’re actually changing the input to the function and dividing the
problem in half each time, stopping once there’s no more book left.
In week 1, too, we implemented a “pyramid” of blocks in the following shape:
##
###
####
But notice that a pyramid of height 4 is actually a pyramid of height 3, with an extra row of 4 blocks added on. And a pyramid of height 3
is a pyramid of height 2, with an extra row of 3 blocks. A pyramid of height 2 is a pyramid of height 1, with an extra row of 2 blocks. And
finally, a pyramid of height 1 is just a single block.
With this idea in mind, we can write a recursive function to draw a pyramid, a function that calls itself to draw a smaller pyramid before
adding another row.
Merge sort
We can take the idea of recusion to sorting, with another algorithm called merge sort. The pseudocode might look like:
Return
Else
3 5 6 8 | 1 2 4 7
We’ll merge the two lists for a final sorted list by taking the smallest element at the front of each list, one at a time:
3 5 6 8 | _ 2 4 7
The 1 on the right side is the smallest between 1 and 3, so we can start our sorted list with it.
3 5 6 8 | _ _ 4 7
1 2
_ 5 6 8 | _ _ 4 7
1 2 3
_ 5 6 8 | _ _ _ 7
1 2 3 4
_ _ 6 8 | _ _ _ 7
1 2 3 4 5
_ _ _ 8 | _ _ _ 7
1 2 3 4 5 6
_ _ _ 8 | _ _ _ _
1 2 3 4 5 6 7
1 2 3 4 5 6 7
_ _ _ _ | _ _ _ _
1 2 3 4 5 6 7 8
Return
Else
6 3 8 5 2 7 4 1
6 3 8 5
Well, to sort that, we need to sort the left half of the left half first:
6 3
Now both of these halves just have one item each, so they’re both sorted. We merge these two lists together, for a sorted list:
_ _ 8 5 2 7 4 1
3 6
We’re back to sorting the right half of the left half, merging them together:
_ _ _ _ 2 7 4 1
3 6 5 8
Both halves of the left half have been sorted individually, so now we need to merge them together:
_ _ _ _ 2 7 4 1
_ _ _ _
3 5 6 8
_ _ _ _ _ _ _ _
_ _ _ _ 2 7 1 4
3 5 6 8
_ _ _ _ _ _ _ _
_ _ _ _ _ _ _ _
3 5 6 8 1 2 4 7
_ _ _ _ _ _ _ _
_ _ _ _ _ _ _ _
_ _ _ _ _ _ _ _
1 2 3 4 5 6 7 8
Each number was moved from one shelf to another three times (since the list was divided from 8, to 4, to 2, and to 1 before merged back
together into sorted lists of 2, 4, and finally 8 again). And each shelf required all 8 numbers to be merged together, one at a time.
Each shelf required n steps, and there were only log n shelves needed, so we multiply those factors together. Our total running time for
binary search is O(log n):
2
O( )
2
O(n )
merge sort
O(n)
linear search
O(log n)
binary search
O(1)
(Since log n is greater than 1 but less than n, n log n is in between n (times 1) and n .)
2
The best case, Ω, is still n log n, since we still have to sort each half first and then merge them together:
2
Ω(n )
selection sort
Ω(n log n)
merge sort
Ω(n)
bubble sort
Ω(log n)
Ω(1)
2
Θ(n )
selection sort
Θ(n log n)
merge sort
Θ(n)
Θ(log n)
Θ(1)
We look at a final visualization (https://www.youtube.com/watch?v=ZZuD6iUe3Pc) of sorting algorithms with a larger number of inputs,
running at the same time.