0% found this document useful (0 votes)
60 views

3901 Slides

Uploaded by

DP
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views

3901 Slides

Uploaded by

DP
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 524

CSCI 3901: Software Development

Concepts

Mike McAllister
Fall 2020
CSCI 3901

Ensure that you are able to implement key elements of


Csci 2110 – data structures & algorithms
Csci 2132 – software development
Csci 2141 – introduction to database systems
Csci 3130 – software engineering

Know how to implement and design from the basics

2
Learning Outcomes

Data Structures and Algorithms


Use abstract data types (ADTs), including lists, stacks, queues, maps,
dictionaries.
Implement fundamental data structures, such as linked lists, trees,
graphs, and hash tables.
Implement traversals, recursive search and state-space exploration
algorithms.
Implement simple iterative and recursive algorithms to solve moderately
simple tasks.
Select the appropriate data structure to implement a given ADT under a
given set of constraints.
Select and use appropriate abstract data types, data structures, and
algorithms to solve real-world problems.

3
Learning Outcomes

Databases
Describe the properties of multiuser database transactions (ACID).
Describe the purpose, function, evolution, classification, and building
blocks of data models and data modeling.
Describe the basic components of a relational model and how relations
are implemented.
Derive business rules from requirements' specifications and translate
these rules into database table and relationship designs.
Use SQL data definition and manipulation operations.
Construct an entity relationship diagram (ERD).
Describe normalization and denormalization, and their role in database
design.
Perform normalization and denormalization on a database.

4
Learning Outcomes

Software Engineering
Design a software system and prepare detailed design documentation.
Implement moderate-sized programs individually and as a team.
Use general data representations like XML
Apply standard software processes for build and deployment
management.
Apply standard software processes for risk management.
Apply standard software processes for version control.
Apply concepts of software engineering to plan, execute and manage a
small software project.
Create a Test Plan for a software development project.
Effectively debug a program.

5
Grading

Labs – 10%
Assignments – 40% (6 assignments, equally weighed)
Module quizzes – 20%
Participation – 10%
Final project – 20%

No late assignments accepted


Use the university’s policy on self-declaration of short-term
illness
Can be used twice in the term for up to 3 days of illness

6
Important Dates

November 9 – 13
Fall study break
December 8
Last day of classes

October 2
Last day to add or drop courses without penalty
November 2
Last day to drop courses without academic penalty

7
Tentative Schedule

Consult syllabus

8
Course textbook

No official textbook
We cover so much material that you would need several

If I had to choose one:


Course text from csci 2110
“Data Structures Outside-In With Java”

9
Homework

Get Java working on your computer


Select and install an IDE for Java
Install a secure shell program and a file transfer program
Write a “hello world” program using the IDE that can also work
on timberlea.cs.dal.ca
Compile foo.java on timberlea with “javac foo.java”
Run foo.class on timberlea with “java foo”
Write a program that uses one loop to print the integers from 1
to 10 on individual lines and with a # symbol before the numbers
5 and 6
Read up on / review linked lists, stacks, queues, and abstract
data types
10
Academic Integrity Officer (AIO) Process
Suspicion of infraction

Send to Faculty AIO

Grading continues as no Sufficient


usual evidence?
yes

Prior yes Case sent to Senate


infraction? Discipline Committee

no
Student no
Faculty-level hearing accepts?

yes yes
no Infraction Remedy sent to
happened student Remedy implemented
11
Concluding Reminders

Accessibility

Responsible computing
Code of Conduct
Culture of Respect
Scent-free university

12
Lessons from last time

Understand the problem early


Ask early about parts that seem unclear
Test your program before submitting it
Document every program
Internal comments, not just one per method
External documentation that doesn’t just duplicate the assignment text
When submitting work
Do not upload your entire development directory

13
Lessons learned from last time

Best practice coding


Do not write huge methods
Avoid hard-coding constants
- Create static final variables to hold the values then use the variable names in
the code
Use braces around the body of all if and loop statements
Aim to have classes in their own files
- One class per file unless the class is a private support class
Adhere to the input and output specifications
Keep debugging print statements inactive
Don’t “improve” on the interface unless the assignment gives you that
leeway

14
Postage problem

Write a program that accepts the destination country


(Canada or US) and the weight of a standard envelope
and returns the needed postage.
Weight Canada US
Up to 30g $0.85 $1.20
Over 30g and up to 50g $1.20 $1.80
Up to 100g $1.80 $2.95
Over 100g and up to 200g $2.95 $5.15
Over 200g and up to 300g $4.10 $10.30
Over 300g and up to 400g $4.70 $10.30
Over 400g and up to 500g $5.05 $10.30
Postage data from Canada Post at
https://www.canadapost.ca/cpo/mc/personal/ratesprices/postalprices.js on Sept. 2015
15
Problem Solving – Starting the process

What comes in to the program?


Do different data or modes need to be handled differently?
What transformations do I need to make to the data?
Are there sub-problems or patterns that I can use?
What part of the data is processed right away?
What part of the data do I need to keep longer?
What tasks do I need to do to with that longer-term data?
- How do I organize or store the data to make those tasks easy?
What goes out of the program?
Do different data or modes need to be handled differently?

16
Problem Solving – Starting the process

What assumptions can I make?


Are any given?
Can I reasonably make any of my own?
What constraints exist?
Are there strange cases to handle?
What is important for the solution to do?

Who are the users and how will they use it?
What is the target environment?
How stable are the requirements?

17
Postage Problem

What are all the starting parameters for the postage


problem?

18
Postage problem

What comes in to the program?


Country and weight from the keyboard
The table of postage rates (never changes, so can be part of program itself).
What transformations do I need to make to the data?
None
What part of the data is processed right away?
Answer as soon as we get input
What part of the data do I need to keep longer?
Nothing stored long-term
What goes out of the program?
The postage rate

19
Problem Solving – Starting the process

What assumptions can I make?


The country and weight are given as integers
What constraints exist?
None
Are there strange cases to handle?
Country or weight outside the table
What is important for the solution to do?
Nothing beyond the given output constraint

20
Evolution of solving problems

Often follow a sequence of solutions


Can a computer solve the problem at all?
- There are some problems that computers cannot solve
What is _a_ solution?
What is an efficient solution?
What is a practical solution?
What is a simple and practical solution?
What is an optimal solution?
What is a simple and optimal solution?
Experience lets you start at different points in the
sequence
21
Postage Problem

How many different solution styles can you create?

22
Postage problem

Write a program that accepts the destination country


(Canada or US) and the weight of a standard envelope
and returns the needed postage.
Weight Canada US
Up to 30g $0.85 $1.20
Over 30g and up to 50g $1.20 $1.80
Up to 100g $1.80 $2.95
Over 100g and up to 200g $2.95 $5.15
Over 200g and up to 300g $4.10 $10.30
Over 300g and up to 400g $4.70 $10.30
Over 400g and up to 500g $5.05 $10.30
Postage data from Canada Post at
https://www.canadapost.ca/cpo/mc/personal/ratesprices/postalprices.js on Sept. 2015
23
Postage Problem

How many different solution styles can you create?

24
Postage Problem

The code must know that cases exist


Decide whether the cases appear in the code itself or in data
structures that the code navigates.
- Cases in the code: often easier to follow and ensure
- Cases in data structures: easier to change or expand; more likely to
treat the testing of all cases the same way

25
Encode Cases in the Code

One independent “if” statement for each case


Set of “if” statements and exploit previous failed tests
using “else” clauses
“if” statements could be nested or not

26
Part data structure, part code

Encode the boundaries in an array, search for the


position in the array for the weight, and encode the
answer for that solution into code

27
Data structures

Use a data structure (two-dimensional array is enough)


to store all of the rates.

28
Independent “if”

Get the country and weight


If (country is Canada and weight <= 30) report $0.85;
If (country is Canada and 30 < weight <= 50) report $1.20;
If (country is Canada and 50 < weight <= 100) report $1.80;
If (country is Canada and 100 < weight <= 200) report $2.95;
If (country is Canada and 200 < weight <= 300) report $4.10;
If (country is Canada and 300 < weight <= 400) report $4.70;
If (country is Canada and 400 < weight <= 500) report $5.05;
If (country is US and weight <= 30) report $1.20;
If (country is US and 30 < weight <= 50) report $1.80;

29
“if - else”

Get the country and weight


If (country is Canada and weight <= 30) report $0.85;
Else if (country is Canada and weight <= 50) report $1.20;
Else if (country is Canada and weight <= 100) report $1.80;
Else if (country is Canada and weight <= 200) report $2.95;
Else if (country is Canada and weight <= 300) report $4.10;
Else if (country is Canada and weight <= 400) report $4.70;
Else if (country is Canada and weight <= 500) report $5.05;
Else if (country is US and weight <= 30) report $1.20;
Else if (country is US weight <= 50) report $1.80;

30
“if - else” nesting

Get the country and weight


If (country is Canada) {
If (weight <= 30) report $0.85;
Else if (weight <= 50) report $1.20;
Else if (weight <= 100) report $1.80;
Else if (weight <= 200) report $2.95;
Else if (weight <= 300) report $4.10;
Else if (weight <= 400) report $4.70;
Else if (weight <= 500) report $5.05;
} else if (country is US) {
Else if (weight <= 30) report $1.20;
Else if (weight <= 50) report $1.80;

31 }
“if - else” deeper nesting
Get the country and weight
If (country is Canada) {
If (weight <= 200) { /* Canada and weight <= 200 */

if (weight <= 50) { /* Canada and weight <= 50 */

if (weight <= 30) report $0.85 /* Canada and weight <= 30 */


else report $1.20 /* Canada and 30 < weight <= 50 */

} else { /* Canada and 50 < weight <= 200 */

if (weight <= 100) report $1.80 /* Canada and 50 < weight <= 100 */

else report $2.95 /* Canada and 100 < weight <= 200 */

}
/* Canada and weight > 200 */
} else {
if (weight <= 400) { /* Canada and 200 < weight <= 400 */
if (weight <= 300) report $4.10 /* Canada and 200 < weight <= 300 */
else report $4.70 /* Canada and 300 < weight <= 400 */

} else report $5.05 /* Canada and 400 < weight */


}
} else if (country is US) {

32
}
2d-array for rate classes

Get the country and weight

boundaries = array with values 30, 50, 100, 200, 300, 500
Find index i such that boundaries[i-1] < weight <= boundaries[i]

rates = 2d array:
0.85, 1.20, 1.80, 2.95, 4.10, 4.70, 5.05
1.20, 1.80, 2.95, 5.15, 10.30, 10.30, 10.30

Report rates[country][i]

33
Switch solution
Get the country and weight
boundaries = array with values 0, 30, 50, 100, 200, 300, 500
Find index i such that boundaries[i] < weight <= boundaries[i+1]
/* We know that there are at most 7 rates, so combine the country and weight into one
integer: 10’s digit is country, unit digit is weight category. */
Class = country * 10 + i
Switch (class) {
10: report $0.85
11: report $1.20
12: report $1.80

20: report $1.20
21: report $1.80
22: report $2.95

}
34
Big table solution
// Create an array with 0 rows and 501 columns. Each row corresponds to a country
// and each row gives the postage rate for each weight, in grams, of the envelope

rates = 2d array {
{ 0, 0.85, 0.85, …, 0.85, 1.20, 1.20, …, 1.20, 1.80, 1.80, …, 1.80, 2.95, … } ,
{ 0, 1.20, 1,20, …, 1.20, 1.80, 1.80, …, 1.80, 2.95, 2.95, …, 2.95, 5.15, …}
}

get country (0 for Canada, 1 for US)


get weight
return rates[country][weight];

35
Trade-offs in the algorithms?

Lots of “if” statements


Lots of “if” statements, but centralized in functions
Cascaded “if … else” statements
Switch statement
Boundary and rates in arrays for us to search
Linear search
Binary search
Use a formula to calculate the array entry that we want
2d array of all the data

36
Trade-offs

One independent “if” statement for each case


Pro: quick to code; every statement is self-contained to check
Con: lots of repetition; an error in one case is easy to miss; more elements in the
“if” expressions
Set of “if - else” statements
Pro: can be made more efficient in terms of number of tests done; “if”
expressions can be simpler
Con: need to carry implicit information as you get into nested “if”s
Switch statement
Pro: boundaries are easily maintained; no complicated “if” structure; the
compiler makes the check efficient
Con: new cases still need expanded code; encoding of multiple criteria may not
seem natural

37
Trade-offs

Two-dimensional array for each country/rate combination


Pro: expands easily to any number of boundaries and rates; checking on correct
values has us look at the table values rather than search through the code
Con: code looks more complex; more space needed for variables to store the
rates; common rates aren’t combined; implicit connection between the different
arrays
Two-dimensional array for all possible country/weight
combination
Pro: very fast lookup of a rate; same lookup time for any weight (good for real-
time systems)
Con: uses lots of space; takes time to load up the initial table

38
Does our program work? Test cases

Any program that solves the problem should pass tests


based on the requirements
Don’t need an implementation to define the test
The test is meaningful no matter the implementation
Called blackbox tests

Some tests may probe specific aspects of your


implementation approach
The test is meaningful for one implementation but may not
seem meaningful for another implementation
Called whitebox tests
39
Does our program work? Test cases

Independent of implementation (blackbox)


Case boundaries
- Try values on either side of each weight boundary
- Try with each country
Input boundaries
- Country: each country, invalid country numbers
- Weight: negative, zero, 1-500, 501 or more
- Non-integer values when integers are expected
Output cases
- Value < $1, value with one integer digit, value with two integer digits

40
Does our program work? Test cases

Dependent on implementation (whitebox)


Multiple “if” statements
- Try each case and ensure appropriate output
Linear search in array
- Search for first, middle, and last entry
- Search for entry not in the array
Binary search in array
- Search for element requiring
Left – left search
Left – right search
Right – left search
Right – right search
41
Blackbox test cases
Weight Canada US Weight Canada US
1 0.85 1.20 -1 ? ?
30 0.85 1.20 0 ? ?
31 1.20 1.80 501 ? ?
50 1.20 1.80 String ? ?
51 1.80 2.95 Country:
100 1.80 2.95 0 -- ?
101 2.95 5.15 3 -- ?
String -- ?
200 2.95 5.15
201 4.10 10.30 The problem doesn’t specify what to
do in the bad cases, so document
300 4.10 10.30 your assumption and ensure that
301 4.70 10.30 your code does it.
400 4.70 10.30
In general, report an error condition.
401 5.05 10.30
500 5.05 10.30
42
43
Abstract Data Types and Data Structures

44
Abstract Data Type vs Data Structure

Abstract Data Type (ADT)


“a mathematical model for data types, where a data type is defined by its
behavior (semantics) from the point of view of a user of the data”
(https://en.wikipedia.org/wiki/Abstract_data_type, September 6, 2018)

Data Structure
“a data organization, management and storage format that
enables efficient access and modification. More precisely, a data structure is a
collection of data values, the relationships among them, and the functions or
operations that can be applied to the data.”
(https://en.wikipedia.org/wiki/Data_structure, September 6, 2018. Emphasis added.)

An ADT can typically be implemented using different data


structures.

45
Abstract Data Type and Data Structures

Abstract Data Type What the structure should do / how it should behave?

Implemented by

We know a maximum size


for the data
We’re wiling to set a We have no bound on the
Data Structure
maximum size and incur a size of the data
(potentially big) cost if we
guess incorrectly
How is the data organized?

46
Abstract Data Type and Data Structures

Abstract Data Type Stack, Queue, Deque, Map, Set

Implemented by

Array, Hash Table

Linked list, Binary Tree,


Data Structure
Array List, Dynamic Hash Heap
Table

47
Abstract Data Type vs. Data Structure
ADT Data Structure
Stack Array
- Push, Pop, IsEmpty Hash table
Queue - Relationship between data
- Enqueue, Dequeue, IsEmpty and storage location
PriorityQueue Linked list
- AddWithPriority, Binary tree
RemoveHighestPriority, IsEmpty - Relationship between
Deque children and parents
- AddHead, AddTail, RemoveHead, Heap
RemoveTail, IsEmpty - Relationship between
Map children and parents
- Store, Retrieve, Delete
Set
- Add, ElementOf, Union, Intersection,
48
Complement, Cardinality
Stack

Queue

Map

49 Images from google images


Linked list

Array

Hash table

Binary tree Heap

50 Images from google images


Name mixing of ADT and data structures in Java

Java contains specific implementations of some ADTs.


Don’t confuse the implementation (which is a data
structure) with the data type.

For example, the Java classes of HashSet, TreeSet, and


LinkedHashSet are all implementations of the set ADT:
HashSet: set implemented using a hash table
TreeSet: set implemented using a binary search tree
LinkedHashSet: set implemented using a combination of
hash table and linked list.

51
Stack

Behaviour
Stores a collection of items
Items retrieved from the stack in last in first out order (LIFO)
Cannot access random elements of the collection
Operations
Push( value )
Pop( ) -> value
IsEmpty( ) -> boolean
Push( value )
Pop()

52
Queue

Behaviour
Stores a collection of items
Items retrieved from the queue in first in first out order (FIFO)
Cannot access random elements of the collection
Operations
Enqueue( value )
Dequeue( ) -> value
IsEmpty( ) -> boolean
Enqueue( value ) Dequeue ()

53
Priority Queue

Behaviour
Stores a collection of items where each item has a “priority”
to designate importance
Items retrieved in order of priority
Cannot access random elements of the collection
Operations
AddWithPriority( value, priority )
RemoveWithHighestPriority( ) -> value
IsEmpty( ) -> boolean

54
Priority Queue

Other ATDs might be seen as an instance of a priority


queue
Stack is a priority queue where
- The priority number is the time of insertion
- High priority means a high priority number
Queue is a priority queue where
- The priority number is the time of insertion
- High priority means a low priority number

55
Deque (doubly-ended queue)

Behaviour
Stores a collection of items
Concept of a queue where you can add or remove from either
end (but not from the middle)
Cannot access random elements of the collection
Operations
AddHead( value ) AddTail( value )
RemoveHead( ) -> value RemoveTail( ) -> value
IsEmpty( ) -> boolean

AddTail( value ) AddTail ( value )


RemoveTail () RemoveHead ()
56
Set

Behaviour
Stores a collection of items
Concept of a mathematical set
- Only one copy of each item allowed
Can access any item
No presumed order to the items
Operations
Add( value )
Remove( value )
ElementOf( value ) -> Boolean
Union( set1, set2 ) -> set
Intersection( set1, set2 ) -> set
Complement( universe, set ) -> set (optional)
57 Cardinality( set ) -> integer
Map (or dictionary or associative array)

Behaviour
Associates some meaningful data (the ”key”) with a value
Stores and accesses values using the key
No specific ordering of the data
Operations:
Put (key, value)
Get (key) -> value
Size () -> integer
ContainsKey (key) -> Boolean
ContainsValue (value) -> Boolean (optional)
Remove (key) -> Boolean (optional)
58
Map examples

Trivial map
Key is the sequence of integers 1, 2, 3, 4, …
Implementation: a standard array

More complex map


Key is your Banner ID
Value is your netid
Implementation: hash table

59
Recognizing a spot for a standard ADT

Stack
Doing a set of operations that might need undoing in the reverse order
Exploring options that involve backtracking (changing or removing the most
recent choice)
Recursion (implicit or explicit stack)
Situations where proper nesting is involved
Exploring connected problems that handle depth of coverage before breadth of
coverage
Queue
Simulations of scheduling with items arriving at different times
Processing a growing list of items in a way that ensures that each item is handled
in a “fair” timeframe
Exploring connected problems that handle breadth of coverage before depth of
coverage

60
Recognizing a spot for a standard ADT

Priority Queue
I need to store items and retrieve them in an order that I define (order can change)
Often used in scheduling
Set
I have a collection if items to store
- I just care about having one copy
I want to access the items randomly
I want to iterate over the set
I don’t have any particular order needed for the data
List
I have a collection of items to store
- I might have several copies of the same thing
I want to access items randomly
I want to be able to impose an order to the set by sorting
I want to iterate over the set in the sorted order
Map
61 Random access to key, value pairs when exact matches to keys is all that is needed
What is stored in an ADT?

Basic data types, like “int” (for integers)


Behave exactly as we expect them to

Objects
Important to understand exactly what is being stored

62
Is there a difference?

int a; myIntClass a;
int b; myIntClass b;

a = 10; a = new myIntClass( 10 );


b = a; b = a;
a = 20; a.setValue( 20 );

System.out.println( b ); System.out.println( b.getValue() );

Public class myIntClass {


int value;
public void setValue( int val ) {
value = val;
}
public int getValue( ) {
return value;
}
}
63
Elements of a Process

“static” variables in Java

Local variables

Objects created with “new”

64
Storing Objects
public int myMethod() {
int someInt;
static boolean test;
OtherClass anObject;

anObject = new OtherClass();


}

test

someInt

anObject – just space for a reference to an


object

the “new” object

65
Storing Objects
public int myMethod() {
int someInt;
static boolean test;
OtherClass anObject;

anObject = new OtherClass();


}
test

someInt
anObject

66
Storing Objects
public int myMethod() {
int someInt;
static boolean test;
OtherClass anObject;

anObject = new OtherClass();

OtherClass anotherObject;
test
anotherObject = anObject;
someInt
}
anObject
anotherObject

When we assign object values, we are copying


the reference to the object. We are not copying
the content.

67
Storing Objects
public int myMethod() {
int someInt;
static boolean test;
OtherClass anObject;

anObject = new OtherClass();

OtherClass anotherObject;
test
anotherObject.copy(anObject);
someInt
}
anObject
anotherObject

Classes often have a “copy” method to make an


actual copy of the class instead.

68
Storing Objects
public int myMethod() {
int someInt;
static boolean test;
OtherClass anObject;

anObject = new OtherClass();

OtherClass anotherObject;
test
anotherObject.copy(anObject);
someInt
}
anObject
anotherObject

An object of a class like OtherClass may


reference other objects.
A “copy” method may not always copy the
content of those references. That kind of copy
is called a “shallow copy”.
69
Storing Objects
public int myMethod() {
int someInt;
static boolean test;
OtherClass anObject;

anObject = new OtherClass();

OtherClass anotherObject;
test
anotherObject.deepCopy(
someInt
anObject);
anObject
}
anotherObject

A copy method that copies all of the underlying


objects is often called a “deep copy” of the
object.

70
What is stored in an ADT?

Take-away:
When you put an object into two ADTs, know whether you
expect to have each copy be shared or independent
- If shared, then put the reference into each ADT
- If independent then put a copy into each ADT
Understand if you need a shallow or a deep copy

71
Combine ADTs

You can combine ADTs to meet the need.


Example
You want to store all items in your house to be retrieved by
their colour.
- Store all items of the same colour in one set.
- Store these sets in a map where the key is the colour name and the
value is the set

72
Data structures with a fixed size

73
Array

A fixed-size linear sequence of items


Uses integers to identify the order of items in the
sequence
Start at index number 0 in many programming languages
- Historical context based on implementation efficiency

74
Declaring an array in Java

Creates a reference to an array, but there is no


String[ ] anArray;
actual array to store data yet.

String[ ] anArray = new String[10]; Creates the space for 10 entries


in an array. We see that an
array is treated like an object of
its own.

75
How would you create a 2d array?

76
Hash table

An organization of data in an array to let us search for an entry


quickly.
Key concept:
Use a formula to convert the key to store into an array index
- Called the “hash function”
Store the value in the array at the computed index value
Have rules to handle the case where two values are converted to the
same array index
- Called a “collision” in the hash table
In a moderately-filled array, you expect to find a search value in
constant time.

77
Hash table example

Array size: 13
Data stored: alphabetic lower-case strings
Hash function: the position in the alphabet of the first letter of
the string (starting at position 0)
Array index: take the hash value modulo 13

Expected collisions:
All the strings that start with the same letter end up at the same index
Two letters of the alphabet converge on the same index

78
Hash table example
apple
Add “quiet”
pancake
density Hash value is 16 (‘q’ – ‘a’)

Index is 16 mod 13 = 3

gorilla
Store “quiet” here
umbrella

yoyo

79
How to deal with hash table collisions

Have a data structure at each array index to catch all values that
belong at the index (called “open hashing”)
Linked list, binary tree, …
In-place: look for another “predictable” place in the array to
store the entry (called “closed hashing”)
Move forward k entries in the array until you find an entry spot
- Linear probing: k=1
- Quadratic probing: k follows a sequence 12, 22, 32, 42, …
- Double hashing: k is the result of applying a second hash function to the
value to be stored
More complex resolution schemes
- Eg. Cuckoo hashing

80
Hash table example with linear probing
apple
Add “mandrake”
pancake
density Hash value is 13 (‘m’ – ‘a’)

Index is 13 mod 13 = 0

gorilla
Try to store “mandrake” here, but the entry is full
umbrella

Linear probing: advance by 1 until we find an


empty entry

Store “mandrake” in this empty entry


yoyo

81
Hash table collisions

Other ways to handle hash table collisions


Linear probing (already seen)
Quadratic probing
Double hashing
- Use another hash function to tell you how much to jump ahead
Store a secondary data structure at each entry in the hash
table and put all items that map to entry into the secondary
structure
- Often use a linked list at each entry and call it “chaining”
Other specialized approaches, like cuckoo hashing and
Robin Hood hashing, also exist.
82
Collision management

Linear Quadratic Double Chaining


83 probing probing hashing
Hash tables

Performance of in-place solutions varies with how full


the array is (called the load factor)
Small load factor (50% or less), can expect constant time for
a search in most cases.
Load factor of 80-85% or more can lead to excessive
collisions
- Implementations may trigger a rebuilding of the hash table into a
larger array in these cases.

84
Using Hash Tables in Java

85 Code from https://beginnersbook.com/2014/07/hashtable-in-java-with-example/, January 16, 2020


Using Hash Maps in Java

86 Code from https://beginnersbook.com/2013/12/hashmap-in-java-with-example/ , January 16, 2020


Dynamically-growing structures

Arrays and hash tables start with a fixed size.


The structure becomes dynamic if you take the following steps
when you try to add beyond the fixed size:
Allocate / create a new array or table that is bigger than the current one
Copy all items from the current array or table into the new one
- This is the costly step
Add in the new data item

Common trick: when creating a new array or table, make it twice as big
- Not reallocating the table all of the time to grow by 1 or 2
- Have at most half of the space in the new array or table as unused

87
Dynamically-growing structures
apple
Given a hash table, why don’t we just pancake
copy over all the data directly? umbrella pancake
density
apple
pancake
The hash function for the ??
larger table may not put umbrella
gorilla
each value in the same density
place. umbrella

gorilla

88
Data structures without a fixed size

89
Basics

Typically create a class to store a single item


Class Node {
Node Some_references_to_other_nodes;

SomeClass value;
}

Note that the class includes an attribute that refers back to


the same class.

Different data structures arise in how we link these


single items together

90
Linked List

Informally, it’s a chain of data values


Explicit storage of the sequence of values
- Each value points to the “next” value
Variants
Sorted vs unsorted
Singly-connected – only forward pointers
Doubly-connected – forward and backward pointers
Circular –end point back to the front
- Can be singly- or doubly- connected
Linear progression through the list elements

91
Linked List

Can grow arbitrarily large


Has small additional cost for storing the order
information
Linear traversal incurs an efficiency loss for searching

Value Next Value Next Value Next Value Next

Class Node {
Node next;
int value;

public void add_after ( int new_value ) { Partial linked list node


Node new_node = new Node(); class, as a sample.
new_node.value = new_value;
new_node.next = next;
next = new_node;
92
}
}
Linked List

A very common data structure, so learn how to


implement them
Add
Remove
Search
Traverse

93
Variants of linked lists

Design differences
Singly-connected linked list (forward only)

Doubly-connected linked list (forward and backward)

Circular linked list (can start anywhere)

Implementation differences
Sentinel or dummy node as the start (list never empty)
f
Track both ends of the list (append is quick)

94
Binary search tree

A sorted organization of data values to make searching quick.


Parallels the checks of a binary search in a data structure.
Stored as a set of nodes.
Simplistically, each node stores
- A value
- A reference to the sorted data before the value
- A reference to the sorted data after the value
More precisely, each node u stores a reference to one node whose value
precedes the value in u and a reference to one node whose value
succeeds the value in u
- Not necessarily the immediate predecessor or successor
- You don’t reference a node that any other part of the data structure
references

95
Binary search tree

Balanced tree Unbalanced tree


42
81
left right
left right

31 81
31 99
left right left right
left right left right

16 39 80 99
16 39
left right left right left right left right
left right left right

80
left right

42
left right

96
Binary search tree

Locating an element in a binary search tree involves


descending the levels
Great search time on balanced trees
Can be linear time on severely unbalanced trees

Often use recursion when working with binary trees,


but that’s a convenience rather than a necessity.

97
Java type for a binary tree

public class BinaryTreeNode {


private BinaryTreeNode left;
private BinaryTreeNode right;
private BinaryTreeNode parent; // optional

private SomeDataType value;


}

public class BinaryTree {


private BinaryTreeNode root;
}
98
Finding a value in a binary search tree
public boolean find ( int value ) {
BinaryTreeNode current = root; // root from the class

// Walk a path from the root to where the node should be


while ((current != null) && (current.value != value)) {
if (value < current.value) {
current = current.left;
} else {
current = current.right;
}
}
if (current == null) {
// Went off the end of the tree, so not found
return false;
} else {
return true;
}
99 }
Common variation to track where you were
public boolean find ( int value ) {
BinaryTreeNode current = root; // root from the class
BinaryTreeNode previous = null;

// Walk a path from the root to where the node should be


while ((current != null) && (current.value != value)) {
previous = current;
if (value < current.value) {
current = current.left;
} else {
current = current.right;
}
}
if (current == null) {
// Went off the end of the tree, so not found
return false;
} else {
return true;
}
100
}
Binary search tree variants

Balanced binary tree


Always keep the height of the tree at its minimum
Heuristics balancing
Maintain properties in the tree that keep the tree mostly balanced
- Red-black trees
- Weight balanced trees
- Height balanced trees
- AVL trees
Restructure the tree to optimize frequent searches – self-adjusting trees
- When you search for an element in the tree and find it, restructure the tree to
make this node quicker to find next time – move the node towards the root of
the tree
Both rely on rotation operations

101
Sample Binary Search Tree Rotation

n k

n
k

d a

m m

d
a

b c b c

102 Tree level of k, n, a, and d change


Sample Binary Search Tree Rotation
m
n

k n
k

a b c d
a

b c

103 Tree level of m, n, b, and c change


Binary search tree rotations

Constantly doing binary tree rotations can be


Costly
More error prone, just for the complexity of the rotations
Limiting in multi-threaded trees

Rotations can lead to


A balanced tree, which makes searches faster
Balance robustness relative to the order in which data is
added to the tree

104
Min-Heap

A heap is a data structure that allows you to remove


the smallest element efficiently.
Not intended for you to search for elements
Not intended for you to remove arbitrary elements (although
you can get the code to do it

There is a variant called a Max-Heap to let you remove


the largest element efficiently.

105
Min-Heap

Most often described / modeled as a binary tree:


The parent is smaller than both children
The binary tree remains balanced
The children are not necessarily stored in any particular order
27
left right

85 32
left right left right

86 91 100
left right left right left right

106
Min-Heap

The binary tree thought, as a complete binary tree, is


often implemented through an array:
Store the root at array index 1
The children of node at index x are found at indices 2x and
2x+1
The array looks like you have stored the levels of the binary
tree one after the other
27
left right
unu
27 85 32 86 91 100
85 32 sed
left right left right

86 91 100
left right left right left right
107
Min-Heap

To add an item:
Store it in the next spot in the bottom level
Continually swap it with its parent if it is smaller than the
parent
27 27 27

85 32 85 32 85 18

86 91 100 86 91 100 18 86 91 100 32

18

85 27

86 91 100 32
108
Min-Heap

To remove an item:
Remove the top-most item.
Move the last item in the lowest level to the top
Continually compare this moved item to its two children and
swap it with its smallest child

27 100 32

85 32 85 32 85 100

86 91 100 86 91 86 91

109
Min-Heap

A min-heap is a common implementation of a priority


queue.

110
Graph

An ADT for a relations between elements.


Defined by a set of vertices V and a set of edges E
where E captures the relations (subset of V x V)
Operators include add/remove vertex or edge, access
adjacent vertex, test if an edge exists, and traverse

Example:
Graph of people who know one another
- V = set of people, E = edges between people who know each other

111
Graph

Graph of people who know each other

Jill

Jack
Meg

Rob Todd

112
Sample uses of graphs

Finite state machine


Transition model representation
Process flow
Computer network topology
Neural network
Language structure description

113
Graph representation

What data structures let you store a graph?


Adjacency matrix
- Have a |V| x |V| matrix, with each element of V represented in a row and a
column
- Put a value of 1 in the matrix where the element in the row and the column
are related
Incidence matrix
- Have a |V| x |E| matrix, with each element of V represented in a row and each
edge represented by a column
- Put a value of 1 in the matrix where the row is an endpoint of the edge that is
given by the column
Adjacency list
- For each vertex, store the list of vertices to which it is related
Why choose one over another?
114
Graph

Graph of people who know each other

Jill

Jack
Meg

Rob Todd

115
Adjacency Matrix

Jack Jill Meg Todd Rob


Jack 0 1 0 0 1
Jill 1 0 1 0 1
Meg 0 1 0 1 0
Todd 0 0 1 0 1
Rob 1 1 0 1 0

116
Incidence Matrix

E1 E2 E3 E4 E5 E6
Jack 1 1
Jill 1 1 1
Meg 1 1
Todd 1 1
Rob 1 1 1

117
Adjacency List

Vertex Neighbours
Jack {Jill, Rob}
Jill {Jack, Meg, Rob}
Meg {Jill, Todd}
Todd {Meg, Rob}
Rob {Jack, Jill, Todd}

118
Types of Graphs

Edge influence
Undirected: you can traverse an edge in any direction
Directed: there is just one way by which you can follow an
edge
Connectivity
Connected: you can get from one vertex to any other vertex
Unconnected: there are some vertices that can’t reach one
another

119
Types of Graphs

Edge multiplicity
Simple: there are no edges back to the same vertex (a loop)
and at most one edge between pairs of vertices
Multi-edge: you can have loops and several edges between
the same pair of vertices.
Edge weight
Unweighted: no values associated with edges
Weighted: edges can have a “weight”, either as a cost to
traverse, a number of times that you can use it, a capacity for
the edge, …

120
Common graph problems

Traversals (depth-first or breadth-first)


Shortest path
Graph cycle detection
Minimum spanning tree
Connectivity
Network flow
Topological sorting

121
Text converter problem

Convert a character annotation of text into html


_ for italics; becomes <i>xxx</i>
* for bold; becomes <b>xxx</b>
- for a list item; becomes <li>xxx</li> within a <ul>…</ul> pair
A blank line represents a new paragraph
Remember in HTML that
tags must be properly nested
all text lies inside paragraph tags <p>
More tags could be added later

122
Text converter problem -- sample

The quick *brown* fox _jumps_ over the _*lazy*_ dog.

For another day or two.

Becomes (formatting aside)

<html>
<body>
<p>The quick <b>brown</b> fox <i>jumps</i> over the
<i><b>lazy</b></i> dog.</p>
<p>For another day or two.</p>
</body>
123
</html>
Spreadsheet problem

A spreadsheet is a 2-dimensional grid of cells. Each cell is


either blank, contains a number, contains a string, or contains a
formula. A formula can include references to other cells of the
spreadsheet. When a cell A contains a reference to another cell
B, we must ensure that cell B’s value is calculated before trying
to calculate the value of cell A.

Describe an efficient data structure to store the cells of a


spreadsheet. The data structure will primarily be used to
recalculate the value of each affected cell (and all other cells
that rely on its value) when one cell value changes.

124
Navigation simulation

The city of Halifax wants to assess the flow of traffic in the city.
They will provide a map of the city, the maximum speed of cars
on each street, and whether intersections are stop signs, traffic
lights, or roundabouts.

Given a rate at which cars enter the city from each road, and
having each car choose a destination on the peninsula
(weighted more to downtown destinations), we will simulate how
the traffic flows through the city and which roads or
intersections have the greatest or smallest delays.

125
Features of data structures

Linked list
Simple to implement
Lots of flexibility in deciding the order of items
- The burden is on you to maintain that order
Search time not too critical
Binary search tree
Data has some order to it
Want efficient searches
Ok with one of
- Complex code to keep the tree balanced
- Approximations to a balanced tree are ok
- You’re expecting the order of added data will keep the tree semi-balanced
Heap
The data is ordered and you only ever want the biggest/smallest item
Graph
126 You are representing items _and_ connections or relations between items
Data structure performance comparison

127 Image from https://medium.com/omarelgabrys-blog/data-structures-a-quick-comparison-6689d725b3b0


How do you choose a data structure?

Look at the operations that are needed


Look at the set of constraints or assumptions allowed
Look at the context of use
Are there any characteristics to the data to be stored?
Which operations are more important?
How much data will you store?
How do you define “best”?
Look at the short-term and long-term effort needed to create it
and to maintain it
Look at what else is available in the libraries, program, or
experience set

128
Definitions of “best”

Incomplete list:
Fastest / most efficient
Least memory
Ease of understanding
Ease of maintaining
Ease of implementing
Scalability
Parallelizability
Portability
Serializability
Flexibility

129
Java classes

Concrete class
Abstract class
Interface

When do you use each one?

130
Concrete class

Describes the complete implementation of a class


Essentially describes a data structure or organization of data
Can be instantiated
Can implement multiple interface classes

131
Concrete class syntax

public class point {


private float x, y;

static void add (Point p) {


x += p.x;
y += p.y;
}

static void print ( ) {


System.out.println( “Point at ” + x + “, “ + y);
}
}

132
The issue with concrete classes

They connect the “how” of the implementation with the


“what” of the class
Don’t allow you to code other parts of the program with the
idea of an abstract data type

133
Interface

Describes the method interfaces and constants of the


class
Essentially describes an abstract data type
- Could still be type-specific
Cannot include variables
Cannot be instantiated
Can only extend other interfaces

134
Interface syntax

public interface int_queue {


public void add( int value );
public int remove ( );
}

Cannot say : int_queue varName = new int_queue();

Can say: int_queue varName;


varName = new int_queue_implementation_class()

135
File compressor -- interfaces to the files

136
File compressor – implementation of a reader
class

137
File compressor with an interface

138
Abstract Class

Describes methods, method interfaces, constants, and


variables of the class
More concrete than an abstract data type
Still leaves some implementation decisions to be decided
Cannot be instantiated
Can extend concrete classes and abstract classes, and
can implement interface classes

139
Abstract class syntax
public abstract class tournament {
private String tournament_name;
public abstract void add_team ( String team_name);
public abstract int number_of_teams ( );
public void set_tournament_name ( String name ) {
tournament_name = name;
}
public void print_signup_sheet ( ) {
int i;
System.out.println( “Tournament sign-up: “ + tournament_name);
for (i = 0; i < number_of_teams( ); i++) {
System.out.println( i + “. _____________”);
}
}
140
}
Example

Stack implemented using an array


Concrete class
Stack definition of push(), pop(), and size() methods
Interface
Stack definition of push(), pop(), size(), and print()
methods with an implementation of print() that just
uses push(), pop(), and size()
Abstract class

141
When do you use each class type?

142
Java Collection Framework

Image from https://en.wikipedia.org/wiki/Java_collections_framework, accessed September 18, 2018


143
Text converter problem

Convert a character annotation of text into html


_ for italics; becomes <i>xxx</i>
* for bold; becomes <b>xxx</b>
- for a list item; becomes <li>xxx</li> within a <ul>…</ul> pair
A blank line represents a new paragraph
Remember in HTML that
tags must be properly nested
all text lies inside paragraph tags <p>
More tags could be added later

144
Text converter problem -- sample

The quick *brown* fox _jumps_ over the _*lazy*_ dog.

For another day or two.

Becomes (formatting aside)

<html>
<body>
<p>The quick <b>brown</b> fox <i>jumps</i> over the
<i><b>lazy</b></i> dog.</p>
<p>For another day or two.</p>
</body>
145
</html>
Navigation simulation

The city of Halifax wants to assess the flow of traffic in the city.
They will provide a map of the city, the maximum speed of cars
on each street, and whether intersections are stop signs, traffic
lights, or roundabouts.

Given a rate at which cars enter the city from each road, and
having each car choose a destination on the peninsula
(weighted more to downtown destinations), we will simulate how
the traffic flows through the city and which roads or
intersections have the greatest or smallest delays.

146
Navigation simulation

In:
Map of Halifax
Street info like speed limit
Types of intersections
For entry streets to the city, the rate at which cars arrive and
a destination for each car

Out:
Simulate traffic flow
Ultimately: average travel time and bottlenecks

147
Iterators

A Java interface class to process each element of a Collection


Methods:
hasNext()
next()
remove() (not always present)
forEachRemaining( Consumer action ) (not always present)
Specific iterators may have additional methods
Eg. ListIterator has hasPrevious() and previous()
Order isn’t usually guaranteed
Though order is usually predictable

148
Iterators – typical use

ArrayList<String> collection;

Iterator<String> iterate = collection.iterator();

while (iterate.hasNext()) {
String data = iterate.next();
// do something with ”data”
}

for ( Iterator<String> it2 = collection.iterator(); it2.hasNext(); ){
String data2 = it2.next();
// do something with “data2”
}

for( String data3 : collection ) { // like an implicit iterator
// do something with “data 3”
149 }
Why use iterators?

Isolates your code from the implementation of an ADT


Uses a common practice that others will recognize

Limitations
Don’t want your data structure changing under you at the
same time
Generally uni-directional

150
Anatomy of a process
memory

Your compiled program


Code

Global variables Global variable, static variables

Call stack
Local variables, history of method calls

Heap Objects created with “new”

151
The call stack

A stack is a last-in first-out (LIFO) data structure.


Calling and returning from functions acts in a LIFO
manner
We can use a stack to track which function in which we are
currently executing
All of the data relating to the execution of a function is
grouped into a stack frame
- The data is a picture of the state of the function
- The stack frame holds all the parts of the picture together

152
The call stack

When we call a function, we create a stack frame for it and add


the frame to the call stack
The stack frame holds
The instruction to execute in the calling function when the function ends
Hardware information (register values) that the current function will
change and need to be restored for the calling function
The parameters for the function
Space for local variables for the function
When the function ends, the stack frame for the function is
removed from the call stack, so all the information disappears

153
The call stack
hardware hardware
hardware
Register Register
values Register
values values
main(){ foo(int i, int j){
foo(int i, int j){ bar local
… First thing
First thing variables
foo( a, b ) …
… Register
X bar( j )
bar( j ) values
… Y
} Y Value of j
… …
foo local } Y
}
variables
Register foo local
values variables
Value of a i in foo Register
values
Value of b j in foo
Value of a
X
Value of b
Main local Main local
X
variables variables
Main local
154 variables
The call stack
hardware
When we set a
Register value in bar, we
values change the value in
bar(int n){ the stack frame of
bar local bar local bar. When foo

variables variables runs, its copy of the
n = 42;
Register Register value isn’t changed!

values values
return;
Value of j } 42
Y Y
n in foo n in foo
foo local foo local
variables variables
Register Register
values values
Value of a i in foo Value of a i in foo
Value of b j in foo Value of b j in foo
X X
Main local Main local
155 variables variables
The call stack
hardware hardware
Register Register
values values
bar(int n){ foo(int i, int j){
bar local bar local
… First thing
variables variables
n = 42; …
Register Register
… bar( j )
values values
return; Y
42 } 42 …
Y X } When we return
n in foo from bar, we restore
foo local foo local
variables variables the previous
Register information from the
Register
values values stack.
Value of a i in foo Value of a i in foo
All the information
Value of b j in foo Value of b j in foo
from bar is still lying
X X around in memory,
Main local Main local though.
156 variables variables
The call stack
hardware
Register
values frotz local
foo(int i, int j){ variables
bar local bar local
First thing
variables Register variables

Register values Register
bar( j )
values Value of i values
Y The stack frame for
42 Value of j 42
… frotz overlaps the
X frotz( i, j); Z X
space previously
Z used by bar.
foo local foo local
… variables
variables
} All the old values
Register Register
values values for bar are still in
Value of a i in foo Value of a memory, so that’s
Value of b j in foo Value of b what frotz sees as
X X data unless you
initialize your
Main local Main local
variables.
variables variables
The call stack – recursive call
hardware
Register
values foo local
foo(int i, int j){ variables
First thing
Register

values
foo (i-1, j-1) i in recursive call to foo
9
Z j in recursive call to foo
41

} Z

foo local foo local


variables variables
Register Register
values values
10 i in foo 10
42 j in foo 42
X X
Main local Main local
variables variables
The call stack – recursive call ends
hardware
Register
values
foo local
variables foo(int i, int j){
First thing
Register …
values foo (i-1, j-1)
9 Z
41 …
Z return
}
foo local foo local
variables variables
Register Register
values values
10 i in foo 10 i remains from before
42 j in foo 42 j remains from before
X X
Main local Main local
variables variables
Problem solving

How would you approach getting a program to solve a


maze?

160
Problem solving

Wander randomly and remember your path


Can wander over the same space
Clean up the path at the end
Use a rule to explore deterministically
A rule must exist that will succeed
Send people off in parallel to each take their own path and
report back
Needs coordination among all of the people
Backtrack when you meet a deadend
Must remember past decisions so that you can change them when
needed

161
Problem solving - recursion

Applicable when you have a problem to solve that can


be decomposed into smaller problems of the same type
Needs a simplest case where it can stop
May need to remember some past data
May need to recombine some of the work from the smaller
problems after they are solved.

Programming parallel to mathematical induction

Examples?
162
Recursion

Appears in programs as a method that calls itself with a smaller


instance of data

Canonical example: Fibonacci numbers

int fib (int n) {


if (n<= 0) {
return 0;
} else if (n<=2) {
return 1;
} else {
return fib(n-1)+fib(n-2)
}
163
}
Recursion

The calls “remember” the past data in the call stack

Recursive algorithms can also be solved iteratively


With iteration:
- You are responsible for “remembering” all of the past data
- You are responsible for knowing where to keep going if you
backtrack

164
Common uses of recursion

Binary tree traversals


Binary search
Divide and conquer style of problem solving
Mergesort, quicksort
Backtracking
Lets the stack remember past decisions for you and “undo” changes to
local variables when you reach a dead-end
State space exploration
Easy to launch search in multiple directions
Want to remember the places where you have been so you don’t visit
them again

165
Binary tree traversal

d Traversal means navigating


through all nodes in the tree.

3 types:
b e - Pre-order traversal
- In-order traversal
- Post-order traversal

Differ in whether you process


a c g
a node before its children,
between its two children, or
after its children

166
Binary tree pre-order traversal

d Process a node before its children.

Pre-order( Node n ) {
if (n != null) {
b e process n
Pre-order (n.left);
Pre-order (n.right);
}
}
a c g

Order on this tree:


dbacegf
f

167
Binary tree in-order traversal

d Process a node between its


children.

In-order( Node n ) {
b e if (n != null) {
In-order (n.left);
process n
In-order (n.right);
}
a c g
}

Order on this tree:


abcdefg
f

168
Binary tree post-order traversal

d Process a node after its children.

Post-order( Node n ) {
if (n != null) {
b e Post-order (n.left);
Post-order (n.right);
process n
}
}
a c g

Order on this tree:


acbfged
f

169
Binary tree breadth-first traversal

d Process a node before its children.

BFT( Node n, Queue q ) {


if (n != null) {
b e process n
q.add (n.left);
q.add (n.right);
BFT( q.remove(),q
);
a c g
}
}
Order on this tree:
dbeacgf
f

170 Example of tail recursion


Binary tree breadth-first traversal
(non-recursive version)

d Process a node before its children.

BFT( Node n ) {
Queue q;
b e while (n != null) {
process n
q.add (n.left);
q.add (n.right);
n = q.remove();
a c g
}
}

171
Recursion

Pro
Code can look simpler
Fewer lines of code
The call stack manages data to remember
Naturally fits some problems

Con
Can consume lots of stack space
Typically less time-efficient than iterative solutions
Can inadvertently solve the same sub-problem multiple times
- Consider memoization
172
Keys to recursion

Have all of the stopping cases


Ensure that each recursive call does provide a smaller
problem instance

Practice

173
Sudoku

174
Defensive Programming

175
Defensive Programming

It’s about a programming style that buffers your


implementation from errors in how other parts of the
program may use your code or methods.

176
Defensive Programming for Robustness

Robustness: Ensure that your program as a whole


continues to run no matter what bad information
comes its way

Correctness: Ensure that your program never returns


an inaccurate result

The two concepts are different!

177
Defensive Programming

Defensive programming comes at a cost


Run time cycles to check for odd cases
Memory if adding check information to data structures
Maintenance of defensive programming code
Potential for errors in the defense code

Find the degree of defensive programming that


matches your context

178
How can others influence your code?

User input
Parameter values
Resource permissions
Environment variables
Data read in
Files
Database
Network

179
Input Validation

Decide on a consistent model on how to handle bad


input data
Pretend the method succeeded in a “vacuous” manner?
Have the method fail automatically?
Throw an exception?
Return an error code?

180
Input Validation – Check for unexpected data

Objects (eg. String, Integer, ArrayList, …)


Watch for null objects
Watch for objects with no data in them
Formatted data (eg. a date from a user in yyyy-mm-dd
format)
Double-check the format of the data coming in

181
Input Validation – Check for unexpected data

Data ranges or enumerated answers


(eg. user response of “yes” or “no”;
day number in a month)
If you’re expecting data to be in a range,
check for that range

Special characters
Scan strings for any characters that might have a special
meaning to other libraries where you plan to pass the data
- Eg. & character if you’re sending out HTML
; character in an SQL statement
“ character in a string
182
Input Validation – Check for unexpected data

Test the length of input data, if it has a potential of


making a difference to your code
Strings and buffers are notorious here.

Tables, arrays, or more complex data


structures contain meaningful data on
which to operate

183
Input Validation

Generally a pile of “if” statements in your method where input


data comes in
Acts as as preconditions to continue with the method

Often exploit a common compiler optimization


In a big conjunction for an “if” statement, the conditions are evaluated
left-to-right and stop as soon as one is false
- Consequence: when you reach a condition then you assume that all the ones
to the left of it in the expression are true
Sample use:
- If ( ( node != null ) && !node.word.equals( “” ) )
- The “node.word.equals” would crash if node were null, but that case is
cleared with the earlier part of the expression

184
Return Codes

185
Return Codes

Have functions return information about how the


computation ended
Successfully
A category of error

Come in addition to returned information

186
Return Codes

Many return codes built structure or meaning into the


codes
Eg. HTTP return codes
- 100-199 – informational response
- 200-299 – successful operation
- 300-399 – redirection response
- 400-499 – client-side data error
- 500-599 – server-side error
Individual numbers gave
more information about the
nature of an error.

187
Common Structure in C

Common to be the return value of the function while the function’s


actual data returns as a pass-by-reference parameter

Eg. int myFunction ( int inParameter, char *outParameter );


Constant to be defined elsewhere
Caller then does as the success return code

if (myFunction( in, &out ) != OK) {


/* Do error handling */
} else {
/* Continue with good case code */
}

188
Return Codes

Advantages
Portable concept across many languages
Easily recognized
Can structure the codes

Disadvantages
Error-handling merged with regular control flow
Need to coordinate the meaning of the return codes
Relies on the calling function to check for and act on errors

189
Exceptions

190
Exceptions

Report a not-uncommon problem to


your calling method

Use exceptions for error situations


that you anticipate and whose origin
may be out of your scope
Eg. bad input from a user, path a to a file that doesn’t exist

191
Exceptions

Not all languages include exceptions


An updated way to report an error condition to a calling
method
Generally an upgrade to return codes

Don’t use exceptions to just “pass the buck” to


someone else to handle an error

192
Exceptions

Exceptions are objects like any other in the system


They store information
The belong to a hierarchy and can inherit data and methods
from their superclass
You need to create one to send it back

193
Exception Hierarchy

194 Image from https://docstore.mik.ua/orelly/java/langref/ch09_04.htm


2 Parts to Exceptions in Java

Sending an exception out of one method


Declare that the method might send out
an exception
Create an object of the exception type
Return the exception object with the “throw” keyword
Receiving an exception in a calling method
Be prepared to receive an exception by placing the called
code in a “try” block
List the exceptions that you will handle along with the code
to handle it in a “catch” block
Provide clean-up code in a “finally” block
195
Throwing an Exception in Java

public void myMethod ( void ) throws IOException {


if ( exceptional file case detected ) {


throw new IOException( “Exception message” );
}


}

196
Catching an Exception in Java

public void someMethod( void ) {



try {
myMethod( );
} catch ( IOException e ) {
// Do something with IOException and data in object
“e”
} catch (Exception f) {
// Do something else for another exception type
} finally {
// Do code that runs no matter how we end
}

197 }
Java File Handling Example
//Java program to demonstrate FileNotFoundException
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;

class File_notFound_Demo {

public static void main(String args[]) {


File file = null;
try {

// Following file does not exist


file = new File("E://file.txt");

FileReader fr = new FileReader(file);


} catch (FileNotFoundException e) {
System.out.println("File does not exist");
}
}
}
Doesn’t close the file!!

198 Code modified from https://www.geeksforgeeks.org/types-of-exception-in-java-with-examples/


Java File Handling Example – Extra Care
//Java program to demonstrate FileNotFoundException
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;

class File_notFound_Demo {

public static void main(String args[]) {


File file = null;
try {

// Following file does not exist


file = new File("E://file.txt");

FileReader fr = new FileReader(file);


} catch (FileNotFoundException e) {
System.out.println("File does not exist");
} finally {
file.close()
}
}
}

199 Code modified from https://www.geeksforgeeks.org/types-of-exception-in-java-with-examples/


Java File Handling Example – Try With Resource
Example
//Java program to demonstrate FileNotFoundException
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;

class File_notFound_Demo {

public static void main(String args[]) {


try (new File file = new File("E://file.txt"); ) {

FileReader fr = new FileReader(file);


} catch (FileNotFoundException e) {
System.out.println("File does not exist");
}
}
}

Will automatically invoke the close() method at the end

200 Code modified from https://www.geeksforgeeks.org/types-of-exception-in-java-with-examples/


Multiple Catch Statements

The catch statements are checked in order


The first one to match gets the exception
Consequence: have the specific exceptions before the
general exceptions

201
Exception Blocks Good Practices

Do not leave a catch block empty


Basically ignores that an error has happened, which doesn’t fix the
problem
Include enough information in the exception to understand the
error
You can create your own exceptions if existing ones don’t have enough
information for you
Know which exceptions are thrown to your code
Standardize your project’s use of exceptions
Catch specific exceptions when you can
Can include a more general catch-all exception after the specific ones

202
Sizing Exceptions

How big should your try block be?


Only as much code as may fail in a consistent operation

How detailed should your catch parameter be?


Be as specific as you can reasonably be

203
How much is too much?

Some exceptions are very specific


Eg. Array index out of range

Does that mean you should have every array access


within a try block in case you have a bad index?
No. Use a try block on code where there is some external
influence contributing to the error.
If your own logic is generating the error then find it in
debugging or use assertions.

204
205
Programming paradigms
C, Fortran, Cobol
Procedural programming
Generally focuses on the operations, steps, and transformations needed
to achieve an outcome
Object oriented programming Java, C++, Python
Focuses on the data, concepts, or elements around which computation is
happening
Lisp, ML, Haskell, OCaml
Functional programming
Program flow modeled as a composition of function calls
Prolog
Logic programming
Focus on the rules behind all the computation and let the running
environment look to combine rules as they apply to reach an answer.

206
Object oriented design

Design around the program’s data


Gather all of the methods or transformations on one
piece of data into one place
Call that grouping a ”class”
Allow a hierarchical organization to abstractions of the
data
Extend classes and inheritance

207
Slides courtesy Dr. Alex Brodsky
Approach from Chapter 12 of Big Java: Late Object by
Cay Horstmann
The Software Development Life Cycle

Requirements gathering
Analyse
Design
Implementation
Quality Assurance Maintain Design

Deployment
Maintenance

Deploy Implement

Test
Requirements Gathering
The first step of any software project is to figure out what it is that
you are building
- Functional requirements: What should it do?
- Nonfunctional requirements: Other
Requirements are gathered from:
- RFP: Request for Proposal
- Existing user base
- Prospective user base
- Management
- Etc.
A Software Requirements Specification (SRS) is a document that
specifies the software to be built: all functional and non-functional
requirements
Functional Requirements
Functional requirements describe
the functions of the proposed software
- What should it do?
- What resources can it use?
- What performance benchmarks should it meet?
- What other systems should if integrate with?
- What security standards should it meet?
Example: The assignment and practicum specifications:
- Input
- Processing
- Output
This does not typically specify the “how”, just the “what”
Nonfunctional Requirements

Nonfunctional requirements specify requirements not related


to software behaviour
- Language requirements
- Budgetary requirements
- Implementation deadlines
- Etc
Examples:
- You have 2 week to implement Assignment 6
- You have 90 minutes for Practicum 5
- You must use Java to implement your assignments and
practicums
Object Oriented Design

Once we have a requirements specification, what’s next?


Assumption: We will use Object Oriented Design
Hence, we will need to create classes and objects that
collaborate to meet the requirements
Hence. We will need to figure out
- What classes (and objects) we need?
- What do these classes and objects do?
- How do these classes and objects collaborate?
Discovering Classes

Goals:
- Identify what classes / objects we can reuse
- Identify what new classes / objects we need
- Identify what methods the classes / objects need
- Identify how the classes / objects interact
The Noun-Verb Approach

Idea:
- Use nouns from the problem domain to identify classes
- Use verbs associated with the nouns to identify methods for the
classes
Example: The Bank Simulation:
- Nouns: Bank, Teller, Client, Person, Line-up (Queue)
- Verbs:
Bank: Open, Close
Person: Arrive, Leave
Teller Serve Client
Client: Line-up, Get Served
Design is…

Iterative
You will make mistakes and will revisit the design
About tradeoffs and priorities
Review your software quality objectives
Nondeterministic
Managing complexity and restrictions
Heuristic

216
Design – Manage Complexity

Accidental Complexity
Complexity that we inherit from the processes, environment,
choices
Often disconnected from the base problem itself

Eg. iOS look-and-feel for an iPhone


specifics of how SQL works in a mySQL database
libraries (or lack thereof) in our programming language
constraints on managing resources

217
Design – Manage Complexity

Essential Complexity
Complexity that arise from the problem or the interlocking
set of concepts in the solution
Arise no matter where or how we deploy the solution

Eg. dividing information among tables in a database


connection between a user interface and a simulation model
balancing a binary tree for an efficient search algorithm
details of Dijkstra’s algorithm for finding shortest paths

218
Design

Minimize the quantity of essential complexity that you


need to remember at one time

Prevent accidental complexity from proliferating


throughout the solution

219
Good Design?

Loose coupling and high cohesion


SOLID properties
Code Complete list
• Minimal complexity • Low-to-medium fan-out
• Ease of maintenance • Portability
• Loose coupling • Leanness
• Extensibility • Stratification
• Reusability • Standard techniques
• High fan-in

220
Levels of Design

From biggest to smallest


Software system
Subsystems or packages
Architectures often
Common subsystems help here
- Business rules (later in the course)

- User interface
- Database access
- System dependencies
Classes
Methods
Method algorithms

221
Class-level Design Steps
Single Responsibility
Identify the objects and their attributes Open / Closed Principle
Mind Map
Determine what can be done to each Noun-Verb Approach
object Liskov Substitution
Determine what each object is allowed to Principle
CRC Method
do to other objects Interface
Determine parts of each object that will be Segregation
Dependency
visible to other objects Inversion
Define each object’s public interface

222
DSU Online Elections

The DSU Executive will call an online election. They


will appoint an election officer who will then validate
and publish a slate of candidates for each DSU position
as well as zero or more plebiscite yes/no questions.
The election officer will obtain a list of valid students
from the Registrar’s Office as candidate voters. At the
election, the election officer will open voting for a pre-
determined period of time. Candidate voters will
authenticate themselves to the voting system and then
cast a vote for each DSU position and plebiscite. After
the close of the election, the election officer will
223 tabulate and publish the results.
Brainstorm Elements

224
DSU Online Elections – Nouns and Verbs?

The DSU Executive will call an online election. They


will appoint an election officer who will then validate
and publish a slate of candidates for each DSU position
as well as zero or more plebiscite yes/no questions.
The election officer will obtain a list of valid students
from the Registrar’s Office as candidate voters. At the
election, the election officer will open voting for a pre-
determined period of time. Candidate voters will
authenticate themselves to the voting system and then
cast a vote for each DSU position and plebiscite. After
the close of the election, the election officer will
225 tabulate and publish the results.
Do’s and Don’t of the Noun-
Verb Method
Nouns should be concrete or conceptual from the problem
domain:
- E.g.,
Good classes: Student, Course, Grade, Grade Summary
Bad classes: Grade Sorter, Grade Summarizer
Existing Classes may be named differently
- E.g., Line-up vs Queue
Avoid turning actions into classes
- E.g., SortGrades
Don’t over-do. Decide when various data items in your
program can be represented using basic types
What’s Next?

After identifying classes we need to figure out


- What they do: Responsibilities
- How they interact: Collaborate
This is an iterative process
- We consider each class and ask the above two questions
- We then revisit the classes we have looked at earlier and refine
our answers
We use the CRC method
The CRC Method

CRC = Class, Responsibilities, Collaborators


Idea:
- Use an index card for each identified class
- Divide card into three section:
Class name
Responsibilities : The verbs / methods that the class is
responsible for implementing
Collaborators : Other classes / objects that will be used to
implement the responsibility
- Iterate through all the verbs / methods and add them to the
responsibilities section of the respective class
- Identify the collaborating classes
- Ensure that collaborators provide the necessary methods in their
responsibilities
DSU Online Elections – CRC

DSU President
Voter
Appoint election officer Election
Authenticate Election
Vote for position Position slate
Vote on plebiscite Position vote
Plebiscite
Election officer Plebiscite vote
Create slates Position slate
Validate slates Plebiscite
Publish slates Election Election
Create plebiscite
Load valid students Set position slate Election officer
Open voting Set plebiscite Voter
Close voting Open Position slate
Tabulate results Close Plebiscite
Publish results Authenticate voter
Record position vote
Record plebiscite vote
Tabulate results
229
DSU Online Election

Position slate
Position vote
DSU President

Election
Voter Voter list

Plebiscite Plebiscite vote


Election Officer

Registrar’s Office

Messy, but it’s a start … not the end design.


230
CRC Outcome

Obtain a preliminary map of classes and relations


between classes.

Apply refinements to these classes


SOLID properties
Abstractions
Other heuristics

231
Heuristics for Design

Form abstractions
Encapsulate implementation details
Use inheritance
Hide secrets
Identify areas likely to change
Anticipate different degrees of change
Keep coupling loose
Look for common design patterns

232
Form Abstractions

Keep to abstracts to focus on the big picture.


Include abstractions whenever possible to allow for
Portability
Delaying the point when you need to commit to
implementation details

See SOLID Dependency Inversion

233
DSU Online Election

Question Position slate


Position vote

Plebiscite Plebiscite vote


DSU President

Election
Voter Voter list

Person

Election Officer

Registrar’s Office

Messy, but it’s a start … not the end design.


234
Encapsulate implementation details

Resist exposing implementation details


Provide a consistent level of access across all public
methods of a class
Provide a consistent level of abstraction across all
classes

Complements abstraction

235
Use inheritance

Seek commonalities across classes


Gather the commonalities into a base class
Encode the common code and attributes as the base class

236
Hide secrets – information hiding

Do not let other packages or classes access the details


of another class
Resist public (or even protected) attributes
Avoid having other classes rely on knowing which algorithm
you are using

Hide the complexity of the task or the


solution
Hide or isolate areas that are more
likely to change
237
Barriers (perceived or not) to hiding secrets

Some information is used everywhere / must be


distributed
Opportunity to redesign to simplify and centralize the key
distributed data
Design includes circular dependencies of information
Re-encapsulate data, but beware breach of single responsibility
Confusion between class data and global data
“Global data” probably belongs in a separate config class
Performance penalties (perceived or real)
Question whether it’s truly performance or convenience

238
Identify areas likely to change

Separate items likely to change from more-stable items


Isolate the change

Typical areas of change


Business rules
Hardware dependencies
Input and output
Non-standard language features
Tricky design or algorithm areas
Status variables
239
Anticipate different degrees of change

Design so that something perceived as a small change


should have a small scope of impact

Don’t let a small change become the 100 pound gorilla on


your back

240
Keep coupling loose

…said several times before in the class.


Nothing new to add.

241
Look for common design patterns

Transformations or actions that you do repeatedly


should be collected together
Would likely lead to refactoring later if you didn’t
Seek common “well known” solutions
- Boilerplate solution
- Company standard on how to address the problem
- Industry best practice “design pattern” solution
- Solution already encapsulated in a library

242
Design Considerations

Aim for high cohesion


Build hierarchies
Formalize class contracts
Assign responsibilities
Design for testing
Choose the binding time consciously
Make central points of control
Keep your design modular

243
Design practices

Iterate, iterate, iterate


Divide and conquer
Top-down and bottom-up design
Experimental prototyping
Collaborative design

244
Common Design Criteria -- Cohesion

Cohesion is a measure of relatedness to a single idea


or responsibility within a method, class, or package
A measure of how well everything stick together
Aim for high cohesion

Low cohesion means that either


You need to look to many methods, classes, or packages to
get a task done because the pieces are fragmented or
One method, class, or package is trying to do a lot of
different things, which makes the code difficult to
understand
245
Cohesion

Bad cohesion Good cohesion


Too fragmented

Bad cohesion
Overloaded

246
Common Design Criteria – Coupling

Coupling is a measure of dependence between classes


or between packages
Aim for low or loose coupling

High coupling makes your code difficult


to change because of the ripple effect of
changes that must be carried through to
all the tightly coupled modules

247
Coupling

High Coupling / Low Coupling /


Tight Coupling Loose Coupling

248
Cohesion vs. Coupling

Image from StackOverflow, attributing it to


249
https://www.coursera.org/lecture/object-oriented-design/1-3-1-coupling-and-cohesion-q8wGt
Design Principles – SOLID

Single responsibility principle


Open / closed principle
Liskov substitution principle
Interface segregation principle
Dependency inversion principle

250
Single Responsibility Principle

“A class should only have a single responsibility, that


is, only changes to one part of the software's
specification should be able to affect the specification
of the class.” (https://en.wikipedia.org/wiki/SOLID)

251
Single Responsibility Principle

Principle kept

Principle not kept

Principle really not kept

252
Single Responsibility Principle – bad example

Public class student {


public String getName();
public int getAge();
public int getFeesOwing();
public int[ ] getRegisteredCourses();
public boolean hasCheckedOutLibraryBook();
}

253
Single Responsibility Principle – correct use

Public class studentInfo {


public String getName();
public int getAge();
}
pubic class studentFinances {
public int getFeesOwing();
}
public class studentRegistration {
public int[ ] getRegisteredCourses();
}
public class studentLibrary {
public boolean hasCheckedOutLibraryBook();
}
254
Single Responsibility Principle – correct use

Public class student {


private studentInfo info;
private studentFinances finances;
private studentRegistration registrations;
private studentLibrary libraryUse;
}

Each smaller class has a more direct responsibility


One aggregating class, if needed, to gather all the information
Don’t replicate the methods of the attributes to the “student” class
Allow others to get references to the specific objects instead

255
Open / Closed Principle

“’ Software entities ... should be open for extension, but


closed for modification.’” (https://en.wikipedia.org/wiki/SOLID)

Relates strongly to subclasses and inheritance:


Write classes expecting / hoping that others will extend it
- Better alternative than many others modifying your class
- Once the class is written, we hope to not change it much
Subclasses should add functionality rather than rewrite
methods from the superclass
- If you need to rewrite many methods then maybe you shouldn’t be
extending the class

256
Open

extends extends extends extends

Good and expected Bad since little of the parent is left

257
Closed

extends extends extends extends

Good and expected Bad since the parent shouldn’t change

258
Closed – Effect of poor use

extends extends

What this object


“sees” … now
seeing bits of the
peer extenders

259
Liskov Substitution Principle

“’ Objects in a program should be replaceable with


instances of their subtypes without altering the correctness
of that program’” (https://en.wikipedia.org/wiki/SOLID)

260
Java Collection Framework

Any code that accepts a


List should work fine if
passed an object of type
- AbstractList
- ArrayList
- Vector
- Stack

Image from https://en.wikipedia.org/wiki/Java_collections_framework, accessed September 18, 2018


261
Sample code

Code uses
private class cell {

protected Set<Character> possibleValues = new HashSet<>();

} Allows us to change our minds on the implementation
later without searching _all_ the code for the class
Rather than name to change.
private class cell {

protected HashSet<Character> possibleValues = new HashSet<>();

} Locks us in to one implementation.

262
Design by Contract

○ Preconditions cannot be strengthened by a subtype


○ You can’t expect more from the subclass than from the
superclass

○ Postconditions cannot be weakened by a subtype


○ The outcome of a subclass must be at least as dependable /
strong / reliable as the superclass

263
Design by Contract

○ Invariants of the supertype must be preserved in a


subtype
○ If we assert a property of the superclass then all its
subclasses must also have the property

○ History constraint: New or modified members of the


subclass should not modify the state of an object in
manner not permitted by the superclass.
○ If the superclass wouldn’t let you make a change then the
subclass shouldn’t suddenly allow the change

264
Liskov Substitution Principle

Ultimately leads to a bigger pattern called a “factory”


Objects are created by a factory class
- Determine the base class type in the factory
- Everybody else only uses the abstracted class type of the object
Only the rare instances that need the specific object type
are aware of the object’s base class

265
Factory Pattern Example in Java

Example from https://en.wikipedia.org/wiki/Factory_method_pattern


266
Interface Segregation Principle

“’ Many client-specific interfaces are better than one


general-purpose interface.’” (https://en.wikipedia.org/wiki/SOLID, attributed to Robert
Martin)

A general-purpose interface has a refactoring “code smell”

Languages like Java only let you extend one other


class
…but they allow you to implement many interfaces

267
Interface Segregation Principle

Complement to Liskov Substitution Principle


Design with interfaces
Don’t make catch-all interfaces

Complements Single Responsibility Principle


The interface should reflect a single responsibility, not many
responsibilities

268
Bad Interface Design
Big interface

implements implements
implements

Class 1 Class 2 Class 3

Means a stub / not used

269
Good Interface Design
Interface 1 Interface 2 Interface 3 Interface 4 Interface 5

implements implements

Class 1 Class 2 Class 3

Starting to have
too many
responsibilities?
270
Dependency Inversion Principle

“One should ‘depend upon abstractions, [not]


concretions.’” (https://en.wikipedia.org/wiki/SOLID, attributed to Robert Martin)

Use interfaces and abstract data types to create a


buffer between classes

271
Dependency Inversion Principle

class class

Classes are more tightly aware of


all of each others’ methods.

class interfaces class


implements
Classes just know the methods in
the interfaces. Provides more
isolation. implements

272
Non-Coding Example – e-mail addresses

Every Internet service provider (ISP) gives you an e-mail


address
[email protected]
[email protected]
If you give everyone your ISP address then you need to notify
everyone when you change ISPs
Like using the classes directly
Instead, have a generic e-mail address that you redirect to your
ISP address
[email protected]
When you change ISP, you change the redirection and nobody
else needs to know.
Generic e-mail address is like using an interface
273
Dependency Inversion Principle

Design using abstract data types


Leads to easier changes later
Ensures that we aren’t coding with specific class side-effects
in mind

274
Using SOLID

Developing a design is an iterative process


Start with some design
Consider some or all the design under a SOLID property
Adjust the design to improve the quality relative to that
property
Assess if any other property became significantly worse that
isn’t worth the trade-off
If the change is sufficient to keep and is ok on cohesion and
coupling then
- Keep the change and do another iteration
Otherwise
- Call the design complete
275
Student Information System

Purely fictitious example


To demonstrate an sample application of the SOLID
principles

Creating a system at Dal that manages student


information (and other information) at the university.

276
Student Information System – Mind Map
ArrayList
Program
HashSet ArrayList
Schedule Courses Payments

ArrayList
Registration Tuition Scholarships
Finances
Academic Student
record TA
Co-curricular Personal
Grad level record info
Payroll
Transcript
Address
Undergrad Web UI
level Name Employee
Engineering Mobile UI
upper level Personal
info

277
Student Information System – Single
Responsibility
ArrayList
Program
HashSet ArrayList
Schedule Courses Payments

ArrayList
Registration Tuition Scholarships
Finances
Academic Student
record TA
Co-curricular Personal
Grad level record info
Payroll
Transcript
Address
Undergrad Web U
level interaction Name Employee
Engineering
Mobile UI
upper level Personal
interaction
info

278
Student Information System – Open/Closed
ArrayList
Program
HashSet ArrayList
Schedule Courses Payments

ArrayList
Registration Tuition Scholarships
Finances
Activity Student
record TA
Academic
Grad level record Transcript
Payroll
Co-curricular Person
record
Undergrad Web U
level interaction Personal Employee
Engineering info
Mobile UI
upper level
interaction
Address
Name
279
Student Information System – Liskov Substitution
Principle
List
Program
Set List
Schedule Courses Payments

List
Registration Tuition Scholarships
Finances
Transcript
Activity Student
record TA
Academic
Grad level record
Payroll
Co-curricular Person
record
Undergrad Web U
level interaction Personal Employee
Engineering info
Mobile UI
upper level
interaction
Address
Name
280
Student Information System – Interface
Segregation
List
Program
Set List
Web UI Schedule Courses Payments
interface
List
Mobile UI Registration Tuition Scholarships
interface
Finances
Transcript
Activity Student
record TA
Academic
Grad level record
Payroll
Co-curricular Person
record
Undergrad
level Personal Employee
Engineering info
upper level
Address
Name
281
Student Information System – Dependency
Inversion
List
Program
Set List
Web UI Schedule Courses Payments
interface
List
Mobile UI Registration Tuition Scholarships
interface
Finances
Transcript
Activity Student
record TA
Academic
Grad level record
Trackable Payroll
Co-curricular activity Person
record interface
Undergrad
level Personal Employee
Engineering info
upper level
Address
Name
282
Student Information System – Dependency
Inversion
List
Program
Set List
Web UI Schedule Courses Payments
interface
List
Mobile UI Registration Tuition Scholarships
interface
Finances
Transcript
Activity Student
record TA
Academic
Grad level record
Trackable Payroll
Co-curricular activity Person
record interface
Undergrad
level Personal Employee
Engineering How is our cohesion? info
upper level How is our coupling?
Address
Name
Refine more original red?
283
Dependency

Relationship: “knows about”


Classes use other classes in Car
their implementation
- Passed a parameters
- Instantiated
- Returned by methods
Fuel station Road
The classes used do not
(usually) know about the
classes that use them
This is a unidirectional
relationship, e.g.,
- Fuel station does not know
about Car
- Road does not know about Car
Dependency

public class class1 {



}

public class class2 {

public void some_method( class1 c1 ) {



}
}

Class c2 depends on class c1

285
Aggregation
Relationship: “has a”
This is a stronger version of
dependency
Objects of one class contain Car
objects of another class
A class has to have instance
variable(s) that store objects
of the other class Engine Tires
E.g., a Car is an aggregation
of an Engine, Tires, and
other parts
Note: A class may use a Piston Spark Plug
collection to store multiple
objects
Aggregation

public class Car {


private Engine e;
private Tires t

}

Engine and Tires objects are attributes inside the Car class

287
Nested Classes

Relationship: “has a”
LinkedList
Class contains another
nested class
This is an aggregation Node Iterator
of classes rather than
objects
E.g. a LinkedList class
defines a Node class
within it
Nested

public class LinkedList {

private class Node {



}

}

Node is nested inside LinkedList

289
Inheritance

This is an “is a”
relationship
Between a more general Vehicle
class (superclass) and a
more specific class
(subclass) Car Truck
E.g.
- El Camino is a Car
- Porche is a Car El Camino Porche
- Car is a Vehicle
Inheritance

public class Vehicle {



}

public class Car extends Vehicle {



}

Car inherits everything from Vehicle

291
UML Relationship Symbols

Relationship Symbol Line Orientation Arrow Tip


Dependency Dashed To Open
Aggregation Solid From Diamond
Nested Class Solid From Circle-Plus
Inheritance Solid From Triangle
Composition Solid From Diamond
Interface Dashed From Triangle
Implementation
From https://www.uml-diagrams.org/software-licensing-domain-diagram-example.html, February 6, 2019
293
Image References

http://blog.nuvemconsulting.com/interviewing-tips-for-
software-requirements-gathering/
https://stevenwilliamalexander.wordpress.com/2015/07/31/no
n-functional-requirements-cart-before-horse/
https://www.teacherspayteachers.com/Product/Parts-of-
Speech-Printable-Posters-Noun-Verb-Adjective-Adverb-
218930
https://www.travel-palawan.com/palawan-dos-dont/
http://pengetouristboard.co.uk/vote-best-takeaway-se20/
295
What is software engineering?

Software engineering is the study and an application of


engineering to the design, development, and
maintenance of software.
https://en.wikipedia.org/wiki/Software_engineering, September 25, 2018

Software engineering often brings in an element of a


process that can be managed to consistently deliver
software that addresses a user’s needs with high
quality and productivity for [large | medium | small]
scale problems that may be in the presence of change.
Software development – what you want
Software development – what you often get
Software development – what you can choose
from
SE encompasses management of the process

Moments of brilliance
A lot can happen in the right
Productivity
time
Difficult to predict or plan for
Difficult to align a team

Time

Productivity More predictable results


Coding stars can feel stifled

Time
One size fits all for software engineering?

Entrepreneur
Corporate software
We are still seeking
out a market, so you The target market is
might choose to generally well-known
change the target of as are the key
the software quickly requirements
Web software
The market reach is
broad and you can
Hobbyist have trouble
Consulting identifying everyone
The focus is on the using it. Quick
challenging part and You are hired by a feedback and quick
the beautiful part specific client for a turnaround on
specific task. The changes to the
Real-time systems task could change or market.
adapt as the client
The task is well- learns more
defined, but failure is
not an option.
Styles of processes

Core systems development Web development today


30 years ago
Requirements can be mostly Functionality changes based on
identified up-front users’ reactions
The window of opportunity is Windows of opportunity open and
relatively large or closes far off close quickly
…We have the time to do lots of Quick audience attention and
planning in advance attrition
…We’re typically developing …We release bits incrementally
everything in-house …We’re developing for an existing
environment and re-using as much
as we can.
SE Methodology

SE focuses mostly on processes for achieving the


goals

SE separates the process of developing the software


from the product that is developed (the software itself)

Quality and productivity is governed by the people,


processes, and technology.
How to you traverse the elements of software
development?

Requirements Design Implementation Verification


Start here

Deploy and
maintain
here
Clients involved here
Plan-based development
eg. Waterfall method

Requirements Design Implementation Verification


Start here

Deploy and
Requirement Design Executable Proof of maintain
s document documents code test here
results
Iterative development
eg. Agile methods

Requirements Design Implementation Verification


Start here

Deploy and
Each row is an iteration. maintain
Working (but limited) here
code at the end of each Clients involved here
iteration.
Tradeoffs

Plan-based
+ You get a full picture, so you optimize the design and
implementation
- Errors detected late in the process are costly to undo
- The client sees little until everything is built
Iterative
+ Continual client feedback to adjust the direction
+ Not as much backtracking when requirement or design
errors are detected
- Feature creep
- Rewriting / reorganizing code (called refactoring) is more
commonly needed
Agile development
Characteristics of agile development

Not any one specific method


Emphasizes teamwork (self-organizing, independent smaller
groups) over management
Values individual creativity and motivation
Anticipates and welcomes change
Planning minimized through continual client engagement, short
goal-focused iterations leading to releases of working software
High internal standards of quality
Devoted to simplicity (in design and process)
Agile interaction

Business value Agile development


proposition development team team
http://agilemanifesto.org
Agile manifesto

What the Agile manifesto does not assert:


Don’t bother with any process
Write all your code from scratch
The code is all the documentation you ever need
Whatever the customer says to do, whenever they ask it, just
do it
You don’t need to have a plan
“If I’m ahead of schedule then I should just add more
features to justify not having to do documentation and
planning because more (working) code is better.”
Agile Iteration Overview
Called user
Identify what would be nice and
stories and
how you know when you’re done
acceptance tests

Estimate the value Estimate how hard Estimate how risky


of features something is something is

Decide on how to proceed Prioritize user


stories

Identify tasks to get the feature done


Track
performance
Identify thorough tests for the tasks (and velocity).
For each story
Adjust
Do and test the task expectations
and scope
Test all previous other tasks
Acceptance
Show the user that the feature is done tests pass
Agile development – User stories and acceptance
tests
How requirements evolve (devolve?) when you
aren’t careful?
What the client had in mind
What the client described
What the team heard
What the team assumed
Who the team expects the user to be / how to behave
What the team is interested in doing
User stories

A “proto-requirement” that drives the development


process
Very terse, often action oriented
Face-to-face elicitation of details with clients/users as
needed
Leads directly to a short-term development goal

Ensures that you’re delivering something of value to


the user
User stories

Answer WHO, WHAT, and WHY


Who needs/wants the functionality
What the functionality does in user- or client-centric terms
Why the functionality is important and/or what benefit is
derived from it

Why are these elements important?


User stories

Does not specify


How the functionality is delivered
- The algorithms are for the team to develop
What technologies are used behind the scenes to build the
solution
- All the end tools are irrelevant to the user. For example, the user
doesn’t care if you use a database or a flat file as long as the
functionality works
Where the solution appears in the product
- Assessing flow and usability is done separately
When the solution is built
- Prioritization is done separately
User story problems

Too high-level
Break it down
Describes a development task
Wrong “who”
Too specific in giving the method
Limiting the options and not focusing on the functionality
Reason not linked to generating value to the “who”
Creates a hodge podge of functionality
What makes a good user story

User stories should provide enough detail to make a


reasonable estimate about how long they should take
to realize.

User stories should lead directly to one or more


concrete (and often automated) acceptance tests that
will verify that the story has been realized.
What do you do with user stories?

User stories are prioritized in consultation with the


client.
Iteration Plans contain the user stories to be realized in
a given iteration, defines the scope of the iteration.
Iteration length varies, typically between 1 week and 1 month
What do you do with user stories?

Within an iteration, tasks are defined that, together,


realize the user story.
Some user stories could share common tasks.

User stories are given estimates for completion


Some use time estimates (1, 2, or 3 weeks)
Others use a rough “difficulty” scale (low, medium, hard)
Still others use a relative difficulty scale (1, 2, 3, 5)

An iteration then has a fixed number of work that can


be assigned (time, difficulty, or relative scale)
Use Cases / User Stories

Use case information at


ftp://ftp.software.ibm.com/software/rational/web/whitep
apers/RAW14023-USEN-00.pdf

Views differ on the format, length, and content of a use


case
All agree that they should be centred on what the user’s
experience is and be understandable to a user

324
Acceptance Tests

Contain more system detail, but remain client-facing


Often created with the client
Verify that the “WHAT” has been accomplished
Are validated by the client

Ensures that you deliver something useful and


complete to the user.
Acceptance Tests

Relate to actionable project engineering tasks, each


taking a few hours to a few days to complete
Target only what is required to fulfill a user story is
implemented and no more
The set of acceptance tests should cover all aspects of the
user story.
The set of acceptance tests should not cover elements that
aren’t asked for in the user story
Anatomy of an Acceptance Test

One template suggestion from the XP wiki web:

A [named user role] can [select/operate] [feature/function] so


that [output] is [visible/complete/…]

Should lead to a yes/no answer that everyone can


agree upon and verify easily.

In reality, the format of the test statement can vary.


Acceptance Test Guidelines

Thoroughly test one thing


Thoroughly: include multiple inputs, border cases, extreme
cases
One thing: some products may have > 1 feature to test.
Focus on one.
Clearly describe some aspect of the proper function of
the system.
Automatable (when possible) to use in regression
testing
Risk Management
Risk Management

What constitutes risk?


possibility of loss or injury
someone or something that creates or suggests a hazard
a : the chance of loss or the perils to the subject matter of an
insurance contract; also : the degree of probability of such loss
b : a person or thing that is a specified hazard to an insurer
c : an insurance hazard from a specified cause or source <war
risk>
the chance that an investment (as a stock or commodity) will lose
value http://www.merriam-webster.com/dictionary/risk, January 2016

It’s about the potential for loss of value


Kinds of risk

Business risk
Are we developing something of value?
Technical risk
Is there something that could go wrong with the technology
- in development?
- in deployment?
Personnel risk

Need to learn which risks can be managed and are


worth worrying about and which should be
acknowledged and dealt with later.
Risk Management

Usually performed

At the start of a project

At the beginning of major project phases

When there are significant changes


Steps for Risk Management

Risk identification

Risk analysis

Risk management planning

Risk review
Risk Identification

Hold a brainstorming session to consider

Weak areas
- Eg. Unknown technology

Aspects that are critical to project success


- Eg. The timely delivery of a vendor’s database software, creation of
translators, user interface that meets the customer’s needs

Problems have plagues past projects


- Eg. Loss of key staff, missed deadlines, error-prone software
Risk Analysis

Make each risk more specific.


“Lack of management buy-in” and “people might leave” are
too vague

Split the risk into smaller, specific risks

Set priorities
Risk Analysis – Overall Impact

What is the likelihood of the problem?


What is the cost if it arises?
How can we detect the problem?
How far in advance can we see the problem developing?
When might the problem arise?
Who is responsible to manage it?
Risk Analysis – Overall Impact

Risk Item Likelihood of Impact to the Priority (likelihood


occurrence project impact)
New operating system may be 10 10 100
unstable.
Communication problems over 8 9 72
system issues.
We may not have the right 9 6 54
requirements.
Requirements may change late in 7 7 49
the development cycle.
Database software may arrive 4 8 32
late.
Key people might leave. 2 10 20
Risk Management Planning

Risk Item Actions to Actions to Who should When should Status of


reduce reduce impact if work on actions be actions
likelihood risk occurs actions complete
New Test OS more Identify second Joe Nov 15, 2018
operating OS
system may
not be stable.
Communicati Develop Schedule training Cathy Nov 27, 2018
on problems system session, assign
over system interface developer to
issues document for implement
critical translation layers if
interfaces needed
We may not Build Limit initial product Phil Dec 20, 2018
have the right prototype of distribution
requirements UI
Risk Review

Review your risks periodically


Check how well mitigation is progressing
Change risk priorities as required
Identify new risks
Re-run the complete risk process if the project has
experienced significant changes
Incorporate risk review into other regularly scheduled
project reviews
Estimation in Agile projects

Incremental releases do not avoid cost overruns, but


cost issues (scheduling issues, scope issues) become
clear early in the project
An emphasis on complexity and customer engagement
should translate into more value for the customer
Estimation is embodied in the concept of velocity
Teams continually assess and revise their velocity measure
Iteration plans permit fine-grained estimation and feedback
Selection of features for an iteration plays a key role in
estimation and communicating costs with the customer
How do you do estimation?

Brainstorming and gathering techniques


Have all voices heard
Sometimes one person may see complexity or simplicity that
others don’t see
How do you do estimation?

As a group, order the stories by difficulty and then


assign complexity given the ordering
Individual estimates of the complexity
Rank 1-6; high, medium, low; or some other way
Compare results; talk about the complexity in extreme
situations
Play the dots game with them
Post all of the user stories
Give everyone a fixed number of red and green sticky dots
Have everyone place dots on cards for easy (green) or hard
(red) tasks; unmarked are medium
Rank the results by the number of dots
Monitor Your Estimates

Monitor the progress with which you complete tasks


Determine the consistency of your estimates
Determine a baseline for what your team can deliver

Measure your rate of completion to determine a team


velocity

343
Software Architecture

344
What can you tell from this shelter?

Could be in many places.


Not much can be concluded…

345
Images from Google images, March 4, 2019
Can you get a better sense of location or area
from architectures?

346
Images from Google images, March 4, 2019
Architecture Definitions

“The complex or carefully designed structure of


something”
http://www.oxforddictionaries.com/definition/english/architecture November 13, 2015

“The conceptual structure and logical organization of a


computer or computer-based
system”http://www.oxforddictionaries.com/definition/english/architecture November 13, 2015

In short, it is a common way that we have organized


our software as a big design. Knowing the architecture
tells you a bit about what to expect of the design and
347
the behaviour of the program.
Sample Architectures

348
Software Architectures

Ways to organize entire designs of programs


Structure architectures
Messaging architectures
Distributed system architectures
Adaptable system architectures
Shared memory architectures

Hybrid architectures

(Categorization from
https://en.wikipedia.org/wiki/List_of_software_architecture_styles_and_patterns )

349
Structure architectures

Monolithic
Layered
Pipes and filters

350
Monolithic Architecture

351
Monolithic Architecture

A structure in which all elements of a system are


combined in one unit
The application performs all tasks of a particular bit of
functionality
Self-contained and independent of all other
applications

352
Monolithic Architecture

Generally used for a system that isn’t exploiting


modularity.

Historically seen in systems that were trying to


optimize for performance by providing direct access to
all system elements.
Often resulted in unmaintainable systems because of the
resulting tight coupling and low cohesion.

353
Layered Architecture

354
Layered Architecture

Divide the responsibilities of the system into distinct


layers.
Lower layers tend to be closer to the hardware, system, or
resources
Upper layers tend to provide high-level abstractions and are
closer to interacting with the user
Layers only interact with the layer
immediately above or immediately
below it.
API definitions are critical.

355
Mac OSX

356
Computer Network Structure

357
Layered Architectures - Advantages

Easy to understand
Quick to locate where some tasks must be done
Allows us to change the implementation of one layer
without affecting the whole system
Strong cohesion in each layer and loose coupling
among layers

358
Layered Architectures - Disadvantages

Overhead in traversing layers when two far-apart layers


requires some interaction
Deploying a new program in an existing layer
framework may require you to create “empty” layer
implementations just to fit the model
Locks in a particular structure and size of program
Introducing many new responsibilities may require a
refactoring of the architecture

359
Layered Architectures

Common when the system


Has a big range of responsibilities
Spans from low-level details to high-level user concepts

360
Pipe and Filter Architecture

361
Pipe-and-Filter Architecture

A sequence of special-purpose
programs (filters) that pass data
from one to the other sequentially
to achieve a result

Characterized by having the filters be small and


specific.

Allows you to quickly rearrange the filters to allow for


quick development of solutions.
362
UNIX pipes

UNIX shells make it easy to take the screen output


(stdout) of one program and supply it as the keyboard
input to another (stdin)
Mechanism is called a pipe

Example: ls | sort | more


- ls produces the list of files in a directory
- sort will sort whatever data is provided to it
- more will show one screen of data and then pause until you have
read it all before showing the next bit
- the pipes “|” connect the input and output of the programs

363 For the UNIX astute, ls will already sort, so the example is a bit contrived
Business Process Workflows

Handling of data through a business process is often


modelled as a pipe-and-filter architecture
Basis of some business process modelling ( BPM )
languages

364
Data Stream Databases

Some new database systems let you do queries against a


stream of data
Stream data: never ending
sequence of values, so you
never have the complete
picture

Create an order in which


parts of the queries can be
answered efficiently
Use a pipe-and-filter
architecture to connect the
parts of the queries together
365
Pipe-and-Filter Architecture Implications

An ideal use of this architecture allows for parallelism


Each filter runs at its own pace
Each filter consumes data as soon as its available and sends
its output as soon as possible
Result is that one data item runs through the processing
quickly

Often used for big data applications where we can’t


afford to wait for all the data to do some processing

366
Pipe-and-Filter Architectures

Common when the system


Is handling data that arrives dynamically
Handles one data item without having much effect on later
data items, other than through summaries or aggregates
Wants to have flexibility in changing the transformations
done to the data
When we can order the types of transformations serially or
temporally

367
Messaging architectures

Event-driven
Finite state machines
Model-View-Controller
Publish-subscribe
Message queues

368
Event Driven Architecture

369
Event-Driven Architecture

A system has a list of functions to invoke each time that a


particular event happens

Creates a system that waits to react to the world


React to events as they happen
Everyone sees all the events
Provides a well-defined model on how and when to react

370
Event-Driven Architecture

Reacting to events is often guided by a finite state


machine.

The circles represent some state of the world


The edges represent how the world is allowed to
change
A lack of an edge between states says that a change isn’t
allowed
371
Finite State Machines

Some machines specify actions to take as you react to


changes

Moore Machine Mealy Machine

372
Model-View-Controller Architecture

An event-driven architecture developed for user


interfaces.
3 main components:
The model does all of the calculations and manages the
business rules
The view shows information to the user
The controller monitors for events
from the user, from the model, or
externally, and adapts the
behavior of the model and the
view accordingly

373
Publish-Subscribe Architecture

Similar to event-driven work, except that agents in the


system may only want to see a subset of events.
The architecture allows agents to ask to receive
(subscribe) only specific events.
Agents that generate events
only distribute them to other
agents who have asked for the
events.

374
Publish-Subscribe Architecture

Typically needs some framework to manage the


publishing and subscribing
Don’t want to build it on your own.
Early system called “Jabber” used pub-sub for a messaging
system

375
Message Queue Architecture

Agents interact by creating self-contained messages.


Leave the message for other agents in message
queues (mailboxes)
May have a queue for each agent or a common queue for
several agents
Agents retrieve
messages at a time
that is convenient
for them
(asynchronous
messaging)
376
Message Queue Architecture

377
Message Queue Architecture - Advantages

Simple
Portable
Agents handle data when it’s convenient for them
Long-lived style of communication, so available just
about everywhere

378
Message Queue Architecture - Disadvantages

May take a while for an agent to act on a message


Agents must remember to poll their queues

379
Distributed system architectures

Client-server
2-tier
3-tier (presentation, domain logic, data storage)
- Common for the web (client tier, web server tier, DBMS tier)
Peer-to-peer
Representational state transfer (REST)
Service-oriented
Precursor to cloud systems

380
Client Server Architecture

381
Client Server Architecture

Characterized by two different programs that


communicate across the network:
The server has the data or service to offer
The client wants the data or service

The server can speak with


multiple clients at the
same time.

382
Client Server Architecture

The server
Has all the data / services
Is always present on the network
Waits around to be contacted by clients
- Doesn’t initiate contact with clients
The client
Wants the data or service
- Has no data or service to offer to others
Initiates the contact with the server
Can come and go
- Is unreliable as far as the network is concerned
383
Client Server Advantages

Centralizes the data or services


Is easy to locate the service
Provides authoritative data by the server
Clearly defined roles for the client and the server

384
Client Server Disadvantages

Server risks being overloaded by too many clients


Use a distribution scheme across servers
Loss of opportunity when you shut down the server for
maintenance
Use a replication system to mitigate

385
3 Tier Architecture

A variant of the Client Server architecture


Add a third element which is a database to store
information
The client never accesses the database directly
Deploy with
The database server in a network not accessible to the
general Internet
The server in a protected area (called
a demilitarized zone)
The client in the general Internet

386
Peer to Peer Architecture

387
Peer to Peer Architecture

Any network entity can


Ask other network entities for data or services
Provide data or services to other network entities
Like every node is both a client and a server at the
same time

388
Peer to Peer Architecture - Advantages

Scales easily
Shares the workload across many computers
Reconfigures itself automatically as nodes come and
go

389
Peer to Peer Architecture – Disadvantages

Can be tricky to find which node has the data or


service you want
Often build a client-server element to act as an index
Not as certain of the authority of information provided
A node can disappear part-way through providing you
with data or service
More complex to program that client-server
architecture

390
REST Framework

391
REST Framework

Representational State Transfer


(REST) is more of a way of
interacting than a full architecture
Often call it a framework

At its core, it says that you have two entities


communicating. In that communication, neither entity
should need to store state about the other entity.
Every message should contain everything needed to answer
the message

392
REST Framework

Developed in the context of the web


Web servers shouldn’t need to remember information about
each client.
Greatly simplifies the web server

393
REST Framework

However, our interactions with the web appear to have


long-running state
Login sessions
Shopping carts
Recommendation systems

That state is managed outside the REST Framework


Cookies
Third tier of a database

394
Service-Oriented Architecture

395
Service Oriented Architecture

Start of the idea that you should be able to ask other


computers to do work for you.
Defines a system and protocols to:
Register your services so others
can find you
Find services that you want to
use
Invoke services on other computers

396
Service Oriented Architectures

Nice idea for computing


Didn’t catch on
Nobody was stepping up with hosting a good registry of
services
Matching services (and subsequent payments) for services
didn’t go as easily as expected
Deployed in smaller scales within company product
lines

Did lay the foundation for cloud computing


397
Cloud Architecture

398
Network “Architectures”

Cloud
Your entire program is run / serviced on someone else’s hardware

Infrastructure as a service (IaaS)


- eg. Amazon EC2, IBM Blue Cloud,
FlexiScale
Platform as a service (PaaS)
- eg. Google App engine
Software as a service (SaaS)
- eg. Salesforce.com, Google docs,
webmail
Cloud Computing

Definitions vary:
“A large-scale distributed computing paradigm that is driven by
economies of scale, in which a pool of abstracted virtualized,
dynamically-scalable, managed computing power, storage, platforms,
and services are delivered on demand to external customers over the
Internet” [Foster et al., 2008]
“A style of computing where scalable and elastic IT capabilities are
provided as a service to multiple external customers using Internet
technologies.” [Plummer et al., 2008]
“Cloud computing is a model for enabling convenient, on-demand
network access to a shared pool of configurable computing resources
(e.g. networks, servers, storage, applications, and services) that can be
rapidly provisioned and released with minimal management effort or
service provider interaction.” [Mell and Grance, 2010]
Cloud Computing Characteristics

Common elements:
Virtualization
- hardware can host many independent simulated servers
Multi-tenancy
- multiple clients can occupy the same physical hardware
Security
- clients are protected from each other and their data is secure
Elasticity
- resources can be added and removed in real-time, often at the request of
the client and without the intervention of the service provider
Cloud Computing Characteristics

Common elements:
Availability
- the service provider gives performance / QoS guarantees
Reliability
- failure of any piece still allows the services to be offered
Agility
- the resource allocations can adapt dynamically
Pay-as-you-go
- the client just pays for the resources used

Clients use the same way to get to the service no matter


how or where the service is deployed
Cloud Computing

Client
the network

Applications served over the Internet


“the cloud”
SaaS Application development environment
PaaS
IaaS Computing and storage resources

virtualization
Underlying hardware and host OS
Hybrid Architectures

404
Hybrid Architectures

Often, no single architecture fits what we want


System is big enough that sub-parts have different
characteristics
Some disadvantages of an architecture need to be overcome

We can combine architectures to create a hybrid


architecture
Use one architecture for the high level and
others for lower levels
Use two systems in parallel to complement
one another, each system with a different
405
architecture
Hybrid Architectures

Good hybrid architectures have the


elements complement one another
Eg. Client-server index for a peer-to-peer
system
The Mac OS that is layered and each layer has its own sub-
architecture (microkernel for the core OS)

Bad hybrid architectures lose the clarity


of the component architectures and can
add confusion

406
Microkernel Architecture

412
Microkernel Architecture

The kernel of the system is the core functionality that


you always keep running and loaded.
In a microkernel system, you keep this kernel as small
as possible and let all other functionality communicate
with and through the kernel.

413
Microkernel Architecture

Often used when there are security or resource


differences between the kernel and the other programs.
Allows parts of the system to start and stop
dynamically and independently.

414
Microkernel

Commonly seen in operating systems

415
Shared memory architectures

Blackboard
Rule-based

416
Blackboard Architecture

417
Blackboard Architecture

Used in artificial intelligence systems


Concurrent tasks or “agents” synchronously update a
common memory data storage (the blackboard) and
react to changes on the blackboard made by other
agents

418
Blackboard - Advantages

Simple way to have agents share information


Allows agents to operate independently and at their
own pace
Minimizes communication requirements between
agents
Not limited by the number of agents

419
Blackboard - Disadvantages

Requires careful synchronization as we update


blackboard data
Requires agents to poll the blackboard if they are
waiting for information
Adds complexity if we want categories of data to be
created dynamically
If categories remain fixed then we need to get the categories
correct from the start

420
Blackboard Architecture

Common when the system


Has many independent agents that must share data
Has well-defined categories of information to share
The timeliness of data for the agents isn’t critical
- Agents can keep working with old data until a new update arrives

Mostly seen in artificial intelligence applications

421
Rule-Based Architecture

422
Rule-Based Architecture

Define a system through a set of rules


Logical conditions and consequences
The system operates by fitting rules to a situation and
executing the consequence of the rule
The consequence can then trigger other rules

423
Rule-Based Architecture

Consists of
A set of rules (knowledge base)
A means by which we acquire data from users, agents,
or the environment
An inference engine that
matches new data against
the knowledge base

424
Rule-Based Architecture

The rule framework takes care of distributing and


searching through the rules.
Programming focuses on getting the right “rules”

Seen more in artificial intelligence systems and in


some business process management systems

425
Rule-Based Architecture - Advantages

Lets you focus on the business element (the rules)


Rules are often easier for a client to understand than
program designs
Self-adapting to new situations due to the inference
engine

426
Rule-Based Architecture - Disadvantages

Performance
Searching the knowledge base can take time
Hard to get a full picture of the whole operation from
just the rules
Need to have the right set of rules
Should ensure that we don’t have contradictory rules in
the system

427
Databases

428
What are we dealing with?

Data
Stored representations of objects and events that have meaning and
importance in the user’s environment
Information
Data that has been processed in such a way as to increase the
knowledge of the person who uses the data
Metadata
Data that describe the properties or characteristics of end-user data, and
the context of that data.
Eg: name, type, range restrictions on numeric data, …
Database management system (DBMS)
A software system that is used to create, maintain, and provide
controlled access to user databases.
429 Definitions from “Modern Database Management” by Jeffrey Hoffer, Mary Prescott, and Heikki Topi, 9th edition
Database basics

Concerned with entities and the relations between the entities.


An entity is a person, place, object, event, or concept in the user
environment about which the organization wishes to maintain data.
We will focus on relational databases
A database that represents data as a collection of tables in which all data
relationships are represented by common values in related tables.

430
Sample table of data
(from excel)
Term Subject Course Number
Section CRN Schedule Type
Monday Tuesday Wednesday Thursday Friday Begin Time End Time Building Room
201810 ACAD 20 1 18188 L W 935 1025 HALEY INSTITUTE 116
201810 ACAD 1050 1 18112 L M W 1005 1125 BANTING BUILDING 32
201810 ACSC 4703 1 14095 L T R F 1035 1125 CHASE BLDG 319
201810 ACSC 4720 1 14096 L M W F 1135 1225 CHASE BLDG 319
201810 ACSC 4950 1 18296 L
201810 AGRI 1000 1 14097 L T R 1005 1125 AGRICULTURAL COX INSTITUTE
24
201810 AGRI 1000 B01 14098 B F 1235 1425 AGRICULTURAL COX INSTITUTE
260
201810 AGRI 1000 B01 14098 B F 1235 1425 AGRICULTURAL COX INSTITUTE
261
201810 AGRI 1000 B01 14098 B F 1235 1425 AGRICULTURAL COX INSTITUTE
262
201810 AGRI 1000 B02 14099 B F 1435 1625 AGRICULTURAL COX INSTITUTE
260
201810 AGRI 1000 B02 14099 B F 1435 1625 AGRICULTURAL COX INSTITUTE
261
201810 AGRI 1000 B02 14099 B F 1435 1625 AGRICULTURAL COX INSTITUTE
262
201810 AGRI 4000 1 14102 L W 1735 2025 HALEY INSTITUTE 110

Regard data about specific entities as relations


Represent data as tuples where each component of a tuple has some
domain
- Eg. (201810, ACAD, 20, 1, 18188, …) where
201810 is a string that represents the term
ACAD is a 4 character string that is one of a set of course subjects
20 is an integer as the course number
1 is string
431
18188 is a 5 digit integer
Advantages of a DBMS

Program-data independence
Planned data redundancy (or removal of redundancy)
Improved data consistency
Improved data sharing
Increased productivity of application development
Enforcement of standards
Improved data quality
Improved data accessibility and responsiveness
Reduced program maintenance
432
Improved decision support
Disadvantages of a DBMS

New, specialized personnel


Installation and management cost and complexity
Conversion cost
Need for explicit backup and recovery
Organizational conflict

433
Some more terminology…

Schema
The structure that contains descriptions of objects created
by a user, such as base tables, views, and constraints, as
part of a database.
Catalog
A set of schemas that, when put together, constitute a
description of a database.

434
Schemas

External data model


The view of users of the database
- Some users may operate through a database view and not see all data.
Conceptual schema
A detailed, technology-independent specification of the overall structure
of the organizational data.
- Covers all external views of the data.
Internal schema
Logical schema
- The representation of a database for a particular data management
technology
Physical schema
- Specifications for how data from a logical schema are stored in a computer’s
secondary memory by a DBMS
435
Schemas – Relational Databases

Relations are stored in rows of tables.


Entity Relationships are represented by two rows in
different tables that share a common column value.

The schema includes a description of the tables, their


columns, and the data types of the columns.

436
Example table description - mysql

437
Mysql data types

Numeric Date and time


Int (4 bytes) Date
Tinyint (1 byte) Datetime
Smallint (2 bytes) Timestamp
Mediumint (3 bytes) Time
Bigint (8 bytes) Year
Float String
Double Char
Enum Varchar
Set Blob / Text
Tinyblob
Smallblob
Mediumblob
438 Longblob
Basic SQL operations

Insert
Query
“select” statement
Delete
Update

439
Insert basics

Insert into <table> (<column list>) values <tuples>

Omit (<column list>) when specifying all values


Insert into person values (NULL, "Jack", 30, 20000), (NULL,
"Kathy", 28, 25000);

Include <column list> if using the default values for all


other columns
Insert into person (name, age, salary) values ("Jack", 30,
20000), ("Kathy", 28, 25000);

440
Query basics

Focus on basic set operations


Set restriction with a predicate
- Structure of a single “select” command
Typed set union
- Joining of the outputs of two “select” commands
Typed set intersection
- Joining of the outputs of two “select” commands
Typed set difference
- Joining of the outputs of two “select” commands

441
Set restriction with a predicate

442
Basic select statement

Select <column list> from <table> where <column criteria>;


Output Input Predicate

Example:

select person_id, name, e-mail from person


where name = " Mike " and city = " Canmore ";

select * from person where city = " Halifax ";

443
Basic select statement

Use a proposition to identify which elements to select from the


set
Use a list of columns to identify what data to report from that
selection
Column selection

Where…

444
Select “from” element – input specification

Identify the source set


One table
Multiple tables
- Use all row combinations of the multiple tables
- Called “joins”
4 variants for later: inner join, outer join, left join, right join
- For tables a, b, c creates the set a X b X c
Fabricated tables from subqueries
- Use the output of one SQL query as the input table for another query
- More on subqueries later
Create short names / aliases for tables
Useful for duplicated tables or fabricated tables
445
Select “from” element

Examples:

… from person … single table alone

… from person as p … single table with alias

… from person, courses … two tables

… from person as p, courses as c … two tables with


aliases

446
Select column list – output specification

Identifies what to return from the query


Could be
A list of column names
- Just the name, if unique
- TableName.ColumnName or TableAlias.ColumnName if not unique
*
- Specifies all table columns
- Can be TableName.* or TableAlias.*
Transformations of columns
Added keywords
- Eg. DISTINCT

447
Can name outgoing columns
Select column list

Examples

select name, age from person where …

select * from person where …

select name as Full_Name from person where …

select person.name, course.name as course from


person, course where …

448
Select column list - transformations

Avg()
Count()
Min()
Max()
Std()
Variance()
Sum()
Format()

449
Select column list - transformations

Concat()
Lcase() or lower()
Ucase() or upper()
Left(), Right, or Mid()
Length()
Ltrim(), Rtrim(), Trim()
Lpad() or Rpad()

450
Select transformations

Examples

select count( name ) from person where …

select avg( age ) from person where …

select concat( name, " – ", age ) from person where …

select sum( fees ) from registration where …

select max( salary ) from person where …


451
Select “where” – selection predicate

Identifies which rows to keep from the input


Uses
Maintain the relation between tables
- Where person,person_id = registration.person_id
Select particular elements
- Where name = " Doug "
Allows for Boolean operators
…. And ….
…. Or ….
Not ….
Use parentheses to help with the Boolean logic
452
Selection predicates

Standard comparators
=, !=, <>, >, <, >=, =<, !<, !>
Numeric ranges – “between”
Select name from person where salary between 32000 and 50000
Set inclusion – “in”
Select person_id from registration where course_id in (1, 2, 3)
Select distinct person_id from registration where course_id in (1, 2, 3)
Near matches – “like”
% matches 0 or more characters, _ matches 1 character
Select name from person where name like “C%”
Works on numbers too: select * from person where salary like “3%”
NULL check – “is null”
453
Additional “select” specifications

Order by <column list> [ASC | DESC]


Allows you to sort the data
Group by <column list>
Collects similar records for aggregation transformations like count or
sum
Group by <column list> having <clause>
Like “group by” but lets you select a subset of groups
Limit n
Report only the first n records
“limit” for mysql, “top” for some other systems
Distinct
Only provide unique rows of output
Duplication can happen when you’re reporting a subset of columns
454
Database lab

7
2996

110

23

326

122
7 273

Sample database in csci3901 from


455 http://www.mysqltutorial.org/mysql-sample-database.aspx
Examples

Select * from person order by name;

Select name, count(person_id)


from course, registration
where course.course_id = registration.course_id
group by course_id;

Select name, count(person_id)


from course, registration
where course.course_id = registration,course_id
group by course.course_id
456 having count(person_id) > 2;
Examples

Select name, count(person_id) as size


from course, registration
where course.course_id = registration.course_id
group by course.course_id
order by size descending
limit 1;

457
Typical join “shortcut”

Often join tables on a common set of columns and then


ask to only show that common column once.
Currently shown using the “from” clause, the “where”
clause, and the choice of columns

Modify the “from” clause to a “from ... join .. using (..)”


clause:
“join” replacing the comma in the from clause
“using” lists the columns to have in common
The variant only keeps one copy of all the columns
mentioned in the “using” part
458
Natural Join

When your join is going to use all common columns,


you can use the ”natural join” operator

Equivalent to join..using ( <all common column names> )

459
Example

Select course.course_id, name, person_id, fees


from course, registration
where course.course_id = registration.course_id;

becomes

Select *
from course join registration using (course_id);

or

460 Select * from course natural join registration;


Different kinds of join operators

Consider 4 join operators:


Inner join
- Returns records that have matching values in both tables
- Equi-join – inner join where the join condition is based on equality between
values in the common columns
- Natural join – inner join that restricts all same-named columns to match and
produces one instance of the common columns
Left join
- Return all records from the left table, and the matched records from the right table,
adding NULL values when a match isn’t present
Right join
- Return all records from the right table, and the matched records from the left table,
adding NULL values when a match isn’t present
Outer join
- Return all records when there is a match in either left or right table
461
Join examples
sample1 sample2

Equi-join

Natural join

462
Join examples
sample1 sample2

Left join

Right join

463
Join conditions

Outer joins often invoke “is null” or “is not null” in the
where condition to filter the results
sample1 sample2

What does the following produce?


Select sample2.id from sample1 right join sample2 on
sample1.id = sample2.id where sample1.id is null;
All ids in sample2 not in sample1

464
Query Execution

The DBMS creates an execution plan from your SQL


select statement
Identifies the order in which to do evaluations
- From: combine small tables first
- Where: apply the most restrictive conditions first
Query optimizer can re-order elements of the query to
increase its performance
- Estimate the size of the tables to be combined to help manage the
total work

How you specify your query can influence


performance.
465
Costs for select statements

Column selection
Identifies what to keep as you process. Low cost, unless you’re using
transformations
From clause
Generates combinations of records. High cost if you’re generating many records
that you will just throw away
Where clause
Does winnowing as you process. Can have complex sets of conditions to
evaluate. Medium cost
Group by clause
Need the final data to make this work. Throwing away most of the generated
records to create summaries.
Order by clause
Need the final data to execute. Cost relative to output size, not generated records
size.
466
Helping performance

1. Minimize the size and number of table combinations


in the ”from” clause
How??? We need a new tool -> subqueries
2. include restrictive “where” elements if you can

467
Are select statements unique?

468
Subqueries

The output of an SQL statement is a table.


Use that output table in the place of any other table in a
query.
Enclose the subquery in parentheses
Need to use the “as” keyword to give the output of the
subquery a name

Use a subquery to reduce the size or number of tables


to combine in the “from” clause

A subquery can have its own subquery


469
Subquery example

Select *
from person join registration using (person_id)
where person_id = 3;

versus

select *
from
(select * from person where person_id = 3) as interested
join registration using (person_id);

Is there a difference?

470
Subquery example

select *
from person as p, course as c, registration as r
where p.person_id = r.person_id
and c.course_id = r.course_id
and p.name like "A%"; How many rows
does each query
versus create?

select * from
(
(select * from person where name like "A%") as p
join registration using (person_id)
) join course as c using (course_id);
471
Subqueries

Can also appear in the “where” clause


Extract one number for a comparison

select name, salary from person


where salary >= (select avg(salary) from person);

Extract a set of values for an “in” statement (waiting for mysql to


catch up to this functionality)

select name, salary from person


where salary in
(select distinct salary from person order by salary desc limit
2);
472
Find everyone within 1 standard deviation of the
average age – shorter version from class
Select name, age from person where age between
(select avg(age) – std(age) from person)
and
(select avg(age) + std(age) from person);

473
Combining outputs of queries

(select …) union (select …)


(select …) intersect (select …)
(select ...) except (select …)

Use at the top-level query or in subqueries

The columns produced by the pair of select statements


must be the same.

494
Union example

(select name from person where salary >= 30000) union


(select name from person where age <= 20);

Could also be done with a “where” clause

select name from person where (salary >= 30000) or


(age <= 20);

“union” often clearer when the where conditions


become complex
495
Intersection in mysql

Intersection doesn’t exist in mysql


Simulate in mysql using inner join + disctinct:

select distinct <column list> from <t1> join <t2> on <join


criteria>

Simulate in mysql using where..in clause:

select id from t1 where t1.id in (select id from t2);

496
Intersection Example

(select name from person where salary >= 30000) intersect


(select name from person where age > 20);

Simulate in mysql using inner join + distinct:

select distinct name from (select name from person where


salary >= 30000) as s1 join (select name from person where age
> 20) as s2 on s1.name = s2.name

497
Except / minus in mysql

Except or minus keywords don’t exist in mysql

Simulate in mysql using left join

select id from t1 left join t2 on t1.id = t2.id where t2.id is


null;

498
Except example

(select name from person where salary >= 30000)


except (select name from person where age > 20);

Simulate in mysql using left join

select name from (select name from person where


salary >= 30000) as s1 left join (select name from
person where age > 20) as s2 on s1.name = s2.name
where s2.name is null;

499
Views

A view creates an abstraction of rows from one (or


more) tables
Can be all rows or a subset of them
Allows queries to use the view like a table
Shortens the syntax of some tables
Allows re-use of common table joins and restrictions
Allows individuals to see only the data that is relevant to
them (or permitted for them to see).

500
View syntax

Create view <viewName> as <select statement> [with


check option]

<viewName> can then be used as a table in queries.


Including the “with check option” designation means
that any updates requested through the view will check
the where statement clauses before happening

Delete the view with

501 drop view <viewName>


View example
Using the sales database from last week’s lab
create view London_Employees as select * from employees
where officeCode = 7;

select * from London_Employees;

drop view London_Employees;

create view NA_Employees as select employees.*, territory from


employees natural join offices where officeCode in (select
officeCode from offices where territory = "NA");

select * from NA_Employees;

502 drop view NA_Employees;


View example

What if I wanted to create a view where an employee


only sees the employee records of people in the same
territory as them?

Helper: user() is a function that returns the login name of the


individual who is running the query.

503
Just the tip of the SQL iceberg

Other functionality to be aware of:


Case statements
- Allows ”if…then” functionality in queries to change behavior
Variables
- Set @var = <expression>
- Select @var := <column> from …
For/while/repeat statements
- Allows looping over the results of a query within SQL
With statements
- Allows you to pull subqueries out of the main query and to not repeat the
subquery text

504
Just the tip of the SQL iceberg

Other functionality to be aware of:


Stored procedures
- Keeps a sequence of SQL commands in the DBMS that you can invoke with
one command
Gives flexibility, efficiency, shareability, applicability to more
than one database
Triggers
- SQL to run before, after, or replacing specific commands to the database

505
Case statement

Format case [when…then…]+ [else …] end

Example
Select city, case when territory = "NA" then "North America"
else territory end as Territory from offices;

506
With statement example

Represent
select EmployeeID, FirstName, LastName
from employees
where EmployeeID in (select distinct ReportsTo from employees);

as
with supervisorIDs as
( select EmployeeID, FirstName, LastName )

select EmployeeID, FirstName, LastName


from employees
where EmployeeID in supervisorIDs;

507
Changing records

Use the “update” command:


Update <tablename> set [<column>=<value>]+ where …

Can set the value of multiple columns at the same time

Same “where” understanding as in select


- Can use select subqueries to give a list

Values to set can be relative to the current value


- Use the column name in the value clause
- Will vary by row matched

508
Removing records

Use the “delete” command:


Delete from <tablename> where …

Same “where” understanding as in select


- Can use select subqueries to give a list

509
CRUD operations

Create
Insert into … values …
Read
Select … from … where …
Update
Update … set … where …
Delete
Delete from … where …

510
Effect of timing

By default, MySQL operates in “auto commit” mode


Each statement is stored in the database as you write it.

There may be times when you need 2 (or more) statements to be


done together or not at all to avoid conflicting information in the
database:
The two updates might both be needed, but others may be changing the
database at the same time as you
- Eg. Change provincial and federal sales tax at the same time
Don’t want an invoice with inconsistent tax levels
If the second statement fails then you don’t want the first statement done
You’re trying out a change and may want to discard it if the process isn’t
as you expected.
511
Transactions

A transaction is a construct where all SQL commands


in the transaction are either have all done or have none
done
Need to take the database out of ”auto commit” mode
Identify the start and end of the group of statements
Start:
- Start transaction
End:
- Commit – put all the outputs into the database
- Rollback – discard all the work of the transaction

512
ACID properties – key for a DBMS to maintain

Atomic
The transaction cannot be subdivided. It is either complete
done or no part is done.
Consistent
Any database constraint / property / relation that existed
before the transaction must also exist after the transaction
Isolated
Changes to the database are not revealed to users until the
transaction is committed
Durable
Changes are permanent
513
ACID properties

The ”transaction” model of SQL lets the DBMS manage the


ACID properties
Require some locking and failure handling in a DBMS with concurrent
users/transactions
Tricky to ensure over distributed or federated databases

Transaction designers must also be thinking of the ACID


properties as they choose which commands to include
Must ensure that all business constraints are consistent at the end of the
transaction

514
Describing the database

515
Entity Relation Diagrams (ER Diagrams or ERD)

A representation of the tables in a database and the relations


between the tables
Derived from Entity Relation Models (ERM - business models)
ERD and ERM may be used synonymously in some contexts

Includes
Entities – a person, place, object, event, or concept in the user
environment about which the organization wishes to maintain data
Relations – a meaningful association between or among entities

516
Entities

Aim for:
A singular noun – helps to keep it to a single concept
Something specific to the organization
Something concise
Named for a result/artefact, not a process or procedure

Comprised of attributes

517
Attributes

A property or characteristics of an entity or relationship


type that is of interest to the organization
Required attribute – must always have a value for every entity
Optional attribute – may not have a value for every entity

Atomic attribute – an attribute that cannot be broken down


into smaller components that are meaningful to the
organization
- Eg. age
Composite attribute – an attribute that has meaningful
component parts
- Eg. Address has city, province, …
518
Attributes

A property or characteristics of an entity or relationship


type that is of interest to the organization

Multivalued attribute – an attribute that may take on more than


one value for a given entity
- Eg. Set of skills for a person
- Named in {…} brackets
Derived attribute – an attribute whose value can be calculated
from related attribute values
- Named in […] brackets

519
Attributes

A property or characteristics of an entity or relationship


type that is of interest to the organization

Identifier – an attribute (or combination of attributes) whose


value distinguishes instances of an entity type
- Become our primary keys in the database
Composite identifier – an identifier that consists of a
composite attribute
- Become composite keys

520
ERM symbols – entities
strong entity weak entity associative entity
Table Name Table Name Table Name

Column Column Column


names names names

PK – primary key (for this table)


FK – foreign key (primary key of another table)
Blank – not a key

Strong entity – stands on its own


Weak entity – doesn’t make sense if alone
Associative entity – associates the instances of one or more entity types and
521 contains attributes that are peculiar to the relationship
Relations

Degree – number of entity types that participate


Unary – one
Binary – two
Ternary – three

Cardinality
The number of entities that participate in the relation

522
ERM symbols – relations

Optional one

Mandatory one

Optional many

Mandatory many

Typically connect the relation edges between the


matching primary and foreign keys

523
Database lab

7
2996

110

23

326

122
7 273

Sample database in csci3901 from


524 http://www.mysqltutorial.org/mysql-sample-database.aspx
Reverse-
engineered by
MySQL
Workbench

525
Data Modeling

Defines
which entities you have,
how they are grouped,
the relation between entities, and
the cardinality of the relations.

Come from your analysis of the business


Derived from explicit and implicit business rules

526
Creating tables

Create table [if not exists] <tablename> (


<columnName1> <datatype>,
<columnName2> <datatype>
);

Modifiers for after the data type:


Not null – prevent NULL values from being stored
Default X – set default value to X on inserts
Auto_increment – designates an increment field for
surrogate keys (but doesn’t automatically make the column
an key)
527
Relational keys

Primary key
An attribute (or combination of attributes) that uniquely
identifies each row in a relation
- The choice of primary key may not be unique

Composite key
A primary key that consists of more than one attribute.

528
Relational keys

Foreign key
An attribute in a relation that serves as the primary key of
another relation in the same database.

Surrogate key
A serial number or other system assigned primary key for a
relation
Often created to replace
- a complex or highly-composite primary key
- an expensive primary key (often big strings)
- a primary key that could be re-used over time

529
SQL Context

Endings to the “create table” command:


Primary key (<id> [, <id>, <id>, …] ) – defines the primary key
of the table, basic or composite
Foreign key (<id>) references <table> (<key>) – defines field
“id” as a foreign key in the current table that maps to primary
key <key> in table <table>
Check <field condition> -- ensures that data meets a criterion
- Eg field condition could be: Age >= 20
to ensure that all ages are 20 or more in the table.

Can give these endings a name:


- Constraint <name> <ending from above>
530
SQL example

Create table sample3 (


id int not null,
name char(10),
primary key (id)
);

create table sample4 (


info int not null,
value char(10),
id int,
primary key (info),
foreign key (id) references sample3 (id)
);
531
SQL context

Eg.

Create table employees (


employeeNumber int not null,
lastName char(50) not null,
firstName char(20) not null,
age int,
primary key (employeeNumber),
check age >= 15
);

532
Deleting tables

Drop table <tablename>;

533
Converting to a database

Relation – a named two-dimensional table of data


Properties of relations
Each relation has a unique name.
An entry at the intersection of each row and column is
atomic. There are no multivalued attributes.
Each row is unique.
Each attribute / column within a table has a unique name
The sequence of columns is insignificant.
The sequence of rows is insignificant.

534
Conversion steps

1. Map regular entities


2. Map weak entities
3. Map binary relations
4. Map associative entities
5. Map unary relations
6. Map ternary or more complex relations

535
Map regular entities

Basic attributes become table columns


Composite attributes only have their subcomponents stored
Eg. “address” is composite, so we would store street address, city,
province, country, postal code individually but not ”address” itself.
Multivalued attributes
Create a second table that lists the primary key of the first table and one
value of the multivalued attribute
- Eg. Employee with many skills:
Create the employee table and a second table called skills.
A row in the skills table contains an employee id and a skill.

536
Multivalued Attributes

Example: the employee table is to include a set of skills


for each employee. The number of skills is varied and
can grow.

Solution 1: Create just one table that can list the skills:
employee employeeSkill
employeeID employeeID One employee can have many
rows in the employeeSkill table.
skillName
No quick way to ensure that we’re
typing the skill names
consistently.

537
Multivalued Attributes

Example: the employee table is to include a set of skills


for each employee. The number of skills is varied and
can grow.

Solution 2: Create a table of skills then join the two


employee employeeSkill skill
Everyone with
employeeID employeeID skillID the same skill
references the
skillID skillName same ID in the
skill table.

538
Map weak entities

Recall that a weak entity is an entity that does not exist


as an independent concept.
Eg. orderDetail in the lab database

Create a table for the weak entity and include the


primary key of the primary entity as a foreign key

Often use surrogate keys for weak entities

539
Map binary relations

PK id1 Include the primary key of one


PK info1
table as a foreign key in the
other table.

Which table has the foreign key


FK info1 is context-dependent.

PK id1 Include the primary key of the


PK info1
single element table as a foreign
key in the multiple element table.

FK id1

540
Map binary relations

PK id1 Create an intermediate table


PK info1
with the primary keys of both
tables.

The key for this intermediate


table is the set of both foreign
keys.

PK id1 FK id1 PK info1


FK info1

541
Binary relations example

Business side: each order will have at most one payment and we don’t allow
payments to cover more than one order

orders payments
orderID paymentID Note: 2 different ways to
declare the primary keys.
Both are ok.

orderID

create table orders (orderID int not null auto_increment primary key);

create table payments (paymentID int not null auto_increment,


orderID int not null,
primary key (paymentID),
foreign key (orderId) references orders (orderID) );
542
Binary relations example – alternate solution

Business side: each order will have at most one payment and we don’t allow
payments to cover more than one order

orders payments
orderID paymentID Alternatively, have the
paymentID in the order, but
allow it to be NULL for the “no
payments” option (less
paymentID desirable solution, but still
works).
create table payments (paymentID int not null auto_increment);

create table orders (orderID int not null auto_increment,


paymentID int,
primary key (orderID),
foreign key (paymentID) references payments (paymentID) );
543
Binary relations – example one-to-many

Business side: an employee is assigned to at most one home office

offices employees
officeID The foreign key side is on the
employeeID
“many” table.

Foreign key names don’t need to


match.
myOfficeID

create table offices (officeID int not null auto_increment primary key);

create table employees (employeeID int not null auto_increment,


myOfficeID int not null,
primary key (employeeID),
foreign key (myOfficeID) references offices (officeID) );
544
Binary relations – example many-to-many
Business rule: customers have one or more employee contacts and
employees look after multiple customers

employees customers
employeeID customerID

employees contacts customers


employeeID employeeID customerID
customerID

545
Binary relations – example many-to-many
Business rule: customers have one or more employee contacts and
employees look after multiple customers
employees contacts customers
employeeID employeeID customerID
customerID

Create table employees ( employeeID int not null auto_increment primary key);
Create table customers (customerID int not null auto_increment primary key);
Create table contacts ( employeeID int not null,
customerID int not null,
primary key (employeeID, customerID),
foreign key (employeeID) references employees (employeeID),
foreign key (customerID) references customers (customerID ) );

546
Map associative entities

Similar to mapping regular entities

547
Map unary relations

Include the primary key of a table as a foreign key (with


a different name) in the same table
Eg. “reportsTo” field in the employee table of the lab
database.
Sometimes called a recursive foreign key

548
Map ternary relations

Create intermediate tables as in the binary many-to-


many relations.

The intermediate table captures the primary keys of all


the entities in the relation

549

You might also like