0% found this document useful (0 votes)

104 views

Assignment 1

This document provides instructions for a two-stage programming assignment involving building a dictionary data structure using a binary search tree. In Stage 1, students will build the dictionary from a dataset of NYC taxi trip records, storing pickup datetime as keys. They will perform lookups based on input keys and output matching records. In Stage 2, students will write a function to search the dictionary by pickup location ID and output all matching keys in temporal order. The number of comparisons for each search must be reported. The goal is to practice C programming skills like dynamic memory allocation and analyze how data structure choice affects algorithm performance.

Uploaded by

Adam Master

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

104 views

Assignment 1

Uploaded by

Adam Master

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

COMP20003 Algorithms and Data Structures

Second (Spring) Semester 2019

[Assignment 1]
Taxi & For-Hire Vehicle Trip Dataset
Information Retrieval using Binary Search Trees
Handed out: Friday, 23 of August
Due: 8:00 AM, Monday, 9 of September

Purpose
The purpose of this assignment is for you to:

• Increase your proficiency in C programming, your dexterity with dynamic memory allocation
and your understanding of linked data structures, through programming a dictionary.

• Increase your understanding of how computational complexity can affect the performance of an
algorithm by conducting orderly experiments with your program and comparing the results of
your experimentation with theory.

• Increase your proficiency in using UNIX utilities.

Background
A dictionary is an abstract data type that stores and supports lookup of key, value pairs. For example,
in a telephone directory, the (string) key is a person or company name, and the value is the phone
number. In a student record lookup, the key would be a student ID number and the value would be a
complex structure containing all the other information about the student.

A dictionary can be implemented in C using a number of underlying data structures. Any implemen-
tation must support the operations: makedict a new dictionary; insert a new item (key, value
pair) into a dictionary; search for a key in the dictionary, and return the associated value. Most
dictionaries will also support the operation delete an item.

Your task
In this assignment, you will create a simple instance of a dictionary, and we’ll use it to look up infor-
mation about for-hire vehicle trips in New York City.

There are two stages in this project. In the first stage you will code a dictionary in the C programming
language, using a binary search tree as the underlying data structure. You will insert records into this
dictionary from a file, and look up records by key. In the second stage, you will code additional func-
tions to retrieve information from this dictionary. You will use a Makefile to direct the compilation
of two separate executable programs, one for Stage 1 and one for Stage 2.

In both stages of the assignment, you will report on the number of key comparisons used for search
and analyse what would have been expected theoretically. The report should cover each file used to
initialize the dictionary.

You are not required to implement the delete functionality.

1
Stage 1 (7 marks)
In Stage 1 of this assignment, your Makefile will direct the compilation to produce an executable
program called dict1. The program dict1 takes two command line arguments: the first argument is
the name of the data file used to build the dictionary; the second argument is the name of the output
file, containing the data located in the searches. The data file consists of an unspecified number of
records, one per line, with the following information:

VendorID - Code to indicate the vendor which produced the record

passenger count - Number of passengers
trip distance - Length of the trip in miles
RatecodeID - Code to represent the fare rate for the trip
store and fwd flag - Indicates whether trip records were stored locally
PULocationID - TLC Taxi Zone where passengers were picked up
DOLocationID - TLC Taxi Zone where passengers were dropped off
payment type - Code to indicate payment type (e.g., cash)
fare amount - Fare for the trip in USD
extra - Extra charges in USD
mta tax - MTA tax in USD
tip amount - Tip in USD
tolls amount - Tolls in USD
improvement surcharge - Improvement surcharge in USD
total amount - Total cost of the trip in USD
PUdatetime - Date/time passengers were picked up
DOdatetime - Date/time passengers were dropped off
trip duration - Duration of the trip in minutes

This data comes from a publicly-available dataset released by the New York City Taxi & Limousine
Commission. More information about the dataset can be found at:
https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page

The field <PUdatetime> is an alphabetic string representing the date and time of the taxi trip in
the format YYYY-MM-DD HH:mm:ss (year-month-day hour:minute:second). The other columns can be
treated simply as the associated <data> field. Build a data structure of strings to save the associated
data collected about each taxi trip. The maximum size that any string can be is 128 characters. Each
string is separated by a comma “,”. This is a standard csv format where the delimiter used is a comma.

The <PUdatetime> field will serve as the dictionary key, so the records will be sorted in temporal
order. Note that because the datetime information is stored in lexicographical order, the values can be
compared as strings (e.g., with strcmp()) to determine which trip is earlier/later. The <data> is the
information sought during lookup.

In this assignment the search keys are not guaranteed to be unique – there are instances where mul-
tiple taxis pick up passengers at exactly the same day and time. You should handle duplicates by
implementing a linked list for items with same key.

For the purposes of this assignment, you may assume that the input data is well-formatted, that the
input file is not empty, and that the maximum length of an input record (a single full line of the csv
file) is 256 characters. This number could help you to determine the buffer size to use when reading
the file.

In this first stage of the assignment, you will:

• Construct a binary search tree to store the information contained in the file specified in the

2
command line argument. Each record should be stored in a separate Node.

• Search the binary search tree for records, based on their keys. The keys are read in from stdin,
i.e. from the screen.

For testing, it is often convenient to create a file of keys to be searched, one per line, and redirect
the input from this file. Use the UNIX operator < to redirect input from a file.

• Examples of use:

– dict1 datafile outputfile then type in keys; or

– dict1 datafile outputfile < keyfile

• Your program will look up each key and output the information (the data found) to the output
file specified by the second command line parameter. If the key is not found in the tree, you
must output the word NOTFOUND.

The number of key comparisons performed during both successful and unsuccessful lookups
should be written to stdout.

• Remember that the entries in the file do not necessarily have unique keys. Your search must
locate and output all the data found for a matching key.

• Example output:

– output file (information):

2018-12-15 01:49:13 --> VendorID: 1 || passenger count: 1 || trip distance: 1.9

|| RatecodeID: 1.0 || store and fwd flag: 0 || PULocationID: 79 || DOLocationID: 234 ||
payment type: 1 || fare amount: 9.5 || extra: 0.5 || mta tax: 0.5 || tip amount: 2.15
|| tolls amount: 0.0 || improvement surcharge: 0.3 || total amount: 12.95 || DOdatetime:
2018-12-15 02:00:00 || trip duration: 10 ||
2018-12-15 01:49:13 --> VendorID: 1 || passenger count: 1 || trip distance: 0.6
|| RatecodeID: 1.0 || store and fwd flag: 0 || PULocationID: 79 || DOLocationID: 114 ||
payment type: 1 || fare amount: 5.0 || extra: 0.5 || mta tax: 0.5 || tip amount: 1.00
|| tolls amount: 0.0 || improvement surcharge: 0.3 || total amount: 7.35 || DOdatetime:
2018-12-02 01:53:38 || trip duration: 4 ||
1901-11-06 12:03:14 --> NOTFOUND

– stdout (comparisons):

2018-12-15 01:49:13 --> 423

1901-11-06 12:03:14 --> 401

Note that the key is output to both the file and to stdout, for identification purposes. Also note that
the number of comparisons is only output at the end of the search, so there is only one number for
key comparisons per key, even when multiple records have been located for that key.

The format need not be exactly as above. Variations in whitespace/tabs are permitted. The number of
comparisons shown above was made up; do not take it as an example of a correct result.

3
Stage 2 (2 marks)
In Stage 2, you will code a function which takes a taxi zone ID number as as input and returns to the
output file all of the <PUdatetime> keys from records which match the <PUlocationID>, using
in-order tree traversal. The keys should be output in sorted temporal order (that is, earlier records
should be printed first). If no records with the requested <PUlocationID> exist in the database,
this function should write the string NOTFOUND to the output file. As in Stage 1, the the number of
comparisons made during the search should be written to stdout.

The <PUlocationID> is an unsigned integer between 1 and 265 which indicates where the taxi
picked up passengers. You can find maps of the zones at the dataset website linked above, but you
do not need these maps for the assignment – you can treat the zone as simply an integer. You may
store the <PUlocationID> as a separate field in your struct, or you can check for the matching
<PUlocationID> inside the <data> field. As in Stage 1, you should handle duplicate keys by im-
plementing a linked list for items with same key. Note that this means there may be more than one
matching <PUlocationID> for a single key. If this is the case, the key should be output multiple
times to reflect the number of matches.

You should compile your code using a Makefile to produce an executable program called dict2.
The program dict2 takes two command line arguments: the first argument is the name of the data
file used to build the dictionary; the second argument is the name of the output file, containing the
data located in the searches. You may reuse your record insertion code from Stage 1 to build the
dictionary from the datafile in Stage 2.

• Examples of use:

– dict2 datafile outputfile then type in location IDs; or

– dict2 datafile outputfile < idsfile

• Example output:

– output file (information):

79 --> 2018-12-08 19:36:57

79 --> 2018-12-08 21:22:08
79 --> 2018-12-15 01:49:13
79 --> 2018-12-15 01:49:13
79 --> 2018-12-23 17:26:42

– stdout (comparisons):

79 --> 1528

The number of comparisons shown above was made up; do not take it as an example of a correct result.

Experimentation (4 marks)
You will run various files through your program to test its accuracy and also to examine the number of
key comparisons used when searching different files. You will report on the key comparisons used by
your Stage 1 dictionary dict1 for various data inputs and the key comparisons used by your Stage 2
dictionary dict2 for various data inputs too. You will compare these results with what you expected
based on theory (big-O) for these algorithms and data structure.

4
Your experimentation should be systematic, varying the size and characteristics of the dataset files you
use (e.g. sorted or random), and observing how the number of key comparisons varies. Repeating a
test case with different keys and taking the average can be useful.

Some useful UNIX commands for creating test files with different characteristics include sort, sort
-R (man sort for more information on the -R option), and shuf. You can randomize your input
data and pick the first x keys as the lookup keywords.

If you use only keyboard input for searches, it is unlikely that you will be able to generate enough
data to analyze your results. You should familiarize yourself with the powerful UNIX facilities for
redirecting standard input (stdin) and standard output (stdout). You might also find it useful to
familiarize yourself with UNIX pipes ‘|’ and possibly also the UNIX program awk for processing
structured output. For example, if you pipe your output into echo ‘‘abc:def’’ | awk -F ’:’
’{print $1}’, you will output only the first column (abc). In the example, -F specifies the de-
limiter. Instead of using echo you can use cat filename.csv | awk -F ’;’ ’{print $1}’
which will print only the first column of the filename.csv file. You can build up a file of numbers of
key comparisons using the shell append operator >>, e.x. your command >> file to append to.

You will write up your findings and submit your results separately through the Turnitin system. You will
describe your results from each stage and also compare these to what you know about the theory of
binary search trees.

Tables and graphs are useful presentation methods. Select only informative data; more is not always
better.

You should present your findings clearly, in light of what you know about the data structures used in
your programs and in light of their known computational complexity. You may find that your results
are what you expected, based on theory. Alternatively, you may find your results do not agree with
theory. In either case, you should state what you expected from the theory, and if there is a discrep-
ancy you should suggest possible reasons. You might want to discuss space-time trade-offs, if this is
appropriate to your code and data.

You are not constrained to any particular structure in this report, but a useful way to present your
findings might be:

• Introduction: Summary of data structures and inputs.

• Stage 1:

– Data (number of key comparisons)

– Comparison with theory

• Stage 2:

– Data (number of key comparisons)

– Comparison with theory

• Discussion

Implementation Requirements
The following implementation requirements must be adhered to:

• You must code your dictionary in the C programming language.

5
• You must code your dictionary in a modular way, so that your dictionary implementation could be
used in another program without extensive rewriting or copying. This means that the dictionary
operations are kept together in a separate .c file, with its own header (.h) file, separate from the
main program.

• Your code should be easily extensible to allow for multiple dictionaries. This means that the func-
tions for insertion, search, and deletion take as arguments not only the item being inserted or a
key for searching and deleting, but also a pointer to a particular dictionary, e.g. insert(dict,
item).

• Your program should store strings in a space-efficient manner. If you are using malloc() to
create the space for a string, remember to allow space for the final end of string ‘\0’ (NULL).

• A Makefile is not provided for you. The Makefile should direct the compilation of two
separate programs: dict1 and dict2. To use the Makefile, make sure it is in the same directory
of your code, and type make dict1 to make the dictionary for Stage 1 and make dict2 to
make the dictionary for Stage 2. You must submit your makefile with your assignment. Hint: If
you havent used make before, try it on simple programs first. If it doesn’t work, read the error
messages carefully. A common problem in compiling multifile executables is in the included
header files. Note also that the whitespace before the command is a tab, and not multiple
spaces. It is not a good idea to code your program as a single file and then try to break it down
into multiple files. Start by using multiple files, with minimal content, and make sure they are
communicating with each other before starting more serious coding.

Data
The data files are provided at /home/shared/assg1/datafiles/ on JupyterHub. The data for-
mat is as specified above in Stage 1.

No attempt has been made to remove or prevent duplicate keys in the original files, so you should
expect duplicate keys. Our script only formatted the data correctly making sure it complies with a csv
standard specification.

Resources: Programming Style (2 Marks)

Two locally-written papers containing useful guidelines on coding style and structure can be found
on the LMS Resources → Project Coding Guidelines, by Peter Schachte, and below and adapted version
of the LMS Resources → C Programming Style, written for Engineering Computation COMP20005 by
Aidan Nagorcka-Smith. Be aware that your programming style will be judged with 2 marks.

1 / * * ***********************
2 * C Programming S t y l e f o r E n g i n e e r i n g Computation
3 * C r ea t e d by Aidan Nagorcka−Smith ( aidann@student . unimelb . edu . au ) 13/03/2011
4 * D e f i n i t i o n s and i n c l u d e s
5 * D e f i n i t i o n s a r e i n UPPER CASE
6 * I n c l u d e s go b e f o r e d e f i n i t i o n s
7 * Space between i n c l u d e s , d e f i n i t i o n s and t h e main f u n c t i o n .
8 * Use d e f i n i t i o n s f o r any c o n s t a n t s i n your program , do not j u s t w r i t e them
9 * in .
10 *
11 * Tabs may be s e t t o 4−s p a c e s or 8−s pa c es , depending on your e d i t o r . The code
12 * Below i s ``gnu ' ' s t y l e . I f your e d i t o r has `` bsd ' ' i t w i l l f o l l o w t h e 8−s p a ce
13 * s t y l e . Both a r e v e r y s t a n d a r d .
14 */
15

6
16 /* *
17 * GOOD:
18 */
19
20 #i n c l u d e <s t d i o . h>
21 #i n c l u d e < s t d l i b . h>
22 #d e f i n e MAX STRING SIZE 1000
23 #d e f i n e DEBUG 0
24 i n t main( i n t argc , ch ar ** argv) {
25 ...
26
27 /* *
28 * BAD :
29 */
30
31 / * D e f i n i t i o n s and i n c l u d e s a r e mixed up * /
32 #i n c l u d e < s t d l i b . h>
33 #d e f i n e MAX STING SIZE 1000
34 / * D e f i n i t i o n s a r e g i v e n names l i k e v a r i a b l e s * /
35 #d e f i n e debug 0
36 #i n c l u d e <s t d i o . h>
37 / * No s p a c i n g between i n c l u d e s , d e f i n i t i o n s and main f u n c t i o n * /
38 i n t main( i n t argc , ch ar ** argv) {
39 ...
40
41 / * * *****************************
42 * Variables
43 * Give them u s e f u l l o w e r c a s e names or camelCase . E i t h e r i s f i n e ,
44 * as long as you a r e c o n s i s t e n t and a p p l y always t h e same s t y l e .
45 * I n i t i a l i s e them t o something t h a t makes s e n s e .
46 */
47
48 /* *
49 * GOOD: l o w e r c a s e
50 */
51
52 i n t main( i n t argc , ch ar ** argv) {
53
54 int i = 0;
55 int num_fifties = 0 ;
56 int num_twenties = 0 ;
57 int num_tens = 0 ;
58
59 ...
60 /* *
61 * GOOD: camelCase
62 */
63
64 i n t main( i n t argc , ch ar ** argv) {
65
66 int i = 0;
67 int numFifties = 0 ;
68 int numTwenties = 0 ;
69 int numTens = 0 ;
70
71 ...
72 /* *
73 * BAD :
74 */
75
76 i n t main( i n t argc , ch ar ** argv) {
77
78 / * V a r i a b l e not i n i t i a l i s e d − c a u s e s a bug because we didn ' t remember t o
79 * s e t i t b e f o r e t h e loop * /
80 int i;
81 / * V a r i a b l e i n a l l caps − we ' l l g e t confus ed between t h i s and c o n s t a n t s

7
82 */
83 i n t NUM_FIFTIES = 0 ;
84 / * O v e r l y a b b r e v i a t e d v a r i a b l e names make t h i n g s hard . * /
85 i n t nt = 0
86
87 w h i l e (i < 10) {
88 ...
89 i++;
90 }
91
92 ...
93
94 / * * ********************
95 * Spacing :
96 * Space i n t e l l i g e n t l y , v e r t i c a l l y t o group b l o c k s o f code t h a t a r e doing a
97 * s p e c i f i c o p e r a t i o n , or t o s e p a r a t e v a r i a b l e d e c l a r a t i o n s from o t h e r code .
98 * One t a b o f i n d e n t a t i o n w i t h i n e i t h e r a f u n c t i o n or a loop .
99 * Spaces a f t e r commas .
100 * Space between ) and { .
101 * No s p a c e between t h e ** and t h e argv i n t h e d e f i n i t i o n o f t h e main
102 * function .
103 * When d e c l a r i n g a p o i n t e r v a r i a b l e or argument , you may p l a c e t h e a s t e r i s k
104 * a d j a c e n t t o e i t h e r t h e t y p e or t o t h e v a r i a b l e name .
105 * L i n e s a t most 80 c h a r a c t e r s long .
106 * C l o s i n g b r a c e goes on i t s own l i n e
107 */
108
109 /* *
110 * GOOD:
111 */
112
113 i n t main( i n t argc , ch ar ** argv) {
114
115 i n t i = 0;
116
117 f o r (i = 100; i >= 0 ; i−−) {
118 i f (i > 0) {
119 printf( ”%d b o t t l e s o f beer , t a k e one down and p a s s i t around , ”
120 ” %d b o t t l e s o f beer . \ n ” , i , i − 1) ;
121 } else {
122 printf( ”%d b o t t l e s o f beer , t a k e one down and p a s s i t around . ”
123 ” We ' r e empty . \ n ” , i) ;
124 }
125 }
126
127 return 0;
128 }
129
130 /* *
131 * BAD :
132 */
133
134 / * No s p a c e a f t e r commas
135 * Space between t h e ** and argv i n t h e main f u n c t i o n d e f i n i t i o n
136 * No s p a c e between t h e ) and { a t t h e s t a r t o f a f u n c t i o n * /
137 i n t main( i n t argc , ch ar ** argv) {
138 i n t i = 0;
139 / * No s p a c e between v a r i a b l e d e c l a r a t i o n s and t h e r e s t o f t h e f u n c t i o n .
140 * No s p a c e s around t h e boolean o p e r a t o r s * /
141 f o r (i=100;i>=0;i−−) {
142 / * No i n d e n t a t i o n * /
143 i f (i > 0) {
144 / * L i n e too long * /
145 printf( ”%d b o t t l e s o f beer , t a k e one down and p a s s i t around , %d
146 b o t t l e s o f b eer . \ n ” , i , i − 1) ;
147 } else {

8
148 / * Spacing f o r no good r e a s o n . * /
149
150 printf( ”%d b o t t l e s o f beer , t a k e one down and p a s s i t around . ”
151 ” We ' r e empty . \ n ” , i) ;
152
153 }
154 }
155 / * C l o s i n g b r a c e not on i t s own l i n e * /
156 return 0;}
157
158 / * * ****************
159 * Braces :
160 * Opening b r a c e s go on t h e same l i n e as t h e loop or f u n c t i o n name
161 * C l o s i n g b r a c e s go on t h e i r own l i n e
162 * C l o s i n g b r a c e s go a t t h e same i n d e n t a t i o n l e v e l as t h e t h i n g t h e y a r e
163 * closing
164 */
165
166 /* *
167 * GOOD:
168 */
169
170 i n t main( i n t argc , ch ar ** argv) {
171
172 ...
173
174 for ( . . . ) {
175 ...
176 }
177
178 return 0;
179 }
180
181 /* *
182 * BAD :
183 */
184
185 i n t main( i n t argc , ch ar ** argv) {
186
187 ...
188
189 / * Opening b r a c e on a d i f f e r e n t l i n e t o t h e f o r loop open * /
190 for ( . . . )
191 {
192 ...
193 /* C l o s i n g b r a c e a t a d i f f e r e n t i n d e n t a t i o n t o t h e t h i n g i t ' s
194 closing
195 */
196 }
197
198 / * C l o s i n g b r a c e not on i t s own l i n e . * /
199 return 0;}
200
201 / * * **************
202 * Commenting :
203 * Each program should have a comment e x p l a i n i n g what i t does and who c r e a t e d
204 * it .
205 * A l s o comment how t o run t h e program , i n c l u d i n g o p t i o n a l command l i n e
206 * parameters .
207 * Any i n t e r e s t i n g code should have a comment t o e x p l a i n i t s e l f .
208 * We should not comment o b v i o u s t h i n g s − w r i t e code t h a t documents i t s e l f
209 */
210
211 /* *
212 * GOOD:
213 */

9
214
215 / * change . c
216 *
217 * C r ea t e d by Aidan Nagorcka−Smith ( aidann@student . unimelb . edu . au )
218 13/03/2011
219 *
220 * P r i n t t h e number o f each c o i n t h a t would be needed t o make up some
221 change
222 * t h a t i s i n p u t by t h e u s e r
223 *
224 * To run t h e program t y p e :
225 * . / c o i n s −−num coins 5 −−s h a p e c o i n s t r a p e z o i d −−o ut pu t b l a b l a . t x t
226 *
227 * To s e e a l l t h e i n p u t parameters , t y p e :
228 * . / c o i n s −−h e l p
229 * O pt i on s : :
230 * −−h e l p Show h e l p message
231 * −−num coins arg I n p u t number o f c o i n s
232 * −−s h a p e c o i n s arg I n p u t c o i n s shape
233 * −−bound arg (=1) Max bound on xxx , d e f a u l t v a l u e 1
234 * −−o u t p u t arg Output s o l u t i o n f i l e
235 *
236 */
237
238 i n t main( i n t argc , ch ar ** argv) {
239
240 i n t input_change = 0 ;
241
242 printf( ” P l e a s e i n p u t t h e v a l u e o f t h e change (0−99 c e n t s
243 i n c l u s i v e ) :\n” ) ;
244 scanf( ”%d ” , &input_change) ;
245 printf( ” \n ” ) ;
246
247 // V a l i d change v a l u e s a r e 0−99 i n c l u s i v e .
248 i f (input_change < 0 | | input_change > 99) {
249 printf( ” I n p u t not i n t h e range 0−99.\n ” )
250 }
251
252 ...
253
254 /* *
255 * BAD :
256 */
257
258 / * No e x p l a n a t i o n o f what t h e program i s doing * /
259 i n t main( i n t argc , ch ar ** argv) {
260
261 / * Commenting o b v i o u s t h i n g s * /
262 / * C r e a t e a i n t v a r i a b l e c a l l e d i n p u t c h a n g e t o s t o r e t h e i n p u t from
263 the
264 * u s e r . */
265 i n t input_change ;
266
267 ...
268
269 / * * ****************
270 * Code s t r u c t u r e :
271 * F a i l f a s t − i n p u t c h e c k s should happen f i r s t , then do t h e computation .
272 * S t r u c t u r e t h e code so t h a t a l l e r r o r h a n d l i n g happens i n an e a s y t o read
273 * location
274 */
275
276 /* *
277 * GOOD:
278 */
279 i f (input_is_bad) {

10
280 printf( ” E r r o r : I n p u t was not v a l i d . E x i t i n g . \ n ” ) ;
281 exit(EXIT_FAILURE) ;
282 }
283
284 / * Do com putations here * /
285 ...
286
287 /* *
288 * BAD :
289 */
290
291 i f (input_is_good) {
292 / * l o t s o f computation here , pushing t h e e l s e p a r t o f f t h e s c r e e n .
293 */
294 ...
295 } else {
296 fprintf(stderr , ” E r r o r : I n p u t was not v a l i d . E x i t i n g . \ n ” ) ;
297 exit(EXIT_FAILURE) ;
298 }

Additional Support
Your tutors will be available to help with your assignment during the scheduled workshop times. Ques-
tions related to the assignment may be posted on the Piazza forum, using the folder tag assignment1
for new posts. You should feel free to answer other students’ questions if you are confident of your
skills.

A tutor will check the Discussion Forum regularly, and answer some questions, but be aware that for
some questions you will just need to use your judgment and document your thinking. For example, a
question like, How much data should I use for the experiments?, will not be answered; you must try
out different data and see what makes sense.

In this subject, we’ll be supporting the shared JupyterHub system, its terminal and file editor. Your
final program must compile and run on the shared JupyterHub instance.

Submission
You will need to make two submissions for this assignment:

• Your C code files (including your Makefile) will be submitted through the LMS page for this
subject: Assignments → Assignment 1 → Assignment 1: Code.

• Your experiments report file will be submitted through the LMS page for this subject: Assignments
→ Assignment 1 → Assignment 1: Experimentation. This file can be of any format, e.g. .pdf, text
or other.

Program files submitted (Code)

Submit the program files for your assignment and your Makefile.

If you wish to submit any scripts or code used to generate input data, you may, although this is not
required. Just be sure to submit all your files at the same time.

Your programs must compile and run correctly on the shared JupyterHub instance. You may have de-
veloped your program in another environment, but it still must run on the JupyterHub at submission

11
time. For this reason, and because there are often small, but significant, differences between compil-
ers, it is suggested that if you are working in a different environment, you upload and test your code
on the shared JupyterHub instance at reasonably frequent intervals.

A common reason for programs not to compile is that a file has been inadvertently omitted from the
submission. Please check your submission, and resubmit all files if necessary.

Experiment file submitted using Turnitin

As noted above, your experimental work will be submitted through the LMS, via the Turnitin system.
Go to the LMS page for this subject: Assignments → Assignment 1 → Assignment 1: Experimentation
and follow the prompts.

Your file can be in any format. Plain text or .pdf are recommended, but other formats will be ac-
cepted. It is expected that your experimental work will be in a single file, but multiple files can be
accepted. Add your username to the top of your experiments file.

Please do not submit large data files. There is no need to query every key on the dictionary.

Assessment
There are a total of 15 marks given for this assignment, 7 marks for Stage 1, 2 marks for Stage 2, and
4 marks for the separately submitted Experimentation Stage. 2 marks will be given based on your
C programming style.

Your C program will be marked on the basis of accuracy, readability, and good C programming struc-
ture, safety and style, including documentation. Safety refers to checking whether opening a file
returns something, whether mallocs do their job, etc. The documentation should explain all major
design decisions, and should be formatted so that it does not interfere with reading the code. As much
as possible, try to make your code self-documenting, by choosing descriptive variable names.

Your experimentation will be marked on the basis of orderliness and thoroughness of experimentation,
comparison of your results with theory, and thoughtful discussion.

Plagiarism
This is an individual assignment. The work must be your own.

While you may discuss your program development, coding problems and experimentation with your
classmates, you must not share files, as this is considered plagiarism.

If you refer to published work in the discussion of your experiments, be sure to include a citation to
the publication or the web link.

Borrowing of someone elses code without acknowledgment is plagiarism. Plagiarism is considered

a serious offense at the University of Melbourne. You should read the University code on Academic
honesty and details on plagiarism. Make sure you are not plagiarizing, intentionally or unintentionally.

You are also advised that there will be a C programming component (on paper, not on a computer) in
the final examination. Students who do not program their own assignments will be at a disadvantage
for this part of the examination.

12
Administrative issues
When is late? What do I do if I am late? The due date and time are printed on the front of this
document. The lateness policy is on the handout provided at the first lecture and also available on the
subject LMS page. If you decide to make a late submission, you should send an email directly to the
lecturer as soon as possible and he will provide instructions for making a late submission.

What are the marks and the marking criteria Recall that this project is worth 15% of your final
score. There is also a hurdle requirement: you must earn at least 15 marks out of a subtotal of 30 for
the projects to pass this subject.

Finally Despite all these stern words, we are here to help! There is information about getting help in
this subject on the LMS pages. Frequently asked questions about the project will be answered in the
LMS discussion group.

SIG Interview Questions
No ratings yet
SIG Interview Questions
3 pages
Top Unix Interview Questions - Part 1
No ratings yet
Top Unix Interview Questions - Part 1
37 pages
5 Mark
No ratings yet
5 Mark
16 pages
206MiniAss4x2023Winter
No ratings yet
206MiniAss4x2023Winter
7 pages
Data Science Programming In Python
From Everand
Data Science Programming In Python
Anita Raichand
No ratings yet
SE_DSA.docx
No ratings yet
SE_DSA.docx
2 pages
COMP Assignment4
No ratings yet
COMP Assignment4
9 pages
CSV FILES WORKSHEET ANS (1)
No ratings yet
CSV FILES WORKSHEET ANS (1)
9 pages
RTS-2 3
No ratings yet
RTS-2 3
5 pages
Faculty of Engineering and Technology Electrical and Computer Engineering Department
No ratings yet
Faculty of Engineering and Technology Electrical and Computer Engineering Department
2 pages
Data Science Lab Group Submission
No ratings yet
Data Science Lab Group Submission
13 pages
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
practice 4,5
No ratings yet
practice 4,5
20 pages
Computer Programming and Applications (ENGG1002C) Assignment 3
No ratings yet
Computer Programming and Applications (ENGG1002C) Assignment 3
4 pages
Some Important Questions Cs Class 12
No ratings yet
Some Important Questions Cs Class 12
4 pages
You Have Two Datasets - Trips - TXT Which Records Tri...
No ratings yet
You Have Two Datasets - Trips - TXT Which Records Tri...
6 pages
Python Journal Grade 12
No ratings yet
Python Journal Grade 12
54 pages
Grade12 Computer Set1 AK Sahodaya
No ratings yet
Grade12 Computer Set1 AK Sahodaya
9 pages
Unit wise list of questions for practical file
No ratings yet
Unit wise list of questions for practical file
6 pages
2412_Computer_Science
No ratings yet
2412_Computer_Science
12 pages
Record Ex-12 to Ex-15
No ratings yet
Record Ex-12 to Ex-15
13 pages
XII CS MS
No ratings yet
XII CS MS
6 pages
PB 1 - ComputerScience MS
No ratings yet
PB 1 - ComputerScience MS
16 pages
Digital Engineering: Complex System Design
From Everand
Digital Engineering: Complex System Design
S Mathioudakis
No ratings yet
HCI Computing 2018 Promo Solution Guide
No ratings yet
HCI Computing 2018 Promo Solution Guide
7 pages
12 Comp
No ratings yet
12 Comp
7 pages
Set 1 Comp. Sci Ans. Key
No ratings yet
Set 1 Comp. Sci Ans. Key
6 pages
SAMPLE QUESTION PAPER 2 (Solved)
No ratings yet
SAMPLE QUESTION PAPER 2 (Solved)
8 pages
Max Marks:50: Computer Science
No ratings yet
Max Marks:50: Computer Science
8 pages
DATA FILE HANDLING REVISION CLASS 12 CHENNAI PUBLIC SCHOOLS
No ratings yet
DATA FILE HANDLING REVISION CLASS 12 CHENNAI PUBLIC SCHOOLS
4 pages
12 CS 2023-24 QP PB ISM Set123
No ratings yet
12 CS 2023-24 QP PB ISM Set123
30 pages
MS-Computer Science-12-Common Exam
No ratings yet
MS-Computer Science-12-Common Exam
9 pages
Grade 12 Cs- Pre Board 3 Ans
No ratings yet
Grade 12 Cs- Pre Board 3 Ans
29 pages
KV CS Pre Board 1 2022-23 Set 2 MS
No ratings yet
KV CS Pre Board 1 2022-23 Set 2 MS
5 pages
CSV Flie Question Bank Solutions
No ratings yet
CSV Flie Question Bank Solutions
20 pages
MS Comp. Science Class 12
No ratings yet
MS Comp. Science Class 12
5 pages
SET I MS
No ratings yet
SET I MS
10 pages
R-raj
No ratings yet
R-raj
9 pages
SAMPLE PAPER-IX Class XII (Computer Science) SEE PDF
No ratings yet
SAMPLE PAPER-IX Class XII (Computer Science) SEE PDF
5 pages
UT-2_SET-B-AK
No ratings yet
UT-2_SET-B-AK
7 pages
XII CS MODEL AK
No ratings yet
XII CS MODEL AK
10 pages
SQP 14 - QP
No ratings yet
SQP 14 - QP
14 pages
ITNPBD2 Assignment 2015
No ratings yet
ITNPBD2 Assignment 2015
5 pages
MS - 12CS - PB-I - 23-24 Set 2
No ratings yet
MS - 12CS - PB-I - 23-24 Set 2
6 pages
Class12 Mock Test-1 2024 Solution
No ratings yet
Class12 Mock Test-1 2024 Solution
8 pages
Archivo
No ratings yet
Archivo
6 pages
HYDERABAD_MS 2
No ratings yet
HYDERABAD_MS 2
6 pages
Solved QBank_CSV Files
No ratings yet
Solved QBank_CSV Files
10 pages
CS29206 Systems Programming Laboratory, Spring 2022-2023
No ratings yet
CS29206 Systems Programming Laboratory, Spring 2022-2023
4 pages
CD Lab Manual PDF
No ratings yet
CD Lab Manual PDF
71 pages
UP Assignment
No ratings yet
UP Assignment
13 pages
CSV FILES WORKSHEET
No ratings yet
CSV FILES WORKSHEET
4 pages
Comp Sample
No ratings yet
Comp Sample
13 pages
7.Computer-Science-PRE-BOARD-MS
No ratings yet
7.Computer-Science-PRE-BOARD-MS
9 pages
CISA Exam-Testing Concept-PERT/CPM/Gantt Chart/FPA/EVA/Timebox (Chapter-3)
From Everand
CISA Exam-Testing Concept-PERT/CPM/Gantt Chart/FPA/EVA/Timebox (Chapter-3)
Hemang Doshi
1.5/5 (3)
class12_boardPracQP[1]
No ratings yet
class12_boardPracQP[1]
12 pages
12 CS Record Obs Material 2024 1714638166
No ratings yet
12 CS Record Obs Material 2024 1714638166
38 pages
Ms Pb-1 Mumbai Xii Cs 2024-25
No ratings yet
Ms Pb-1 Mumbai Xii Cs 2024-25
10 pages
CLASS_12_COMPUTER_SCI_HOLIDAY_HOMEWORK_22052024_110241000000009
No ratings yet
CLASS_12_COMPUTER_SCI_HOLIDAY_HOMEWORK_22052024_110241000000009
5 pages
Create A File With Hole in It
No ratings yet
Create A File With Hole in It
63 pages
L12 FileInputOutput
No ratings yet
L12 FileInputOutput
18 pages
1.ab Initio - Unix - DB - Concepts & Questions - !
No ratings yet
1.ab Initio - Unix - DB - Concepts & Questions - !
35 pages
HTTP WWW Computerhope Com Unix Overview HTM
No ratings yet
HTTP WWW Computerhope Com Unix Overview HTM
24 pages
Awk Cheat Sheet
No ratings yet
Awk Cheat Sheet
4 pages
Unix Basics and TCL Scripting
100% (1)
Unix Basics and TCL Scripting
49 pages
19 Syllabus MCASyllabus
No ratings yet
19 Syllabus MCASyllabus
86 pages
OS Lab Manual
No ratings yet
OS Lab Manual
21 pages
Awk, Sed
No ratings yet
Awk, Sed
15 pages
Unix Beyond Basics
No ratings yet
Unix Beyond Basics
20 pages
Lab Manual OS
No ratings yet
Lab Manual OS
32 pages
Unit Part II
No ratings yet
Unit Part II
49 pages
Mastering Unix Shell Scripting Bash Bourne and Korn Shell Scripting for Programmers System Administrators and UNIX Gurus Second Edition Randal K. Michael all chapter instant download
100% (6)
Mastering Unix Shell Scripting Bash Bourne and Korn Shell Scripting for Programmers System Administrators and UNIX Gurus Second Edition Randal K. Michael all chapter instant download
51 pages
Javamcq
No ratings yet
Javamcq
54 pages
Awk Introduction Tutorial - 7 Awk Print Examples
No ratings yet
Awk Introduction Tutorial - 7 Awk Print Examples
10 pages
0-13-933821-7 Unix System V Rel4 Migration Guide 1990
No ratings yet
0-13-933821-7 Unix System V Rel4 Migration Guide 1990
96 pages
Awk Script Test 101 PDF
No ratings yet
Awk Script Test 101 PDF
3 pages
Command Line UEH - Exception Analysis
No ratings yet
Command Line UEH - Exception Analysis
16 pages
Linux Admin
No ratings yet
Linux Admin
29 pages
Awk Code
No ratings yet
Awk Code
55 pages
Unix and Linux Programming File
No ratings yet
Unix and Linux Programming File
20 pages
Assignment 8 AWK
No ratings yet
Assignment 8 AWK
4 pages
1 Semester QB 2K19
No ratings yet
1 Semester QB 2K19
12 pages
Overview of UNIX: References
0% (1)
Overview of UNIX: References
172 pages
Bash Cheatsheet
100% (2)
Bash Cheatsheet
4 pages
AWK Command in Unix
No ratings yet
AWK Command in Unix
6 pages
Structural Regular Expressions: Rob Pike
No ratings yet
Structural Regular Expressions: Rob Pike
7 pages
Unix Commands
No ratings yet
Unix Commands
15 pages
CMD 1
No ratings yet
CMD 1
7 pages
file_exec
No ratings yet
file_exec
3 pages
The AWK Programming Language Addison Wesley Professional Computing Series 2nd Edition Alfred Aho pdf download
100% (1)
The AWK Programming Language Addison Wesley Professional Computing Series 2nd Edition Alfred Aho pdf download
41 pages

Assignment 1

Uploaded by

Assignment 1

Uploaded by

COMP20003 Algorithms and Data Structures

Second (Spring) Semester 2019

• Increase your proficiency in using UNIX utilities.

You are not required to implement the delete functionality.

VendorID - Code to indicate the vendor which produced the record

In this first stage of the assignment, you will:

– dict1 datafile outputfile then type in keys; or

– output file (information):

2018-12-15 01:49:13 --> VendorID: 1 || passenger count: 1 || trip distance: 1.9

2018-12-15 01:49:13 --> 423

– dict2 datafile outputfile then type in location IDs; or

– output file (information):

79 --> 2018-12-08 19:36:57

• Introduction: Summary of data structures and inputs.

– Data (number of key comparisons)

– Data (number of key comparisons)

• You must code your dictionary in the C programming language.

Resources: Programming Style (2 Marks)

Program files submitted (Code)

Experiment file submitted using Turnitin

Borrowing of someone elses code without acknowledgment is plagiarism. Plagiarism is considered

You might also like