endterm

The document outlines the structure and content of a final exam for a course, COL 761, consisting of various questions related to clustering algorithms, distance functions, and data structures. It includes multiple-choice questions, theoretical explanations, and derivations, covering topics such as MCL assumptions, OPTICS vs. DBSCAN, distance metrics, KD-trees, Jaccard similarity, and Bloom filters. Additionally, there are extra credit questions and true/false statements to assess students' understanding of the material.

Uploaded by

Kartik Arora

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

endterm

Uploaded by

Kartik Arora

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

COL 761

Final Exam

Total: 50 points + 30 points for extra credit (to be added to HW component)

1. ( 4 points) What assumption(s) does MCL make about clusters in a graph in addition to the
property that nodes in a cluster have large number of paths between them and low connectivity
among nodes across clusters?

2. (5 points) Which of the following clustering algorithms can be used to cluster graphs in a graph
database with a metric distance function for graphs. [1 point for correct options, -1 for incorrect
options. 5 points (i.e., 1 point bonus) if you get all correct options and 0 incorrect options]
a. K-means
b. DBSCAN
c. K-medoid
d. Single-linkage

3. (5 points) It is clear that parameters are easier to set in OPTICS than in DBSCAN. But if parameter
selection is not a problem (let’s say some oracle tells us the best parameters), would you still say
OPTICS is better? Explain.
4. ( 8 points) What is the time complexity of the fastest possible algorithm for single-linkage
hierarchical clustering? Write the algorithm and the complexity analysis.

5. (6 points) Suppose you have three distance functions, 𝑑1 , 𝑑2 ,𝑑3 to rank webpages for a given query
keyword. To identify which is the best distance function, you conducted a survey across 1000
people, where each person searched for a web query, and were shown the top-ranked page by
each of the three distance functions. The users were asked to choose the result that they liked the
most. You found out that 500 people voted for 𝑑1 . Similarly, 𝑑2 received 300 votes and 𝑑3 received
200 votes. How can you infer if this distribution of votes is purely due to chance or there is a definite
preference towards 𝑑1 ? Explain precisely and formally.
6. ( 5 points) Let G be an edge-weighted (only positive edge-weights) undirected graph. Let the
distance d(u,v) between two nodes in the graph be the length of the shortest path from u to v.
The length of a path is the sum of its constituent edge weights. Prove 𝑑(𝑢, 𝑣) satisfies triangular
inequality.

7. ( 5+2=7 points) Derive the querying time complexity of range query in a d-dimensional KD-tree.
Write down the recursion you will have for the maximum number of intersections with the query
region in terms of both n and d, and the final complexity. You must provide the detailed
derivation in addition to writing down the expressions below. No points will be awarded for just
expressions.

Answer: Q(n)=_, O(_)

8. ( 10=6+2+2 points) Suppose you have a database of 10 × 106 text documents, where each
document is a d-dimensional bit vector. The similarity between two documents is the Jaccard
similarity between them. The Jaccard distance can analogously be defined as (1 - Jaccard Similarity).
Given a query document, you want to use LSH to identify its 1-NN. You are not allowed to convert
the dataset into Hamming Space or perform any other space transformations. It is given to you that
the 1-NN always resides within a Jaccard Similarity of 0.8. In other words, the 1 -NN has a similarity
of 0.8 or more with any query. You are allowed to absorb an approximation error of 𝜖 = 1 in the
LSH. Answer the following questions with respect to this problem.
a. Propose a locality sensitive hash function with parameters (r1, r2, p1, p2) as defined in the
slides. Specifically, i) mention your hash code generation policy, and ii) the values of r1, r2,
p1, p2. These guarantees must hold on the original Jaccard Similarity (or distance) itself
and not on Hamming distance or some other converted space. Note that r1 and r2 are
distance radii. So, convert Jaccard similarity to distance accordingly.
b. What should be the value of H, i.e., the number of hash codes per table?
c. What should be the value of L, i.e., the number of hash tables?

[Note: You can leave the answers to part a, b and c above at an expression level. You don’t need to
solve them]

Extra Credit Questions. The marks you obtain in this section will be added to your Homework
component. [20 points]

9. (10 points) True/False questions [2 points for correct answer, -2 for incorrect answer]
a. The event of finding 20 heads and 30 tails out of 50 coin tosses has a p-value below 0.05.
b. With increase in inflation parameter, MCL would identify a smaller number of clusters.
c. MBRs in R-tree may overlap in space but not in actual data points.
d. The Space-saving algorithm is likely to work better for uniform frequency distribution than
power-law distribution.
e. Complete linkage clustering tries to minimize the diameter (farthest distance between any
pair of points) of clusters.

10. (10 points) Is the distance function dynamic time warping (DTW) metric? Prove or disprove. DTW
between two time series sequences T1 and T2 is defined as the following:
A time series sequence T=[s 1,…,sn] is a sequence of points. You are free to choose any
distance function as dist() as long as it satisfies metric properties. Rest(T) is the sub-sequence
containing all points of T except T.s1.

11. (10 points) In Bloom filters, we have an array of n bits, where n is the maximum number of bits
that can be maintained in memory and k hash functions that hash to these n bits.

a. The number of hash functions, k, allows us to improve the false positive rate. Are there any
disadvantages of setting a very high value of k? Explain. [4 points]

b. Consider an alternative hashing scheme where we have k different bit vectors, all of equal
sizes. We choose the size of the bit vectors such that all k of them can be maintained in
memory. We also have k hash functions, but the i th hash function can hash only into the ith bit
vector. We have m “good” objects that we hash in pre-processing. A new object is classified
as positive only if it hashes into 1-bits (i.e., a bit with value 1) for all k hash functions. Would
the false positive rate be worse or better in this modified scheme if the memory budget (total
number of bits) for both schemes are same? Prove or disprove. [6 points]

CS246 Final Exam Solutions, Winter 2011
No ratings yet
CS246 Final Exam Solutions, Winter 2011
18 pages
The Idea of A New City Was Introduced in 1974
No ratings yet
The Idea of A New City Was Introduced in 1974
2 pages
Mitsubishi Lancer Diesel 4D68 Workshop Manual - Engine
No ratings yet
Mitsubishi Lancer Diesel 4D68 Workshop Manual - Engine
68 pages
sample_question
No ratings yet
sample_question
19 pages
AA Exam 2021 Answers
No ratings yet
AA Exam 2021 Answers
6 pages
homework2w24
No ratings yet
homework2w24
3 pages
6th Sem End Sem All Ques
100% (1)
6th Sem End Sem All Ques
15 pages
資料結構與演算法 105
No ratings yet
資料結構與演算法 105
2 pages
2-1 Algorithm 19 Batch
No ratings yet
2-1 Algorithm 19 Batch
4 pages
CPT212-Test2-2023 Solution
No ratings yet
CPT212-Test2-2023 Solution
6 pages
Solution11
No ratings yet
Solution11
4 pages
As Needed For Those Other Classes (So If You Get Lucky and Find A Solution in One of
No ratings yet
As Needed For Those Other Classes (So If You Get Lucky and Find A Solution in One of
12 pages
Exam Examples
No ratings yet
Exam Examples
15 pages
6.00 Quiz 2, 2011 - Name
No ratings yet
6.00 Quiz 2, 2011 - Name
8 pages
Quiz 2
No ratings yet
Quiz 2
8 pages
Ugc Net - January 2017 Paper-II
No ratings yet
Ugc Net - January 2017 Paper-II
11 pages
q1_new
No ratings yet
q1_new
4 pages
Final 2023_PartB_Solution
No ratings yet
Final 2023_PartB_Solution
13 pages
UGC NET Computer Science Solved Paper - III Dec2012: (B) Building Internet Market
No ratings yet
UGC NET Computer Science Solved Paper - III Dec2012: (B) Building Internet Market
19 pages
Randomized Algorithms Notes
No ratings yet
Randomized Algorithms Notes
13 pages
HW3
No ratings yet
HW3
3 pages
test2_sample
No ratings yet
test2_sample
9 pages
cs484 hw2
No ratings yet
cs484 hw2
2 pages
CS70 Midterm Exam 1 Summer 2010
No ratings yet
CS70 Midterm Exam 1 Summer 2010
9 pages
mmds_exam_2022
No ratings yet
mmds_exam_2022
17 pages
EE4146_Test1_202324_semB_solution
No ratings yet
EE4146_Test1_202324_semB_solution
7 pages
CS 251 Fall 2018 Final Exam
No ratings yet
CS 251 Fall 2018 Final Exam
15 pages
MOCK-Exam-HCMUT-DS-2024
No ratings yet
MOCK-Exam-HCMUT-DS-2024
9 pages
2015-Spr
No ratings yet
2015-Spr
4 pages
DM MCQs-1
No ratings yet
DM MCQs-1
56 pages
CS70 Midterm Exam 1 Fall 2019
No ratings yet
CS70 Midterm Exam 1 Fall 2019
11 pages
551 Sp24 Mt1 Answers
No ratings yet
551 Sp24 Mt1 Answers
6 pages
Final With Solutions 2021
No ratings yet
Final With Solutions 2021
36 pages
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
From Everand
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
Sama Alshatali
No ratings yet
DM MCQs-1
No ratings yet
DM MCQs-1
37 pages
('Christos Papadimitriou', 'Midterm 1', ' (Solution) ') Fall 2009
No ratings yet
('Christos Papadimitriou', 'Midterm 1', ' (Solution) ') Fall 2009
5 pages
Data Structure and Algorithms II - Final - Summer 2023
No ratings yet
Data Structure and Algorithms II - Final - Summer 2023
4 pages
Fall2023 Assignment
No ratings yet
Fall2023 Assignment
6 pages
AADS sample endTerm
No ratings yet
AADS sample endTerm
2 pages
(COMP3711) (2018) (F) Final 97lpo 59937
No ratings yet
(COMP3711) (2018) (F) Final 97lpo 59937
12 pages
Couchbase Certified Java Developer - Exam Practice Tests
From Everand
Couchbase Certified Java Developer - Exam Practice Tests
Cristian Scutaru
No ratings yet
Mid-Sem Regular
No ratings yet
Mid-Sem Regular
2 pages
AA Exam 2022
No ratings yet
AA Exam 2022
3 pages
Linear Algebra Assignment Fall 2024 241120 144727[1]
No ratings yet
Linear Algebra Assignment Fall 2024 241120 144727[1]
7 pages
Midterm01 Grading Guide
No ratings yet
Midterm01 Grading Guide
4 pages
MFCS
No ratings yet
MFCS
30 pages
Ad3351 Daa Ass-II QB Edited 8.11.2024
No ratings yet
Ad3351 Daa Ass-II QB Edited 8.11.2024
76 pages
Practice Papers MTH 401
No ratings yet
Practice Papers MTH 401
7 pages
Quiz2
No ratings yet
Quiz2
5 pages
Ugc Cs Paper 2017
No ratings yet
Ugc Cs Paper 2017
13 pages
Solutions/Hints For The Problems: 1. Traffic Safe City (Points: 300)
No ratings yet
Solutions/Hints For The Problems: 1. Traffic Safe City (Points: 300)
6 pages
Daa All
No ratings yet
Daa All
24 pages
cs221 hw5
No ratings yet
cs221 hw5
3 pages
CS246 Hw1
No ratings yet
CS246 Hw1
5 pages
Quiz 1
No ratings yet
Quiz 1
16 pages
AP PGECET CS and IT (CS-2015) Question Paper & Answer Key. Download All Previous Years Computer Science & Information Technology Sample & Model Question Papers.
100% (2)
AP PGECET CS and IT (CS-2015) Question Paper & Answer Key. Download All Previous Years Computer Science & Information Technology Sample & Model Question Papers.
16 pages
AA Exam 2022 Answers
No ratings yet
AA Exam 2022 Answers
5 pages
AA Resit 2020 Answers
No ratings yet
AA Resit 2020 Answers
5 pages
DRDO_CSE_2022
No ratings yet
DRDO_CSE_2022
7 pages
Mca 101 Discrete Mathematical Structure 2006
No ratings yet
Mca 101 Discrete Mathematical Structure 2006
8 pages
UCS415 - EST - Final With Solutions
No ratings yet
UCS415 - EST - Final With Solutions
16 pages
DAA Final Examination 2003en
No ratings yet
DAA Final Examination 2003en
10 pages
Activity Worksheet: Module 4: Cs - Rs11-Iiic-E-1-7
No ratings yet
Activity Worksheet: Module 4: Cs - Rs11-Iiic-E-1-7
2 pages
Choosing The Right Surgical Glove An Overview and Update
No ratings yet
Choosing The Right Surgical Glove An Overview and Update
5 pages
Andean Cat
No ratings yet
Andean Cat
15 pages
Wealth Management: The Fundamental
No ratings yet
Wealth Management: The Fundamental
6 pages
Consolatrix College of Toledo City Inc.: A Syllabus in Physical Education 4 Via ONLINE CLASS
No ratings yet
Consolatrix College of Toledo City Inc.: A Syllabus in Physical Education 4 Via ONLINE CLASS
9 pages
Gottfried Semper and The Problem of Historicism - M. Hvattum (2004)
100% (1)
Gottfried Semper and The Problem of Historicism - M. Hvattum (2004)
289 pages
DMir 1912 05 14 01-Titanic
No ratings yet
DMir 1912 05 14 01-Titanic
16 pages
Introduction To Organizational Behavior: by Ritik Goel
No ratings yet
Introduction To Organizational Behavior: by Ritik Goel
8 pages
Extech HDV640 Manual
No ratings yet
Extech HDV640 Manual
11 pages
BRAYOO Utility Bill 1
No ratings yet
BRAYOO Utility Bill 1
1 page
1D DVS Sulzer Rta48t-B Warsil 143
No ratings yet
1D DVS Sulzer Rta48t-B Warsil 143
143 pages
Foreign Policy of Pakistan - Docx-627546669
50% (2)
Foreign Policy of Pakistan - Docx-627546669
4 pages
Microbubble Air& Dirt Separator PN16
No ratings yet
Microbubble Air& Dirt Separator PN16
2 pages
Poliomyelitis
No ratings yet
Poliomyelitis
19 pages
Revised KYC Text File Structure 2.0: S. No. Field Name Type Size Validation Remark
No ratings yet
Revised KYC Text File Structure 2.0: S. No. Field Name Type Size Validation Remark
6 pages
Traditional Wedding of Vietnam: Welcome To Presentation of Group 2
No ratings yet
Traditional Wedding of Vietnam: Welcome To Presentation of Group 2
32 pages
Mehp-Is-g07 Oi Sheet Sept 24
No ratings yet
Mehp-Is-g07 Oi Sheet Sept 24
4 pages
22me101 - Engineering Graphics and Design - Syllabus
No ratings yet
22me101 - Engineering Graphics and Design - Syllabus
2 pages
Aira Module 1 Bped 19
No ratings yet
Aira Module 1 Bped 19
8 pages
English 7 Summative-Test
No ratings yet
English 7 Summative-Test
5 pages
IT 112 Module 1 SY 2022 2023
No ratings yet
IT 112 Module 1 SY 2022 2023
35 pages
Ms Word Free Questions Download
No ratings yet
Ms Word Free Questions Download
20 pages
MIS - Ethical and Social Issues
No ratings yet
MIS - Ethical and Social Issues
38 pages
Kamba-Comparative Law - Theoratical Framework
100% (1)
Kamba-Comparative Law - Theoratical Framework
36 pages
CDEV8132Career Management - Assignment 2 Part1 My Cover Letter & Resume - 8980077
No ratings yet
CDEV8132Career Management - Assignment 2 Part1 My Cover Letter & Resume - 8980077
10 pages
H
No ratings yet
H
11 pages
Presentacion Bombas Lorentz
No ratings yet
Presentacion Bombas Lorentz
19 pages
A Family Friend
100% (1)
A Family Friend
7 pages

endterm

Uploaded by

endterm

Uploaded by

COL 761

Total: 50 points + 30 points for extra credit (to be added to HW component)

Answer: Q(n)=_____________________, O(_____________________)

You might also like

Answer: Q(n)=_, O(_)