0% found this document useful (0 votes)

42 views8 pages

Assignment4 - Fall 2024_553_dsci

Uploaded by

anshumohanty2002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views8 pages

Assignment4 - Fall 2024_553_dsci

Uploaded by

anshumohanty2002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

DSCI-553 Foundations and Applications of Data Mining

Fall 2024
Assignment 4
Deadline: November 14 - 11:59 PM PST

1. Overview of the Assignment

In this assignment, you will explore the spark GraphFrames library as well as implement your own
Girvan-Newman algorithm using the Spark Framework to detect communities in graphs. You will use the
ub_sample_data.csv dataset to find users who have similar business tastes. The goal of this assignment is
to help you understand how to use the Girvan-Newman algorithm to detect communities in an efficient
way within a distributed environment.

2. Requirements
2.1 Programming Requirements
a. For Task 1, you can use the Spark DataFrame and GraphFrames library. For task 2 you can ONLY use
Spark RDD and standard Python or Scala libraries. There will be a 10% bonus for each task if you also
submit a Scala implementation and both your Python and Scala implementations are correct.

2.2 Programming Environment

Python 3.6, JDK 1.8, Scala 2.12, and Spark 3.1.2
We will use these library versions to compile and test your code. There will be no point if we cannot run
your code on Vocareum.

2.3 Write your own code

Do not share code with other students!!
For this assignment to be an effective learning experience, you must write your own code! We
emphasize this point because you will be able to find Python implementations of some of the required
functions on the web. Please do not look for or at any such code!
TAs will combine all the code we can find from the web (e.g., Github) as well as other students’ code
from this and other (previous) sections for plagiarism detection. We will report all detected plagiarism.

2.4 What you need to turn in

You need to submit the following files on Vocareum:
a. [REQUIRED] two Python scripts, named: task1.py, task2.py
b1. [OPTIONAL, REQUIRED FOR SCALA] two Scala scripts, named: task1.scala, task2.scala
b2. [OPTIONAL, REQUIRED FOR SCALA] one jar package, named: hw4.jar
c. [OPTIONAL] You can include other scripts called by your main program.
d. You don’t need to include your results. We will grade your code with our testing data (data will be in
the same format).

3. Datasets
We have generated a sub-dataset, ub_sample_data.csv, from the Yelp review dataset containing user_id
and business_id. You can find the data on Vocareum under resource/asnlib/publicdata/.

4. Tasks
4.1 Graph Construction
To construct the social network graph, assume that each node is uniquely labeled and that links are
undirected and unweighted.

Each node represents a user. There should be an edge between two nodes if the number of common
businesses reviewed by two users is greater than or equivalent to the filter threshold. For example,
suppose user1 reviewed set{business1, business2, business3} and user2 reviewed set{business2,
business3, business4, business5}. If the threshold is 2, there will be an edge between user1 and user2.

If the user node has no edge, we will not include that node in the graph.
The filter threshold will be given as an input parameter when running your code.

4.2 Task1: Community Detection Based on GraphFrames (2 pts)

In task1, you will explore the Spark GraphFrames library to detect communities in the network graph you
constructed in 4.1. In the library, it provides the implementation of the Label Propagation Algorithm
(LPA) which was proposed by Raghavan, Albert, and Kumara in 2007. It is an iterative community
detection solution whereby information “flows” through the graph based on underlying edge structure.
In this task, you do not need to implement the algorithm from scratch, you can call the method provided
by the library. The following websites may help you get started with the Spark GraphFrames:
https://docs.databricks.com/spark/latest/graph-analysis/graphframes/user-guide-python.html
https://docs.databricks.com/spark/latest/graph-analysis/graphframes/user-guide-scala.html

4.2.1 Execution Detail

The version of the GraphFrames should be 0.6.0.
(For your convenience, graphframes0.6.0 is already installed for python on Vocareum. The corresponding
jar package can also be found under the $ASNLIB/public folder. )
For Python (in local machine):
● [Approach 1] Run “python3.6 -m pip install graphframes” in the terminal to install the package.
● [Approach 2] In PyCharm, you add the sentence below into your code to use the jar package
os.environ["PYSPARK_SUBMIT_ARGS"] = "--packages
graphframes:graphframes:0.8.2-spark3.1-s_2.12 pyspark-shell"
● In the terminal, you need to assign the parameter “packages” of the spark-submit:
--packages graphframes:graphframes:0.8.2-spark3.1-s_2.12
For Scala (in local machine):
● In Intellij IDEA, you need to add library dependencies to your project
“graphframes” % “graphframes” % “0.8.2-spark3.1-s_2.12”
“org.apache.spark” %% “spark-graphx” % sparkVersion
● In the terminal, you need to assign the parameter “packages” of the spark-submit:
--packages graphframes:graphframes:0.8.2-spark3.1-s_2.12
For the parameter “maxIter” of the LPA method, you should set it to 5.

4.2.2 Output Result

In this task, you need to save your result of communities in a txt file. Each line represents one
community and the format is:
‘user_id1’, ‘user_id2’, ‘user_id3’, ‘user_id4’, …
Your result should be firstly sorted by the size of communities in ascending order, and then the first
user_id in the community in lexicographical order (the user_id is of type string). The user_ids in each
community should also be in the lexicographical order.
If there is only one node in the community, we still regard it as a valid community.

Figure 1: community output file format

4.3 Task 2: Community Detection Based on Girvan-Newman algorithm (5 pts)

In task 2, you will implement your own Girvan-Newman algorithm to detect the communities in the
network graph. You can refer to Chapter 10 from the Mining of Massive Datasets book for the algorithm
details.
Because your task1 and task2 code will be executed separately, you need to construct the graph again in
this task following the rules in section 4.1.
For task 2, you can ONLY use Spark RDD and standard Python or Scala libraries. Remember to delete
your code that imports graphframes. Usage of Spark DataFrame is NOT allowed in this task.
4.3.1 Betweenness Calculation (2 pts)
In this part, you will calculate the betweenness of each edge in the original graph you constructed in 4.1.
Then you need to save your result in a txt file. The format of each line is
(‘user_id1’, ‘user_id2’), betweenness value
Your result should be firstly sorted by the betweenness values in descending order and then the first
user_id in the tuple in lexicographical order (the user_id is type of string). The two user_ids in each tuple
should also be in lexicographical order.

For output, you should use the python built-in round() function to round the betweenness value to five
digits after the decimal point. (Rounding is for output only, please do not use the rounded numbers for
further calculation)

IMPORTANT: Please strictly follow the output format since your code will be graded automatically. We
will not regrade because of formatting issues.

Figure 2: betweenness output file format

4.3.2 Community Detection (3 pts)

You are required to divide the graph into suitable communities, which reaches the global highest
modularity. The formula of modularity is shown below:
According to the Girvan-Newman algorithm, after removing one edge, you should re-compute the
betweenness. The “m” in the formula represents the edge number of the original graph. (Hint: In each
remove step, “m”, “k_i” and “k_j” should not be changed, while ‘A’ is calculated based on the updated
graph.). In the step of removing the edges with the highest betweenness, if two or more edges have the
same (highest) betweenness, you should remove all those edges.

If the community only has one user node, we still regard it as a valid community.
You need to save your result in a txt file. The format is the same as the output file from task 1.
Hints:
1. For task 2.2, you should take into account the precision. For example: stop the modularity
calculation only if there is a significant reduction in the new modularity.
2. A=1 when BOTH i in j and j in i. Not just i in j or j in i.
3. For task 2.2 the stopping criteria plays an important role. Again, avoid the temptation to stop
your search at the first decrease in modularity. Instead, continue exploring all potential
partitions to find the global maximum. This comprehensive approach ensures that you don't
miss the optimal solution.
4. If you want to do a thorough checking of the answer, you can always calculate the modularity for
all possible communities (stop until no edges remain).
5. In modularity calculation, For A, using current graph; for kikj, using original graph.

IMPORTANT: Please strictly follow the hints as your code will be graded on a
different dataset. Passing the submission dataset does not guarantee passing
the grading dataset unless you strictly follow all the hints above and you may
lose points because of that. We will not regrade for any points lost due to
this. PLEASE DO FOLLOW ALL THE HINTS ABOVE.

4.4 Execution Format

Execution example:
Python:
spark-submit --packages graphframes:graphframes:0.8.2-spark3.1-s_2.12 task1.py <filter threshold>
<input_file_path> <community_output_file_path>
spark-submit task2.py <filter threshold> <input_file_path> <betweenness_output_file_path>
<community_output_file_path>
Scala:
spark-submit --packages graphframes:graphframes:0.8.2-spark3.1-s_2.12 –-class task1 hw4.jar <filter
threshold> <input_file_path> <community_output_file_path>
spark-submit –-class task2 hw4.jar <filter threshold> <input_file_path>
<betweenness_output_file_path> <community_output_file_path>
Input parameters:
1. <filter threshold>: the filter threshold to generate edges between user nodes.
2. <input file path>: the path to the input file including path, file name and extension.
3. <betweenness output file path>: the path to the betweenness output file including path, file name
and extension.
4. <community output file path>: the path to the community output file including path, file name and
extension.
Execution time:
The overall runtime limit of your task1 (from reading the input file to finishing writing the community
output file) is 400 seconds.
The overall runtime limit of your task 2 (from reading the input file to finishing writing the community
output file) is 400 seconds.

If your runtime exceeds the above limit, there will be no point for this task.

5. About Vocareum
a. Dataset is under the directory $ASNLIB/publicdata/, jar package is under $ASNLIB/public/
b. You should upload the required files under your workspace: work/, and click submit
c. You should test your scripts on both the local machine and the Vocareum terminal before
submission.
d. During the submission period, the Vocareum will automatically test task1 and task2.
e. During the grading period, the Vocareum will use another dataset that has the same format for
testing.
f. We do not test the Scala implementation during the submission period.
g. Vocareum will automatically run both Python and Scala implementations during the grading period.
h. Please start your assignment early! You can resubmit any script on Vocareum. We will only grade on
your last submission.

6. Grading Criteria
(% penalty = % penalty of possible points you get)
1. You can use your free 5-day extension separately or together
a. Late Day Form
b. This form will record the number of late days you use for each assignment. We will not
count late days if no request is submitted. Remember to submit the request BEFORE
the deadline.
2. There will be a 10% bonus if you use both Scala and Python.
3. We will combine all the code we can find from the web (e.g., Github) as well as other students’ code
from this and other (previous) sections for plagiarism detection.
4. All submissions will be graded on the Vocareum. Please strictly follow the format provided, otherwise
you can’t get the point even though the answer is correct.
5. If the outputs of your program are unsorted or partially sorted, there will be a 50% penalty.
6. We can regrade your assignments within seven days once the scores are released. No argument after
one week.
7. There will be a 20% penalty for late submission within a week and no point after a week.
8. Only when your results from Python are correct, the bonus of using Scala will be calculated. There is no
partial point for Scala.

7. Common problems causing fail submission on Vocareum/FAQ

(If your program runs seems successfully on your local machine but fail on Vocareum, please check
these)
1. Try your program on Vocareum terminal. Remember to set python version as python3.6,

Use the latest Spark

/opt/spark/spark-3.1.2-bin-hadoop3.2/bin/spark-submit

Select JDK 8 by running the command

"export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64"
2. Check the input command line formats.
3. Check the output formats, for example, the headers, tags, typos.
4. Check the requirements of sorting the results.
5. Your program scripts should be named as task1.py task2.py etc.
6. Check whether your local environment fits the assignment description, i.e. version, configuration.
7. If you implement the core part in python instead of spark, or implement it with a high time
complexity (e.g. search an element in a list instead of a set), your program may be killed on the
Vocareum because it runs too slow.
8. You are required to only use Spark RDD in order to understand Spark operations more deeply. You
will not get any points if you use Spark DataFrame or DataSet. Don’t import sparksql.
9. Do not use Vocareum for debugging purposes, please debug on your local machine. Vocareum can
be very slow if you use it for debugging.
10. Vocareum is reliable in helping you to check the input and output formats, but its function on
checking the code correctness is limited. It can not guarantee the correctness of the code even with
a full score in the submission report.
11. Some students encounter an error like: the output rate …. has exceeded the allowed
value ….bytes/s; attempting to kill the process.
To resolve this, please remove all print statements and set the Spark logging level such that it
limits the logs generated - that can be done using sc.setLogLevel . Preferably, set the log level to
either WARN or ERROR when submitting your code.

SANS322
No ratings yet
SANS322
13 pages
EC135 Autopilot July 2004
100% (5)
EC135 Autopilot July 2004
105 pages
C & C++ Interview Questions You'll Most Likely Be Asked
From Everand
C & C++ Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Environmental Modelling - Finding Simplicity in Complexity PDF
No ratings yet
Environmental Modelling - Finding Simplicity in Complexity PDF
432 pages
JtdmoMJK64 hw4
No ratings yet
JtdmoMJK64 hw4
10 pages
23MCB0003 Sna 04
No ratings yet
23MCB0003 Sna 04
15 pages
200104092_DA_4
No ratings yet
200104092_DA_4
14 pages
Q1: Conference Reviewing (20 PTS, 5 Pts Each) : M M M (I) (J) I J J I M (I) (J) - 1
No ratings yet
Q1: Conference Reviewing (20 PTS, 5 Pts Each) : M M M (I) (J) I J J I M (I) (J) - 1
9 pages
Image Classification: Step-by-step Classifying Images with Python and Techniques of Computer Vision and Machine Learning
From Everand
Image Classification: Step-by-step Classifying Images with Python and Techniques of Computer Vision and Machine Learning
Mark Magic
No ratings yet
HomeWork2 Tutorial
No ratings yet
HomeWork2 Tutorial
16 pages
Section 5
No ratings yet
Section 5
21 pages
Assignment 1_553
No ratings yet
Assignment 1_553
8 pages
All Exp Lab
No ratings yet
All Exp Lab
15 pages
R Fast Track Guide - 86 Key Points Every Programmer from Other Languages Should Master
From Everand
R Fast Track Guide - 86 Key Points Every Programmer from Other Languages Should Master
Ginno
No ratings yet
Ian Talks Python A-Z
From Everand
Ian Talks Python A-Z
Ian Eress
No ratings yet
SAP interface programming with RFC and VBA: Edit SAP data with MS Access
From Everand
SAP interface programming with RFC and VBA: Edit SAP data with MS Access
Karl Josef Hensel
No ratings yet
Learn R By Coding
From Everand
Learn R By Coding
Thomas Kurnicki
No ratings yet
Section 8
No ratings yet
Section 8
19 pages
Cisco Packet Tracer Implementation: Building and Configuring Networks: 1, #1
From Everand
Cisco Packet Tracer Implementation: Building and Configuring Networks: 1, #1
S. R. Jena
No ratings yet
PHP Package Mastery: 100 Essential Tools in One Hour - 2024 Edition
From Everand
PHP Package Mastery: 100 Essential Tools in One Hour - 2024 Edition
Kanto
No ratings yet
ML Clustering
No ratings yet
ML Clustering
5 pages
Data Science 5th Assignment
No ratings yet
Data Science 5th Assignment
13 pages
Graph Handout
No ratings yet
Graph Handout
13 pages
I210277 I210461 ProjectProposal
No ratings yet
I210277 I210461 ProjectProposal
8 pages
DS 3002 Project Proposal template
No ratings yet
DS 3002 Project Proposal template
5 pages
Dart for Flutter
From Everand
Dart for Flutter
Zeuz IT
No ratings yet
Social Network Analysis
No ratings yet
Social Network Analysis
20 pages
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
From Everand
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
Vladimir Kiselev
No ratings yet
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
Lecture 4 - Analyzing Massive Graphs Part I
No ratings yet
Lecture 4 - Analyzing Massive Graphs Part I
27 pages
Week2 Lab and Assessment
No ratings yet
Week2 Lab and Assessment
7 pages
Modern C++23 QuickStart Pro: Advanced programming including variadic templates, lambdas, async IO, multithreading and thread sync
From Everand
Modern C++23 QuickStart Pro: Advanced programming including variadic templates, lambdas, async IO, multithreading and thread sync
Jarek Thalor
No ratings yet
Modern C++23 QuickStart Pro
From Everand
Modern C++23 QuickStart Pro
Jarek Thalor
No ratings yet
5150 Fall 2021 Project+List Russel
No ratings yet
5150 Fall 2021 Project+List Russel
3 pages
UE709 Network Algorithms PROJECT
No ratings yet
UE709 Network Algorithms PROJECT
2 pages
Week 10-11
No ratings yet
Week 10-11
9 pages
Graph Algorithm
No ratings yet
Graph Algorithm
10 pages
Digital Engineering: Complex System Design
From Everand
Digital Engineering: Complex System Design
S Mathioudakis
No ratings yet
Assignment
No ratings yet
Assignment
2 pages
Dy Ai Rec
No ratings yet
Dy Ai Rec
24 pages
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
121A1114_D2_SMA_EXP4
No ratings yet
121A1114_D2_SMA_EXP4
5 pages
HW4_Handout (1)
No ratings yet
HW4_Handout (1)
5 pages
Graph Theory and its Applications_ What Can Graphs Do for Your Software_ _ by Héla Ben Khalfallah _ Sep, 2024 _ ITNEXT
No ratings yet
Graph Theory and its Applications_ What Can Graphs Do for Your Software_ _ by Héla Ben Khalfallah _ Sep, 2024 _ ITNEXT
52 pages
Mastering Data Structures and Algorithms in Python & Java
From Everand
Mastering Data Structures and Algorithms in Python & Java
Sachin Naha
No ratings yet
Spark Using Python
No ratings yet
Spark Using Python
28 pages
C Programming for the Pc the Mac and the Arduino Microcontroller System
From Everand
C Programming for the Pc the Mac and the Arduino Microcontroller System
Peter D Minns
No ratings yet
Learning PyTorch 2.0, Second Edition: Utilize PyTorch 2.3 and CUDA 12 to experiment neural networks and deep learning models
From Everand
Learning PyTorch 2.0, Second Edition: Utilize PyTorch 2.3 and CUDA 12 to experiment neural networks and deep learning models
Matthew Rosch
No ratings yet
Learning PyTorch 2.0, Second Edition
From Everand
Learning PyTorch 2.0, Second Edition
Matthew Rosch
No ratings yet
1745064423339-Coders_of_Delhi
No ratings yet
1745064423339-Coders_of_Delhi
12 pages
1dt907_l8
No ratings yet
1dt907_l8
176 pages
dsaAssignment3FALL2024
No ratings yet
dsaAssignment3FALL2024
3 pages
Lecture 1_Introduction
No ratings yet
Lecture 1_Introduction
124 pages
Understanding Networks Through Clustering
No ratings yet
Understanding Networks Through Clustering
5 pages
Graph Lecture - G5 - No Code
No ratings yet
Graph Lecture - G5 - No Code
78 pages
DESIGN ALGORITHMS TO SOLVE COMMON PROBLEMS: Mastering Algorithm Design for Practical Solutions (2024 Guide)
From Everand
DESIGN ALGORITHMS TO SOLVE COMMON PROBLEMS: Mastering Algorithm Design for Practical Solutions (2024 Guide)
ARCHER PAUL
No ratings yet
Report MathProject
No ratings yet
Report MathProject
23 pages
Learning OpenCV 3 Application Development
From Everand
Learning OpenCV 3 Application Development
Samyak Datta
No ratings yet
MMD_Hw2
No ratings yet
MMD_Hw2
2 pages
Essential Algorithms: A Practical Approach to Computer Algorithms
From Everand
Essential Algorithms: A Practical Approach to Computer Algorithms
Rod Stephens
4.5/5 (2)
AI Networks - Ultra Series - Research 00z0021
No ratings yet
AI Networks - Ultra Series - Research 00z0021
5 pages
Projects With Microcontrollers And PICC
From Everand
Projects With Microcontrollers And PICC
Guillermo Perez Guillen
5/5 (1)
Coordinate Converter
No ratings yet
Coordinate Converter
13 pages
HMKT100 1 Jan Jun2024 SA2 DA V2 24082023 1
No ratings yet
HMKT100 1 Jan Jun2024 SA2 DA V2 24082023 1
9 pages
Subject Verb Agreement
No ratings yet
Subject Verb Agreement
11 pages
Cost Concepts and Classification
No ratings yet
Cost Concepts and Classification
3 pages
Telecom Design Criteria: Arvandan Oil and Gas Company
100% (1)
Telecom Design Criteria: Arvandan Oil and Gas Company
13 pages
LT 9
No ratings yet
LT 9
2 pages
Errata Sheet For "The B-52 Competition of 1946... and Dark Horses From Douglas, 1947-1950 (The American Aerospace Archive 3) "
No ratings yet
Errata Sheet For "The B-52 Competition of 1946... and Dark Horses From Douglas, 1947-1950 (The American Aerospace Archive 3) "
5 pages
PSP Module 5 Notes
No ratings yet
PSP Module 5 Notes
25 pages
Datasheet - O3 Modulating Valve TF243-20160809
No ratings yet
Datasheet - O3 Modulating Valve TF243-20160809
5 pages
Dipti Rach - Career Profile PDF
No ratings yet
Dipti Rach - Career Profile PDF
2 pages
One Word (100) Substitutes For BCS, Bank Job PDF
100% (3)
One Word (100) Substitutes For BCS, Bank Job PDF
5 pages
Strategic Human Resource Management in The Afghanistan Government
No ratings yet
Strategic Human Resource Management in The Afghanistan Government
7 pages
ZABAG Telescopic Sliding Gate KOMBI TZ
No ratings yet
ZABAG Telescopic Sliding Gate KOMBI TZ
2 pages
Unilever
No ratings yet
Unilever
13 pages
MODULE 5. Lesson Proper 1
No ratings yet
MODULE 5. Lesson Proper 1
5 pages
Buffer and Process Liquids Flyer CY33366 15jun23 FL
No ratings yet
Buffer and Process Liquids Flyer CY33366 15jun23 FL
2 pages
Design of Model Anganwadi Building
No ratings yet
Design of Model Anganwadi Building
19 pages
LAPORAN PRAKTIKUM Marisko Yudistira, Pengolahan Citra
No ratings yet
LAPORAN PRAKTIKUM Marisko Yudistira, Pengolahan Citra
10 pages
The Dance of Empathy - Empathy, Diversity, Technical Eclecticism - Bohart, Arthur C. & Rosenbaum, Robert
100% (1)
The Dance of Empathy - Empathy, Diversity, Technical Eclecticism - Bohart, Arthur C. & Rosenbaum, Robert
24 pages
Statics Module 6
No ratings yet
Statics Module 6
8 pages
The Beauty of Serendipity Embracing The Unexpected Main First
No ratings yet
The Beauty of Serendipity Embracing The Unexpected Main First
3 pages
Search: Upload
No ratings yet
Search: Upload
8 pages
Radiant International School, Patna: CLASS: VII (Social Science)
No ratings yet
Radiant International School, Patna: CLASS: VII (Social Science)
2 pages
ECE 2nd Year
No ratings yet
ECE 2nd Year
329 pages
CPG Early Management of Head Injury in Adults
No ratings yet
CPG Early Management of Head Injury in Adults
84 pages
Quo 226 PNC Cv. Citra Pandugo
No ratings yet
Quo 226 PNC Cv. Citra Pandugo
1 page
AI Handout 2025
No ratings yet
AI Handout 2025
97 pages

Assignment4 - Fall 2024_553_dsci

Uploaded by

Assignment4 - Fall 2024_553_dsci

Uploaded by

DSCI-553 Foundations and Applications of Data Mining

1. Overview of the Assignment

2.2 Programming Environment

2.3 Write your own code

2.4 What you need to turn in

4.2 Task1: Community Detection Based on GraphFrames (2 pts)

4.2.1 Execution Detail

4.2.2 Output Result

Figure 1: community output file format

4.3 Task 2: Community Detection Based on Girvan-Newman algorithm (5 pts)

Figure 2: betweenness output file format

4.3.2 Community Detection (3 pts)

4.4 Execution Format

7. Common problems causing fail submission on Vocareum/FAQ

Use the latest Spark

Select JDK 8 by running the command

You might also like