0% found this document useful (0 votes)
89 views9 pages

Its665 Isp565 Group Project Mac2024

rubric isp565
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views9 pages

Its665 Isp565 Group Project Mac2024

rubric isp565
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

ISP565/ITS665/2024

GROUP PROJECT TASK


ISP565/ITS665 DATA MINING

Contents
DESCRIPTION OF TASK .................................................................................................................... 2
A1. Data Acquisition and Data Understanding ................................................................................... 2
A2. Data preparation – selection and cleaning ................................................................................... 2
A3. Data preparation – transformation .............................................................................................. 2
A4. Data preparation- reduction......................................................................................................... 3
B. Model Development and Evaluation .............................................................................................. 3
GROUP PROJECT PRESENTATION AND SUBMISSION ............................................................ 5
Presentation ........................................................................................................................................ 5
Submission .......................................................................................................................................... 5
PROJECT TIMELINE.......................................................................................................................... 6
REFERENCE for RUBRIC .................................................................................................................. 7
GROUP INFORMATION .................................................................................................................... 9

1
Prepared by : Sofi M/SAR
ISP565/ITS665/2024

DESCRIPTION OF TASK
A1. Data Acquisition and Data Understanding
1. Search and select a dataset depending on your interests. It should contain enough
instances (at least 3000), attributes with at least 20 attributes up to 30 attributes
including the class label, should contain a good mix of numeric and nominal attributes
and if possible the dataset has some missing values. (If there is no missing values, than
you need to perform other relevant processes).
2. Describe about your project problem, data and the source of dataset.
3. Each group is required to develop one method only – classification. (If your group is
interested to do clustering, please refer to your lecturer).

Task in A2-A3-A4 is for data understanding, preparation and reduction.


Meanwhile, task in B is for model development and evaluation.
Use WEKA tool to perform each of the task. You might need other tool such as Excel or other.
The relevant attributes should be within 10-15 attributes.

A2. Data preparation – selection and cleaning


Data selection refers to choose the most relevant attributes or samples. Meanwhile, data
cleaning can be accomplished by applying filters to the data in the Preprocess tab.
1. Start with the Preprocess tab. Study numeric attributes. Give the mean, min & max and
its standard deviation.
2. Study the nominal attributes and report the values of each attribute and the count of
each.
3. Identify the attributes with missing/noise values. Remove the missing/noise values with
the method of your choice using WEKA, explaining which filter you are using and why you
make this choice.
4. Identify the attributes with outliers. Investigate the method for detecting outliers. Are
there any outliers in this dataset, and if yes, describe how you deal with these outliers.
5. Save the cleaned dataset into file-cleaned.arff. Show several samples in the dataset, at
least the first 20 rows of this dataset – with the columns.

A3. Data preparation – transformation


Among the different data transformation techniques, explore those available through the
WEKA filters in the Preprocess tab. Study the following data transformation and report which
you have applied:
1. Perform normalization when necessary. Explain which filter that you applied; Min-max
normalization, Z-score normalization or Decimal normalization and provide detailed
information the method of your choice – state which one you choose and why.
*You may not require to normalize all attributes. Explain why you do the normalization on
the attributes.
2. Perform discretization when necessary. Which attributes and how many bins have you
implemented? Explain.
3. Perform attribute construction when necessary – for example adding an attribute
representing the sum of two other ones. Which Weka filter permits to do this? Show the
attribute if you applied it.
4. Perform other specific processes when necessary and explain.
5. Save the prepared dataset into file_processed.arff. Show the samples, at least the first
20 rows of this dataset – with all columns.

2
Prepared by : Sofi M/SAR
ISP565/ITS665/2024

A4. Data preparation- reduction


This task should be done after you have run the model in part B using relevant
attributes from the previous steps in A1-A2-A3.
Usually, data mining datasets are too large to process directly. Reduction can be done on the
attributes (Select attribute) and also on the samples (Sampling). In this project, you have to
apply Select Attribute.
Reduce the dataset through Select Attribute, using suitable method.
1. Explain your reduced dataset.
2. Compare the results based on several set of features, preferably with more than one
different set.
3. Save reduced dataset into file_reduced.arff, and paste a screenshot showing at least the
first 20 rows of this dataset – with all columns.

B. Model Development and Evaluation

By default, each group requires to develop classification model. Apply an algorithm under
selected study using your dataset. Present the outcome of the project. Each member has to
elaborate his/her role/contribution for groupwork.
(If your group is interested to do clustering, please refer to your lecturer).
METHOD ACTIVITIES AND EVALUATION

1. Perform all tasks in steps A1-A3.


CLASSIFICATION
2. Testing your results over the training data on the following options;
i. Cross validation with different folds, k = 10 and k = 20.
ii. Percentage split (70:30); where 70 is the percentage of training
dataset and (80:20); where 80 is the percentage of training dataset.
Discuss every result.

3. Generate the tree visualizer of best result from cross validation and
percentage split.
4. Apply reduction steps in A4. Report the reduction method that you have
applied.
5. Repeat step 2-3 on the reduction datasets.
6. Compare the evaluation results of processed dataset and after the
dataset is reduced using graph (Excel). Explain your results with the help
of the graph (Excel).
OPTIONAL: 1. Perform all tasks in steps A1-A3.
CLUSTERING
2. Solve the problem using the clustering algorithm in WEKA. Evaluate
three different numbers of clusters by investigating the errors (says, k =
{3, 4, 5, 6,….}) with following options;
i. Euclidean distance
ii. Manhattan distance
Can you find the best number of clusters? Discuss.
3. Visualize the clusters using appropriate scatter plots and graphs. Explain
with the help of graphs / Excel.
4. Apply reduction steps in A4. Report the reduction method that you have
applied.
5. Repeat step 1-3 on the reduction datasets.

6. Compare result between processed and reduced datasets. Explain the


differences of generated clusters, with the help of graphs.

3
Prepared by : Sofi M/SAR
ISP565/ITS665/2024

FLOW OF THE TASKS IN THE PROJECT

Data Preparation/Pre-
processing

Dataset

Relevant features Reduction -selected


A1-A2-A3 attributes
(A1-A2-A3-A4)
(file_processsed)
(file_reduced)

Model development Model development


using DM algorithm using DM algorithm

Model Evaluation Model Evaluation

Tree Description

4
Prepared by : Sofi M/SAR
ISP565/ITS665/2024

About the group project task


1. This is a group task of 5 members to perform data mining project, present the outcome
and submit the slides and datasets/models.
2. Finding the right data would be the most tedious task, please spend time in this task.
Confirm your datasets with lecturer. No group can use the same dataset or existing
data in WEKA folder, following the basis of FCFS. The delay in finding the dataset and
getting the approval will delay your work.

Useful link to data repositories containing multiple datasets to choose from:


● http://www.kaggle.com/
● UCI ML Data Repository http://archive.ics.uci.edu/ml/datasets.html (use the recent from 2015
onwards)
● Any related portals

GROUP PROJECT PRESENTATION AND


SUBMISSION
Presentation
1. List all the members and picture of each member (with name) in the slide.
2. The presentation is suggested in week 13-14.
3. Options:
a. recorded video
b. live presentation
4. Each group is given a maximum of 30 minutes slots:
a. Presentation/voice over should be within maximum 15 minutes time, best is in 10
minutes. Avoid to put many text in the slides OR avoid to read each of the word.
Describe briefly.
b. 10 – 15 minutes for Q&A. Each group member has to be presence OR voice in
slide during the slot with CAMERA ON and answer the question properly.

5. During the session, make sure you have your slide, WEKA and the datasets/models on
the desktop sharing.

Submission
1. Presentation slide - contains all the results as required, list the group members and
pictures in the first slide. Reduce repetition or share common processes.
2. The complete dataset: the original dataset, in .CSV/ARFF format including preprocessed
dataset, cleaned, normalized, processed, reduced, train and test datasets etc.
3. Model of the experiments (in WEKA format).
4. Articles for the project.
• Upload in the MSTEAMS according to group, with naming format:
• groupID_datafilename_leadername
• e.g. CS2434A_soccerdata_ali

5
Prepared by : Sofi M/SAR
ISP565/ITS665/2024

PROJECT TIMELINE

2024

ACTIVITIES WEEK 7 WEEK 8 WEEK 9 WEEK 10 WEEK 11 WEEK 12 WEEK 13 WEEK 14

Understanding and
Information Gathering

Data Preparation

Model Development

Analysis and Results

Delivery of Output
(Presentation)

6
Prepared by : Sofi M/SAR
ISP565/ITS665/2024

REFERENCE for RUBRIC


*Subject to change

DATA (CLO3-C5
1 - 2 (Very
PREPARATION /Digital Skill/ Subattribute Level 3 - 4 (Weak) 5 - 6 (Fair) 7 - 8 (Good) 9-10 (Very Good)
Weak)
(A1-A2-A3) PLO12-eres)

Able to partially Able to partially Able to identify and


Not able to
identify the basic identify the Able to identify the provide explanation
Identifying techniques for data identify the
Problem Initial phase of techniques for data techniques for techniques for data of data preparation
preparation in doing Data Mining techniques for
Identification study preparation (e.g. data preparation preparation with techniques very
project data
data collection, data with basic clear explanation clearly and
preparation
selection) explanation accurately

Able to organise
and analyse the Able to organise Able to organise and
Not able to data preparation and analyse the analyse the data
analyse the Finds difficulty in results but does data preparation preparation results
Analysing the data preparation results Initial phase of
Analysis data analysing the data not clearly results and clearly and very clearly
with DM technology/tools study
preparation preparation results analyse the analyse the factors analyse the factors
results factors that that contribute to that contribute to the
contribute to the the problem problem
problem

7
Prepared by : Sofi M/SAR
ISP565/ITS665/2024

MODEL
DEVELOPMENT (CLO3-C5 / 1 - 2 (Very
Subattribute Level 3 - 4 (Weak) 5 - 6 (Fair) 7 - 8 (Good) 9-10 (Very Good)
(After A1-A2-A3-A4 PLO12) Weak)
& B)

Able to develop
Able to develop the Able to develop the
the data mining
Not able to data mining models data mining models
Finds difficulty in models but does
Applying DM algorithms for model Middle phase of develop the and clearly explain and very clearly
Application developing the data not clearly
building in the selected dataset the project data mining the explain the
mining models explain the
model factors/parameters factors/parameters
factors/parameters
that involved that involved
that involved

Able to develop
the data mining Able to develop the Able to develop the
Not able to
Finds difficulty in models on the data mining models data mining models
develop the
developing the data reduction dataset on the reduction on the reduction
Applying DM algorithms for model Middle phase of data mining
Application mining models with but does not dataset and dataset and very
building in the reduced dataset the project model on the
the reduction clearly explain clearly explain the clearly explain the
reduced
dataset the factors/parameters factors/parameters
dataset
factors/parameters that involved that involved
that involved

MODEL (CLO3-C5 / 1 - 2 (Very


Subattribute Level 3 - 4 (Weak) 5 - 6 (Fair) 7 - 8 (Good) 9-10 (Very Good)
EVALUATION (B) PLO12) Weak)

Not able to Able to evaluate Able to evaluate and


Able to evaluate
explain the and visualize (e.g. visualize (e.g.
Able to explain the the data mining
results of Decision Tree Decision Tree
results of model models but does
Final phase of model model) the data model) the data
Evaluating the results of DM models Evaluation evaluation not clearly
the project evaluation mining models and mining models and
(accuracy, explain the
(accuracy, clearly explain the clearly explain the
precision, recall etc) comparison of
precision, comparison of the comparison of the
the models
recall etc) models models

8
Prepared by : Sofi M/SAR
ISP565/ITS665/2024

GROUP INFORMATION
Find the form in MSTEAM.

Group Group
Student ID Project Title Dataset Link
number Members

CS2594A-1

CS2594A-2

CS2594A-3

CS2594B-1

CS2594B-2

CS2594C-1

CS2594C-1

9
Prepared by : Sofi M/SAR

You might also like