Cia 3
Cia 3
SUBMITTED BY
Abalone is a sort of shellfish that is highly common. Their flesh is prized as a delicacy, and their
shells are frequently used in jewellery. The topic of assessing the age of abalone based on its
physical properties is addressed in this paper. Alternative techniques of estimating their age are
time-consuming. Therefore this subject is of interest. Depending on the species, abalone can live
up to 50 years. Environmental elements such as water flow and wave activity play a significant
role in how quickly they grow. Those from protected waters often develop more slowly than
those from exposed reef areas due to differences in food availability. Estimating the age of
abalone is challenging because to the fact that their size is determined not only by their age, but
also by the availability of food. Furthermore, abalone can develop so-called 'stunted' populations,
which have substantially distinct growth characteristics than other abalone populations. The
abalone age prediction problem has been classified as a classification problem in most of the
research on the dataset, which entails assigning a label to each case in the dataset. In this case,
the label represents the abalone's ring count, which is an actual quantity. As a result, the
classifier will be unable to distinguish between many classes and will perform insignificantly.
The age of abalone has a positive correlation with its price. However, identifying an abalone's
age is a time-consuming operation. As the abalone matures, rings form in its inner shell,
generally at a pace of one ring per year. Cutting the shell of an abalone allows access to the rings.
A lab technician examines a shell sample under a microscope and counts the rings after polishing
and staining them.
PROBLEM STATEMENT
Abalones are endangered marine snails that are found within the cold coastal waters worldwide,
majorly being distributed off the coasts of recent Zealand, African nation, Australia, Western
North America, and Japan. Abalones are sea snails or molluscs otherwise commonly called as
ear shells or sea ears. due to the economic importance of the age of the abalone and therefore the
cumbersome process that's involved in calculating it, much research has been done to resolve the
problem of abalone age prediction using its physical measurements available within the dataset.
TARGET FEATURE
The abalone rings are the target feature. It is an integer that describes the age of abalone;
multiply the number of rings by 1.5 to get the age in years.
OVERVIEW
CLASSIFICATION
We'll use four classifiers to classify the data: random forest, decision tree, KNN and SVM. We'll
also figure out which parameters are best for each classifier. We don't utilise cross validation to
find the optimal parameter because there are numerous objectives with a total of 1. We utilise the
simple grid search strategy to find the optimal parameter for each classifier.
RANDOM FOREST
Random Forest is a type of ensemble learning technique that creates a large number of decision
trees during training. For classification problems, it predicts the mode of the classes, and for
regression tasks, it predicts the mean of trees. During tree construction, it employs the random
subspace approach and bagging. It comes with a built-in feature importance indicator.
> str(datarf)
$ Diameter : num [1:4177] 0.365 0.265 0.42 0.365 0.255 0.3 0.415 0.425 0.37 0.44 ...
$ Height : num [1:4177] 0.095 0.09 0.135 0.125 0.08 0.095 0.15 0.125 0.125 0.15 ...
$ Whole weight : num [1:4177] 0.514 0.226 0.677 0.516 0.205 ...
$ Shucked weight: num [1:4177] 0.2245 0.0995 0.2565 0.2155 0.0895 ...
$ Viscera weight: num [1:4177] 0.101 0.0485 0.1415 0.114 0.0395 ...
$ Shell weight : num [1:4177] 0.15 0.07 0.21 0.155 0.055 0.12 0.33 0.26 0.165 0.32 ...
- attr(*, "spec")=
.. cols(
.. S = col_double(),
.. Sex = col_character(),
.. Length = col_double(),
.. D = col_double(),
.. Diameter = col_double(),
.. H = col_double(),
.. Height = col_double(),
.. R = col_double(),
.. Rings = col_double(),
.. W = col_double()
.. )
- attr(*, "problems")=<externalptr>
>
> table(datarf$D)
0 1
1116 3061
> set.seed(123)
>
> library(randomForest)
> install.packages("randomForest")
> set.seed(222)
> print(rf)
Call:
randomForest(formula = D ~ W, data = train, ntree = 300, mtry = 8, importance =
TRUE, proximity = TRUE)
Confusion matrix:
0 1 class.error
1 1 2137 0.0004677268
> attributes(rf)
$names
$class
> rf$confusion
0 1 class.error
0 548 232 0.2974358974
1 1 2137 0.0004677268
> library(caret)
> head(p1)
123456
111111
Levels: 0 1
> head(train$D)
[1] 1 1 0 1 1 1
Levels: 0 1
Reference
Prediction 0 1
0 548 1
1 232 2137
Accuracy : 0.9202
Kappa : 0.775
Sensitivity : 0.7026
Specificity : 0.9995
Prevalence : 0.2673
'Positive' Class : 0
> head(p2)
123456
010111
Levels: 0 1
> head(test$D)
[1] 0 1 0 1 1 1
Levels: 0 1
Reference
Prediction 0 1
0 223 0
1 113 923
Accuracy : 0.9102
Kappa : 0.7432
Specificity : 1.0000
Prevalence : 0.2669
'Positive' Class : 0
> plot(rf)
> hist(treesize(rf), main = "No. of nodes for the Trees", col = "green")
> varImpPlot(rf)
varImpPlot(rf, sort=T, n.var=10, main="Top 10 - Variable Importance")
> importance(rf)
0 1 MeanDecreaseAccuracy MeanDecreaseGini
> varUsed(rf)
[1] 300
partialPlot(rf,train,Height"1")
> getTree(rf, 1, labelVar = TRUE)
left daughter right daughter split var split point status prediction
1 2 3 W 0.5 1 <NA>
2 0 0 <NA> 0.0 -1 0
3 0 0 <NA> 0.0 -1 1
KNN is a Supervised Learning algorithm that predicts the output of data points using a labelled
input data set.It is one of the most basic Machine Learning algorithms, and it may be used to
solve a wide range of issues. It is primarily based on resemblance of features. KNN compares a
data point's similarity to that of its neighbour and assigns it to the most similar class.KNN is a
non-parametric model, which means it makes no assumptions about the data set, unlike most
algorithms. Because the algorithm can now handle realistic data, it becomes more effective.KNN
is a lazy algorithm, which implies that instead of learning a discriminative function from the
training data, it memorises it. Both classification and regression problems can be solved with
KNN.
> set.seed(1234)
> set.seed(222)
SVM
Support vector machines (SVMs) are supervised learning models with related learning
algorithms for classification and regression analysis in machine learning. It's primarily used to
solve categorization challenges. Each data item is displayed as a point in n-dimensional space
(where n is the number of features), with the value of each feature being the value of a specific
coordinate in this algorithm. The hyper-plane that best distinguishes the two classes is then used
to classify the data. SVMs may also conduct non-linear classification, implicitly translating their
inputs into high-dimensional feature spaces, in addition to linear classification.
> View(abalone)
> data(abalone)
> data(Abalone)
> str(abalone)
$ Length : num [1:4177] 0.455 0.35 0.53 0.44 0.33 0.425 0.53 0.545 0.475 0.55 ...
$ Diameter : num [1:4177] 0.365 0.265 0.42 0.365 0.255 0.3 0.415 0.425 0.37 0.44 ...
$ Height : num [1:4177] 0.095 0.09 0.135 0.125 0.08 0.095 0.15 0.125 0.125 0.15 ...
$ Whole weight : num [1:4177] 0.514 0.226 0.677 0.516 0.205 ...
$ Shucked weight: num [1:4177] 0.2245 0.0995 0.2565 0.2155 0.0895 ...
$ Viscera weight: num [1:4177] 0.101 0.0485 0.1415 0.114 0.0395 ...
$ Shell weight : num [1:4177] 0.15 0.07 0.21 0.155 0.055 0.12 0.33 0.26 0.165 0.32 ...
> library(ggplot2)
> library(e1071)
> mymodel <- svm(D~., data=abalone)
> summary(mymodel)
Call:
Parameters:
SVM-Type: eps-regression
SVM-Kernel: radial
cost: 1
gamma: 0.06666667
epsilon: 0.1
> library(predtoolsTS)
Warning message:
> tab
Actual
Predicted 0 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 0.055
-0.428323797892874 0 0 0 0 0 0 0 0 0 0 0
-0.401113458820979 0 0 0 0 0 0 0 0 0 0 0
-0.386433636913436 0 0 0 0 0 0 0 0 0 0 0
-0.368591188430687 0 0 0 0 0 0 0 0 0 0 0
-0.339139940558743 0 0 0 0 0 0 0 0 0 0 0
-0.317581220057663 0 0 0 0 0 0 0 0 0 0 0
-0.202278624410508 0 0 0 0 0 0 0 0 0 0 0
-0.184111808267246 0 0 0 0 0 0 0 0 0 0 0
-0.158358211476489 0 0 0 0 0 0 0 0 0 0 0
-0.148342875783461 0 0 0 0 0 0 0 0 0 0 0
-0.146884610550649 0 0 0 0 0 0 0 0 0 0 0
-0.128110023710106 0 0 0 0 0 0 0 0 0 0 0
-0.124924507377761 0 0 0 0 0 0 0 0 0 0 0
-0.123656899763398 0 0 0 0 0 0 0 0 0 0 0
-0.113946442560223 0 0 0 0 0 0 0 0 0 0 0
-0.113028091306822 0 0 0 0 0 0 0 0 0 0 0
-0.101605807977074 0 0 0 0 0 0 0 0 0 0 0
-0.0917080027028029 0 0 0 0 0 0 0 0 0 0 0
-0.0794593044619388 0 0 0 0 0 0 0 0 0 0 0
Actual
Predicted 0.06 0.065 0.07 0.075 0.08 0.085 0.09 0.095 0.1 0.105 0.11
-0.428323797892874 0 0 0 0 0 0 0 0 0 0 1
-0.401113458820979 0 0 0 0 0 1 0 0 0 0 0
-0.386433636913436 0 0 0 0 0 0 0 0 0 0 0
-0.368591188430687 0 0 0 0 0 0 0 0 1 0 0
-0.339139940558743 0 0 0 0 0 0 0 0 1 0 0
-0.317581220057663 0 0 0 0 0 1 0 0 0 0 0
-0.202278624410508 0 0 0 0 0 0 1 0 0 0 0
-0.184111808267246 0 0 0 0 0 0 0 0 0 1 0
-0.158358211476489 0 0 0 0 0 0 0 0 0 0 0
-0.148342875783461 0 0 0 0 0 0 0 0 0 0 0
-0.146884610550649 0 0 0 0 0 0 0 1 0 0 0
-0.128110023710106 0 0 0 0 0 0 0 0 1 0 0
-0.124924507377761 0 0 0 0 0 0 0 0 0 1 0
-0.123656899763398 0 0 0 0 0 0 0 0 0 0 1
-0.113946442560223 0 0 0 0 0 0 0 0 0 1 0
-0.113028091306822 0 0 0 0 0 0 0 1 0 0 0
-0.101605807977074 0 0 0 0 0 0 0 0 0 1 0
-0.0917080027028029 0 0 0 0 0 0 0 1 0 0 0
-0.0794593044619388 0 0 0 0 0 0 0 0 0 0 1
Actual
Predicted 0.115 0.12 0.125 0.13 0.135 0.14 0.145 0.15 0.155 0.16
-0.428323797892874 0 0 0 0 0 0 0 0 0 0
-0.401113458820979 0 0 0 0 0 0 0 0 0 0
-0.386433636913436 1 0 0 0 0 0 0 0 0 0
-0.368591188430687 0 0 0 0 0 0 0 0 0 0
-0.339139940558743 0 0 0 0 0 0 0 0 0 0
-0.317581220057663 0 0 0 0 0 0 0 0 0 0
-0.202278624410508 0 0 0 0 0 0 0 0 0 0
-0.184111808267246 0 0 0 0 0 0 0 0 0 0
-0.158358211476489 0 1 0 0 0 0 0 0 0 0
-0.148342875783461 0 0 0 1 0 0 0 0 0 0
-0.146884610550649 0 0 0 0 0 0 0 0 0 0
-0.128110023710106 0 0 0 0 0 0 0 0 0 0
-0.124924507377761 0 0 0 0 0 0 0 0 0 0
-0.123656899763398 0 0 0 0 0 0 0 0 0 0
-0.113946442560223 0 0 0 0 0 0 0 0 0 0
-0.113028091306822 0 0 0 0 0 0 0 0 0 0
-0.101605807977074 0 0 0 0 0 0 0 0 0 0
-0.0917080027028029 0 0 0 0 0 0 0 0 0 0
-0.0794593044619388 0 0 0 0 0 0 0 0 0 0
Actual
Predicted 0.165 0.17 0.175 0.18 0.185 0.19 0.195 0.2 0.205 0.21 0.215
-0.428323797892874 0 0 0 0 0 0 0 0 0 0 0
-0.401113458820979 0 0 0 0 0 0 0 0 0 0 0
-0.386433636913436 0 0 0 0 0 0 0 0 0 0 0
-0.368591188430687 0 0 0 0 0 0 0 0 0 0 0
-0.339139940558743 0 0 0 0 0 0 0 0 0 0 0
-0.317581220057663 0 0 0 0 0 0 0 0 0 0 0
-0.202278624410508 0 0 0 0 0 0 0 0 0 0 0
-0.184111808267246 0 0 0 0 0 0 0 0 0 0 0
-0.158358211476489 0 0 0 0 0 0 0 0 0 0 0
-0.148342875783461 0 0 0 0 0 0 0 0 0 0 0
-0.146884610550649 0 0 0 0 0 0 0 0 0 0 0
-0.128110023710106 0 0 0 0 0 0 0 0 0 0 0
-0.124924507377761 0 0 0 0 0 0 0 0 0 0 0
-0.123656899763398 0 0 0 0 0 0 0 0 0 0 0
-0.113946442560223 0 0 0 0 0 0 0 0 0 0 0
-0.113028091306822 0 0 0 0 0 0 0 0 0 0 0
-0.101605807977074 0 0 0 0 0 0 0 0 0 0 0
-0.0917080027028029 0 0 0 0 0 0 0 0 0 0 0
-0.0794593044619388 0 0 0 0 0 0 0 0 0 0 0
Actual
-0.428323797892874 0 0 0 0 0 0 0 0
-0.401113458820979 0 0 0 0 0 0 0 0
-0.386433636913436 0 0 0 0 0 0 0 0
-0.368591188430687 0 0 0 0 0 0 0 0
-0.339139940558743 0 0 0 0 0 0 0 0
-0.317581220057663 0 0 0 0 0 0 0 0
-0.202278624410508 0 0 0 0 0 0 0 0
-0.184111808267246 0 0 0 0 0 0 0 0
-0.158358211476489 0 0 0 0 0 0 0 0
-0.148342875783461 0 0 0 0 0 0 0 0
-0.146884610550649 0 0 0 0 0 0 0 0
-0.128110023710106 0 0 0 0 0 0 0 0
-0.124924507377761 0 0 0 0 0 0 0 0
-0.123656899763398 0 0 0 0 0 0 0 0
-0.113946442560223 0 0 0 0 0 0 0 0
-0.113028091306822 0 0 0 0 0 0 0 0
-0.101605807977074 0 0 0 0 0 0 0 0
-0.0917080027028029 0 0 0 0 0 0 0 0
-0.0794593044619388 0 0 0 0 0 0 0 0
> 1-sum(diag(tab))/sum(tab)
[1] 0.9997606
> summary(mymodel)
Call:
Parameters:
SVM-Type: eps-regression
SVM-Kernel: linear
cost: 1
gamma: 1
epsilon: 0.1
> tab
Actual
Predicted 0 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 0.055 0.06
0.0418229150443036 0 1 0 0 0 0 0 0 0 0 0 0
0.0703686005060373 0 0 0 0 0 1 0 0 0 0 0 0
0.0866804207699125 0 0 0 0 0 1 1 0 0 0 0 0
0.0907583758358395 0 0 0 0 0 0 0 1 0 0 0 0
0.0948363309018036 0 0 0 0 0 0 2 0 0 0 0 0
0.102992241033757 0 0 0 0 1 0 0 0 0 0 0 0
0.107070196099725 0 0 0 0 1 0 0 1 0 1 0 0
0.111148151165702 0 0 0 1 2 0 1 0 0 0 0 0
0.11522610623166 0 0 1 1 0 1 0 1 0 1 0 0
0.119304061297647 0 0 0 0 0 0 1 0 0 0 1 0
0.123382016363582 0 0 0 0 0 0 0 3 0 1 1 0
0.12745997142956 0 0 0 0 0 0 1 0 1 1 0 0
0.131537926495509 0 0 0 0 0 0 0 1 2 0 0 0
0.135615881561442 0 0 0 0 0 2 0 1 1 0 0 0
0.139693836627442 0 0 0 0 0 0 0 1 1 1 0 0
0.143771791693361 0 0 0 0 1 0 0 2 0 1 1 1
0.14784974675935 0 0 0 0 0 0 0 0 2 1 1 0
0.151927701825331 0 0 0 0 0 0 0 0 2 2 2 0
0.156005656891238 0 0 0 0 0 1 0 0 0 0 3 1
Actual
Predicted 0.065 0.07 0.075 0.08 0.085 0.09 0.095 0.1 0.105 0.11 0.115
0.0418229150443036 0 0 0 0 0 0 0 0 0 0 0
0.0703686005060373 0 0 0 0 0 0 0 0 0 0 0
0.0866804207699125 0 0 0 0 0 0 0 0 0 0 0
0.0907583758358395 0 0 0 0 0 0 0 0 0 0 0
0.0948363309018036 0 0 0 0 0 0 0 0 0 0 0
0.102992241033757 0 0 0 0 0 0 0 0 0 0 0
0.107070196099725 0 0 0 0 0 0 0 0 0 0 0
0.111148151165702 0 0 0 0 0 0 0 0 0 0 0
0.11522610623166 0 0 0 0 0 0 0 0 0 0 0
0.119304061297647 0 0 0 0 0 0 1 0 0 0 0
0.123382016363582 0 0 0 0 0 0 0 0 0 0 0
0.12745997142956 0 0 0 1 0 0 0 0 0 0 0
0.131537926495509 0 0 0 0 0 0 0 0 0 0 0
0.135615881561442 0 0 0 0 0 0 0 0 0 0 0
0.139693836627442 0 0 0 0 0 0 0 0 0 0 0
0.143771791693361 0 0 0 0 0 0 0 0 0 0 0
0.14784974675935 1 0 0 0 0 0 0 0 0 0 0
0.151927701825331 0 0 0 0 0 0 0 0 0 0 0
0.156005656891238 0 0 0 0 0 0 0 0 0 0 0
Actual
Predicted 0.12 0.125 0.13 0.135 0.14 0.145 0.15 0.155 0.16 0.165 0.17
0.0418229150443036 0 0 0 0 0 0 0 0 0 0 0
0.0703686005060373 0 0 0 0 0 0 0 0 0 0 0
0.0866804207699125 0 0 0 0 0 0 0 0 0 0 0
0.0907583758358395 0 0 0 0 0 0 0 0 0 0 0
0.0948363309018036 0 0 0 0 0 0 0 0 0 0 0
0.102992241033757 0 0 0 0 0 0 0 0 0 0 0
0.107070196099725 0 0 0 0 0 0 0 0 0 0 0
0.111148151165702 0 0 0 0 0 0 0 0 0 0 0
0.11522610623166 0 0 0 0 0 0 0 0 0 0 0
0.119304061297647 0 0 0 0 0 0 0 0 0 0 0
0.123382016363582 0 0 0 0 0 0 0 0 0 0 0
0.12745997142956 0 0 0 0 0 0 0 0 0 0 0
0.131537926495509 1 0 0 0 0 0 0 0 0 0 0
0.135615881561442 0 0 0 0 0 0 0 0 0 0 0
0.139693836627442 0 0 0 0 0 0 0 0 0 0 0
0.143771791693361 0 0 0 0 0 0 0 0 0 0 0
0.14784974675935 0 0 0 0 0 0 0 0 0 0 0
0.151927701825331 0 0 0 0 0 0 0 0 0 0 0
0.156005656891238 0 0 0 0 0 0 0 0 0 0 0
Actual
Predicted 0.175 0.18 0.185 0.19 0.195 0.2 0.205 0.21 0.215 0.22 0.225
0.0418229150443036 0 0 0 0 0 0 0 0 0 0 0
0.0703686005060373 0 0 0 0 0 0 0 0 0 0 0
0.0866804207699125 0 0 0 0 0 0 0 0 0 0 0
0.0907583758358395 0 0 0 0 0 0 0 0 0 0 0
0.0948363309018036 0 0 0 0 0 0 0 0 0 0 0
0.102992241033757 0 0 0 0 0 0 0 0 0 0 0
0.107070196099725 0 0 0 0 0 0 0 0 0 0 0
0.111148151165702 0 0 0 0 0 0 0 0 0 0 0
0.11522610623166 0 0 0 0 0 0 0 0 0 0 0
0.119304061297647 0 0 0 0 0 0 0 0 0 0 0
0.123382016363582 0 0 0 0 0 0 0 0 0 0 0
0.12745997142956 0 0 0 0 0 0 0 0 0 0 0
0.131537926495509 0 0 0 0 0 0 0 0 0 0 0
0.135615881561442 0 0 0 0 0 0 0 0 0 0 0
0.139693836627442 0 0 0 0 0 0 0 0 0 0 0
0.143771791693361 0 0 0 0 0 0 0 0 0 0 0
0.14784974675935 0 0 0 0 0 0 0 0 0 0 0
0.151927701825331 0 0 0 0 0 0 0 0 0 0 0
0.156005656891238 0 0 0 0 0 0 0 0 0 0 0
Actual
0.0418229150443036 0 0 0 0 0 0
0.0703686005060373 0 0 0 0 0 0
0.0866804207699125 0 0 0 0 0 0
0.0907583758358395 0 0 0 0 0 0
0.0948363309018036 0 0 0 0 0 0
0.102992241033757 0 0 0 0 0 0
0.107070196099725 0 0 0 0 0 0
0.111148151165702 0 0 0 0 0 0
0.11522610623166 0 0 0 0 0 0
0.119304061297647 0 0 0 0 0 0
0.123382016363582 0 0 0 0 0 0
0.12745997142956 0 0 0 0 0 0
0.131537926495509 0 0 0 0 0 0
0.135615881561442 0 0 0 0 0 0
0.139693836627442 0 0 0 0 0 0
0.143771791693361 0 0 0 0 0 0
0.14784974675935 0 0 0 0 0 0
0.151927701825331 0 0 0 0 0 0
0.156005656891238 0 0 0 0 0 0
> 1-sum(diag(tab))/sum(tab)
[1] 0.9997606
> data=abalone,kernel="polynomial")
"data=abalone,"
> summary(mymodel)
Call:
Parameters:
SVM-Type: eps-regression
SVM-Kernel: polynomial
cost: 1
degree: 3
gamma: 1
coef.0: 0
epsilon: 0.1
> library(predtoolsTS)
> tab
Actual
Predicted 0 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 0.055
-1.06030362631623 0 1 0 0 0 0 0 0 0 0 0
-0.738293545238754 0 0 0 0 0 1 0 0 0 0 0
-0.577050087259589 0 0 0 0 0 1 1 0 0 0 0
-0.539202379143768 0 0 0 0 0 0 0 1 0 0 0
-0.502315239756495 0 0 0 0 0 0 2 0 0 0 0
-0.431373279713574 0 0 0 0 1 0 0 0 0 0 0
-0.397293765294827 0 0 0 0 1 0 0 1 0 1 0
-0.364125432048579 0 0 0 1 2 0 1 0 0 0 0
-0.331855933095134 0 0 1 1 0 1 0 1 0 1 0
-0.300472921537761 0 0 0 0 0 0 1 0 0 0 1
-0.269964050495512 0 0 0 0 0 0 0 3 0 1 1
-0.24031697307421 0 0 0 0 0 0 1 0 1 1 0
-0.211519342381017 0 0 0 0 0 0 0 1 2 0 0
-0.1835588115112 0 0 0 0 0 2 0 1 1 0 0
-0.156423033584044 0 0 0 0 0 0 0 1 1 1 0
-0.130099661718987 0 0 0 0 1 0 0 2 0 1 1
-0.10457634898605 0 0 0 0 0 0 0 0 2 1 1
-0.0798407485214074 0 0 0 0 0 0 0 0 2 2 2
-0.0558805134289063 0 0 0 0 0 1 0 0 0 0 3
Actual
Predicted 0.06 0.065 0.07 0.075 0.08 0.085 0.09 0.095 0.1 0.105 0.11
-1.06030362631623 0 0 0 0 0 0 0 0 0 0 0
-0.738293545238754 0 0 0 0 0 0 0 0 0 0 0
-0.577050087259589 0 0 0 0 0 0 0 0 0 0 0
-0.539202379143768 0 0 0 0 0 0 0 0 0 0 0
-0.502315239756495 0 0 0 0 0 0 0 0 0 0 0
-0.431373279713574 0 0 0 0 0 0 0 0 0 0 0
-0.397293765294827 0 0 0 0 0 0 0 0 0 0 0
-0.364125432048579 0 0 0 0 0 0 0 0 0 0 0
-0.331855933095134 0 0 0 0 0 0 0 0 0 0 0
-0.300472921537761 0 0 0 0 0 0 0 1 0 0 0
-0.269964050495512 0 0 0 0 0 0 0 0 0 0 0
-0.24031697307421 0 0 0 0 1 0 0 0 0 0 0
-0.211519342381017 0 0 0 0 0 0 0 0 0 0 0
-0.1835588115112 0 0 0 0 0 0 0 0 0 0 0
-0.156423033584044 0 0 0 0 0 0 0 0 0 0 0
-0.130099661718987 1 0 0 0 0 0 0 0 0 0 0
-0.10457634898605 0 1 0 0 0 0 0 0 0 0 0
-0.0798407485214074 0 0 0 0 0 0 0 0 0 0 0
-0.0558805134289063 1 0 0 0 0 0 0 0 0 0 0
Actual
Predicted 0.115 0.12 0.125 0.13 0.135 0.14 0.145 0.15 0.155 0.16 0.165
-1.06030362631623 0 0 0 0 0 0 0 0 0 0 0
-0.738293545238754 0 0 0 0 0 0 0 0 0 0 0
-0.577050087259589 0 0 0 0 0 0 0 0 0 0 0
-0.539202379143768 0 0 0 0 0 0 0 0 0 0 0
-0.502315239756495 0 0 0 0 0 0 0 0 0 0 0
-0.431373279713574 0 0 0 0 0 0 0 0 0 0 0
-0.397293765294827 0 0 0 0 0 0 0 0 0 0 0
-0.364125432048579 0 0 0 0 0 0 0 0 0 0 0
-0.331855933095134 0 0 0 0 0 0 0 0 0 0 0
-0.300472921537761 0 0 0 0 0 0 0 0 0 0 0
-0.269964050495512 0 0 0 0 0 0 0 0 0 0 0
-0.24031697307421 0 0 0 0 0 0 0 0 0 0 0
-0.211519342381017 0 1 0 0 0 0 0 0 0 0 0
-0.1835588115112 0 0 0 0 0 0 0 0 0 0 0
-0.156423033584044 0 0 0 0 0 0 0 0 0 0 0
-0.130099661718987 0 0 0 0 0 0 0 0 0 0 0
-0.10457634898605 0 0 0 0 0 0 0 0 0 0 0
-0.0798407485214074 0 0 0 0 0 0 0 0 0 0 0
-0.0558805134289063 0 0 0 0 0 0 0 0 0 0 0
Actual
Predicted 0.17 0.175 0.18 0.185 0.19 0.195 0.2 0.205 0.21 0.215 0.22
-1.06030362631623 0 0 0 0 0 0 0 0 0 0 0
-0.738293545238754 0 0 0 0 0 0 0 0 0 0 0
-0.577050087259589 0 0 0 0 0 0 0 0 0 0 0
-0.539202379143768 0 0 0 0 0 0 0 0 0 0 0
-0.502315239756495 0 0 0 0 0 0 0 0 0 0 0
-0.431373279713574 0 0 0 0 0 0 0 0 0 0 0
-0.397293765294827 0 0 0 0 0 0 0 0 0 0 0
-0.364125432048579 0 0 0 0 0 0 0 0 0 0 0
-0.331855933095134 0 0 0 0 0 0 0 0 0 0 0
-0.300472921537761 0 0 0 0 0 0 0 0 0 0 0
-0.269964050495512 0 0 0 0 0 0 0 0 0 0 0
-0.24031697307421 0 0 0 0 0 0 0 0 0 0 0
-0.211519342381017 0 0 0 0 0 0 0 0 0 0 0
-0.1835588115112 0 0 0 0 0 0 0 0 0 0 0
-0.156423033584044 0 0 0 0 0 0 0 0 0 0 0
-0.130099661718987 0 0 0 0 0 0 0 0 0 0 0
-0.10457634898605 0 0 0 0 0 0 0 0 0 0 0
-0.0798407485214074 0 0 0 0 0 0 0 0 0 0 0
-0.0558805134289063 0 0 0 0 0 0 0 0 0 0 0
Actual
-1.06030362631623 0 0 0 0 0 0 0
-0.738293545238754 0 0 0 0 0 0 0
-0.577050087259589 0 0 0 0 0 0 0
-0.539202379143768 0 0 0 0 0 0 0
-0.502315239756495 0 0 0 0 0 0 0
-0.431373279713574 0 0 0 0 0 0 0
-0.397293765294827 0 0 0 0 0 0 0
-0.364125432048579 0 0 0 0 0 0 0
-0.331855933095134 0 0 0 0 0 0 0
-0.300472921537761 0 0 0 0 0 0 0
-0.269964050495512 0 0 0 0 0 0 0
-0.24031697307421 0 0 0 0 0 0 0
-0.211519342381017 0 0 0 0 0 0 0
-0.1835588115112 0 0 0 0 0 0 0
-0.156423033584044 0 0 0 0 0 0 0
-0.130099661718987 0 0 0 0 0 0 0
-0.10457634898605 0 0 0 0 0 0 0
-0.0798407485214074 0 0 0 0 0 0 0
-0.0558805134289063 0 0 0 0 0 0 0
> 1-sum(diag(tab))/sum(tab)
[1] 0.9997606
> summary(mymodel)
Call:
Parameters:
SVM-Type: eps-regression
SVM-Kernel: sigmoid
cost: 1
gamma: 1
coef.0: 0
epsilon: 0.1
>
> tab
Actual
Predicted 0 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 0.055 0.06
-64.808646407855 0 0 0 0 0 0 0 0 0 0 0 0
-61.0036292723928 0 0 0 0 0 0 0 0 0 0 0 0
-55.4470678424714 0 0 0 0 0 0 0 0 0 0 0 0
-53.9667168934057 0 0 0 0 0 0 0 0 0 0 0 0
-52.4485621847445 0 0 0 0 0 0 0 0 0 0 0 0
-50.8920370737335 0 0 0 0 0 0 0 0 0 0 0 0
-49.2966462901024 0 0 0 0 0 0 0 0 0 0 0 0
-47.661979491527 0 0 0 0 0 0 0 0 0 0 0 0
-45.9877265119677 0 0 0 0 0 0 0 0 0 0 0 0
-44.2736944761617 0 0 0 0 0 0 0 0 0 0 0 0
-42.5198269658264 0 0 0 0 0 0 0 0 0 0 0 0
-40.7262254350135 0 0 0 0 0 0 0 0 0 0 0 0
-38.8931730830072 0 0 0 0 0 0 0 0 0 0 0 0
-37.0211614026707 0 0 0 0 0 0 0 0 0 0 0 0
-35.1109196293623 0 0 0 0 0 0 0 0 0 0 0 0
-33.1634473195948 0 0 0 0 0 0 0 0 0 0 0 0
-31.1800502882033 0 0 0 0 0 0 0 0 0 0 0 0
-29.162380126415 0 0 0 0 0 0 0 0 0 0 0 0
-27.1124775089835 0 0 0 0 0 0 0 0 0 0 0 0
Actual
Predicted 0.065 0.07 0.075 0.08 0.085 0.09 0.095 0.1 0.105 0.11 0.115
-64.808646407855 0 0 0 0 0 0 0 0 0 0 0
-61.0036292723928 0 0 0 0 0 0 0 0 0 0 0
-55.4470678424714 0 0 0 0 0 0 0 0 0 0 0
-53.9667168934057 0 0 0 0 0 0 0 0 0 0 0
-52.4485621847445 0 0 0 0 0 0 0 0 0 0 0
-50.8920370737335 0 0 0 0 0 0 0 0 0 0 0
-49.2966462901024 0 0 0 0 0 0 0 0 0 0 0
-47.661979491527 0 0 0 0 0 0 0 0 0 0 0
-45.9877265119677 0 0 0 0 0 0 0 0 0 0 0
-44.2736944761617 0 0 0 0 0 0 0 0 0 0 0
-42.5198269658264 0 0 0 0 0 0 0 0 0 0 0
-40.7262254350135 0 0 0 0 0 0 0 0 0 0 0
-38.8931730830072 0 0 0 0 0 0 0 0 0 0 0
-37.0211614026707 0 0 0 0 0 0 0 0 0 0 0
-35.1109196293623 0 0 0 0 0 0 0 0 0 0 0
-33.1634473195948 0 0 0 0 0 0 0 0 0 0 0
-31.1800502882033 0 0 0 0 0 0 0 0 0 0 0
-29.162380126415 0 0 0 0 0 0 0 0 0 0 0
-27.1124775089835 0 0 0 0 0 0 0 0 0 0 0
Actual
Predicted 0.12 0.125 0.13 0.135 0.14 0.145 0.15 0.155 0.16 0.165 0.17
-64.808646407855 0 0 0 0 0 0 0 0 0 0 0
-61.0036292723928 0 0 0 0 0 0 0 0 0 0 0
-55.4470678424714 0 0 0 0 0 0 0 0 0 0 0
-53.9667168934057 0 0 0 0 0 0 0 0 0 0 0
-52.4485621847445 0 0 0 0 0 0 0 0 0 0 0
-50.8920370737335 0 0 0 0 0 0 0 0 0 0 0
-49.2966462901024 0 0 0 0 0 0 0 0 0 0 0
-47.661979491527 0 0 0 0 0 0 0 0 0 0 0
-45.9877265119677 0 0 0 0 0 0 0 0 0 0 0
-44.2736944761617 0 0 0 0 0 0 0 0 0 0 0
-42.5198269658264 0 0 0 0 0 0 0 0 0 0 0
-40.7262254350135 0 0 0 0 0 0 0 0 0 0 0
-38.8931730830072 0 0 0 0 0 0 0 0 0 1 0
-37.0211614026707 0 0 0 0 0 0 0 0 0 0 0
-35.1109196293623 0 0 0 0 0 1 0 0 0 0 2
-33.1634473195948 0 0 0 0 0 0 0 0 0 0 0
-31.1800502882033 0 0 0 0 0 0 1 0 0 1 2
-29.162380126415 0 0 0 0 0 0 0 0 0 1 4
-27.1124775089835 0 0 1 0 0 0 0 0 1 0 2
Actual
Predicted 0.175 0.18 0.185 0.19 0.195 0.2 0.205 0.21 0.215 0.22 0.225
-64.808646407855 0 0 0 0 0 0 0 0 0 0 0
-61.0036292723928 0 0 0 0 1 0 0 0 0 0 0
-55.4470678424714 0 0 0 0 0 0 0 1 1 0 0
-53.9667168934057 0 0 0 0 0 0 0 0 0 1 0
-52.4485621847445 1 0 0 0 1 0 0 0 1 0 0
-50.8920370737335 0 1 0 0 0 0 0 0 0 1 0
-49.2966462901024 0 0 0 1 0 0 0 0 1 0 0
-47.661979491527 0 0 0 0 0 1 1 1 0 0 0
-45.9877265119677 0 1 0 0 1 0 2 1 2 0 0
-44.2736944761617 0 0 0 1 0 1 0 0 3 0 0
-42.5198269658264 0 1 1 1 1 1 2 0 0 1 0
-40.7262254350135 1 0 1 0 0 0 2 1 1 2 2
-38.8931730830072 0 1 1 2 0 0 1 1 0 0 1
-37.0211614026707 1 0 3 3 0 2 1 2 1 1 0
-35.1109196293623 1 5 1 2 4 4 3 0 3 1 1
-33.1634473195948 2 3 1 2 0 1 0 0 0 0 0
-31.1800502882033 4 1 0 0 6 2 2 0 1 1 0
-29.162380126415 0 3 1 0 1 2 2 1 1 2 0
-27.1124775089835 5 1 2 4 1 3 4 0 2 1 0
Actual
Predicted 0.23 0.235 0.24 0.25 0.515 1.13
-64.808646407855 0 0 0 1 0 0
-61.0036292723928 0 0 0 0 0 0
-55.4470678424714 0 0 0 0 0 0
-53.9667168934057 0 0 0 1 0 0
-52.4485621847445 0 0 0 0 0 0
-50.8920370737335 0 0 0 0 0 0
-49.2966462901024 0 0 0 0 0 0
-47.661979491527 0 0 0 0 0 0
-45.9877265119677 0 1 0 0 0 0
-44.2736944761617 0 0 0 0 0 0
-42.5198269658264 0 0 0 0 0 0
-40.7262254350135 0 0 0 0 0 0
-38.8931730830072 1 0 0 0 0 0
-37.0211614026707 0 0 1 0 0 0
-35.1109196293623 1 1 0 0 0 0
-33.1634473195948 1 0 1 0 0 0
-31.1800502882033 0 0 0 0 0 0
-29.162380126415 0 0 0 0 1 0
-27.1124775089835 1 0 0 0 0 0
> 1-sum(diag(tab))/sum(tab)
[1] 0.985875
> set.seed(123)
> tmodel <- tune(svm, Diameter~Length, data=abalone, ranges = list(epsilon =
seq(0,1,0.1), cost=2^(2:9)))
>summary(tmodel)
>tab
>1-sum(diag(tab))/sum(tab)
DECISION TREE
In machine learning, a Decision Tree is a supervised method. It assigns a target value to each
data sample using a binary tree graph (each node has two children). The tree leaves represent the
target values. Starting at the root node, the sample is propagated through nodes until it reaches
the leaf. A choice is made in each node about which descendant node it should travel to. The
feature of the selected sample is used to make a choice. It is usually one of the factors considered
while making a decision (one feature is used in the node to make a decision). The process of
discovering the best rules at each internal tree node based on the chosen metric is known as
decision tree learning.
library(party)
>plot(mytree,type="simple")
>tab<-table(predict(mytree), mydata$D)
>print(tab)
>1-sum(diag(tab))/sum(tab)
CONCLUSION
We cross-validated each of the models on the test data before optimising them. Because cross
validation is a random process, we use pairwise t-tests to see if there is a statistically significant
difference between the performance of any two improved classifiers. First, we run each of the
best models via a 10-fold stratified cross-validation procedure (without any repetitions). Second,
we use a paired t-test to compare the accuracy of the RF model to that of other models, because
the RF model is the most accurate. RF outperforms other models in terms of f1-score and
weighted average recall, followed by KNN. At the same time, KNN has a higher precision score.
Other classifications are similar; but, because to the enormous number of goal levels, we did not
print them all. In the confusion matrix, the scenario is the same.