0% found this document useful (0 votes)
47 views38 pages

Cia 3

The document discusses using machine learning algorithms to predict the age of abalone based on their physical characteristics. It explores using decision trees, random forest, KNN, and SVM classifiers on a dataset containing measurements of abalone. Random forest performed best with an accuracy of 91% on the test data. Feature importance analysis showed the weight variable was most important for predictions. The goal is to develop a model that can accurately estimate abalone age without requiring time-consuming manual ring counting.

Uploaded by

Shivangi Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views38 pages

Cia 3

The document discusses using machine learning algorithms to predict the age of abalone based on their physical characteristics. It explores using decision trees, random forest, KNN, and SVM classifiers on a dataset containing measurements of abalone. Random forest performed best with an accuracy of 91% on the test data. Feature importance analysis showed the weight variable was most important for predictions. The goal is to develop a model that can accurately estimate abalone age without requiring time-consuming manual ring counting.

Uploaded by

Shivangi Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 38

CIA- 3

MACHINE LEARNING ALGORITHMS

SUBMITTED BY

SHIVANGI GUPTA ( 20221026)

UNDER THE GUIDANCE OF

DR. DURGANSH SHARMA

INSTITUTE OF BUSINESS AND MANAGEMENT

CHRIST (DEEMED TO BE UNIVERSITY), DELHI NCR


INTRODUCTION

Abalone is a sort of shellfish that is highly common. Their flesh is prized as a delicacy, and their
shells are frequently used in jewellery. The topic of assessing the age of abalone based on its
physical properties is addressed in this paper. Alternative techniques of estimating their age are
time-consuming. Therefore this subject is of interest. Depending on the species, abalone can live
up to 50 years. Environmental elements such as water flow and wave activity play a significant
role in how quickly they grow. Those from protected waters often develop more slowly than
those from exposed reef areas due to differences in food availability. Estimating the age of
abalone is challenging because to the fact that their size is determined not only by their age, but
also by the availability of food. Furthermore, abalone can develop so-called 'stunted' populations,
which have substantially distinct growth characteristics than other abalone populations. The
abalone age prediction problem has been classified as a classification problem in most of the
research on the dataset, which entails assigning a label to each case in the dataset. In this case,
the label represents the abalone's ring count, which is an actual quantity. As a result, the
classifier will be unable to distinguish between many classes and will perform insignificantly.
The age of abalone has a positive correlation with its price. However, identifying an abalone's
age is a time-consuming operation. As the abalone matures, rings form in its inner shell,
generally at a pace of one ring per year. Cutting the shell of an abalone allows access to the rings.
A lab technician examines a shell sample under a microscope and counts the rings after polishing
and staining them.

PROBLEM STATEMENT

Abalones are endangered marine snails that are found within the cold coastal waters worldwide,
majorly being distributed off the coasts of recent Zealand, African nation, Australia, Western
North America, and Japan. Abalones are sea snails or molluscs otherwise commonly called as
ear shells or sea ears. due to the economic importance of the age of the abalone and therefore the
cumbersome process that's involved in calculating it, much research has been done to resolve the
problem of abalone age prediction using its physical measurements available within the dataset.
TARGET FEATURE

The abalone rings are the target feature. It is an integer that describes the age of abalone;
multiply the number of rings by 1.5 to get the age in years.

OVERVIEW

This notebook's primary objective is to investigate three distinct Supervised Learning


Algorithms: Decision Trees and Random Forest. I'll go over missing value handling, pipeline
creation, partial dependency plots, and one-hot encoding. The goal is to figure out how old
Abalone is. These are sea snails that are endangered in South Africa, but also exist in Australia,
the United Kingdom, and New Zealand.

CLASSIFICATION

We'll use four classifiers to classify the data: random forest, decision tree, KNN and SVM. We'll
also figure out which parameters are best for each classifier. We don't utilise cross validation to
find the optimal parameter because there are numerous objectives with a total of 1. We utilise the
simple grid search strategy to find the optimal parameter for each classifier.

RANDOM FOREST

Random Forest is a type of ensemble learning technique that creates a large number of decision
trees during training. For classification problems, it predicts the mode of the classes, and for
regression tasks, it predicts the mean of trees. During tree construction, it employs the random
subspace approach and bagging. It comes with a built-in feature importance indicator.

> datarf <- abalone1

> str(datarf)

spec_tbl_df [4,177 x 14] (S3: spec_tbl_df/tbl_df/tbl/data.frame)

$S : num [1:4177] 0 0 1 0 1 1 1 1 0 1 ...

$ Sex : chr [1:4177] "M" "M" "F" "M" ...


$ Length : num [1:4177] 0.455 0.35 0.53 0.44 0.33 0.425 0.53 0.545 0.475 0.55 ...

$D : num [1:4177] 1 0 1 1 0 0 1 1 1 1 ...

$ Diameter : num [1:4177] 0.365 0.265 0.42 0.365 0.255 0.3 0.415 0.425 0.37 0.44 ...

$H : num [1:4177] 0 0 1 1 0 0 1 1 1 1 ...

$ Height : num [1:4177] 0.095 0.09 0.135 0.125 0.08 0.095 0.15 0.125 0.125 0.15 ...

$ Whole weight : num [1:4177] 0.514 0.226 0.677 0.516 0.205 ...

$ Shucked weight: num [1:4177] 0.2245 0.0995 0.2565 0.2155 0.0895 ...

$ Viscera weight: num [1:4177] 0.101 0.0485 0.1415 0.114 0.0395 ...

$ Shell weight : num [1:4177] 0.15 0.07 0.21 0.155 0.055 0.12 0.33 0.26 0.165 0.32 ...

$R : num [1:4177] 1 0 0 0 0 0 1 1 0 1 ...

$ Rings : num [1:4177] 15 7 9 10 7 8 20 16 9 19 ...

$W : num [1:4177] 1 0 1 1 0 1 1 1 1 1 ...

- attr(*, "spec")=

.. cols(

.. S = col_double(),

.. Sex = col_character(),

.. Length = col_double(),

.. D = col_double(),

.. Diameter = col_double(),

.. H = col_double(),

.. Height = col_double(),

.. `Whole weight` = col_double(),

.. `Shucked weight` = col_double(),

.. `Viscera weight` = col_double(),


.. `Shell weight` = col_double(),

.. R = col_double(),

.. Rings = col_double(),

.. W = col_double()

.. )

- attr(*, "problems")=<externalptr>

>

> datarf$D <- as.factor(datarf$D)

> table(datarf$D)

0 1

1116 3061

> set.seed(123)

>

> ind <- sample(2, nrow(datarf), replace=TRUE, prob=c(0.7,0.3))

> train <- datarf[ind==1,]

> test <- datarf[ind==2,]

> library(randomForest)

> install.packages("randomForest")

> set.seed(222)

> rf<-randomForest(D~W, data=train, ntree = 300, mtry = 8,importance = TRUE, proximity


= TRUE)

> print(rf)

Call:
randomForest(formula = D ~ W, data = train, ntree = 300, mtry = 8, importance =
TRUE, proximity = TRUE)

Type of random forest: classification

Number of trees: 300

No. of variables tried at each split: 1

OOB estimate of error rate: 7.98%

Confusion matrix:

0 1 class.error

0 548 232 0.2974358974

1 1 2137 0.0004677268

> attributes(rf)

$names

[1] "call" "type" "predicted" "err.rate"

[5] "confusion" "votes" "oob.times" "classes"

[9] "importance" "importanceSD" "localImportance" "proximity"

[13] "ntree" "mtry" "forest" "y"

[17] "test" "inbag" "terms"

$class

[1] "randomForest.formula" "randomForest"

> rf$confusion

0 1 class.error
0 548 232 0.2974358974

1 1 2137 0.0004677268

> library(caret)

> p1 <- predict(rf, train)

> head(p1)

123456

111111

Levels: 0 1

> head(train$D)

[1] 1 1 0 1 1 1

Levels: 0 1

> confusionMatrix(p1, train$D)

Confusion Matrix and Statistics

Reference

Prediction 0 1

0 548 1

1 232 2137
Accuracy : 0.9202

95% CI : (0.9097, 0.9297)

No Information Rate : 0.7327

P-Value [Acc > NIR] : < 2.2e-16

Kappa : 0.775

Mcnemar's Test P-Value : < 2.2e-16

Sensitivity : 0.7026

Specificity : 0.9995

Pos Pred Value : 0.9982

Neg Pred Value : 0.9021

Prevalence : 0.2673

Detection Rate : 0.1878

Detection Prevalence : 0.1881

Balanced Accuracy : 0.8510

'Positive' Class : 0

> p2 <- predict(rf, test)

> head(p2)
123456

010111

Levels: 0 1

> head(test$D)

[1] 0 1 0 1 1 1

Levels: 0 1

> confusionMatrix(p2, test$D)

Confusion Matrix and Statistics

Reference

Prediction 0 1

0 223 0

1 113 923

Accuracy : 0.9102

95% CI : (0.8931, 0.9255)

No Information Rate : 0.7331

P-Value [Acc > NIR] : < 2.2e-16

Kappa : 0.7432

Mcnemar's Test P-Value : < 2.2e-16


Sensitivity : 0.6637

Specificity : 1.0000

Pos Pred Value : 1.0000

Neg Pred Value : 0.8909

Prevalence : 0.2669

Detection Rate : 0.1771

Detection Prevalence : 0.1771

Balanced Accuracy : 0.8318

'Positive' Class : 0

> plot(rf)
> hist(treesize(rf), main = "No. of nodes for the Trees", col = "green")

> varImpPlot(rf)
varImpPlot(rf, sort=T, n.var=10, main="Top 10 - Variable Importance")

> importance(rf)

0 1 MeanDecreaseAccuracy MeanDecreaseGini

W 321.381 257.9927 297.0707 722.7635

> varUsed(rf)

[1] 300

partialPlot(rf,train,Height"1")
> getTree(rf, 1, labelVar = TRUE)

left daughter right daughter split var split point status prediction

1 2 3 W 0.5 1 <NA>

2 0 0 <NA> 0.0 -1 0

3 0 0 <NA> 0.0 -1 1

> MDSplot(rf, train$D)


K-NN

KNN is a Supervised Learning algorithm that predicts the output of data points using a labelled
input data set.It is one of the most basic Machine Learning algorithms, and it may be used to
solve a wide range of issues. It is primarily based on resemblance of features. KNN compares a
data point's similarity to that of its neighbour and assigns it to the most similar class.KNN is a
non-parametric model, which means it makes no assumptions about the data set, unlike most
algorithms. Because the algorithm can now handle realistic data, it becomes more effective.KNN
is a lazy algorithm, which implies that instead of learning a discriminative function from the
training data, it memorises it. Both classification and regression problems can be solved with
KNN.

> data <- read.csv(file.choose(), header = T)

> data$D[data$D == 0] <- 'No'

> data$D[data$D == 1] <- 'Yes'

> data$D <- factor(data$D)

> set.seed(1234)

> ind <- sample(2, nrow(data), replace = T, prob = c(0.7, 0.3))

> training <- data[ind == 1,]

> test <- data[ind == 2,]

> trControl <- trainControl(method = "repeatedcv",number = 10, repeats = 3, classProbs


= TRUE, summaryFunction = twoClassSummary)

> set.seed(222)

SVM

Support vector machines (SVMs) are supervised learning models with related learning
algorithms for classification and regression analysis in machine learning. It's primarily used to
solve categorization challenges. Each data item is displayed as a point in n-dimensional space
(where n is the number of features), with the value of each feature being the value of a specific
coordinate in this algorithm. The hyper-plane that best distinguishes the two classes is then used
to classify the data. SVMs may also conduct non-linear classification, implicitly translating their
inputs into high-dimensional feature spaces, in addition to linear classification.

> View(abalone)

> data(abalone)

> data(Abalone)

> str(abalone)

tibble [4,177 x 14] (S3: tbl_df/tbl/data.frame)

$S : num [1:4177] 0 0 1 0 1 1 1 1 0 1 ...

$ Sex : chr [1:4177] "M" "M" "F" "M" ...

$ Length : num [1:4177] 0.455 0.35 0.53 0.44 0.33 0.425 0.53 0.545 0.475 0.55 ...

$D : num [1:4177] 1 0 1 1 0 0 1 1 1 1 ...

$ Diameter : num [1:4177] 0.365 0.265 0.42 0.365 0.255 0.3 0.415 0.425 0.37 0.44 ...

$H : num [1:4177] 0 0 1 1 0 0 1 1 1 1 ...

$ Height : num [1:4177] 0.095 0.09 0.135 0.125 0.08 0.095 0.15 0.125 0.125 0.15 ...

$ Whole weight : num [1:4177] 0.514 0.226 0.677 0.516 0.205 ...

$ Shucked weight: num [1:4177] 0.2245 0.0995 0.2565 0.2155 0.0895 ...

$ Viscera weight: num [1:4177] 0.101 0.0485 0.1415 0.114 0.0395 ...

$ Shell weight : num [1:4177] 0.15 0.07 0.21 0.155 0.055 0.12 0.33 0.26 0.165 0.32 ...

$R : num [1:4177] 1 0 0 0 0 0 1 1 0 1 ...

$ Rings : num [1:4177] 15 7 9 10 7 8 20 16 9 19 ...

$W : num [1:4177] 1 0 1 1 0 1 1 1 1 1 ...

> library(ggplot2)

> library(e1071)
> mymodel <- svm(D~., data=abalone)

> summary(mymodel)

Call:

svm(formula = D ~ ., data = abalone)

Parameters:

SVM-Type: eps-regression

SVM-Kernel: radial

cost: 1

gamma: 0.06666667

epsilon: 0.1

Number of Support Vectors: 1100

> plot(mymodel, data=abalone,


abalone.Height~abalone.Length,
slice=list(abalone.Height=3,abalone.Length=4))

> library(predtoolsTS)

Warning message:

package ‘predtoolsTS’ was built under R version 4.0.5

> pred <- predict(mymodel, abalone)

> tab <- table(Predicted=pred, Actual=abalone$Height)

> tab

Actual

Predicted 0 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 0.055

-0.428323797892874 0 0 0 0 0 0 0 0 0 0 0

-0.401113458820979 0 0 0 0 0 0 0 0 0 0 0

-0.386433636913436 0 0 0 0 0 0 0 0 0 0 0

-0.368591188430687 0 0 0 0 0 0 0 0 0 0 0
-0.339139940558743 0 0 0 0 0 0 0 0 0 0 0

-0.317581220057663 0 0 0 0 0 0 0 0 0 0 0

-0.202278624410508 0 0 0 0 0 0 0 0 0 0 0

-0.184111808267246 0 0 0 0 0 0 0 0 0 0 0

-0.158358211476489 0 0 0 0 0 0 0 0 0 0 0

-0.148342875783461 0 0 0 0 0 0 0 0 0 0 0

-0.146884610550649 0 0 0 0 0 0 0 0 0 0 0

-0.128110023710106 0 0 0 0 0 0 0 0 0 0 0

-0.124924507377761 0 0 0 0 0 0 0 0 0 0 0

-0.123656899763398 0 0 0 0 0 0 0 0 0 0 0

-0.113946442560223 0 0 0 0 0 0 0 0 0 0 0

-0.113028091306822 0 0 0 0 0 0 0 0 0 0 0

-0.101605807977074 0 0 0 0 0 0 0 0 0 0 0

-0.0917080027028029 0 0 0 0 0 0 0 0 0 0 0

-0.0794593044619388 0 0 0 0 0 0 0 0 0 0 0

Actual

Predicted 0.06 0.065 0.07 0.075 0.08 0.085 0.09 0.095 0.1 0.105 0.11

-0.428323797892874 0 0 0 0 0 0 0 0 0 0 1

-0.401113458820979 0 0 0 0 0 1 0 0 0 0 0

-0.386433636913436 0 0 0 0 0 0 0 0 0 0 0

-0.368591188430687 0 0 0 0 0 0 0 0 1 0 0

-0.339139940558743 0 0 0 0 0 0 0 0 1 0 0

-0.317581220057663 0 0 0 0 0 1 0 0 0 0 0

-0.202278624410508 0 0 0 0 0 0 1 0 0 0 0
-0.184111808267246 0 0 0 0 0 0 0 0 0 1 0

-0.158358211476489 0 0 0 0 0 0 0 0 0 0 0

-0.148342875783461 0 0 0 0 0 0 0 0 0 0 0

-0.146884610550649 0 0 0 0 0 0 0 1 0 0 0

-0.128110023710106 0 0 0 0 0 0 0 0 1 0 0

-0.124924507377761 0 0 0 0 0 0 0 0 0 1 0

-0.123656899763398 0 0 0 0 0 0 0 0 0 0 1

-0.113946442560223 0 0 0 0 0 0 0 0 0 1 0

-0.113028091306822 0 0 0 0 0 0 0 1 0 0 0

-0.101605807977074 0 0 0 0 0 0 0 0 0 1 0

-0.0917080027028029 0 0 0 0 0 0 0 1 0 0 0

-0.0794593044619388 0 0 0 0 0 0 0 0 0 0 1

Actual

Predicted 0.115 0.12 0.125 0.13 0.135 0.14 0.145 0.15 0.155 0.16

-0.428323797892874 0 0 0 0 0 0 0 0 0 0

-0.401113458820979 0 0 0 0 0 0 0 0 0 0

-0.386433636913436 1 0 0 0 0 0 0 0 0 0

-0.368591188430687 0 0 0 0 0 0 0 0 0 0

-0.339139940558743 0 0 0 0 0 0 0 0 0 0

-0.317581220057663 0 0 0 0 0 0 0 0 0 0

-0.202278624410508 0 0 0 0 0 0 0 0 0 0

-0.184111808267246 0 0 0 0 0 0 0 0 0 0

-0.158358211476489 0 1 0 0 0 0 0 0 0 0

-0.148342875783461 0 0 0 1 0 0 0 0 0 0
-0.146884610550649 0 0 0 0 0 0 0 0 0 0

-0.128110023710106 0 0 0 0 0 0 0 0 0 0

-0.124924507377761 0 0 0 0 0 0 0 0 0 0

-0.123656899763398 0 0 0 0 0 0 0 0 0 0

-0.113946442560223 0 0 0 0 0 0 0 0 0 0

-0.113028091306822 0 0 0 0 0 0 0 0 0 0

-0.101605807977074 0 0 0 0 0 0 0 0 0 0

-0.0917080027028029 0 0 0 0 0 0 0 0 0 0

-0.0794593044619388 0 0 0 0 0 0 0 0 0 0

Actual

Predicted 0.165 0.17 0.175 0.18 0.185 0.19 0.195 0.2 0.205 0.21 0.215

-0.428323797892874 0 0 0 0 0 0 0 0 0 0 0

-0.401113458820979 0 0 0 0 0 0 0 0 0 0 0

-0.386433636913436 0 0 0 0 0 0 0 0 0 0 0

-0.368591188430687 0 0 0 0 0 0 0 0 0 0 0

-0.339139940558743 0 0 0 0 0 0 0 0 0 0 0

-0.317581220057663 0 0 0 0 0 0 0 0 0 0 0

-0.202278624410508 0 0 0 0 0 0 0 0 0 0 0

-0.184111808267246 0 0 0 0 0 0 0 0 0 0 0

-0.158358211476489 0 0 0 0 0 0 0 0 0 0 0

-0.148342875783461 0 0 0 0 0 0 0 0 0 0 0

-0.146884610550649 0 0 0 0 0 0 0 0 0 0 0

-0.128110023710106 0 0 0 0 0 0 0 0 0 0 0

-0.124924507377761 0 0 0 0 0 0 0 0 0 0 0
-0.123656899763398 0 0 0 0 0 0 0 0 0 0 0

-0.113946442560223 0 0 0 0 0 0 0 0 0 0 0

-0.113028091306822 0 0 0 0 0 0 0 0 0 0 0

-0.101605807977074 0 0 0 0 0 0 0 0 0 0 0

-0.0917080027028029 0 0 0 0 0 0 0 0 0 0 0

-0.0794593044619388 0 0 0 0 0 0 0 0 0 0 0

Actual

Predicted 0.22 0.225 0.23 0.235 0.24 0.25 0.515 1.13

-0.428323797892874 0 0 0 0 0 0 0 0

-0.401113458820979 0 0 0 0 0 0 0 0

-0.386433636913436 0 0 0 0 0 0 0 0

-0.368591188430687 0 0 0 0 0 0 0 0

-0.339139940558743 0 0 0 0 0 0 0 0

-0.317581220057663 0 0 0 0 0 0 0 0

-0.202278624410508 0 0 0 0 0 0 0 0

-0.184111808267246 0 0 0 0 0 0 0 0

-0.158358211476489 0 0 0 0 0 0 0 0

-0.148342875783461 0 0 0 0 0 0 0 0

-0.146884610550649 0 0 0 0 0 0 0 0

-0.128110023710106 0 0 0 0 0 0 0 0

-0.124924507377761 0 0 0 0 0 0 0 0

-0.123656899763398 0 0 0 0 0 0 0 0

-0.113946442560223 0 0 0 0 0 0 0 0

-0.113028091306822 0 0 0 0 0 0 0 0
-0.101605807977074 0 0 0 0 0 0 0 0

-0.0917080027028029 0 0 0 0 0 0 0 0

-0.0794593044619388 0 0 0 0 0 0 0 0

[ reached getOption("max.print") -- omitted 4158 rows ]

> 1-sum(diag(tab))/sum(tab)

[1] 0.9997606

> mymodel <- svm(Diameter~Length, data=abalone, kernel="linear")

> summary(mymodel)

Call:

svm(formula = Diameter ~ Length, data = abalone, kernel = "linear")

Parameters:

SVM-Type: eps-regression

SVM-Kernel: linear

cost: 1

gamma: 1

epsilon: 0.1

Number of Support Vectors: 1878

> plot(mymodel, data=abalone,


abalone.Height~abalone.Length,
slice=list(abalone.Height=3,abalone.Length=4))

> pred <- predict(mymodel, abalone)

> tab <- table(Predicted=pred, Actual=abalone$Height)

> tab

Actual

Predicted 0 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 0.055 0.06

0.0418229150443036 0 1 0 0 0 0 0 0 0 0 0 0
0.0703686005060373 0 0 0 0 0 1 0 0 0 0 0 0

0.0866804207699125 0 0 0 0 0 1 1 0 0 0 0 0

0.0907583758358395 0 0 0 0 0 0 0 1 0 0 0 0

0.0948363309018036 0 0 0 0 0 0 2 0 0 0 0 0

0.102992241033757 0 0 0 0 1 0 0 0 0 0 0 0

0.107070196099725 0 0 0 0 1 0 0 1 0 1 0 0

0.111148151165702 0 0 0 1 2 0 1 0 0 0 0 0

0.11522610623166 0 0 1 1 0 1 0 1 0 1 0 0

0.119304061297647 0 0 0 0 0 0 1 0 0 0 1 0

0.123382016363582 0 0 0 0 0 0 0 3 0 1 1 0

0.12745997142956 0 0 0 0 0 0 1 0 1 1 0 0

0.131537926495509 0 0 0 0 0 0 0 1 2 0 0 0

0.135615881561442 0 0 0 0 0 2 0 1 1 0 0 0

0.139693836627442 0 0 0 0 0 0 0 1 1 1 0 0

0.143771791693361 0 0 0 0 1 0 0 2 0 1 1 1

0.14784974675935 0 0 0 0 0 0 0 0 2 1 1 0

0.151927701825331 0 0 0 0 0 0 0 0 2 2 2 0

0.156005656891238 0 0 0 0 0 1 0 0 0 0 3 1

Actual

Predicted 0.065 0.07 0.075 0.08 0.085 0.09 0.095 0.1 0.105 0.11 0.115

0.0418229150443036 0 0 0 0 0 0 0 0 0 0 0

0.0703686005060373 0 0 0 0 0 0 0 0 0 0 0

0.0866804207699125 0 0 0 0 0 0 0 0 0 0 0

0.0907583758358395 0 0 0 0 0 0 0 0 0 0 0
0.0948363309018036 0 0 0 0 0 0 0 0 0 0 0

0.102992241033757 0 0 0 0 0 0 0 0 0 0 0

0.107070196099725 0 0 0 0 0 0 0 0 0 0 0

0.111148151165702 0 0 0 0 0 0 0 0 0 0 0

0.11522610623166 0 0 0 0 0 0 0 0 0 0 0

0.119304061297647 0 0 0 0 0 0 1 0 0 0 0

0.123382016363582 0 0 0 0 0 0 0 0 0 0 0

0.12745997142956 0 0 0 1 0 0 0 0 0 0 0

0.131537926495509 0 0 0 0 0 0 0 0 0 0 0

0.135615881561442 0 0 0 0 0 0 0 0 0 0 0

0.139693836627442 0 0 0 0 0 0 0 0 0 0 0

0.143771791693361 0 0 0 0 0 0 0 0 0 0 0

0.14784974675935 1 0 0 0 0 0 0 0 0 0 0

0.151927701825331 0 0 0 0 0 0 0 0 0 0 0

0.156005656891238 0 0 0 0 0 0 0 0 0 0 0

Actual

Predicted 0.12 0.125 0.13 0.135 0.14 0.145 0.15 0.155 0.16 0.165 0.17

0.0418229150443036 0 0 0 0 0 0 0 0 0 0 0

0.0703686005060373 0 0 0 0 0 0 0 0 0 0 0

0.0866804207699125 0 0 0 0 0 0 0 0 0 0 0

0.0907583758358395 0 0 0 0 0 0 0 0 0 0 0

0.0948363309018036 0 0 0 0 0 0 0 0 0 0 0

0.102992241033757 0 0 0 0 0 0 0 0 0 0 0

0.107070196099725 0 0 0 0 0 0 0 0 0 0 0
0.111148151165702 0 0 0 0 0 0 0 0 0 0 0

0.11522610623166 0 0 0 0 0 0 0 0 0 0 0

0.119304061297647 0 0 0 0 0 0 0 0 0 0 0

0.123382016363582 0 0 0 0 0 0 0 0 0 0 0

0.12745997142956 0 0 0 0 0 0 0 0 0 0 0

0.131537926495509 1 0 0 0 0 0 0 0 0 0 0

0.135615881561442 0 0 0 0 0 0 0 0 0 0 0

0.139693836627442 0 0 0 0 0 0 0 0 0 0 0

0.143771791693361 0 0 0 0 0 0 0 0 0 0 0

0.14784974675935 0 0 0 0 0 0 0 0 0 0 0

0.151927701825331 0 0 0 0 0 0 0 0 0 0 0

0.156005656891238 0 0 0 0 0 0 0 0 0 0 0

Actual

Predicted 0.175 0.18 0.185 0.19 0.195 0.2 0.205 0.21 0.215 0.22 0.225

0.0418229150443036 0 0 0 0 0 0 0 0 0 0 0

0.0703686005060373 0 0 0 0 0 0 0 0 0 0 0

0.0866804207699125 0 0 0 0 0 0 0 0 0 0 0

0.0907583758358395 0 0 0 0 0 0 0 0 0 0 0

0.0948363309018036 0 0 0 0 0 0 0 0 0 0 0

0.102992241033757 0 0 0 0 0 0 0 0 0 0 0

0.107070196099725 0 0 0 0 0 0 0 0 0 0 0

0.111148151165702 0 0 0 0 0 0 0 0 0 0 0

0.11522610623166 0 0 0 0 0 0 0 0 0 0 0

0.119304061297647 0 0 0 0 0 0 0 0 0 0 0
0.123382016363582 0 0 0 0 0 0 0 0 0 0 0

0.12745997142956 0 0 0 0 0 0 0 0 0 0 0

0.131537926495509 0 0 0 0 0 0 0 0 0 0 0

0.135615881561442 0 0 0 0 0 0 0 0 0 0 0

0.139693836627442 0 0 0 0 0 0 0 0 0 0 0

0.143771791693361 0 0 0 0 0 0 0 0 0 0 0

0.14784974675935 0 0 0 0 0 0 0 0 0 0 0

0.151927701825331 0 0 0 0 0 0 0 0 0 0 0

0.156005656891238 0 0 0 0 0 0 0 0 0 0 0

Actual

Predicted 0.23 0.235 0.24 0.25 0.515 1.13

0.0418229150443036 0 0 0 0 0 0

0.0703686005060373 0 0 0 0 0 0

0.0866804207699125 0 0 0 0 0 0

0.0907583758358395 0 0 0 0 0 0

0.0948363309018036 0 0 0 0 0 0

0.102992241033757 0 0 0 0 0 0

0.107070196099725 0 0 0 0 0 0

0.111148151165702 0 0 0 0 0 0

0.11522610623166 0 0 0 0 0 0

0.119304061297647 0 0 0 0 0 0

0.123382016363582 0 0 0 0 0 0

0.12745997142956 0 0 0 0 0 0

0.131537926495509 0 0 0 0 0 0
0.135615881561442 0 0 0 0 0 0

0.139693836627442 0 0 0 0 0 0

0.143771791693361 0 0 0 0 0 0

0.14784974675935 0 0 0 0 0 0

0.151927701825331 0 0 0 0 0 0

0.156005656891238 0 0 0 0 0 0

[ reached getOption("max.print") -- omitted 115 rows ]

> 1-sum(diag(tab))/sum(tab)

[1] 0.9997606

> data=abalone,kernel="polynomial")

Error: unexpected ',' in

"data=abalone,"

> mymodel <- svm(Diameter~Length, data=abalone,kernel="polynomial")

> summary(mymodel)

Call:

svm(formula = Diameter ~ Length, data = abalone, kernel = "polynomial")

Parameters:

SVM-Type: eps-regression

SVM-Kernel: polynomial

cost: 1

degree: 3

gamma: 1

coef.0: 0

epsilon: 0.1

Number of Support Vectors: 3745


> plot(mymodel, data=abalone,
abalone.Height~abalone.Length,
slice=list(abalone.Height=3,abalone.Length=4))

> library(predtoolsTS)

> pred <- predict(mymodel, abalone)

> tab <- table(Predicted=pred, Actual=abalone$Height)

> tab

Actual

Predicted 0 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 0.055

-1.06030362631623 0 1 0 0 0 0 0 0 0 0 0

-0.738293545238754 0 0 0 0 0 1 0 0 0 0 0

-0.577050087259589 0 0 0 0 0 1 1 0 0 0 0

-0.539202379143768 0 0 0 0 0 0 0 1 0 0 0

-0.502315239756495 0 0 0 0 0 0 2 0 0 0 0

-0.431373279713574 0 0 0 0 1 0 0 0 0 0 0

-0.397293765294827 0 0 0 0 1 0 0 1 0 1 0

-0.364125432048579 0 0 0 1 2 0 1 0 0 0 0

-0.331855933095134 0 0 1 1 0 1 0 1 0 1 0

-0.300472921537761 0 0 0 0 0 0 1 0 0 0 1

-0.269964050495512 0 0 0 0 0 0 0 3 0 1 1

-0.24031697307421 0 0 0 0 0 0 1 0 1 1 0

-0.211519342381017 0 0 0 0 0 0 0 1 2 0 0

-0.1835588115112 0 0 0 0 0 2 0 1 1 0 0

-0.156423033584044 0 0 0 0 0 0 0 1 1 1 0

-0.130099661718987 0 0 0 0 1 0 0 2 0 1 1

-0.10457634898605 0 0 0 0 0 0 0 0 2 1 1
-0.0798407485214074 0 0 0 0 0 0 0 0 2 2 2

-0.0558805134289063 0 0 0 0 0 1 0 0 0 0 3

Actual

Predicted 0.06 0.065 0.07 0.075 0.08 0.085 0.09 0.095 0.1 0.105 0.11

-1.06030362631623 0 0 0 0 0 0 0 0 0 0 0

-0.738293545238754 0 0 0 0 0 0 0 0 0 0 0

-0.577050087259589 0 0 0 0 0 0 0 0 0 0 0

-0.539202379143768 0 0 0 0 0 0 0 0 0 0 0

-0.502315239756495 0 0 0 0 0 0 0 0 0 0 0

-0.431373279713574 0 0 0 0 0 0 0 0 0 0 0

-0.397293765294827 0 0 0 0 0 0 0 0 0 0 0

-0.364125432048579 0 0 0 0 0 0 0 0 0 0 0

-0.331855933095134 0 0 0 0 0 0 0 0 0 0 0

-0.300472921537761 0 0 0 0 0 0 0 1 0 0 0

-0.269964050495512 0 0 0 0 0 0 0 0 0 0 0

-0.24031697307421 0 0 0 0 1 0 0 0 0 0 0

-0.211519342381017 0 0 0 0 0 0 0 0 0 0 0

-0.1835588115112 0 0 0 0 0 0 0 0 0 0 0

-0.156423033584044 0 0 0 0 0 0 0 0 0 0 0

-0.130099661718987 1 0 0 0 0 0 0 0 0 0 0

-0.10457634898605 0 1 0 0 0 0 0 0 0 0 0

-0.0798407485214074 0 0 0 0 0 0 0 0 0 0 0

-0.0558805134289063 1 0 0 0 0 0 0 0 0 0 0

Actual
Predicted 0.115 0.12 0.125 0.13 0.135 0.14 0.145 0.15 0.155 0.16 0.165

-1.06030362631623 0 0 0 0 0 0 0 0 0 0 0

-0.738293545238754 0 0 0 0 0 0 0 0 0 0 0

-0.577050087259589 0 0 0 0 0 0 0 0 0 0 0

-0.539202379143768 0 0 0 0 0 0 0 0 0 0 0

-0.502315239756495 0 0 0 0 0 0 0 0 0 0 0

-0.431373279713574 0 0 0 0 0 0 0 0 0 0 0

-0.397293765294827 0 0 0 0 0 0 0 0 0 0 0

-0.364125432048579 0 0 0 0 0 0 0 0 0 0 0

-0.331855933095134 0 0 0 0 0 0 0 0 0 0 0

-0.300472921537761 0 0 0 0 0 0 0 0 0 0 0

-0.269964050495512 0 0 0 0 0 0 0 0 0 0 0

-0.24031697307421 0 0 0 0 0 0 0 0 0 0 0

-0.211519342381017 0 1 0 0 0 0 0 0 0 0 0

-0.1835588115112 0 0 0 0 0 0 0 0 0 0 0

-0.156423033584044 0 0 0 0 0 0 0 0 0 0 0

-0.130099661718987 0 0 0 0 0 0 0 0 0 0 0

-0.10457634898605 0 0 0 0 0 0 0 0 0 0 0

-0.0798407485214074 0 0 0 0 0 0 0 0 0 0 0

-0.0558805134289063 0 0 0 0 0 0 0 0 0 0 0

Actual

Predicted 0.17 0.175 0.18 0.185 0.19 0.195 0.2 0.205 0.21 0.215 0.22

-1.06030362631623 0 0 0 0 0 0 0 0 0 0 0

-0.738293545238754 0 0 0 0 0 0 0 0 0 0 0
-0.577050087259589 0 0 0 0 0 0 0 0 0 0 0

-0.539202379143768 0 0 0 0 0 0 0 0 0 0 0

-0.502315239756495 0 0 0 0 0 0 0 0 0 0 0

-0.431373279713574 0 0 0 0 0 0 0 0 0 0 0

-0.397293765294827 0 0 0 0 0 0 0 0 0 0 0

-0.364125432048579 0 0 0 0 0 0 0 0 0 0 0

-0.331855933095134 0 0 0 0 0 0 0 0 0 0 0

-0.300472921537761 0 0 0 0 0 0 0 0 0 0 0

-0.269964050495512 0 0 0 0 0 0 0 0 0 0 0

-0.24031697307421 0 0 0 0 0 0 0 0 0 0 0

-0.211519342381017 0 0 0 0 0 0 0 0 0 0 0

-0.1835588115112 0 0 0 0 0 0 0 0 0 0 0

-0.156423033584044 0 0 0 0 0 0 0 0 0 0 0

-0.130099661718987 0 0 0 0 0 0 0 0 0 0 0

-0.10457634898605 0 0 0 0 0 0 0 0 0 0 0

-0.0798407485214074 0 0 0 0 0 0 0 0 0 0 0

-0.0558805134289063 0 0 0 0 0 0 0 0 0 0 0

Actual

Predicted 0.225 0.23 0.235 0.24 0.25 0.515 1.13

-1.06030362631623 0 0 0 0 0 0 0

-0.738293545238754 0 0 0 0 0 0 0

-0.577050087259589 0 0 0 0 0 0 0

-0.539202379143768 0 0 0 0 0 0 0

-0.502315239756495 0 0 0 0 0 0 0
-0.431373279713574 0 0 0 0 0 0 0

-0.397293765294827 0 0 0 0 0 0 0

-0.364125432048579 0 0 0 0 0 0 0

-0.331855933095134 0 0 0 0 0 0 0

-0.300472921537761 0 0 0 0 0 0 0

-0.269964050495512 0 0 0 0 0 0 0

-0.24031697307421 0 0 0 0 0 0 0

-0.211519342381017 0 0 0 0 0 0 0

-0.1835588115112 0 0 0 0 0 0 0

-0.156423033584044 0 0 0 0 0 0 0

-0.130099661718987 0 0 0 0 0 0 0

-0.10457634898605 0 0 0 0 0 0 0

-0.0798407485214074 0 0 0 0 0 0 0

-0.0558805134289063 0 0 0 0 0 0 0

[ reached getOption("max.print") -- omitted 115 rows ]

> 1-sum(diag(tab))/sum(tab)

[1] 0.9997606

> mymodel <- svm(Diameter~Length, data=abalone, kernel="sigmoid")

> summary(mymodel)

Call:

svm(formula = Diameter ~ Length, data = abalone, kernel = "sigmoid")

Parameters:

SVM-Type: eps-regression
SVM-Kernel: sigmoid

cost: 1

gamma: 1

coef.0: 0

epsilon: 0.1

Number of Support Vectors: 4149

> plot(mymodel, data=abalone,


abalone.Height~abalone.Length,
slice=list(abalone.Height=3,abalone.Length=4))

> pred <- predict(mymodel, abalone)

>

> tab <- table(Predicted=pred, Actual=abalone$Height)

> tab

Actual

Predicted 0 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 0.055 0.06

-64.808646407855 0 0 0 0 0 0 0 0 0 0 0 0

-61.0036292723928 0 0 0 0 0 0 0 0 0 0 0 0

-55.4470678424714 0 0 0 0 0 0 0 0 0 0 0 0

-53.9667168934057 0 0 0 0 0 0 0 0 0 0 0 0

-52.4485621847445 0 0 0 0 0 0 0 0 0 0 0 0

-50.8920370737335 0 0 0 0 0 0 0 0 0 0 0 0

-49.2966462901024 0 0 0 0 0 0 0 0 0 0 0 0

-47.661979491527 0 0 0 0 0 0 0 0 0 0 0 0

-45.9877265119677 0 0 0 0 0 0 0 0 0 0 0 0

-44.2736944761617 0 0 0 0 0 0 0 0 0 0 0 0

-42.5198269658264 0 0 0 0 0 0 0 0 0 0 0 0
-40.7262254350135 0 0 0 0 0 0 0 0 0 0 0 0

-38.8931730830072 0 0 0 0 0 0 0 0 0 0 0 0

-37.0211614026707 0 0 0 0 0 0 0 0 0 0 0 0

-35.1109196293623 0 0 0 0 0 0 0 0 0 0 0 0

-33.1634473195948 0 0 0 0 0 0 0 0 0 0 0 0

-31.1800502882033 0 0 0 0 0 0 0 0 0 0 0 0

-29.162380126415 0 0 0 0 0 0 0 0 0 0 0 0

-27.1124775089835 0 0 0 0 0 0 0 0 0 0 0 0

Actual

Predicted 0.065 0.07 0.075 0.08 0.085 0.09 0.095 0.1 0.105 0.11 0.115

-64.808646407855 0 0 0 0 0 0 0 0 0 0 0

-61.0036292723928 0 0 0 0 0 0 0 0 0 0 0

-55.4470678424714 0 0 0 0 0 0 0 0 0 0 0

-53.9667168934057 0 0 0 0 0 0 0 0 0 0 0

-52.4485621847445 0 0 0 0 0 0 0 0 0 0 0

-50.8920370737335 0 0 0 0 0 0 0 0 0 0 0

-49.2966462901024 0 0 0 0 0 0 0 0 0 0 0

-47.661979491527 0 0 0 0 0 0 0 0 0 0 0

-45.9877265119677 0 0 0 0 0 0 0 0 0 0 0

-44.2736944761617 0 0 0 0 0 0 0 0 0 0 0

-42.5198269658264 0 0 0 0 0 0 0 0 0 0 0

-40.7262254350135 0 0 0 0 0 0 0 0 0 0 0

-38.8931730830072 0 0 0 0 0 0 0 0 0 0 0

-37.0211614026707 0 0 0 0 0 0 0 0 0 0 0
-35.1109196293623 0 0 0 0 0 0 0 0 0 0 0

-33.1634473195948 0 0 0 0 0 0 0 0 0 0 0

-31.1800502882033 0 0 0 0 0 0 0 0 0 0 0

-29.162380126415 0 0 0 0 0 0 0 0 0 0 0

-27.1124775089835 0 0 0 0 0 0 0 0 0 0 0

Actual

Predicted 0.12 0.125 0.13 0.135 0.14 0.145 0.15 0.155 0.16 0.165 0.17

-64.808646407855 0 0 0 0 0 0 0 0 0 0 0

-61.0036292723928 0 0 0 0 0 0 0 0 0 0 0

-55.4470678424714 0 0 0 0 0 0 0 0 0 0 0

-53.9667168934057 0 0 0 0 0 0 0 0 0 0 0

-52.4485621847445 0 0 0 0 0 0 0 0 0 0 0

-50.8920370737335 0 0 0 0 0 0 0 0 0 0 0

-49.2966462901024 0 0 0 0 0 0 0 0 0 0 0

-47.661979491527 0 0 0 0 0 0 0 0 0 0 0

-45.9877265119677 0 0 0 0 0 0 0 0 0 0 0

-44.2736944761617 0 0 0 0 0 0 0 0 0 0 0

-42.5198269658264 0 0 0 0 0 0 0 0 0 0 0

-40.7262254350135 0 0 0 0 0 0 0 0 0 0 0

-38.8931730830072 0 0 0 0 0 0 0 0 0 1 0

-37.0211614026707 0 0 0 0 0 0 0 0 0 0 0

-35.1109196293623 0 0 0 0 0 1 0 0 0 0 2

-33.1634473195948 0 0 0 0 0 0 0 0 0 0 0

-31.1800502882033 0 0 0 0 0 0 1 0 0 1 2
-29.162380126415 0 0 0 0 0 0 0 0 0 1 4

-27.1124775089835 0 0 1 0 0 0 0 0 1 0 2

Actual

Predicted 0.175 0.18 0.185 0.19 0.195 0.2 0.205 0.21 0.215 0.22 0.225

-64.808646407855 0 0 0 0 0 0 0 0 0 0 0

-61.0036292723928 0 0 0 0 1 0 0 0 0 0 0

-55.4470678424714 0 0 0 0 0 0 0 1 1 0 0

-53.9667168934057 0 0 0 0 0 0 0 0 0 1 0

-52.4485621847445 1 0 0 0 1 0 0 0 1 0 0

-50.8920370737335 0 1 0 0 0 0 0 0 0 1 0

-49.2966462901024 0 0 0 1 0 0 0 0 1 0 0

-47.661979491527 0 0 0 0 0 1 1 1 0 0 0

-45.9877265119677 0 1 0 0 1 0 2 1 2 0 0

-44.2736944761617 0 0 0 1 0 1 0 0 3 0 0

-42.5198269658264 0 1 1 1 1 1 2 0 0 1 0

-40.7262254350135 1 0 1 0 0 0 2 1 1 2 2

-38.8931730830072 0 1 1 2 0 0 1 1 0 0 1

-37.0211614026707 1 0 3 3 0 2 1 2 1 1 0

-35.1109196293623 1 5 1 2 4 4 3 0 3 1 1

-33.1634473195948 2 3 1 2 0 1 0 0 0 0 0

-31.1800502882033 4 1 0 0 6 2 2 0 1 1 0

-29.162380126415 0 3 1 0 1 2 2 1 1 2 0

-27.1124775089835 5 1 2 4 1 3 4 0 2 1 0

Actual
Predicted 0.23 0.235 0.24 0.25 0.515 1.13

-64.808646407855 0 0 0 1 0 0

-61.0036292723928 0 0 0 0 0 0

-55.4470678424714 0 0 0 0 0 0

-53.9667168934057 0 0 0 1 0 0

-52.4485621847445 0 0 0 0 0 0

-50.8920370737335 0 0 0 0 0 0

-49.2966462901024 0 0 0 0 0 0

-47.661979491527 0 0 0 0 0 0

-45.9877265119677 0 1 0 0 0 0

-44.2736944761617 0 0 0 0 0 0

-42.5198269658264 0 0 0 0 0 0

-40.7262254350135 0 0 0 0 0 0

-38.8931730830072 1 0 0 0 0 0

-37.0211614026707 0 0 1 0 0 0

-35.1109196293623 1 1 0 0 0 0

-33.1634473195948 1 0 1 0 0 0

-31.1800502882033 0 0 0 0 0 0

-29.162380126415 0 0 0 0 1 0

-27.1124775089835 1 0 0 0 0 0

[ reached getOption("max.print") -- omitted 115 rows ]

> 1-sum(diag(tab))/sum(tab)

[1] 0.985875

> set.seed(123)
> tmodel <- tune(svm, Diameter~Length, data=abalone, ranges = list(epsilon =
seq(0,1,0.1), cost=2^(2:9)))

>summary(tmodel)

>tmodel <- tune(svm, Diameter~Length, data=abalone, ranges = list(epsilon = seq(0,1,0.1), cost=2^(2:7)))

>plot(mymodel, data=abalone, abalone.Height~abalone.Length,


slice=list(abalone.Height=3,abalone.Length=4))

>pred <- predict(mymodel, abalone)

>tab <- table(Predicted=pred, Actual=abalone$Height)

>tab

>1-sum(diag(tab))/sum(tab)

DECISION TREE

In machine learning, a Decision Tree is a supervised method. It assigns a target value to each
data sample using a binary tree graph (each node has two children). The tree leaves represent the
target values. Starting at the root node, the sample is propagated through nodes until it reaches
the leaf. A choice is made in each node about which descendant node it should travel to. The
feature of the selected sample is used to make a choice. It is usually one of the factors considered
while making a decision (one feature is used in the node to make a decision). The process of
discovering the best rules at each internal tree node based on the chosen metric is known as
decision tree learning.

mydata <- read.csv("abalone.csv")

mydata$D <- as.factor(mydata$D)

library(party)

>mytree <- ctree(D~H+W+R, mydata, controls=ctree_control(mincriterion=0.9, minsplit=50))


>print(mytree)

>plot(mytree,type="simple")

>tab<-table(predict(mytree), mydata$D)

>print(tab)

>1-sum(diag(tab))/sum(tab)

CONCLUSION

We cross-validated each of the models on the test data before optimising them. Because cross
validation is a random process, we use pairwise t-tests to see if there is a statistically significant
difference between the performance of any two improved classifiers. First, we run each of the
best models via a 10-fold stratified cross-validation procedure (without any repetitions). Second,
we use a paired t-test to compare the accuracy of the RF model to that of other models, because
the RF model is the most accurate. RF outperforms other models in terms of f1-score and
weighted average recall, followed by KNN. At the same time, KNN has a higher precision score.
Other classifications are similar; but, because to the enormous number of goal levels, we did not
print them all. In the confusion matrix, the scenario is the same.

You might also like