贝叶斯实例
junjun
2016年2月10日
Rmarkdown脚本及数据集:http://pan.baidu.com/s/1hr0gTrI
实例一、朴素贝叶斯对莺尾花进行分类
#1、加载数据
data("iris")
#2、创建测试集和训练集数据
library(caret)
## Loading required package: lattice
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 3.2.3
set.seed(2005)
index <- createDataPartition(iris$Species, p=0.7, list=F)
train_iris <- iris[index, ]
test_iris <- iris[-index, ]
#3、建模
library(e1071)
model_iris <- naiveBayes(Species~., data=train_iris)
#4、模型评估
summary(model_iris)
## Length Class Mode
## apriori 3 table numeric
## tables 4 -none- list
## levels 3 -none- character
## call 4 -none- call
pred <- predict(model_iris, train_iris, type="class")
mean(pred==train_iris[, 5])
## [1] 0.952381
#5、预测
pred_iris <- predict(model_iris, test_iris, type="class")
mean(pred_iris==test_iris[, 5])
## [1] 1
table(pred_iris, test_iris[, 5])
##
## pred_iris setosa versicolor virginica
## setosa 15 0 0
## versicolor 0 15 0
## virginica 0 0 15
实例二、对打网球数据分类并预测
#1、加载数据
data<-read.csv("F:/R/Rworkspace/NB/playingtennis.csv")
str(data)
## 'data.frame': 14 obs. of 6 variables:
## $ Day : Factor w/ 14 levels "D1","D10","D11",..: 1 7 8 9 10 11 12 13 14 2 ...
## $ Outlook : Factor w/ 3 levels "Overcast","Rain",..: 3 3 1 2 2 2 1 3 3 2 ...
## $ Temperature: Factor w/ 3 levels "Cool","Hot","Mild": 2 2 2 3 1 1 1 3 1 3 ...
## $ Humidity : Factor w/ 2 levels "High","Normal": 1 1 1 1 2 2 2 1 2 2 ...
## $ Wind : Factor w/ 2 levels "Strong","Weak": 2 1 2 2 2 1 1 2 2 2 ...
## $ PlayTennis : Factor w/ 2 levels "No","Yes": 1 1 2 2 2 1 2 1 2 2 ...
summary(data)
## Day Outlook Temperature Humidity Wind PlayTennis
## D1 :1 Overcast:4 Cool:4 High :7 Strong:6 No :5
## D10 :1 Rain :5 Hot :4 Normal:7 Weak :8 Yes:9
## D11 :1 Sunny :5 Mild:6
## D12 :1
## D13 :1
## D14 :1
## (Other):8
#从上可知:数据集中的Day属性对分类和预测无用,可以删除
#2、数据清洗
dataset <- data[, 2:6]
#3、建模
library(e1071)
model <- naiveBayes(dataset[, 1:4], dataset[, 5])
#4、预测
new_data <- data.frame("Rain","Hot","High","Strong")
predict(model, new_data)
## [1] Yes
## Levels: No Yes
new_data <- data.frame("Sunny","Mild","Normal","Weak")
predict(model, new_data)
## [1] Yes
## Levels: No Yes