0% found this document useful (0 votes)

84 views13 pages

Cloudsvm: Training An SVM Classifier in Cloud Computing Systems

This document summarizes a research paper about training support vector machine (SVM) classifiers in cloud computing systems using MapReduce. The authors propose a method called CloudSVM that iteratively trains subsets of large datasets split across cloud storage servers. It merges the support vectors from each trained subset to converge on a global optimal classifier. The authors analyze CloudSVM using various public datasets and implement it using Hadoop MapReduce in the cloud. This allows large-scale datasets to be trained using SVMs that were previously not possible on a single computer.

Uploaded by

sfaritha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

84 views13 pages

Cloudsvm: Training An SVM Classifier in Cloud Computing Systems

Uploaded by

sfaritha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

CloudSVM : Training an SVM Classifier in Cloud

Computing Systems
F. Ozgur CATAK
National Research Institute Of Electronics and Cryptology
(UEKAE)TUBITAK
M. Erdal BALABAN
Industrial Engineering, Isik University

Abstract
In conventional method, distributed support vector machines (SVM)
algorithms are trained over pre-configured intranet/internet environments
to find out an optimal classifier. These methods are very complicated and
costly for large datasets. Hence, we propose a method that is referred as
the Cloud SVM training mechanism (CloudSVM) in a cloud computing
environment with MapReduce technique for distributed machine learning
applications. Accordingly, (i) SVM algorithm is trained in distributed
cloud storage servers that work concurrently; (ii) merge all support vectors
in every trained cloud node; and (iii) iterate these two steps until the SVM
converges to the optimal classifier function. Large scale data sets are not
possible to train using SVM algorithm on a single computer. The results
of this study are important for training of large scale data sets for machine
learning applications. We provided that iterative training of splitted data
set in cloud computing environment using SVM will converge to a global
optimal classifier in finite iteration size.

1 Introduction
Machine learning applications generally require large amounts of computation
time and storage space. Learning algorithms have to be scaled up to handle
extremely large data sets. When the training set is large, not all the examples
can be loaded into memory in training phase of the machine learning algorithm
at one step. It is required to distribute computation and memory requirements
among several connected computers.

In machine learning field, support vector machines(SVM) offers most ro-

bust and accurate classification method due to their generalized properties.
With its solid theoretical foundation and also proven effectiveness, SVM has
contributed to researchers’ success in many fields. But, SVM’s suffer from a

1
widely recognized scalability problem in both memory requirement and compu-
tational time[1]. SVM algorithm’s computation and memory requirements in-
crease rapidly with the number of instances in data set, many data sets are not
suitable for classification[14]. The SVM algorithm is formulated as quadratic
optimization problem. Quadratic optimization problem has O(m3 ) time and
O(m2 ) space complexity, where m is the training set size[2]. The computation
time of SVM training is quadratic in the number of training instances.

The first approach to overcome large scale data set training is to reduce fea-
ture vector size. Feature selection and feature transformation methods are basic
approaches for reducing vector size [3]. Feature selection algorithms choose a
subset of the features from the original feature set and feature transformation
algorithms creates new data from the original feature space to a new space
with reduced dimensionality. In literature, there are several methods; Singular
Value Decomposition (SVD)[4], Principal Component Analysis (PCA) [5], In-
dependent Component Analysis (ICA)[6], Correlation Based Feature Selection
(CFS)[7], Sampling based data set selection. All of these methods have a big
problem for generalization of final machine learning model.

Second approach for large scale data set training is chunking [13]. Collobert
et al. [12] propose a parallel SVM training algorithm that each subset of whole
dataset is trained with SVM and then the classifiers are combined into a fi-
nal single classifier. Lu et al.[8] proposed distributed support vector machine
(DSVM) algorithm that finds support vectors (SVs) on strongly connected net-
works. Each site within a strongly connected network classifies subsets of train-
ing data locally via SVM and passes the calculated SVs to its descendant sites
and receives SVs from its ancestor sites and recalculates the SVs and passes
them to its descendant sites and so on. Ruping et al.[9] proposed incremental
learning with Support Vector Machine. One needs to make an error on the
old Support Vectors(which represent the old learning set) more costly than an
error on a new example. Syed et al. [10] proposed the distributed support
vector machine (DSVM) algorithm that finds SVs locally and processes them
altogether in a central processing center. Caragea et al. [11] in 2005 improved
this algorithm by allowing the data processing center to send support vectors
back to the distributed data source and iteratively achieve the global optimum.
Graf et al. [14] had an algorithm that implemented distributed processors into
cascade top-down network topology, namely Cascade SVM. The bottom node
of the network is the central processing center. The distributed SVM methods
in these works converge and increase test accuracy. All of these works have
similar problems. They require a pre-defined network topology and computer
size in their network. The performance of training depends on the special net-
work configuration. Main idea of current distributed SVM methods is first data
chunking then parallel implementation of SVM training. Global synchroniza-
tion overheads are not considered in these approaches.

In this paper, we propose a Cloud Computing based SVM method with

2
MapReduce [18] technique for distributed training phase of algorithm. By split-
ting training set over a cloud computing system’s data nodes, each subset is
optimized iteratively to find out a single global classifier. The basic idea behind
this approach is to collect SVs from every optimized subset of training set at
each cloud node, then merge them to save as global support vectors. Comput-
ers in cloud computing system exchange only minimum number of training set
samples. Our algorithm CloudSVM is analysed with various public datasets.
CloudSVM is built on the LibSVM and implemented using the Hadoop imple-
mentation of MapReduce.

This paper is organized as follows. In section 2, we will provide an overview

to SVM formulations. In Section 3, presents the Map Reduce pattern in detail.
Section 4 explains system model with our implementation of the Map Reduce
pattern for the SVM training. In section 5, convergence of CloudSVM is ex-
plained. In section 6, simulation results with various UCI datasets are shown.
Thereafter, we will give concluding remarks in Section 7.

2 Support Vector Machine

Support vector machine is a supervised learning method in statistics and com-
puter science, to analyse data and recognize patterns, used for classification and
regression analysis. The standard SVM takes a set of input data and predicts,
for each given input, which of two possible classes forms the input, making
the SVM a non-probabilistic binary linear classifier. Note that if the training
data[singular/plural] are linearly separable as shown in figure 1, we can select
the two hyperplanes of the margin in a way that there are no points between
them and then try to maximize their distance. By using geometry, we find the
distance between these two hyperplanes is 2/kwk. Given some training data D,
a set of n points of the form

D = {(xi , yi ) | xi ∈ Rm , yi ∈ {−1, 1} }ni=1 (1)

where xi is an m-dimensional real vector, yi is either -1 or 1 denoting the class

to which point xi belongs. SVMs aim to search a hyperplane in the Reproduc-
ing Kernel Hilbert Space (RKHS) that maximizes the margin between the two
classes of data in D with the smallest training error [13]. This problem can be
formulated as the following quadratic optimization problem:
m
1 X
minimize :P (w, b, ξ) = kwk2 + C ξi
2 i=1
(2)
subjectto :yi (hw, φ(xi )i + b) ≥ 1 − ξi
ξi ≥ 0

for i = 1, ..., m, where ξi are slack variables and C is a constant denoting the
cost of each slack. C is a trade-off parameter which controls the maximization of

3
Figure 1: Binary classification of an SVM with Maximum-margin hyperplane
trained with samples from two classes. Samples on the margin are called the
support vectors.

the margin and minimizing the training error. The decision function of SVMs is
f (x) = wT φ(x) + b where the w and b are obtained by solving the optimization
problem P in (2). By using Lagrange multipliers , the optimization problem P
in (2) can be expressed as
1 T
min :F (α) = α QαT − αT 1
2
subjectto :0 ≤ α ≤ C (3)

yT α = 0

where [Q]ij = yi yj φT (xi )φ(xj ) is the Lagrangian multiplier variable. It

is not need to know φ, but it is necessary to know is how to compute the
modified inner product which will be called as kernel function represented as
K(xi , xj ) = φT (xi )φ(xj ). Thus, [Q]ij = yi yj K(xi , xj ). Choosing a positive
definite kernel K, by Mercers theorem, then optimization problem P is a convex
quadratic programming (QP) problem with linear constraints and can be solved
in polynomial time.

3 MapReduce
MapReduce is a programming model derived from the map and reduce function
combination from functional programming. MapReduce model widely used to
run parallel applications for large scale data sets processing. Users specify a
map function that processes a key/value pair to generate a set of intermediate
key/value pairs, and a reduce function that merges all intermediate values as-
sociated with the same intermediate key[18]. MapReduce is divided into two

4
major phases called map and reduce, separated by an internal shuffle phase
of the intermediate results. The framework automatically executes those func-
tions in parallel over any number of processors[19]. Simply, a MapReduce job
executes three basic operations on a data set distributed across many shared-
nothing cluster nodes. First task is Map function that processes in parallel
manner by each node without transferring any data with other notes. In next
operation, processed data by Map function is repartitioned across all nodes of
the cluster. Lastly, Reduce task is executed in parallel manner by each node
with partitioned data.

Figure 2: Overview of MapReduce System

A file in the distributed file system (DFS) is split into multiple chunks and
each chunk is stored on different data-nodes. A map function takes a key/value
pair as input from input chunks and produces a list of key/value pairs as output.
The type of output key and value can be different from input key and value:

map(key1 , value1 ) ⇒ list(key2 , value2 )

A reduce function takes a key and associated value list as input and generates
a list of new values as output:

reduce(key2 , list(value2 )) ⇒ list(value3 )

Each Reduce call typically produces either one value v3 or an empty return,
though one call is allowed to return more than one value. The returns of all
calls are collected as the desired result list.Main advantage of MapReduce system
is that it allows distributed processing of submitted job on the subset of a whole
dataset in the network.

5
4 System Model

Figure 3: Schematic of Cloud SVM architecture.

CloudSVM is a MapReduce based SVM training algorithm that runs in parallel

on multiple commodity computers with Hadoop. As shown in figure 3, the
training set of the algorithm is split into subsets and each one is evaluated
individually to get α values (i.e. support vectors). In Map stage of MapReduce
job, the subset of training set is combined with global support vectors. In
Reduce step, the merged subset of training data is evaluated. The resulting
new support vectors are combined with the global support vectors in Reduce
step. The CloudSVM with MapReduce algorithm can be explained as follows.
First, each computer within a cloud computing system reads the global support
vectors, then merges global SVs with subsets of local training data and classifies
via SVM. Finally, all the computed SVs in cloud computers are merged. Thus,
algorithm saves global SVs with new ones. The algorithm of CloudSVM consists

6
of the following steps.
1. As initialization the global support vector set as t = 0, V t = ∅
2. t = t + 1;

3. For any computer in l, l = 1, ..., L reads global SVs and merge them with
subset of training data.
4. Train SVM algorithm with merged new data set
5. Find out support vectors

6. After all computers in cloud system complete their training phase, merge
all calculated SVs and save the result to the global SVs
7. If ht = ht−1 stop, otherwise go to step 2
Pseudo code of CloudSVM Algorithm’s Map and Reduce function are given
in Algorithm 1 and Algorithm 2

Algorithm 1 Map Function of CloudSVM Algorithm

SVGlobal = ∅ // Empty global support vector set
while ht 6= ht−1 do
for l ∈ L // For each subset loop do
Dlt ← Dlt ∪ SVGlobal
t

end for
end while

Algorithm 2 Reduce Function of CloudSVM Algorithm

while ht 6= ht−1 do
for l ∈ L do
SVl , ht ← svm(Dl ) // Train merged Dataset to obtain Support Vectors
and Hypothesis
end for
for l ∈ L do
SVGlobal ← SVGlobal ∪ SVl
end for
end while

For training SVM classifier functions, we used LibSVM with various kernels.
Appropriate parameters C and γ values were found by cross validation test.
All system is implemented with Hadoop and streaming Python package mrjob
library.

7
5 Convergence of CloudSVM
Let S denotes a subset of training set D, F (S) is the optimal objective function
over data set S, h∗ is the global optimal hypothesis for which has a minimal
empirical risk Remp (h). Our algorithm starts with SV0Global = 0, and generates
a non-increasing sequence of positive set of vectors SVtGlobal , where SVtGlobal
is the vector of support vector at the t.th iteration. We used hinge loss for
testing our models trained with CloudSVM algorithm. Hinge loss works well
for its purposes in SVM as a classifier, since the more you violate the margin,
the higher the penalty is[20]. The hinge loss function is the following:

l(f (x), y) = max {0, 1 − y.f (x)}

Empirical risk can be computed with an approximation:

n
1X
Remp (h) = l(h(xi ), yi )
n i=1

According to the empirical risk minimization principle the learning algorithm

should choose a hypothesis ĥ which minimizes the empirical risk:

ĥ = arg min Remp (h).

h∈H

A hypothesis is found in every cloud node. Let X be a subset of training data at

cloud node i where X ∈ Rmxn , SVtGlobal is the vector of support vector at the
t.th iteration, ht,i is hypothesis at node i with iteration t, then the optimization
problem in equation 3 becomes
T T
t,l 1 α1 Q11 Q12 α1 1 α1
maximize h =− +
α
2 2 Q21 Q22 α2 1 α2
I (4)
X
subjectto : 0 ≤ αi ≤ C, ∀i and αi yi = 0
i

where Q12 and Q21 are kernel matrices with respect to

n o
t
Q12 = Ki,j (xij , SVGlobal(i,j) ) | i = 1, ..., m, j = 1, ..., n .

α1 and α2 are the solutions estimated by node i with dataset X and SVGlobal .
Because of the Mercer’s theorem, our kernel matrix Q is a symmetric positive-
definite function on a square. Then our sub matrices Q12 and Q21 must be
equal.
We can define Q11 and Q22 matrices such that

Q11 = {Ki,j (xi,j , xi,j )|xi,j ∈ X , i = 1, ..., m, j = 1, ..., n}

Q22 = {Ki,j (SVGlobal , SVGlobal )|i = 1, ..., m, j = 1, ..., n}

8
at iteration t.
Algorithm’s stop point is reached when the hypothesis’ empirical risk is same
with previous iteration. That is:

Remp (ht ) = Remp (ht−1 ) (5)

Lemma : Accuracy of the decision function of CloudSVM classifier at iteration

t is always greater or equal to the maximum accuracy of the decision function
of SVM classifier at iteration t − 1. That is

Remp (ht ) ≤ arg min

t−1
Remp (h) (6)
h∈H

Proof :Without loss of generality, Iterated CloudSVM monotonically con-

verges to optimum classifier.

SVtGlobal = SVt−1 t−1

Global ∪ SVi | i = 1, ...n

where n is the data set split size(or cloud node size). Then, training set for svm
algorithm at node i is

d = X ∪ SVtGlobal
Adding more samples cannot decrease the optimal value. Accuracy of the
sub problem in each node monotonically increases in each step.

6 Simulation Results

Table 1: The datasets used in experiments

Dataset Name Train. Data Dim.
German 1000 24
Heart 270 13
Ionosphere 351 34
Satellite 4435 36

We have selected several data sets from the UCI Machine Learning Repos-
itory, namely, German, Heart, Ionosphere, Hand Digit and Satellite. The data
sets length and input dimensions are shown in Table 1. We test our algorithm
over a real-word data sets to demonstrate the convergence. Linear kernels were
used with optimal parameters (γ, C). Parameters were estimated by cross-
validation method.
We used 10-fold cross-validation, dividing the set of samples at random into
10 approximately equal-size parts. The 10 parts were roughly balanced, ensur-
ing that the classes were distributed uniformly to each of the 10 parts. Ten-fold
cross-validation works as follows: we fit the model on 90% of the samples and

9
Table 2: Performance Results of CloudSVM algorithm with various UCI
datasets γ
Dataset Name γ C No. Of Iteration No. of SVs Accuracy Kernel Type
German 100 1 5 606 0.7728 Linear
Heart 100 1 3 137 0.8259 Linear
Ionosphere 108 1 3 160 0.8423 Linear
Satellite 100 1 2 1384 0.9064 Linear

then predict the class labels of the remaining 10% (the test samples). This pro-
cedure is repeated 10 times, with each part playing the role of the test samples
and the errors on all 10 parts added together to compute the overall error.
To analyse the CloudSVM, we randomly distributed all the training data to a
cloud computing system with 10 computers with pseudo distributed Hadoop.
Data set prediction accuracy with iterations and total number of SVs are shown
in Table 3. When iteration size become 3 - 5, test accuracy values of all data
sets reach to the highest values. If the iteration size is increased, the value of
test accuracy falls into a steady state. The value of test accuracy is not changed
for large enough number of iteration size.

When the iteration size is increased, the number of global support vectors
are passed the steady-state condition. As a result, the CloudSVM algorithm is
useful for large size training data.

7 Conclusion and Further Research

We have proposed distributed support vector machine implementation in cloud
computing systems with MapReduce technique that improves scalability and
parallelism of split data set training. The performance and generalization prop-
erty of our algorithm are evaluated in Hadoop. Our algorithm is able to work on
cloud computing systems without knowing how many computers connected to
run parallel. The algorithm is designed to deal with large scale data set training
problems. It is empirically shown that the generalization performance and the
risk minimization of our algorithm are better than the previous results.

References
[1] Chang, E.Y., Zhu, K., Wang, H., Bai, H., Li, J. and Qiu, Z.,Cui, H.:
PSVM: Parallelizing Support Vector Machines on Distributed Comput-
ers. Advances in Neural Information Processing Systems 20, (2007)

10
Table 3: Data set prediction accuracy with iterations

[2] Tsang, I.W., Kwok, J.T., Cheung, P.M. : Core Vector Machines: Fast
SVM Training on Very Large Data Sets. J. Mach. Learn. Res. 6, 363-392
(2005)

[3] Weston, J., Mukherjee, S., Chapelle O., Pontil M., Poggio T., Vap-
nik V.: Feature selection for SVMs Advances in Neural Information
Processing Systems 13, 668-674 (2000)
[4] Golub, G., Reinsch, C. ER: Singular value decomposition and least
squares solutions. Numerische Mathematik, 14, 403-420 (1970)
[5] Jolliffe I.: T. Principal Component Analysis , Series: Springer Series in
Statistics , 2nd ed., New York (2002)
[6] Comon P.: Independent Component Analysis, a new concept ?. Signal
Processing, Elsevier, 36, 287-314 (1994)

11
[7] Hall M.A.: Correlation-based Feature Selection for Discrete and Nu-
meric Class Machine Learning. In: Proceedings of the Seventeenth
International Conference on Machine Learning, pp. 359-366. Morgan
Kaufmann Publishers Inc., San Francisco, CA (2000)
[8] Lu, Y., Roychowdhury, V., Vandenberghe, L.: Distributed parallel sup-
port vector machines in strongly connected networks. IEEE Trans. Neu-
ral Networks, 19, 1167-1178 (2008)
[9] Stefan, R.: Incremental Learning with Support Vector Machines. In:
Data Mining, IEEE International Conference on, pp. 641. IEEE Com-
puter Society, Los Alamitos, CA (2001)
[10] Syed, N.A., Liu, H.,Sung K.: Incremental learning with support vec-
tor machines. In: Proceedings of the Fifth ACM SIGKDD Interna-
tional Conference on Knowledge Discovery and Data Mining (KDD),
San Diego, California, (1999)
[11] Caragea, C., Caragea, D., Honavar, V: Learning support vector ma-
chine classifiers from distributed data sources.In Proceedings of the
Twentieth National Conference on Artificial Intelligence (AAAI), Stu-
dent Abstract and Poster Program, pp. 1602-1603. AAAI Press, Pitts-
burgh, Pennsylvania, (2005)
[12] Collobert, R., Bengio, S., Bengio, Y.: A parallel mixture of SVMs for
very large scale problems. Neural Computation, 14, 1105-1114 (2002)
[13] Vapnik, V.N.: The nature of statistical learning theory. Springer, NY
(1995)
[14] Graf,H. P., Cosatto, E., Bottou, L., Durdanovic, I., Vapnik, V.: Par-
allel support vector machines: The cascade SVM.In: Proceedings of
the Eighteenth Annual Conference on Neural Information Processing
Systems (NIPS), pp. 521-528. MIT Press, Vancouver (2004)
[15] Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector ma-
chines. ACM Transactions on Intelligent Systems and Technology, 2,
27-27 (2011)
[16] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learn-
ing applied to document recognition. Proceedings of the IEEE, 86,
22782324, (1998)
[17] Bertsekas, D.P.:Nonlinear Programming (Second ed.). Athena Scien-
tific. Cambridge, (1999)
[18] Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on
large clusters. In : Proceedings of the 6th conference on Symposium
on Operating Systems Design & Implementation(OSDI), pp. 10-10.
USENIX Association, Berkeley (2004)

12
[19] Schatz, M.C.: CloudBurst: highly sensitive read mapping with MapRe-
duce. Bioinformatics (Oxford, England), 25, 1363-1369 (2009)
[20] Rosasco, L., De Vito, E., Caponnetto, A., Piana, M., Verri, A.: Are
loss functions all the same. Neural Computation, 16, 1063-1076 (2011)

A Support Vector Machines
No ratings yet
A Support Vector Machines
2 pages
Tutorial On Support Vector Machine (SVM) : Abstract
No ratings yet
Tutorial On Support Vector Machine (SVM) : Abstract
13 pages
VO MCA S4 Data Mining Unit 6
No ratings yet
VO MCA S4 Data Mining Unit 6
21 pages
A Comprehensive Survey On Support Vector Machine Classification Applications, Challenges and Trends - 2019
No ratings yet
A Comprehensive Survey On Support Vector Machine Classification Applications, Challenges and Trends - 2019
8 pages
Support Vector Machine in R Paper
No ratings yet
Support Vector Machine in R Paper
28 pages
UBICC Article 522 522
No ratings yet
UBICC Article 522 522
8 pages
Support Vector Machine For Classification
No ratings yet
Support Vector Machine For Classification
38 pages
Presented By: M. Saqib Iqbal Gull Muhammad Presented To: Mr. Imran Ali Khan Artificial Intelligence National College of Bussiness Administration & Economics Multan
No ratings yet
Presented By: M. Saqib Iqbal Gull Muhammad Presented To: Mr. Imran Ali Khan Artificial Intelligence National College of Bussiness Administration & Economics Multan
11 pages
Support Vector Machines
No ratings yet
Support Vector Machines
12 pages
Fundamental Knowledge of Machine Learning: Abstract This Chapter Introduces The Basic Concepts and Methods of Machine
No ratings yet
Fundamental Knowledge of Machine Learning: Abstract This Chapter Introduces The Basic Concepts and Methods of Machine
14 pages
Artigo Smallex
No ratings yet
Artigo Smallex
17 pages
SVM: A Guide for Data Scientists
No ratings yet
SVM: A Guide for Data Scientists
9 pages
This Is
No ratings yet
This Is
7 pages
Support Vector Machine Guide
No ratings yet
Support Vector Machine Guide
21 pages
AP For NLP-LO2
No ratings yet
AP For NLP-LO2
38 pages
ML Module 3
No ratings yet
ML Module 3
44 pages
Fuzzy Support Vector Machines: IEEE Transactions On Neural Networks March 2002
No ratings yet
Fuzzy Support Vector Machines: IEEE Transactions On Neural Networks March 2002
9 pages
Sesion 4
No ratings yet
Sesion 4
37 pages
Ijetae 0812 11
No ratings yet
Ijetae 0812 11
4 pages
Support Vector Machine
No ratings yet
Support Vector Machine
12 pages
Unit5 ML
No ratings yet
Unit5 ML
12 pages
A Map Reduce Based Support Vector Machine For Big Data Classification
No ratings yet
A Map Reduce Based Support Vector Machine For Big Data Classification
22 pages
A Comprehensive Survey On Support Vector Machine in Data Mining Tasks: Applications & Challenges
No ratings yet
A Comprehensive Survey On Support Vector Machine in Data Mining Tasks: Applications & Challenges
18 pages
Support Vector Machine (SVM) : Basic Terminologies
100% (1)
Support Vector Machine (SVM) : Basic Terminologies
2 pages
Survey Piccialli sciandrone4OR
No ratings yet
Survey Piccialli sciandrone4OR
29 pages
Unit 3 Aam
No ratings yet
Unit 3 Aam
30 pages
Advanced SVM Optimization Techniques
No ratings yet
Advanced SVM Optimization Techniques
20 pages
Presentation On Support Vector Machine (SVM)
100% (2)
Presentation On Support Vector Machine (SVM)
22 pages
Support Vector Machines: (Vapnik, 1979)
No ratings yet
Support Vector Machines: (Vapnik, 1979)
34 pages
Course Title: Fundamentals of Machine Learning Course Code: Group Assignment On
No ratings yet
Course Title: Fundamentals of Machine Learning Course Code: Group Assignment On
9 pages
Ml-Ii Unit-1
No ratings yet
Ml-Ii Unit-1
4 pages
SVM7
No ratings yet
SVM7
53 pages
SVM Basics Paper
No ratings yet
SVM Basics Paper
7 pages
Data Classification Using Support Vector Machine: Durgesh K. Srivastava, Lekha Bhambhu
No ratings yet
Data Classification Using Support Vector Machine: Durgesh K. Srivastava, Lekha Bhambhu
7 pages
Support Vector Machines (SVMS) - Introduction and Key Concepts
No ratings yet
Support Vector Machines (SVMS) - Introduction and Key Concepts
52 pages
Taz TFG 2016 2057
No ratings yet
Taz TFG 2016 2057
52 pages
Paper Oltean Gordan
No ratings yet
Paper Oltean Gordan
7 pages
An Improved Training Algorithm For Support Vector Machines
No ratings yet
An Improved Training Algorithm For Support Vector Machines
10 pages
11597-Article Text-7395-2-10-20220813
No ratings yet
11597-Article Text-7395-2-10-20220813
5 pages
Support Vector Machine
100% (2)
Support Vector Machine
11 pages
Chapter 07
No ratings yet
Chapter 07
18 pages
The SVM Classifier Based On The Modified Particle Swarm Optimization
No ratings yet
The SVM Classifier Based On The Modified Particle Swarm Optimization
9 pages
SVM Consolidated
No ratings yet
SVM Consolidated
34 pages
Machine Learning As Ecology
No ratings yet
Machine Learning As Ecology
23 pages
Multi-Class Classification Using Support Vector Machines in Binary Tree Architecture
No ratings yet
Multi-Class Classification Using Support Vector Machines in Binary Tree Architecture
6 pages
Data Science Presentation
No ratings yet
Data Science Presentation
15 pages
Large-Scale Least Squares Twin SVMs
No ratings yet
Large-Scale Least Squares Twin SVMs
19 pages
A New Heuristic of The Decision Tree Induction: Ning Li, Li Zhao, Ai-Xia Chen, Qing-Wu Meng, Guo-Fang Zhang
No ratings yet
A New Heuristic of The Decision Tree Induction: Ning Li, Li Zhao, Ai-Xia Chen, Qing-Wu Meng, Guo-Fang Zhang
6 pages
Machine Learning Unit 5 Part 1
No ratings yet
Machine Learning Unit 5 Part 1
19 pages
Articol Informatica Economica
No ratings yet
Articol Informatica Economica
10 pages
Support Vector Machine
No ratings yet
Support Vector Machine
14 pages
SVM Generalization Across Fields
No ratings yet
SVM Generalization Across Fields
14 pages
ML Basics Unit 4
No ratings yet
ML Basics Unit 4
29 pages
SUpport Vector Machine
No ratings yet
SUpport Vector Machine
28 pages
SVM
No ratings yet
SVM
4 pages
Data Structure MCQs with Answers
No ratings yet
Data Structure MCQs with Answers
6 pages
Data Structure (DS) Solved Mcqs Set 4
No ratings yet
Data Structure (DS) Solved Mcqs Set 4
7 pages
Python Program To Calculate Electricity Bill: If Condition
No ratings yet
Python Program To Calculate Electricity Bill: If Condition
6 pages
Machine Learning and Cloud Computing: Survey of Distributed and Saas Solutions
No ratings yet
Machine Learning and Cloud Computing: Survey of Distributed and Saas Solutions
13 pages
GE3151 PYTHON Syllabus
No ratings yet
GE3151 PYTHON Syllabus
2 pages
CLD ML
No ratings yet
CLD ML
26 pages
Advt Asso Prof Deputy Librarian Dy Dir of PE
No ratings yet
Advt Asso Prof Deputy Librarian Dy Dir of PE
1 page
Commodities Intraday TRDG
No ratings yet
Commodities Intraday TRDG
2 pages
Top 100 Academic Journals 2021
No ratings yet
Top 100 Academic Journals 2021
372 pages
Multi-Objective Evolutionary Biclustering of Gene Expression Data
No ratings yet
Multi-Objective Evolutionary Biclustering of Gene Expression Data
14 pages
A Novel Machine Learning Algorithm For Spammer Identification in Industrial Mobile Cloud Computing
100% (1)
A Novel Machine Learning Algorithm For Spammer Identification in Industrial Mobile Cloud Computing
5 pages
Optimized Cluster Head Selection Using Krill Herd Algorithm For Wireless Sensor Network
No ratings yet
Optimized Cluster Head Selection Using Krill Herd Algorithm For Wireless Sensor Network
10 pages
Equities Update: Morning
No ratings yet
Equities Update: Morning
2 pages
Equities Update: Morning
No ratings yet
Equities Update: Morning
2 pages
Equities Update: Morning
No ratings yet
Equities Update: Morning
2 pages
USD/INR Daily Chart
No ratings yet
USD/INR Daily Chart
2 pages
Daily Commodity Wrapup
No ratings yet
Daily Commodity Wrapup
2 pages
Daily Dua's - Dua & Azkar
100% (1)
Daily Dua's - Dua & Azkar
41 pages
Archives of Environmental Protection Is A Quarterly Published Jointly by The Institute of
No ratings yet
Archives of Environmental Protection Is A Quarterly Published Jointly by The Institute of
4 pages
Java Unit Wise Questions
No ratings yet
Java Unit Wise Questions
4 pages
BGP for Network Professionals
No ratings yet
BGP for Network Professionals
12 pages
FW F436 P PDF
No ratings yet
FW F436 P PDF
2 pages
Pic
100% (1)
Pic
71 pages
Postman Guide for Developers
No ratings yet
Postman Guide for Developers
30 pages
ANTLMonitoring - MRBTS-525536 - MED - Bello Sauces - WN19 - FL19 - GF19 - SBTS19B - ENB - 0000 - 001696 - 000000 - 20210211-1157
No ratings yet
ANTLMonitoring - MRBTS-525536 - MED - Bello Sauces - WN19 - FL19 - GF19 - SBTS19B - ENB - 0000 - 001696 - 000000 - 20210211-1157
6 pages
Indian Mobile Brands & Ambassadors
No ratings yet
Indian Mobile Brands & Ambassadors
9 pages
Spring MVC Framework Guide
No ratings yet
Spring MVC Framework Guide
65 pages
ZXA10 C320 Product Introduction
100% (2)
ZXA10 C320 Product Introduction
9 pages
KUKA - BendTechBasic KRC 6.0 EN
No ratings yet
KUKA - BendTechBasic KRC 6.0 EN
119 pages
TOPOLT 72 Crack Serial
No ratings yet
TOPOLT 72 Crack Serial
1 page
SVFE CBS-Specification
No ratings yet
SVFE CBS-Specification
98 pages
Resume: Personal Information
No ratings yet
Resume: Personal Information
3 pages
GP Install
No ratings yet
GP Install
15 pages
EG Manual-1
No ratings yet
EG Manual-1
13 pages
User Manual
No ratings yet
User Manual
175 pages
Ict 11
No ratings yet
Ict 11
2 pages
Database Final Assignment
No ratings yet
Database Final Assignment
8 pages
L I N K e D: Linked List
100% (1)
L I N K e D: Linked List
11 pages
Ebill 13072638909
No ratings yet
Ebill 13072638909
6 pages
LM2575
No ratings yet
LM2575
25 pages
Tamron Lenses for Canon & Nikon
No ratings yet
Tamron Lenses for Canon & Nikon
18 pages
Corel Draw Shortcut Keys
100% (1)
Corel Draw Shortcut Keys
5 pages
Manual SIMOTION Web Accumulator V3.0.0
No ratings yet
Manual SIMOTION Web Accumulator V3.0.0
59 pages
CRM - Tables and Transaction Codes SAP - BYTE
100% (1)
CRM - Tables and Transaction Codes SAP - BYTE
39 pages
Sam International CV
No ratings yet
Sam International CV
3 pages
4 - Tamerlane and Other Poems
No ratings yet
4 - Tamerlane and Other Poems
93 pages
Comparision Sheet Colour Printer
No ratings yet
Comparision Sheet Colour Printer
4 pages
Industrial Network Optimization
No ratings yet
Industrial Network Optimization
6 pages
Penawaran LagoonAvenue - Aiscomm
No ratings yet
Penawaran LagoonAvenue - Aiscomm
3 pages

Cloudsvm: Training An SVM Classifier in Cloud Computing Systems

Uploaded by

Cloudsvm: Training An SVM Classifier in Cloud Computing Systems

Uploaded by

CloudSVM : Training an SVM Classifier in Cloud

In machine learning field, support vector machines(SVM) offers most ro-

In this paper, we propose a Cloud Computing based SVM method with

This paper is organized as follows. In section 2, we will provide an overview

2 Support Vector Machine

D = {(xi , yi ) | xi ∈ Rm , yi ∈ {−1, 1} }ni=1 (1)

where xi is an m-dimensional real vector, yi is either -1 or 1 denoting the class

where [Q]ij = yi yj φT (xi )φ(xj ) is the Lagrangian multiplier variable. It

Figure 2: Overview of MapReduce System

map(key1 , value1 ) ⇒ list(key2 , value2 )

reduce(key2 , list(value2 )) ⇒ list(value3 )

Figure 3: Schematic of Cloud SVM architecture.

CloudSVM is a MapReduce based SVM training algorithm that runs in parallel

Algorithm 1 Map Function of CloudSVM Algorithm

Algorithm 2 Reduce Function of CloudSVM Algorithm

l(f (x), y) = max {0, 1 − y.f (x)}

Empirical risk can be computed with an approximation:

According to the empirical risk minimization principle the learning algorithm

ĥ = arg min Remp (h).

A hypothesis is found in every cloud node. Let X be a subset of training data at

where Q12 and Q21 are kernel matrices with respect to

Q11 = {Ki,j (xi,j , xi,j )|xi,j ∈ X , i = 1, ..., m, j = 1, ..., n}

Q22 = {Ki,j (SVGlobal , SVGlobal )|i = 1, ..., m, j = 1, ..., n}

Remp (ht ) = Remp (ht−1 ) (5)

Lemma : Accuracy of the decision function of CloudSVM classifier at iteration

Remp (ht ) ≤ arg min

Proof :Without loss of generality, Iterated CloudSVM monotonically con-

SVtGlobal = SVt−1 t−1

Table 1: The datasets used in experiments

7 Conclusion and Further Research

You might also like