Feature Selection
Ms. Gayaksha Kandolkar
Assistant Professor (on Contract)
Department of Computer Engineering,
Padre Conceicao College of Engineering
● Feature selection can also be used to speed up the process of classification, at
the same time, ensuring that the classification accuracy is optimal. Feature
selection ensures the following:
1. Reduction in cost of pattern classification and design of the classifier
Dimensionality reduction, i.e., using a limited feature set simplifies both the
representation of patterns and the complexity of the classifiers. Consequently,
the resulting classifier will be faster and use less memory.
2. Improvement of classification accuracy
Exhaustive Search
● The most straightforward approach to the problem of feature selection is to
search through all the feature sub-sets and find the best sub-set.
● In this method, all combinations of features are tried out and the criterion function
J calculated.
● The combination of features which gives the highest value of J is the set of features
selected.
● If the patterns consist of d features, and a sub-set of size m features is to be found
with the smallest classification error, it entails searching all (d m) possible sub-sets
of size m and selecting the sub-set with the highest criterion function J(.),whereJ
=(1− Pe)
This method is suitable when only a few features need to be removed.
Artificial Neural Networks
● A multilayer feed-forward network with a back-propagation learning algorithm is
used in this method. The approach considered here is to take a larger than
necessary network and then remove unnecessary nodes.
● Pruning is carried out by eliminating the least salient nodes. It is based on the idea
of iteratively eliminating units and adjusting the remaining weights in such a way
that the network performance does not become worse over the entire training set.
● The pruning of nodes corresponds to removing the corresponding features
from the feature set. The saliency of a node is defined as the sum of the
increase in error over all the training patterns caused by the removal of
that node.
● The node pruning based feature selection first trains a network and then
removes the least salient node.
● The reduced network is trained again, followed by the removal of the least
salient node. This procedure is repeated to get the least classification
error.
Evaluation of Classifiers
The various parameters of the classifier which requires to be taken into account are:
1. Accuracy of the classifier: The main aim of using a classifier is to correctly classify
unknown patterns.
2. Design time and classification time: Design time is the time taken to build the classifier
from the training data while classification time is the time taken to classify a pattern using
the designed classifier.
3. Space required If an abstraction of the training set is carried out, the space required will
be less. If no abstraction is carried out and the entire training data is required for
classification, the space requirement is high.
4. Explanation ability: If the reason for the classifier in choosing the class of a
pattern is clear to the user, then its explanation ability is good. For instance, in
the decision tree classifier, following the path from the root of the tree to the
leaf node for the values of the features in the pattern will give the class of the
pattern.
Similarly, the user understands why a rule based system chooses a particular
class for a pattern. On the other hand, the neural network classifier has a
trained neural network and it is not clear to the user what the network is
doing.
5. Noise tolerance: This refers to the ability of the classifier to take care of
outliers and patterns wrongly classified.
To estimate how good a classifier is, an estimate can be made using the
training set itself. This is known as resubstitution estimate.
It assumes that the training data is a good representative of the data.
Sometimes, a part of the training data is used as a measure of the performance
of the classifier. Usually the training set is divided into smaller subsets. One of
the subsets is used for training while the other is used for validation. The
different methods of validation are as follows:
THANK YOU