0% found this document useful (0 votes)
22 views3 pages

Non-Metric Methods For Pattern Classification

Non-numeric data encompasses any data that cannot be expressed with numbers, including nominal data, which consists of distinct categories without inherent order. Decision trees are a non-metric method for classifying such data by asking a series of categorical questions, effectively handling nominal data. The process involves starting with a root node, asking the best question to split the data, and continuing until reaching leaf nodes that represent final classifications.

Uploaded by

abir09082003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views3 pages

Non-Metric Methods For Pattern Classification

Non-numeric data encompasses any data that cannot be expressed with numbers, including nominal data, which consists of distinct categories without inherent order. Decision trees are a non-metric method for classifying such data by asking a series of categorical questions, effectively handling nominal data. The process involves starting with a root node, asking the best question to split the data, and continuing until reaching leaf nodes that represent final classifications.

Uploaded by

abir09082003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Non-numeric data is a broader term encompassing any data that can't be expressed with

numbers. This includes nominal data, but also other types like text, images, and dates.

Nominal data, on the other hand, is a specific type of non-numeric data where the information is
categorized into distinct labels or names. These categories don't have any inherent order or
ranking.

Here's how to distinguish them with examples:

Non-numeric data (can include nominal data):

 Example 1: Favorite color: This could be answered with words like "red," "blue," or
"green." It's non-numeric because these colors don't have a specific order (red isn't
"more" than blue).
 Example 2: Customer satisfaction rating: This might be a survey question with options
like "satisfied," "neutral," or "dissatisfied." It's non-numeric as the options are categories,
not numerical values.
 Example 3: Image of a cat: A picture doesn't have numerical values, but it conveys
information.

Nominal data (a type of non-numeric data):

 Example 1: Shirt size (S, M, L, XL): These are distinct categories with no inherent
order. A large isn't "twice the size" of a medium, it's just a different size.
 Example 2: Country of residence: This categorizes people by their country, but there's
no order (e.g., France isn't "better" than Canada).
 Example 3: Blood type (A, B, AB, O): These are blood type classifications, not
numerical values.

Non-Metric Methods for Pattern Classification


Pattern classification typically relies on metric methods that use distances or similarities between
data points. However, there are scenarios where data isn't numerical and lacks inherent metrics.
This is where non-metric methods come into play for classifying patterns in such data.

One prominent non-metric method is decision trees. These classify data by asking a series of
questions about the features, splitting the data at each step based on the answer. The process
continues until the data reaches a "leaf" node, representing a specific class. Decision trees handle
nominal data (discrete categories) effectively, where distances between categories might not be
meaningful.

Here's a breakdown of decision trees in the context of non-metric methods:


 Non-metric nature: Decision trees don't rely on calculating distances or similarities
between data points. Instead, they use a sequence of questions based on the features'
categories.
 Handling nominal data: They excel at working with nominal data, where features have
discrete categories without a natural ordering or distance metric. For example, classifying
emails as spam or not spam based on keywords (nominal) rather than the word lengths
(numerical).

Key to Non-Metric Approach:

 Decision trees don't measure distances between data points. Instead, they ask a series of
"yes/no" or categorical questions about the features in your data.
 Each question splits the data into subsets based on the answer. This process continues
until the data reaches a final classification (leaf node).

Advantages for Non-Metric Data:

 Nominal Data Champion: Decision trees shine when dealing with nominal data, which
has discrete categories without a natural order. Think of classifying emails as spam based
on keywords (categories) rather than word lengths (numerical).
 Flexibility: They can handle mixed data types, including nominal, ordinal (ordered
categories), and even continuous data (converted to categories).

Building the Tree:

1. Start with the Root Node: This represents your entire dataset.
2. Ask the Best Question: The algorithm chooses the most informative feature (question)
to split the data. This is based on metrics like Gini impurity or information gain, which
measure how well the question separates the data into distinct classes.
3. Branch Out: Based on the answer to the question, the data gets divided into branches
leading to child nodes.
4. Repeat and Refine: The process repeats at each child node, asking new questions to
further refine the classification. This continues until a stopping criterion is met, such as
reaching pure classes (all data points in a node belong to the same class) or reaching a
maximum depth.
5. Leaf Nodes - The Destination: These terminal nodes represent the final classifications
for the data that reaches them.

Building a Decision Tree: Restaurant Choice Example


Imagine you're building a decision tree to help someone decide what type of restaurant to go to
based on their preferences. Here's how the steps you described would play out in this scenario:

1. Root Node:

This represents all the possible restaurants.


2. Ask the Best Question:

The algorithm analyzes your preferences (features) and chooses the most informative question to
split the data. Let's say your preferences include cuisine type, price range, and formality. The
algorithm might determine "What type of cuisine are you looking for?" is the best first
question because it effectively separates preferences.

3. Branch Out:

Based on your answer (e.g., Italian, Mexican, etc.), the data branches into child nodes
representing restaurants of that cuisine type.

4. Repeat and Refine:

At each child node, new questions are asked to further refine the options. For example, under
"Italian," the question might be "Price range (Budget-friendly, Moderate, Upscale)?" This
splits the data into price-based categories.

5. Leaf Nodes:

The process continues until you reach leaf nodes representing specific restaurant
recommendations. For instance, a leaf node under "Italian - Moderate" might be "Trattoria with
good reviews".

Additional Details:

 Choosing the "best question" involves information gain metrics that measure how well a
question separates data into distinct classes (cuisine types in this case).
 The process stops when a stopping criterion is met. This could be reaching pure classes
(all restaurants in a node are similar) or reaching a maximum depth (limiting the tree's
complexity).

You might also like