0% found this document useful (0 votes)
6 views

data mining steps

The document discusses data selection and preprocessing in data mining, highlighting the importance of data sets, test data, and trained data. It explains that test data is used to evaluate machine learning models after training, while training data is essential for building effective predictive models. Additionally, it outlines data cleaning techniques, including handling missing values and removing noise to improve data quality.

Uploaded by

tushikasahu5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

data mining steps

The document discusses data selection and preprocessing in data mining, highlighting the importance of data sets, test data, and trained data. It explains that test data is used to evaluate machine learning models after training, while training data is essential for building effective predictive models. Additionally, it outlines data cleaning techniques, including handling missing values and removing noise to improve data quality.

Uploaded by

tushikasahu5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Exp 3

Data selection define


Data set :- specifies data attribute are consider as a data system
A data set is a collection of data that can be used for data mining
purposes. A data set can be composed of different types of data,
such as numerical, categorical, textual, spatial, temporal, and so on. A
data set can also have different characteristics, such as size,
dimensionality, quality, and structure. Depending on the data mining
task and the data mining method, you may need to preprocess,
transform, or manipulate your data set before applying data mining
techniques.
Test Data:- known as a data which applies to perform specific task
You will need unknown information to test your machine learning
model after it was created (using your training data). This data is
known as testing data, and it may be used to assess the progress and
efficiency of your algorithms' training as well as to modify or optimize
them for better results.
 Showing the original set of data.
 Be large enough to produce reliable projections
This dataset needs to be "unseen" and recent. This is because the
training data was already "learned" by your model. You can decide if
it is operating successfully or when it need more training data to
fulfill your standards by observing how it performs on fresh test data.
Test data provides as a last, real check if an unknown dataset was
correctly trained by the machine learning algorithm.

Trained Data :- Approach to learn by data


Testing data is used to determine the performance of the trained
model, whereas training data is used to train the machine learning
model. Training data is the power that supplies the model in machine
learning, it is larger than testing data. Because more data helps to
more effective predictive models. When a machine learning
algorithm receives data from our records, it recognizes patterns and
creates a decision-making model.
Algorithms allow a company's past experience to be used to make
decisions. It analyzes all previous cases and their results and, using
this data creates models to score and predict the outcome of current
cases. The more data ML models have access to, the more reliable
their predictions get over time.

Exp 4
Preprocessing step
Data cleaning :- missing value, unwanted part, noise remove
1. Missing value:-
 Fill manually
 Remove attribute
 Use global constant nil, null
2. Noise :- unwanted data
 Smoothing techniques
 Data order
 Bin divide
 Smoothing technique
 Smoothing bin by mean
 Smoothing bin by medium
 Smoothing bin by Boundaries
3. Reggresion
4. cluster

You might also like