
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Binarize Data Using Python Scikit-Learn
Binarization is a preprocessing technique which is used when we need to convert the data into binary numbers i.e., when we need to binarize the data. The scikit-learn function named Sklearn.preprocessing.binarize() is used to binarize the data.
This binarize function is having threshold parameter, the feature values below or equal this threshold value is replaced by 0 and value above it is replaced by 1.
In this tutorial, we will learn to binarize data and sparse matrices using Scikit-learn (Sklearn) in Python.
Example
Let's see an example in which we preprocess a numpy array into binary numbers ?
# Importing the necessary packages import sklearn import numpy as np from sklearn import preprocessing X = [[ 0.4, -1.8, 2.9],[ 2.5, 0.9, 0.3],[ 0., 1., -1.5],[ 0.1, 2.9, 5.9]] Binarized_Data = preprocessing.Binarizer(threshold=0.5).transform(X) print("\nThe Binarized data is:\n", Binarized_Data)
Output
It will produce the following output ?
The Binarized data is: [[0. 0. 1.] [1. 1. 0.] [0. 1. 0.] [0. 1. 1.]]
How to Binarize Sparse Matrices?
Sparse matrix is comprised of mostly zero values, and they are distinct from so called dense matrices which comprise mostly non-zero values. Spare matrices are special because, to save space in memory, the zeros aren't stored.
We can use Scikit-learn preprocessing.binarize() function to binarize the sparse matrices but the condition is that the threshold value cannot be less than zero.
Example 1
Let's see an example to understand it ?
# Import necessary libraries import sklearn from scipy.sparse import coo import numpy as np # Create sparse matrix sparse_matrix = coo.coo_matrix(np.random.binomial(1, .25, 50)) # Import sklearn preprocessing module from sklearn import preprocessing sparse_binarized = preprocessing.binarize(sparse_matrix, threshold=-1)
Output
It will produce the error that the value of threshold cannot be less than 0 ?
ValueError: Cannot binarize a sparse matrix with threshold < 0
Example 2
Let's see same example having threshold value more than zero ?
# Import necessary libraries import sklearn from scipy.sparse import coo import numpy as np # Create sparse matrix sparse_matrix = coo.coo_matrix(np.random.binomial(1, .25, 50)) # Import sklearn preprocessing module from sklearn import preprocessing sparse_binarized = preprocessing.binarize(sparse_matrix, threshold=0.8) print(sparse_binarized)
Output
It will produce the following output ?
(0, 5) 1 (0, 6) 1 (0, 9) 1 (0, 15) 1 (0, 25) 1 (0, 27) 1 (0, 29) 1 (0, 30) 1 (0, 31) 1