0% found this document useful (0 votes)
108 views

Python Application Development Using Imbalanced-Learn

The document discusses the Python package imbalanced-learn, which provides resampling techniques for addressing class imbalance in datasets. It describes how class imbalance can negatively impact machine learning algorithms and how resampling the data can help create a more robust model. The document outlines the different types of resampling techniques provided by imbalanced-learn, including oversampling, undersampling, and ensemble methods. It also provides installation instructions and an example of using the ClusterCentroids resampling algorithm.

Uploaded by

enghoss77
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
108 views

Python Application Development Using Imbalanced-Learn

The document discusses the Python package imbalanced-learn, which provides resampling techniques for addressing class imbalance in datasets. It describes how class imbalance can negatively impact machine learning algorithms and how resampling the data can help create a more robust model. The document outlines the different types of resampling techniques provided by imbalanced-learn, including oversampling, undersampling, and ensemble methods. It also provides installation instructions and an example of using the ClusterCentroids resampling algorithm.

Uploaded by

enghoss77
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Boostlog   Sign in

JUNE 25, 2018

Python Application Development


Using Imbalanced-learn
python development imbalanced learn

Bily809
3248 views
bily809

Boostlog is an online community for developers


Introduction  Sign in with GitHub.
who want to share ideas and grow each other.
Imbalanced-learn is a python package offering a number of re-sampling
Boostlog   Sign in
techniques commonly used in datasets showing strong between-class
imbalance. It is compatible with scikit-learn and is part of scikit-learn-contrib
projects. Some of its Applications are in:

Bioinformatics
Medical imaging: diseases versus healthy
Social sciences: prediction of academic dropout
Web services: Service Level Agreement violation prediction
Security services: fraud detection

Most classification algorithms will only perform optimally when the number of
samples of each class is roughly the same. Highly skewed datasets, where the
minority is heavily outnumbered by one or more classes, have proven to be a
challenge while at the same time becoming more and more common. One way of
addressing this issue is by re-sampling the dataset as to offset this imbalance
with the hope of arriving at a more robust and fair decision boundary than you
would otherwise.

Re-sampling techniques are divided in two categories:


1. Under-sampling the majority class(es).
2. Over-sampling the minority class.
3. Combining over- and under-sampling.
4. Create ensemble balanced sets.

imbalanced-learn is an open-source python toolbox aiming at providing a wide


range of methods to cope with the problem of imbalanced dataset frequently
encountered in machine learning and pattern recognition. The implemented
state-of-the-art methods can be categorized into 4 groups:

(i) under-sampling,
Boostlog
(ii) isover-sampling,
an online community for developers
 Sign in with GitHub.
who want to share ideas and grow each other.
(iii) combination of over- and under-sampling, and
Boostlog   Sign in
(iv) ensemble learning methods.

Under-sampling

i. Random majority under-sampling with replacement


ii. Extraction of majority-minority Tomek links
iii. Under-sampling with Cluster Centroids
iv. NearMiss-(1 & 2 & 3)
v. Condensend Nearest Neighbour
vi. One-Sided Selection
vii. Neighboorhood Cleaning Rule
viii. Edited Nearest Neighbours
ix. Instance Hardness Threshold
x. Repeated Edited Nearest Neighbours
xi. AllKNN

Over-sampling

xii. Random minority over-sampling with replacement


xiii. SMOTE - Synthetic Minority Over-sampling Technique
xiv. bSMOTE(1 & 2) - Borderline SMOTE of types 1 and 2
xv. SVM SMOTE - Support Vectors SMOTE
xvi. ADASYN - Adaptive synthetic sampling approach for imbalanced learning

Over-sampling followed by under-sampling

xvii. SMOTE + Tomek links


xviii. SMOTE + ENN

Ensemble sampling

xix. EasyEnsemble
xx. BalanceCascade

The different algorithms are presented in the sphinx-gallery.


Boostlog is an online community for developers
 Sign in with GitHub.
who want to share ideas and grow each other.
The toolbox only depends on numpy , scipy, and scikit-learn and is distributed
Boostlog   Sign in
under MIT license. Furthermore, it is fully compatible with scikit-learn and is part
of the scikit-learn-contrib supported project.

Installation

imbalanced-learn is tested to work under Python 2.7, Python 3.5 and 3.6. The
dependency requirements are based on the last scikit-learn release:

scipy (>=0.13.3)
numpy (>=1.8.2)
scikit-learn (>=0.19.0)

imbalanced-learn is currently available on the PyPi’s repository and you can


install it via pip:

pip install -U imbalanced-learn

Example

The example here illustrates a sampling technique.

>>> from collections import Counter


>>> from sklearn.datasets import make_classification
>>> X, y = make_classification(n_samples=5000, n_features=2, n_informative=2,
... n_redundant=0, n_repeated=0, n_classes=3,
... n_clusters_per_class=1,
... weights=[0.01, 0.05, 0.94],
... class_sep=0.8, random_state=0)
>>> print(sorted(Counter(y).items()))
[(0, 64), (1, 262), (2, 4674)]
>>> from imblearn.under_sampling import ClusterCentroids
>>> cc = ClusterCentroids(random_state=0)
>>> X_resampled, y_resampled = cc.fit_sample(X, y)
>>> print(sorted(Counter(y_resampled).items()))
[(0, 64), (1, 64), (2, 64)]
Boostlog is an online community for developers
 Sign in with GitHub.
who want to share ideas and grow each other.
Boostlog
Related article   Sign in

17 best python libraries

AUTHOR

Bily809
bily809

0 Sign in with Github   

Boostlog is an online community for developers


who want to share ideas and grow each other.

 Sign up with GitHub.

READ NEXT

Jan 25 2018

What teams are suitable for development with


React Native
react development beginner +1

Boostlog is an online community for developers


Junpei Shimotsu  Sign in with GitHub.
who want to share ideas and grow each other.
junp1234
 106 Sign in
Boostlog  

Jan 25 2018

Plink in Python
python

Margot Swift
margot_swift19 0

Boostlog is an online community for developers


 Sign in with GitHub.
who want to share ideas and grow each other.

You might also like