0% found this document useful (0 votes)
172 views35 pages

Data Augmentation Techniques in Python

The document outlines various data augmentation techniques across different domains including images, audio, natural language processing, and time series data. It highlights common methods such as mirroring, random cropping, and advanced transformations like neural-based augmentations. Additionally, it mentions several Python libraries designed for efficient data augmentation, such as albumentations, imgaug, and nlpaug.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
172 views35 pages

Data Augmentation Techniques in Python

The document outlines various data augmentation techniques across different domains including images, audio, natural language processing, and time series data. It highlights common methods such as mirroring, random cropping, and advanced transformations like neural-based augmentations. Additionally, it mentions several Python libraries designed for efficient data augmentation, such as albumentations, imgaug, and nlpaug.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Transfer Learning

Data Augmentation
Common augmentation method
Mirroring

Random Cropping
Common augmentation method
Mirroring

Random Cropping
Common augmentation method
Mirroring

Random Cropping
Common augmentation method
Mirroring

Random Cropping
Color Shifting

-20,+20,+20
Images augmentation
• Affine transformations
o Rotation
o Scaling
o Random cropping
o Reflection
• Elastic transformations
o Contrast shift
o Brightness shift
o Blurring
o Channel shuffle
Images augmentation
• Advanced transformations
– Random erasing
– Adding rain effects, sun flare...
– Image blending
• Neural-based transformations
– Adversarial noise
– Neural Style Transfer
– Generative Adversarial Networks
Audio augmentation
• Noise injection
• Time shift
• Time stretching
• Random cropping
• Pitch scaling
• Dynamic range compression
• Simple gain
• Equalization
• Voice conversion (Speech)
Natural Language Processing
augmentation

• Thesaurus
• Text Generation
• Back Translation
• Word Embeddings
• Contextualized Word Embeddings
• Paraphrasing
• Text perturbation
Natural Language Processing
augmentation

• Thesaurus
• Text Generation
• Back Translation
• Word Embeddings
• Contextualized Word Embeddings
• Paraphrasing
• Text perturbation
Time Series Data Augmentation
• Basic approaches
– Warping
– Jittering
– Perturbing
• Advanced approaches
– Embedding space
– GAN/Adversarial
– RL/Meta-Learning
Images Augmentation
albumentations is a python library with a set of useful, large and diverse
data augmentation methods. It offers over 30 different types of
augmentations, easy and ready to use. Moreover, as the authors prove, the
library is faster than other libraries on most of the transformations.
Images Augmentation
imgaug - is another very useful and widely used python library. As authors describe: it helps you with
augmenting images for your machine learning projects. It converts a set of input images into a new,
much larger set of slightly altered images. It offers many augmentation techniques such as affine
transformations, perspective transformations, contrast changes, gaussian noise, dropout of regions,
hue/saturation changes, cropping/padding, blurring.
Images Augmentation
Kornia - is a differentiable computer vision library for PyTorch. It consists of a set of
routines and differentiable modules to solve generic computer vision problems. At its core,
the package uses PyTorch as its main backend both for efficiency and to take advantage of
the reverse-mode auto-differentiation to define and compute the gradient of complex
functions.
At a granular level, Kornia is a library that consists of the following components:
Images Augmentation
Natural Language Processing
• nlpaug - This python library helps you with augmenting nlp for your
machine learning projects. Visit this introduction to understand
about Data Augmentation in NLP. Augmenter is the basic element of
augmentation while Flow is a pipeline to orchestra multi augmenter
together.
• Features:
o Generate synthetic data for improving model performance
without manual effort
o Simple, easy-to-use and lightweight library. Augment data in 3
lines of code
o Plug and play to any neural network frameworks (e.g. PyTorch,
TensorFlow)
o Support textual and audio input
Natural Language Processing
Audio
• SpecAugment with Pytorch -
([Link]
[Link] ) is a state of the art data augmentation approach for
speech recognition. It supports augmentations such as time wrap, time
mask, frequency mask or all above combined.
Audio
• Audiomentations - A Python library for audio data augmentation.
Inspired by albumentations. Useful for machine learning. It allows to use
effects such as: Compose, AddGaussianNoise, TimeStretch, PitchShift
and Shift.

• - MUDA - A library for Musical Data Augmentation. Muda package


implements annotation-aware musical data augmentation, as described
in the muda paper.
Time series
• tsaug : is a Python package for time series augmentation. It offers a set of
augmentation methods for time series, as well as a simple API to connect
multiple augmenters into a pipeline.

• Example augmenters:
o random time warping 5 times in parallel,
o random crop subsequences with length 300,
o random quantize to 10-, 20-, or 30- level sets,
o with 80% probability , random drift the signal up to 10% - 50%,
o with 50% probability, reverse the sequence.

You might also like