Published July 1, 2019 | Version 2019
Dataset Open

Monk Cuper Set (MCS) for benchmarking historical document image binarization

  • 1. University of Groningen

Description

****************************
Monk Cuper Set (MCS): used for document binarization and enhancement

Images are collected on the book Cuper on Monk system.
Monkweb.nl

Ground-Truth are labelled by Zhenwei Shi.

If you use this data set, please cite the paper:

DeepOtsu: Document Enhancement and Binarization using Iterative Deep Learning.
Pattern Recognition. https://www.sciencedirect.com/science/article/abs/pii/S0031320319300330

File names:

Cuper-*.png             : the original input image
GT-Cuper-*.png              : the ground-truth labeled by Zhenwei Shi
________________________________________________________________________

The document images were collected by Jetze Touber of the University of Gent in his study:

Jetze Touber (2016)
De actualiteit van de klassieken bij Gisbert Cuper Weegblad :
Nieuwsblad van de Vereniging De Waag. 8(1). p.6-7, http://hdl.handle.net/1854/LU-8511610

This 17th century Cuper-Braun collection in the Monk system, concerns a series of European scholarly letters by different writers.
They write in a multitude of languages, switching from Latin to French, interjecting the text with phrases in Greek and Hebrew.

Jetze Touber collected the images using his 2014/2015 Apple iPhone,scholarly letters from Johannes Braunius and Gisbert Cuper in the archives. The scans contain chromatic aberration, focus variation (on top of the traditional problems with historical manuscripts).

Contains: 31 .png images and their corresponding ground truth ('GT') binarized versions.

Files

MCSset.zip

Files (78.1 MB)

Name Size Download all
md5:60c0b29b596d95ead3ba502ba44c3a20
78.1 MB Preview Download