Monk data set - Miscellaneous script style line-strip images without transcription V0
Description
Miscellaneous line-strip images without transcription, e.g., for pretraining, e.g., autoencoders, GANs or shape-based transformers.
Number: 274164 line-strip images
Monk manuscripts:
GilmanJournal-16pages 19th century US diary, connected cursive
GilmanJournal-177_174
GilmanJournal-vol15
GilmanJournal-vol18
GilmanJournal-vol20
GilmanLetters-Fam-A Idem, with some typewriting
NL-AsdUvA_UBAinv373 Connected cursive, Dutch
NNM001001033 Natuurlijk Commissie/Naturalis - biologists' jungle field notes, Java, [Indonesia]
NNM001001034
NNM001001035
NNM001001036
NNM001001037
NNM001001117
NNM001001118
PP_Part01_01 - Prize Papers - with Brill
navis-H2_7823_0001-1094 - Kabinet of the King (KdK, Dutch National Archive) early 20th century, administrative, connected cursive
navis-NL-HaNA_2.02.04_3960
navis-NL-HaNA_2.02.14_7826
Notes
Files
Additional details
References
- Tijn van der Zant, Lambert Schomaker, Svitlana Zinger & Henny van Schie (2009) Where are the Search Engines for Handwritten Documents?, Interdisciplinary Science Reviews, 34:2-3, 224-235, DOI: 10.1179/174327909X441126
- Schomaker, L. (2020). Lifelong Learning for Text Retrieval and Recognition in Historical Handwritten Document Collections. In Series in Machine Perception and Artificial Intelligence (pp. 221–248). WORLD SCIENTIFIC. https://doi.org/10.1142/9789811203244_0012
- Schomaker, Lambert (2016). "Design considerations for a large-scale image-based text search engine in historical manuscript collections" it - Information Technology, vol. 58, no. 2, pp. 80-88. https://doi.org/10.1515/itit-2015-0049
- Ameryan, M., Schomaker, L. (2021) A limited-size ensemble of homogeneous CNN/LSTMs for high-performance word classification. Neural Comput & Applic 33, 8615–8634. https://doi.org/10.1007/s00521-020-05612-0