Abstract
Nemotron-Parse-1.1 is a lightweight OCR and document parsing model with improved capabilities in general OCR, markdown formatting, structured table parsing, and text extraction from images, using an encoder-decoder architecture.
We introduce Nemotron-Parse-1.1, a lightweight document parsing and OCR model that advances the capabilities of its predecessor, Nemoretriever-Parse-1.0. Nemotron-Parse-1.1 delivers improved capabilities across general OCR, markdown formatting, structured table parsing, and text extraction from pictures, charts, and diagrams. It also supports a longer output sequence length for visually dense documents. As with its predecessor, it extracts bounding boxes of text segments, as well as corresponding semantic classes. Nemotron-Parse-1.1 follows an encoder-decoder architecture with 885M parameters, including a compact 256M-parameter language decoder. It achieves competitive accuracy on public benchmarks making it a strong lightweight OCR solution. We release the model weights publicly on Huggingface, as well as an optimized NIM container, along with a subset of the training data as part of the broader Nemotron-VLM-v2 dataset. Additionally, we release Nemotron-Parse-1.1-TC which operates on a reduced vision token length, offering a 20% speed improvement with minimal quality degradation.
Community
Lightweight 885M-parameter encoder-decoder OCR model for improved document parsing (OCR, markdown, tables) with bounding-box outputs and a faster 1.1-TC variant, released publicly.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- NVIDIA Nemotron Nano V2 VL (2025)
- DeepSeek-OCR: Contexts Optical Compression (2025)
- DocSLM: A Small Vision-Language Model for Long Multimodal Document Understanding (2025)
- HunyuanOCR Technical Report (2025)
- olmOCR 2: Unit Test Rewards for Document OCR (2025)
- MonkeyOCR v1.5 Technical Report: Unlocking Robust Document Parsing for Complex Patterns (2025)
- Hybrid OCR-LLM Framework for Enterprise-Scale Document Information Extraction Under Copy-heavy Task (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 2
Datasets citing this paper 0
No dataset linking this paper