NuMarkdown-8B-Thinking is the first reasoning OCR vision-language model (VLM) designed to convert documents into clean Markdown optimized for retrieval-augmented generation (RAG). Built on Qwen 2.5-VL-7B and fine-tuned with synthetic Doc → Reasoning → Markdown examples, it generates thinking tokens before producing the final Markdown to better handle complex layouts and tables. It uses a two-phase training process: supervised fine-tuning (SFT) followed by reinforcement learning (GRPO) with a layout-centric reward for accuracy on challenging documents. The model excels at non-standard layouts and complex table structures, outperforming non-reasoning OCR systems like GPT-4o and OCRFlux, and competing with large closed-source reasoning models like Gemini 2.5. Thinking token usage can range from 20% to 500% of the final answer, depending on task difficulty. NuMarkdown-8B-Thinking is released under the MIT license and supports vLLM and Transformers for deployment.

Features

  • 8.29B parameter reasoning-enabled OCR VLM
  • Converts documents to clean, structured Markdown
  • Generates thinking tokens to plan before output
  • Excels at complex layouts and merged table cells
  • Trained with SFT + RL (GRPO) and layout rewards
  • Outperforms GPT-4o and OCRFlux in markdown tasks
  • MIT license for unrestricted use
  • Supports vLLM and Transformers for inference Preguntar a ChatGPT

Project Samples

Project Activity

See All Activity >

Categories

AI Models

Follow NuMarkdown-8B-Thinking

NuMarkdown-8B-Thinking Web Site

Other Useful Business Software
Figure Markets: OnChain Assets Icon
Figure Markets: OnChain Assets

Figure Markets bridges crypto and real-world assets into one seamless platform.

Instantly unlock liquidity with Crypto-Backed Loans—borrow against BTC or ETH at rates starting from 12.5% APR, with no credit checks, no prepayment penalties, and 3-month terms. Keep your crypto, get your cash.
Learn More
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of NuMarkdown-8B-Thinking!

Additional Project Details

Registered

2025-08-11