- Dallas, Texas
Starred repositories
CommonForms — open models to auto-detect PDF form fields
UniTable: Towards a Unified Table Foundation Model
This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. RAG systems combine information retrieval with generative models to provide accurate and cont…
📑 PageIndex: Document Index for Vectorless, Reasoning-based RAG
Tensorlake is a Document Ingestion API and a serverless platform for building data processing and orchestration APIs
Building blocks for rapid development of GenAI applications
A community-driven collection of RAG (Retrieval-Augmented Generation) frameworks, projects, and resources. Contribute and explore the evolving RAG ecosystem.
Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.
A realtime serving engine for Data-Intensive Generative AI Applications
Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’
Mississipi Model Running using CPU
This repo lists relevant papers summarized in our survey paper: A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models.
This repository contains the Hugging Face Agents Course.
主要记录大语言大模型(LLMs) 算法(应用)工程师相关的知识及面试题
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
[NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of existing MLLMs to comprehend long multimodal documents.
The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.
Collection of training data management explorations for large language models
LAVIS - A One-stop Library for Language-Vision Intelligence
[ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models (CVPR 2024 Highlight)
pix2tex: Using a ViT to convert images of equations into LaTeX code.
A quick guide (especially) for trending instruction finetuning datasets
The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model




