AI Models Directory

Browse and discover AI models from leading companies in the industry.

GPT 5.2 Codex Gen 2

By OpenAI

GPT-5.2-Codex is OpenAI’s frontier agentic coding model based on GPT-5.2, optimized for long-horizon software work, large refactors and migrations, better Windows support and state-of-the-art cybersecurity performance in Codex.

NewCoding

Released 1d ago
GPT 5.3 Codex Gen 2

By OpenAI

GPT-5.3-Codex is OpenAI’s most capable agentic coding model, combining GPT-5.2-Codex’s frontier coding with GPT-5.2’s reasoning in one model that is ~25% faster, tops SWE-Bench Pro and Terminal-Bench, and runs long, tool-using workflows on a computer.

NewCoding

Released 1d ago
Claude Opus 4.6 Gen 3

By Anthropic

Claude Opus 4.6 is Anthropic’s newest frontier model, built for complex reasoning, large-scale coding and multi-agent orchestration, with major gains in cybersecurity bug-finding and long-horizon enterprise workflows over Opus 4.5.

NewMultimodal

Released 1d ago
Sarvam Translate Gen 7

By Sarvam AI

Open weights translation model built on Gemma 3 4B that translates across all 22 official Indian languages, handling long form and structured documents with cultural nuance.

Text

Released 7mo ago
Music Flamingo Gen 4

By NVIDIA

Music Flamingo is NVIDIA’s large audio-language model for deep music understanding, fine-tuned on the MF-Skills and MF-Think datasets to analyze full songs with theory-aware chain-of-thought and state-of-the-art benchmark scores

NewAudio

Released 2mo ago
MiniCPM-o 4.5 Gen 3

By OpenBMB

MiniCPM-o 4.5 is an on-device multimodal LLM (~9B params) that matches Gemini 2.5 Flash on vision and speech, supporting full-duplex live streaming so it can see, listen and speak in real time.

Multimodal

Released 5mo ago
PaperBanana Gen 3

By dwzhu-pku

PaperBanana is an agentic framework that turns raw scientific content into publication-ready methodology diagrams and plots, orchestrating multiple AI agents plus a dedicated benchmark, PaperBananaBench.

NewMultimodal

Released 1d ago
Intern S1 Gen 3

By InternLM

Intern-S1 is a scientific multimodal foundation model built on a 235B-parameter Qwen3 MoE LLM plus a 6B vision encoder, trained on 5T multimodal tokens with over half from scientific domains.

Multimodal

Released 5mo ago
Universal 3 Pro Gen 4

By AssemblyAI

Universal-3 Pro is a promptable speech language model that turns raw audio into highly accurate transcripts, letting you control disfluencies, tags, speaker roles and code-switching through natural-language prompts.

NewAudio

Released 3d ago
PaddleOCR-VL 1.5 Gen 3

By Baidu

Production ready OCR and document AI toolkit that turns images and PDFs into structured data, with multilingual OCR, layout analysis and VLM based document parsing.

NewText

Released 8d ago
MOVA Gen 3

By OpenMOSS

Open source foundation model that jointly generates video and audio in one pass, achieving tightly synchronized lip movements and environment-aware sound effects.

NewVideo

Released 5d ago
LongCat Flash Chat Gen 7

By Meituan

Chat-oriented LongCat-Flash variant, a 560B MoE language model with around 27B parameters active per token, tuned as a fast, non-thinking foundation for general and agentic tasks.

Text

Released 5mo ago
LongCat Image Gen 4

By Meituan

6B parameter bilingual (Chinese-English) text-to-image foundation model focused on photorealism, strong Chinese text rendering and high quality image editing.

NewImage

Released 10mo ago
LongCat Flash Thinking Gen 7

By Meituan

Large reasoning model built on a 560B MoE backbone, activating about 18.6 to 31.3B parameters for advanced chain-of-thought, formal and agentic reasoning tasks.

Text

Released 4mo ago
LongCat Flash Omni Gen 3

By Meituan

560B parameter omni-modal MoE model (about 27B active) for real time audio-visual interaction, built on LongCat-Flash with multimodal perception and speech modules.

NewMultimodal

Released 1d ago
LuxTTS Gen 4

By ysharma3501

ZipVoice-based voice cloning TTS that generates 48 kHz speech at up to 150x real time, fitting in about 1 GB VRAM for local, high quality synthesis

NewAudio

Released 4d ago
Kokoro 82M Gen 4

By Hexgrad

Open-weight 82M-parameter TTS model that delivers high quality speech at low cost, designed for fast, production-ready deployment across several languages.

Audio

Released 1y ago
SkyReels V3 Gen 3

By Skywork AI

Long-form video extension engine that analyzes scene semantics and motion to extend clips with coherent shots, maintaining strong temporal consistency and cinematic storytelling

NewMultimodal

Released 8d ago
FASHN VTON v1.5 Gen 4

By FASHN AI

Virtual try-on model that composes a person and garment image into a photorealistic result in pixel space, without segmentation masks, supporting model shots and flat-lay product photos.

NewImage

Released 1mo ago
Qwen Code Qoder Gen 2

By Alibaba

Qwen-Coder-Qoder is a reinforced code model based on Qwen-Coder, custom trained for the Qoder agentic coding platform to improve end to end programming performance inside its IDE and CLI workflows

NewCoding

Released 2d ago
ACE Step v1.5 Gen 4

By StepFun

ACE-STEP v1.5 is an open source, super fast music foundation model that uses a hybrid language model plus diffusion transformer pipeline to turn short prompts into multi minute songs, running on consumer GPUs with under 4 GB VRAM

NewAudio

Released 5d ago

No models found

Try adjusting your search or filters.

...

Search

AI Models Directory

No models found

Help

People also viewed

Create AI Tools

Mini Tool

Vibe code an AI Tool

Choose listing type: