AI Models Directory
Browse and discover AI models from leading companies in the industry.
to
-
By OpenAIGPT-5.2-Codex is OpenAI’s frontier agentic coding model based on GPT-5.2, optimized for long-horizon software work, large refactors and migrations, better Windows support and state-of-the-art cybersecurity performance in Codex.NewCodingReleased 1d ago
-
By OpenAIGPT-5.3-Codex is OpenAI’s most capable agentic coding model, combining GPT-5.2-Codex’s frontier coding with GPT-5.2’s reasoning in one model that is ~25% faster, tops SWE-Bench Pro and Terminal-Bench, and runs long, tool-using workflows on a computer.NewCodingReleased 1d ago
-
By AnthropicClaude Opus 4.6 is Anthropic’s newest frontier model, built for complex reasoning, large-scale coding and multi-agent orchestration, with major gains in cybersecurity bug-finding and long-horizon enterprise workflows over Opus 4.5.NewMultimodalReleased 1d ago
-
By Sarvam AIOpen weights translation model built on Gemma 3 4B that translates across all 22 official Indian languages, handling long form and structured documents with cultural nuance.TextReleased 7mo ago
-
By NVIDIAMusic Flamingo is NVIDIA’s large audio-language model for deep music understanding, fine-tuned on the MF-Skills and MF-Think datasets to analyze full songs with theory-aware chain-of-thought and state-of-the-art benchmark scoresNewAudioReleased 2mo ago
-
By OpenBMBMiniCPM-o 4.5 is an on-device multimodal LLM (~9B params) that matches Gemini 2.5 Flash on vision and speech, supporting full-duplex live streaming so it can see, listen and speak in real time.MultimodalReleased 5mo ago
-
By dwzhu-pkuPaperBanana is an agentic framework that turns raw scientific content into publication-ready methodology diagrams and plots, orchestrating multiple AI agents plus a dedicated benchmark, PaperBananaBench.NewMultimodalReleased 1d ago
-
By InternLMIntern-S1 is a scientific multimodal foundation model built on a 235B-parameter Qwen3 MoE LLM plus a 6B vision encoder, trained on 5T multimodal tokens with over half from scientific domains.MultimodalReleased 5mo ago
-
By AssemblyAIUniversal-3 Pro is a promptable speech language model that turns raw audio into highly accurate transcripts, letting you control disfluencies, tags, speaker roles and code-switching through natural-language prompts.NewAudioReleased 3d ago
-
By BaiduProduction ready OCR and document AI toolkit that turns images and PDFs into structured data, with multilingual OCR, layout analysis and VLM based document parsing.NewTextReleased 8d ago
-
By OpenMOSSOpen source foundation model that jointly generates video and audio in one pass, achieving tightly synchronized lip movements and environment-aware sound effects.NewVideoReleased 5d ago
-
By MeituanChat-oriented LongCat-Flash variant, a 560B MoE language model with around 27B parameters active per token, tuned as a fast, non-thinking foundation for general and agentic tasks.TextReleased 5mo ago
-
By Meituan6B parameter bilingual (Chinese-English) text-to-image foundation model focused on photorealism, strong Chinese text rendering and high quality image editing.NewImageReleased 10mo ago
-
By MeituanLarge reasoning model built on a 560B MoE backbone, activating about 18.6 to 31.3B parameters for advanced chain-of-thought, formal and agentic reasoning tasks.TextReleased 4mo ago
-
By Meituan560B parameter omni-modal MoE model (about 27B active) for real time audio-visual interaction, built on LongCat-Flash with multimodal perception and speech modules.NewMultimodalReleased 1d ago
-
By ysharma3501ZipVoice-based voice cloning TTS that generates 48 kHz speech at up to 150x real time, fitting in about 1 GB VRAM for local, high quality synthesisNewAudioReleased 4d ago
-
By HexgradOpen-weight 82M-parameter TTS model that delivers high quality speech at low cost, designed for fast, production-ready deployment across several languages.AudioReleased 1y ago
-
By Skywork AILong-form video extension engine that analyzes scene semantics and motion to extend clips with coherent shots, maintaining strong temporal consistency and cinematic storytellingNewMultimodalReleased 8d ago
-
By FASHN AIVirtual try-on model that composes a person and garment image into a photorealistic result in pixel space, without segmentation masks, supporting model shots and flat-lay product photos.NewImageReleased 1mo ago
-
By AlibabaQwen-Coder-Qoder is a reinforced code model based on Qwen-Coder, custom trained for the Qoder agentic coding platform to improve end to end programming performance inside its IDE and CLI workflowsNewCodingReleased 2d ago
-
By StepFunACE-STEP v1.5 is an open source, super fast music foundation model that uses a hybrid language model plus diffusion transformer pipeline to turn short prompts into multi minute songs, running on consumer GPUs with under 4 GB VRAMNewAudioReleased 5d ago
No models found
Try adjusting your search or filters.
...
