Generate short videos with one click using AI LLM
Generate blog articles from video or audio
GPT4V-level open-source multi-modal model based on Llama3-8B
text and image to video generation: CogVideoX (2024) and CogVideo
Qwen3-omni is a natively end-to-end, omni-modal LLM
Capable of understanding text, audio, vision, video
Lightweight Python library for adding real-time multi-object tracking
Workflow and speech recognition app
Secure open source cloud runtime for AI apps & AI agents
Qwen3-VL, the multimodal large language model series by Alibaba Cloud
Adversarial Robustness Toolbox (ART) - Python Library for ML security
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
Official Repo For "Sa2VA: Marrying SAM2 with LLaVA
Multi-Modal Neural Networks for Semantic Search, based on Mid-Fusion
Deep Learning API and Server in C++14 support for Caffe, PyTorch
C++ library for high performance inference on NVIDIA GPUs
The Triton Inference Server provides an optimized cloud
Vision utilities for web interaction agents
A GPU-accelerated library containing highly optimized building blocks
Qwen2.5-VL is the multimodal large language model series
Data Lake for Deep Learning. Build, manage, and query datasets
A lightweight vision library for performing large object detection
MNN is a blazing fast, lightweight deep learning framework
OCR expert VLM powered by Hunyuan's native multimodal architecture
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model