Stars
A high-throughput and memory-efficient inference and serving engine for LLMs
Qlib is an AI-oriented Quant investment platform that aims to use AI tech to empower Quant Research, from exploring ideas to implementing productions. Qlib supports diverse ML modeling paradigms, i…
Fully open reproduction of DeepSeek-R1
🤗 smolagents: a barebones library for agents that think in code.
SGLang is a high-performance serving framework for large language models and multimodal models.
Best Practices on Recommendation Systems
SWE-agent takes a GitHub issue and tries to automatically fix it, using your LM of choice. It can also be employed for offensive cybersecurity or competitive coding challenges. [NeurIPS 2024]
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
Qwen3-Coder is the code version of Qwen3, the large language model series developed by Qwen team, Alibaba Cloud.
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
A unified, comprehensive and efficient recommendation library
[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
Minimalistic large language model 3D-parallelism training
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
A standard framework for modelling Deep Learning Models for tabular data
A configurable, tunable, and reproducible library for CTR prediction https://fuxictr.github.io
An open-source solution for full parameter fine-tuning of DeepSeek-V3/R1 671B, including complete code and scripts from training to inference, as well as some practical experiences and conclusions.…
Official implementation of "Continuous Autoregressive Language Models"
HLLM: Enhancing Sequential Recommendations via Hierarchical Large Language Models for Item and User Modeling
Analyzing Hacker News discussions from a decade ago in hindsight with LLMs
Agentar-Scale-SQL is a novel framework that leverages scalable computation to significantly improve Text-to-SQL performance.
Best practices for training DeepSeek, Mixtral, Qwen and other MoE models using Megatron Core.