<h1 align="center">
<br>
rStar2-Agent
</h1>
<p align="center">
ð <a href="https://huggingface.co/papers/2508.20722" target="_blank">[Paper]</a>
</p>
Repo for "[rStar2-Agent: Agentic Reasoning Technical Report](https://huggingface.co/papers/2508.20722)".
Authors: Ning Shang\*, Yifei Liu\*, Yi Zhu\*, Li Lyna Zhang\*â , Weijiang Xu, Xinyu Guan, Buze Zhang, Bingcheng Dong, Xudong Zhou, Bowen Zhang, Ying Xin, Ziming Miao, Scarlett Li, Fan Yang, Mao Yangâ
<p align="center">
<img src="images/figure-1.png" width="1000">
<br>
<em>Figure 1: rStar2-Agent-14B reaches frontier-level math reasoning in just 510 RL training step</em>
</p>
## News
- **[07/15/2025]** Our rStar-Coder [paper](https://arxiv.org/abs/2505.21297) and [dataset](https://huggingface.co/datasets/microsoft/rStar-Coder) are released. We introduce a large-scale, verified dataset of 418K competition-level code problems with **test cases** of varying difficulty, enabling small LLMs (1.5B-14B) to achieve frontier-level code reasoning performance.
- **[02/10/2025]** We are hiring interns! If you are interested in improving LLM reasoning, please send your CV to [email protected].
- **[01/21/2025]** rStar-Math code has been open-sourced.
- **[01/09/2025]** rStar-Math paper is released: https://huggingface.co/papers/2501.04519.
Note: Our prior work [Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers](https://huggingface.co/papers/2408.06195) is open-sourced on the [rStar-mutualreasoning b](https://github.com/microsoft/rStar/tree/rStar-mutualreasoning) branch.
Note: Our prior work [rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking](https://huggingface.co/papers/2501.04519) is open-sourced on the [rStar-math](https://github.com/microsoft/rStar/tree/rStar-math) branch.
## Contents
- [Introduction](#Introduction)
- [Try rStar2-Agent with Tool Calling](#Try-rStar2-Agent-with-Tool-Calling)
- [Evaluation](#Evaluation)
- [rStar2-Agent RL Training](#rStar2-Agent-RL-Training)
- [Citation](#Citation)
## Introduction
We introduce rStar2-Agent, a 14B math reasoning model that thinks smarter rather than merely longer, achieving performance comparable to 671B DeepSeek-R1 through pure agentic reinforcement learning. The model plans, reasons, and autonomously uses coding tools to efficiently explore, verify, and reflect for more complex problem-solving. This capability relies on three key innovations: (i) GRPO-RoC, an effective agentic reinforcement learning algorithm with a novel Resample-on-Correct rollout strategy that optimizes coding tool usage and enables shorter, smarter reasoning by selectively retaining higher-quality positive trajectories while preserving all failure cases; (ii) a scalable and efficient RL infrastructure that supports high-throughput tool call execution and mitigates the high costs of agentic RL rollout, enabling efficient training on limited GPU resources (64 MI300X GPUs); (iii) an agent training recipe that starts with non-reasoning SFT and proceeds through multi-stage RL with concise maximum response lengths per stage and increasing dataset difficulty. To this end, rStar2-Agent boosts a pre-trained 14B model to state-of-the-art levels in only 510 RL steps within one week, achieving 80.6% and 69.8% average pass@1 on AIME24 and AIME25, surpassing DeepSeek-R1 (671B) with shorter responses. Beyond mathematics, rStar2-Agent-14B also demonstrates strong generalization to alignment, scientific reasoning, and agentic tool-use tasks.
## Try rStar2-Agent with Tool Calling
### Installation
#### Option 1: Manual Installation
```bash
# Initialize and update submodules
git submodule init
git submodule update
# install verl
pip install "torch<2.8"
pip install -r verl/requirements_sglang.txt
pip install -e verl
# install code judge
pip install -r code-judge/requirements.txt
pip install -e code-judge
# install rstar2_agent
pip install -e .
```
#### Option 2: Automated Installation
```bash
bash install.sh
```
### Code Judge Server Setup
> â ï¸ **Security Warning**: Code Judge executes arbitrary code. Always deploy in an isolated environment (preferably Docker) and never expose to external networks.
The rStar2-Agent uses Code Judge as a tool call server to execute model-generated Python code.
#### 1. Start Redis Server
```bash
sudo apt-get update -y && sudo apt-get install redis -y
redis-server --daemonize yes --protected-mode no --bind 0.0.0.0
```
#### 2. Launch Code Judge Server
```bash
# Start the main server (master node only)
# Environment variables can be configured as per: https://github.com/0xWJ/code-judge/blob/main/app/config.py
# Replace $WORKSPACE and $MASTER_ADDR with your actual paths
tmux new-session -d -s server \
'cd $WORKSPACE/code-judge && \
MAX_EXECUTION_TIME=4 \
REDIS_URI="redis://$MASTER_ADDR:6379" \
RUN_WORKERS=0 \
uvicorn app.main:app --host 0.0.0.0 --port 8088 --workers 16 \
2>&1 | tee server.log'
```
#### 3. Start Code Judge Workers
```bash
# Launch workers (can be deployed on multiple nodes for increased parallelism)
# Adjust MAX_WORKERS based on your CPU count per node
tmux new-session -d -s worker \
'cd $WORKSPACE/code-judge && \
MAX_EXECUTION_TIME=4 \
REDIS_URI="redis://$MASTER_ADDR:6379" \
MAX_WORKERS=64 \
python run_workers.py \
2>&1 | tee worker.log'
```
### Launch the VLLM Server
First, start the VLLM server:
```bash
vllm serve /path/to/your/model \
--host 0.0.0.0 \
--port 8000 \
--enable-auto-tool-choice \
--tool-call-parser hermes
```
Replace `/path/to/your/model` with the actual path to your downloaded model.
### Verify Server Status
Check if the server is running properly:
```bash
curl http://localhost:8000/v1/models
```
### Run Interactive Chat with Tool Calling
Use the provided script to interact with your model:
```bash
python examples/chat_with_tool_call.py \
--model /path/to/your/model \
--prompt "Solve the system of equations: 2x + 3y = 7, x - y = 1" \
--max_tokens 8192
```
### Script Options
The `examples/chat_with_tool_call.py` script supports the following arguments:
- `--model`: Path to your model
- `--prompt`: Input prompt for the model
- `--max_tokens`: Maximum number of tokens to generate
## Evaluation
### Environment Setup
Please view [Installation](#Installation) and [Code Judge Server Setup](#Code-Judge-Server-Setup).
### Run Evaluation Script
We evaluate following mathematical reasoning benchmarks:
- **AIME 2024/2025 (American Invitational Mathematics Examination)**: High-school level competition mathematics
- **MATH500**: A subset of the MATH dataset containing 500 challenging problems
```bash
MODEL_PATH=/path/to/your/model bash examples/aime_eval.sh
MODEL_PATH=/path/to/your/model bash examples/math500_eval.sh
```
## rStar2-Agent RL Training
A comprehensive reinforcement learning training framework for the rStar2-Agent, built on [Verl](https://github.com/volcengine/verl) and [Code Judge](https://github.com/0xWJ/code-judge). This framework enables training models after instruction-following supervised fine-tuning (SFT).
### Environment Setup
Please view [Installation](#Installation) and [Code Judge Server Setup](#Code-Judge-Server-Setup).
### Data Preparation
This example uses:
- **Training Dataset**: DAPO-17k (English subset)
- **Test Dataset**: AIME24
```bash
# Process AIME 2024 dataset
python data_preprocess/aime2024_rstar2_agent_loop.py
# Process DAPO dataset
python data_preprocess/dapo_rstar2_agent_loop.py
```
### Model Setup
Download the base model (Qwen3-14B-Base):
```bash
huggingface-cli download Qwen/Qwen3-14B-Base --local-dir $HOME/models/Qwen3-14B-Base
```
> **Note**: The base model requires instruction-following SFT before RL training for optimal performance.
### Training
#### Basic Training
Run the training script (for 8x A100/H100 GPUs):
```bash
bash examples/run_qwen3-14b_rstar2_agent_weave.sh
```
> Adjust configuration parameters based on your
没有合适的资源?快使用搜索试试~ 我知道了~
rStar-main.zip

共44个文件
py:27个
sh:4个
yaml:4个

需积分: 0 0 下载量 194 浏览量
2025-09-11
00:26:23
上传
评论
收藏 142KB ZIP 举报
温馨提示
rStar-Math 的核心在于通过“深度思考”(Deep Thinking)增强小型语言模型的数学推理能力。相较于依赖大规模参数的模型,rStar-Math 利用蒙特卡洛树搜索(MCTS)结合自监督强化学习(RL)和符号推理,构建了一个高效的推理框架。
资源推荐
资源详情
资源评论


















格式:zip 资源大小:293.9MB










格式:zip 资源大小:991.0MB
收起资源包目录




























































共 44 条
- 1
资源评论


seegaler

- 粉丝: 147
上传资源 快速赚钱
我的内容管理 展开
我的资源 快来上传第一个资源
我的收益
登录查看自己的收益我的积分 登录查看自己的积分
我的C币 登录后查看C币余额
我的收藏
我的下载
下载帮助


最新资源
- 二维码扫描(14).zip
- 二维码长按保存与识别图中二维码.zip
- 二维码识别与生成.zip
- 安卓学习中学到的东西,数据库 二维码 发送短信 定位 地图 手电筒 播放器 网络状态监听 程序锁.zip
- 二维码扫描,二维码生成.zip
- android,二维码,条形码.zip
- 使用Camera2 API,Zxing,实现条形码和二维码扫描.zip
- 基于区块链的二维码门禁系统.zip
- 二维码追溯系统web端.zip
- 二维码(60).zip
- 扫描二维码_条形码,识别图片二维码Demo.zip
- 斗鱼二维码扫码登录,cookie续期.zip
- 二维码扫码_生成.zip
- 拍照,摄像,美化,录音,截屏,二维码识别,条形码识别.zip
- 二维码(30).zip
- 二维码扫描(26).zip
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈



安全验证
文档复制为VIP权益,开通VIP直接复制
