【免费】rStar-main.zip资源-CSDN下载

共44个文件

py：27个

sh：4个

yaml：4个

需积分: 0 194 浏览量 2025-09-11 00:26:23 上传评论收藏 142KB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

rStar-main.zip （44个子文件）

rStar-main

fused_compute_score

__init__.py 582B

prime_math

__init__.py 12KB

math_normalize.py 6KB

grader.py 19KB

math_verify.py 1KB

SECURITY.md 3KB

code-judge

LICENSE 1KB

rstar2_agent

__init__.py 220B

tools

__init__.py 198B

tool_parser.py 2KB

code_judge_utils.py 14KB

request_processor.py 17KB

code_judge_tool.py 5KB

rollout

__init__.py 121B

rstar2_agent_loop.py 13KB

reward

__init__.py 117B

server.py 5KB

compute_score.py 702B

down_sample

utils.py 2KB

__init__.py 160B

roc.py 7KB

reject_sampling.py 2KB

rstar2_agent_ray_trainer.py 24KB

main_rstar2_agent.py 13KB

config

tool_config

jupyter_tool_config.yaml 676B

python_tool_config.yaml 1KB

rstar2_agent_loop.yaml 88B

rstar2_agent_trainer.yaml 4KB

examples

math500_eval.sh 3KB

run_qwen3-14b_rstar2_agent_weave.sh 3KB

chat_with_tool_call.py 6KB

aime_eval.sh 3KB

verl

CODE_OF_CONDUCT.md 444B

install.sh 362B

.gitmodules 165B

data_preprocess

aime2025_rstar2_agent_loop.py 2KB

math500_rstar2_agent_loop.py 2KB

aime2024_rstar2_agent_loop.py 2KB

dapo_rstar2_agent_loop.py 2KB

pyproject.toml 337B

SUPPORT.md 1KB

.gitignore 3KB

images

figure-1.png 76KB

README.md 11KB

<h1 align="center"> <br> rStar2-Agent </h1> <p align="center"> ð <a href="https://huggingface.co/papers/2508.20722" target="_blank">[Paper]</a> </p> Repo for "[rStar2-Agent: Agentic Reasoning Technical Report](https://huggingface.co/papers/2508.20722)". Authors: Ning Shang\*, Yifei Liu\*, Yi Zhu\*, Li Lyna Zhang\*â , Weijiang Xu, Xinyu Guan, Buze Zhang, Bingcheng Dong, Xudong Zhou, Bowen Zhang, Ying Xin, Ziming Miao, Scarlett Li, Fan Yang, Mao Yangâ <p align="center"> <img src="images/figure-1.png" width="1000"> <br> <em>Figure 1: rStar2-Agent-14B reaches frontier-level math reasoning in just 510 RL training step</em> </p> ## News - **[07/15/2025]** Our rStar-Coder [paper](https://arxiv.org/abs/2505.21297) and [dataset](https://huggingface.co/datasets/microsoft/rStar-Coder) are released. We introduce a large-scale, verified dataset of 418K competition-level code problems with **test cases** of varying difficulty, enabling small LLMs (1.5B-14B) to achieve frontier-level code reasoning performance. - **[02/10/2025]** We are hiring interns! If you are interested in improving LLM reasoning, please send your CV to [email protected]. - **[01/21/2025]** rStar-Math code has been open-sourced. - **[01/09/2025]** rStar-Math paper is released: https://huggingface.co/papers/2501.04519. Note: Our prior work [Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers](https://huggingface.co/papers/2408.06195) is open-sourced on the [rStar-mutualreasoning b](https://github.com/microsoft/rStar/tree/rStar-mutualreasoning) branch. Note: Our prior work [rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking](https://huggingface.co/papers/2501.04519) is open-sourced on the [rStar-math](https://github.com/microsoft/rStar/tree/rStar-math) branch. ## Contents - [Introduction](#Introduction) - [Try rStar2-Agent with Tool Calling](#Try-rStar2-Agent-with-Tool-Calling) - [Evaluation](#Evaluation) - [rStar2-Agent RL Training](#rStar2-Agent-RL-Training) - [Citation](#Citation) ## Introduction We introduce rStar2-Agent, a 14B math reasoning model that thinks smarter rather than merely longer, achieving performance comparable to 671B DeepSeek-R1 through pure agentic reinforcement learning. The model plans, reasons, and autonomously uses coding tools to efficiently explore, verify, and reflect for more complex problem-solving. This capability relies on three key innovations: (i) GRPO-RoC, an effective agentic reinforcement learning algorithm with a novel Resample-on-Correct rollout strategy that optimizes coding tool usage and enables shorter, smarter reasoning by selectively retaining higher-quality positive trajectories while preserving all failure cases; (ii) a scalable and efficient RL infrastructure that supports high-throughput tool call execution and mitigates the high costs of agentic RL rollout, enabling efficient training on limited GPU resources (64 MI300X GPUs); (iii) an agent training recipe that starts with non-reasoning SFT and proceeds through multi-stage RL with concise maximum response lengths per stage and increasing dataset difficulty. To this end, rStar2-Agent boosts a pre-trained 14B model to state-of-the-art levels in only 510 RL steps within one week, achieving 80.6% and 69.8% average pass@1 on AIME24 and AIME25, surpassing DeepSeek-R1 (671B) with shorter responses. Beyond mathematics, rStar2-Agent-14B also demonstrates strong generalization to alignment, scientific reasoning, and agentic tool-use tasks. ## Try rStar2-Agent with Tool Calling ### Installation #### Option 1: Manual Installation ```bash # Initialize and update submodules git submodule init git submodule update # install verl pip install "torch<2.8" pip install -r verl/requirements_sglang.txt pip install -e verl # install code judge pip install -r code-judge/requirements.txt pip install -e code-judge # install rstar2_agent pip install -e . ``` #### Option 2: Automated Installation ```bash bash install.sh ``` ### Code Judge Server Setup > â ï¸ **Security Warning**: Code Judge executes arbitrary code. Always deploy in an isolated environment (preferably Docker) and never expose to external networks. The rStar2-Agent uses Code Judge as a tool call server to execute model-generated Python code. #### 1. Start Redis Server ```bash sudo apt-get update -y && sudo apt-get install redis -y redis-server --daemonize yes --protected-mode no --bind 0.0.0.0 ``` #### 2. Launch Code Judge Server ```bash # Start the main server (master node only) # Environment variables can be configured as per: https://github.com/0xWJ/code-judge/blob/main/app/config.py # Replace $WORKSPACE and $MASTER_ADDR with your actual paths tmux new-session -d -s server \ 'cd $WORKSPACE/code-judge && \ MAX_EXECUTION_TIME=4 \ REDIS_URI="redis://$MASTER_ADDR:6379" \ RUN_WORKERS=0 \ uvicorn app.main:app --host 0.0.0.0 --port 8088 --workers 16 \ 2>&1 | tee server.log' ``` #### 3. Start Code Judge Workers ```bash # Launch workers (can be deployed on multiple nodes for increased parallelism) # Adjust MAX_WORKERS based on your CPU count per node tmux new-session -d -s worker \ 'cd $WORKSPACE/code-judge && \ MAX_EXECUTION_TIME=4 \ REDIS_URI="redis://$MASTER_ADDR:6379" \ MAX_WORKERS=64 \ python run_workers.py \ 2>&1 | tee worker.log' ``` ### Launch the VLLM Server First, start the VLLM server: ```bash vllm serve /path/to/your/model \ --host 0.0.0.0 \ --port 8000 \ --enable-auto-tool-choice \ --tool-call-parser hermes ``` Replace `/path/to/your/model` with the actual path to your downloaded model. ### Verify Server Status Check if the server is running properly: ```bash curl http://localhost:8000/v1/models ``` ### Run Interactive Chat with Tool Calling Use the provided script to interact with your model: ```bash python examples/chat_with_tool_call.py \ --model /path/to/your/model \ --prompt "Solve the system of equations: 2x + 3y = 7, x - y = 1" \ --max_tokens 8192 ``` ### Script Options The `examples/chat_with_tool_call.py` script supports the following arguments: - `--model`: Path to your model - `--prompt`: Input prompt for the model - `--max_tokens`: Maximum number of tokens to generate ## Evaluation ### Environment Setup Please view [Installation](#Installation) and [Code Judge Server Setup](#Code-Judge-Server-Setup). ### Run Evaluation Script We evaluate following mathematical reasoning benchmarks: - **AIME 2024/2025 (American Invitational Mathematics Examination)**: High-school level competition mathematics - **MATH500**: A subset of the MATH dataset containing 500 challenging problems ```bash MODEL_PATH=/path/to/your/model bash examples/aime_eval.sh MODEL_PATH=/path/to/your/model bash examples/math500_eval.sh ``` ## rStar2-Agent RL Training A comprehensive reinforcement learning training framework for the rStar2-Agent, built on [Verl](https://github.com/volcengine/verl) and [Code Judge](https://github.com/0xWJ/code-judge). This framework enables training models after instruction-following supervised fine-tuning (SFT). ### Environment Setup Please view [Installation](#Installation) and [Code Judge Server Setup](#Code-Judge-Server-Setup). ### Data Preparation This example uses: - **Training Dataset**: DAPO-17k (English subset) - **Test Dataset**: AIME24 ```bash # Process AIME 2024 dataset python data_preprocess/aime2024_rstar2_agent_loop.py # Process DAPO dataset python data_preprocess/dapo_rstar2_agent_loop.py ``` ### Model Setup Download the base model (Qwen3-14B-Base): ```bash huggingface-cli download Qwen/Qwen3-14B-Base --local-dir $HOME/models/Qwen3-14B-Base ``` > **Note**: The base model requires instruction-following SFT before RL training for optimal performance. ### Training #### Basic Training Run the training script (for 8x A100/H100 GPUs): ```bash bash examples/run_qwen3-14b_rstar2_agent_weave.sh ``` > Adjust configuration parameters based on your

评论收藏

内容反馈