- Git clone our repository
- creating conda environment:
conda create -n LLMenv python=3.10
conda activate LLMenv
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
pip install sympy==1.13.1
pip install --upgrade transformers
conda install psutil
pip install peft
pip install sentencepiece
pip install deepspeed
git clone https://github.com/huggingface/alignment-handbook.git
cd ./alignment-handbook/
python -m pip install .you can download dataset from https://huggingface.co/datasets/ZhenlongDai/ACPR.
Download it and place it in the directory ./merged_test_cases.
Problem file in "repairDataset/Program_Question_Data/English_Program_Question_StringVersion.json"
file in the directory "repairDataset/RepairData-PythonLevel/CRFLPDataset/"
View "codeTool/use.md"
Please download the pre-trained CodeLlama-7b-Instruct-hf weights from huggingFace and put it in the same directory "./CodeLlama-7b-Instruct-hf".
step 1.firstly need train a base Program Modifier(Location-Aware Repair Learning) for eval bug locator
DATA_FILE="CRFLPDataset"
bash script/pipeline/process_fixbycrflp.sh
The script is executed after specific parameter values are specified based on the result of step 1. DATA_FILE="CRFLPDataset"
bash script/pipeline/process_trace_CRFLP.sh
The script is executed after specific parameter values are specified. DATA_FILE="mixCRFLPDataset"
bash script/pipeline/process_SecondFix.sh
bash script/merge_sft.sh
The script is executed after specific parameter values are specified.
bash script/Prefer/generate_prefer_data.sh
then you should constrcut generation result to training data. you can refer to "./utils/ConstructPerferDataset.py", we screened the data pairs with the highest and lowest consistency greater than 0.1 difference. You also can directly use perference data "dataNew/FixPerferdataset"
PerferDATA_FILE="dataNew/FixPerferdataset"
bash script/pipeline/Prefer/process_Prefer_FixModel.sh
The script is executed after specific parameter values are specified.
bash cript/Execution/Execution_Eval.sh
The script is executed after specific parameter values are specified.
bash script/Postprocessing.sh
you can download dataset from https://huggingface.co/ZhenlongDai/AdaPatcher.
- base Program Modifier
the file of weight: "output_dir/loraWeight/fixbycrflp/checkpoint-8000"
the file of predict result: "predict_dir/loraWeight/fixbycrflp/test-checkpoint-8000.json" - bug locator
the file of weight: "output_dir/loraWeight/trace_CRFLP/checkpoint-14000"
the file of predict result: "predict_dir/loraWeight/trace_CRFLP/test-checkpoint-14000.json" - Program Modifier with Hybrid Training for Selective Reference
the file of weight: "output_dir/loraWeight/fixbycrflp2/checkpoint-12000"
the file of predict result: "predict_dir/loraWeight/fixbycrflp2/test-checkpoint-12000.json" - merge_sft
the file of weight: "output_dir/fix_codeLlama" - Program Modifier with Adaptive Preference Learning
the file of weight: "output_dir/DpoWeight/DPOP_Fix_ND3V1/checkpoint-1300"
the file of predict result: "predict_dir/DpoWeight/DPOP_Fix_ND3V1-GEN/test-checkpoint-1300.json"
BibTeX:
@article{dai2025less,
title={Less is More: Adaptive Program Repair with Bug Localization and Preference Learning},
author={Dai, Zhenlong and Chen, Bingrui and Zhao, Zhuoluo and Tang, Xiu and Wu, Sai and Yao, Chang and Gao, Zhipeng and Chen, Jingyuan},
journal={arXiv preprint arXiv:2503.06510},
year={2025}
}
