This repository contains the full replication package for our paper on Hierarchical Knowledge Injection for Improving LLM-based Program Repair. It includes all code, datasets, prompts, and analysis scripts used in our experiments.
| File | Description |
|---|---|
main.ipynb |
Extracts and constructs contextual information for each bug across all three knowledge layers: Bug, Repository, and Project. |
generate_prompts.ipynb |
Uses the extracted data to generate prompts for each bug under each knowledge layer. |
run_patches.ipynb |
Executes the test suites for each generated patch to evaluate whether the bug is successfully fixed. |
analysis.ipynb |
Performs all quantitative and qualitative analysis, including fix rates, Pass@k metrics, and error breakdowns, as reported in the paper. |
All notebooks include detailed markdown explanations to facilitate reproducibility and understanding.
Contains the full dataset of bugs organized by project and bug ID. Each bug has layer-specific data and outputs.
Example structure:
bug-data/
├── pandas/
│ ├── 1/
│ │ ├── prompts
│ │ ├── patches
│ │ └── extracted_data.json
│ ├── 2/
│ │ ├── ...
...
Each bug folder includes:
prompts: The generated prompts for different knowledge layers (.mdfiles)patches: Model-generated patchesextracted_data.json: Extracted contextual information used for prompt construction
Contains processed CSV files used for analysis and annotation.
| File | Description |
|---|---|
annotated_dataset.csv |
Manually annotated bug types (e.g., GUI, Network, Program Anomaly) |
annotated_dataset_with_bug_level_results.csv |
Fix outcomes for GPT-4o-mini using bug-level context |
annotated_dataset_with_bug_level_results_llama.csv |
Fix outcomes for LLaMA 3.3 using bug-level context |
annotated_dataset_with_repository_level_results_gpt.csv |
Fix outcomes for GPT-4o-mini using repository-level context |
annotated_dataset_with_repository_level_results_llama.csv |
Fix outcomes for LLaMA 3.3 using repository-level context |
annotated_dataset_with_project_level_results_gpt.csv |
Fix outcomes for GPT-4o-mini using project-level context |
annotated_dataset_with_project_level_results_llama.csv |
Fix outcomes for LLaMA 3.3 using project-level context |
annotated_dataset_with_all_levels_results_gpt.csv |
Fix outcomes for GPT-4o-mini using all-levels context at the same time |
annotated_dataset_with_all_levels_results_llama.csv |
Fix outcomes for LLaMA 3.3 using all-levels context at the same time |
calculated_complexity_metrics.csv |
Cyclomatic complexity, LOC, Halstead metrics, and maintainability for all buggy functions |
Install the required packages:
langchain
jedi
sentence_transformers
scikit-learn
transformers
torch
numpy
pandas
- Extract context using
main.ipynbfor each bug. (You can skip since the information is already extracted for each bug) - Generate prompts with
generate_prompts.ipynb. (You can skip since prompts are already generated for each bug) - Run model and evaluate patches using
run_patches.ipynb. - Analyze results and reproduce the figures and tables in the paper with
analysis.ipynb.
Each step is modular and well-commented to support custom extensions or ablations.
The following table summarizes the characteristics of the 16 open-source Python projects included in our dataset: These projects vary in size and popularity, ranging from 16k to 135k stars on GitHub, and include diverse bug types such as Program Anomaly, GUI, and Network.
We use a structured chain-of-thought prompt to guide LLMs step-by-step through analyzing and repairing each buggy function. This format encourages the model to first understand the bug and its context before generating a fix.
As an example, the general structure is illustrated below for the Bug Knowledge Layer:

