🔧 Automated Code Correction Using LLM-Powered Agents

Implementing Research Paper Methodology Focus in AI Agentic Automated Code Correction .

🔧 Automated Code Correction Using LLM-Powered Agents

A QuixBugs Benchmark Study

Author: Manish Singh Project: Automated Code Correction Agent Development

📌 Executive Summary

This project presents the development and evaluation of an LLM-powered agent for automated detection and correction of single-line defects in Python programs using the QuixBugs benchmark.

Achieved Accuracy: ✅ 86% (43/50 programs successfully corrected)
Benchmark: QuixBugs (University of Washington)
Defect Classes: 14 categories analyzed & targeted repair strategies implemented
Performance: Competitive with existing Automated Program Repair (APR) techniques
Future Goal: Integration with MMAPR framework for achieving 93–96% accuracy
Errors are fixed iteratively with up to 5 attempts per program.
Example:

📂 Project Structure

├── AIML_CODEDEBU_FINAL_MANISH_SINGH_23_CS_244.ipynb   # Final implementation notebook
├── AIML_CODEDEBU.ipynb                                # Experimental prototype notebook
├── tester.py                                          # Automated test runner
├── requirements.txt                                   # Dependencies
├── README.md                                          # Documentation
└── /images                                            # Results & workflow images

📜 Problem Motivation

Economic Impact: Software bugs cost ~$2.84 trillion annually
Time Drain: Debugging consumes 50–75% of dev time
Limitations of Tools: No existing solution addresses both syntactic & semantic bugs simultaneously
Case Example: Windows Blue Screen of Death caused by a single-line pointer error

⚡ Challenges Addressed

Context preservation while fixing defects
Avoiding false positives and overcorrections
Handling semantic vs syntactic bugs
Ensuring test coverage for all edge cases
Overcoming API rate limits and model simplicity trade-offs

🧠 Approach

🔹 Defect Analysis & Categorization

We identified 14 defect classes in QuixBugs:

Defect Class	Frequency	Example	Repair Strategy
Off-by-one error	28%	`range(len(arr)-1)` → `range(len(arr))`	Boundary adjustment
Incorrect operators	22%	`< → <=`, `== → is`	Operator replacement
Missing null checks	15%	`if x is not None:`	Defensive programming
Logical conditionals	12%	`and → or`	Logic operator correction
Variable initialization	8%	Wrong defaults	Scope-aware init
Other	15%	Mixed patterns	Case-specific fixes

🔹 Agent Architecture

agent_executor = create_react_agent(model, tools)

tools = [
    run_python_code,           # Code execution & validation
    run_python_code_from_file, # File-based testing
    iterative_fix_and_test     # Multi-attempt repair workflow
]

🛠 Workflow

Detect defect (pattern classification)
Generate repair suggestion
Run automated tests with pytest
Iterate with error feedback (max 5 attempts)
Save fixed program

📍

✅ Results

📊 Performance Summary

Total Programs: 50
Corrected: 43 (86%)
Failed Repairs: 7
Avg Attempts per Fix: 1.8
Fix Time: ⏱️ 5–10s (vs human 15–30 min)

🧪 Success Rate by Defect Class

Off-by-one errors → 92%
Comparison operators → 95%
Null checks → 78%
Logical conditionals → 83%
Variable initialization → 100%

🔎 Failure Analysis (7 Programs)

Complex multi-line logic dependencies
Ambiguous defect classification
Rare edge cases not in training data
Example: shortest_path_lengths.py failed due to state initialization missing

📍 Insert Example Error vs Fixed Code Image Here

📈 Comparative Benchmark

Method	QuixBugs Accuracy	Key Limitation
Our Agent	86%	Complex dependencies
GenProg	65%	Weak semantic understanding
Prophet	58%	Pattern overfitting
CodeT5	72%	Single-attempt limitation
Human Expert	100%	Time-intensive

🔮 Future Work

1️⃣ MMAPR Framework Integration

Multi-modal input (ASTs, error traces, natural language)
Few-shot learning with peer programs
Ensemble repair with multiple LLM backends (GPT-4, Gemini, CodeT5)

📍 Inspired and implemented based on

2️⃣ Technical Enhancements

Multi-language support (Python → Java, C++, JS)
Static & dynamic analysis integration (SonarQube, CodeQL)
CI/CD pipeline & IDE plugin integration

📚 References

MMAPR Framework Research Paper
QuixBugs Benchmark – University of Washington
ACM Computing Surveys, 2021 – Automated Program Repair
OpenAI Codex Technical Report, 2021
LangChain Documentation
Agentic Paper

🚀 Getting Started

🔧 Installation

git clone https://github.com/<your-repo>/LLM-CodeCorrection-Agent.git
cd LLM-CodeCorrection-Agent
pip install -r requirements.txt

▶️ Run Agent

python tester.py --file buggy_code.py

🧪 Run Tests

pytest tests/

📸 Results Showcase

📍 *

Agent fixing buggy code
Test results from pytest
Comparison graphs

🎯 Key Contributions

✔️ Achieved 86% success rate on QuixBugs ✔️ Developed 14-class defect taxonomy ✔️ Implemented iterative agentic repair workflow ✔️ Established roadmap for MMAPR integration (93–96% accuracy)

✨ This work demonstrates that LLM-powered agents can bridge the gap between traditional APR tools and human-level expertise in automated debugging.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
DebugMind		DebugMind
Testingdataset-Code-Refactoring-QuixBugs-master		Testingdataset-Code-Refactoring-QuixBugs-master
pics		pics
AIML_Automated Code Correction Using LLM-Powered Agents.pdf		AIML_Automated Code Correction Using LLM-Powered Agents.pdf
ML_model_agent.png		ML_model_agent.png
MMAPR Paper Summary_ Teaching Guide with Definition pdf.pdf		MMAPR Paper Summary_ Teaching Guide with Definition pdf.pdf
image-1.png		image-1.png
image.png		image.png
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Implementing Research Paper Methodology Focus in AI Agentic Automated Code Correction .

🔧 Automated Code Correction Using LLM-Powered Agents

📌 Executive Summary

📂 Project Structure

📜 Problem Motivation

⚡ Challenges Addressed

🧠 Approach

🔹 Defect Analysis & Categorization

🔹 Agent Architecture

✅ Results

📊 Performance Summary

🧪 Success Rate by Defect Class

🔎 Failure Analysis (7 Programs)

📈 Comparative Benchmark

🔮 Future Work

1️⃣ MMAPR Framework Integration

2️⃣ Technical Enhancements

📚 References

🚀 Getting Started

🔧 Installation

▶️ Run Agent

🧪 Run Tests

📸 Results Showcase

🎯 Key Contributions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Implementing Research Paper Methodology Focus in AI Agentic Automated Code Correction .

🔧 Automated Code Correction Using LLM-Powered Agents

📌 Executive Summary

📂 Project Structure

📜 Problem Motivation

⚡ Challenges Addressed

🧠 Approach

🔹 Defect Analysis & Categorization

🔹 Agent Architecture

✅ Results

📊 Performance Summary

🧪 Success Rate by Defect Class

🔎 Failure Analysis (7 Programs)

📈 Comparative Benchmark

🔮 Future Work

1️⃣ MMAPR Framework Integration

2️⃣ Technical Enhancements

📚 References

🚀 Getting Started

🔧 Installation

▶️ Run Agent

🧪 Run Tests

📸 Results Showcase

🎯 Key Contributions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages