We present an empirical study of ChatGPT's potential as a fully automated programming assistant, focusing on the tasks of code generation, program repair, and code summariziation.
Zenodo Link: https://zenodo.org/record/8239784
- Problems: code_generation_dataset/problems
- Responses: code_generation_dataset/results
- Submissions: data/.../code/(in)correct/
- Responses:
- data/.../code/fixed(_codex)/: the responses of program repair produced by ChatGPT and Codex.
- data_des/.../code/fixed(_codex)/: the responses of program repair produced by ChatGPT_D and Codex_D.
- data/.../code/explanation/: the responses of code explanation produced by ChatGPT.
- python 3.9 (Anaconda recommended)
- pip install -r requirements.txt
- update the absolute path of datasets in config.py.
To obtain the experimental results of our paper, execute main.py with the following parameters:
Request ChatGPT to generate codes and save the responses in the folder specified in Config().generation_path.
python main.py RQ1 generate ChatGPT
Print the Latex tables of submission results from ChatGPT, Codex, and CodeGen.
python main.py RQ1 table
Draw the boxplot of prompt lengths of correct and incorrect problems.
python main.py RQ1 length
Request ChatGPT to repair incorrect codes and save the responses in the folder fixed.
python main.py RQ2 repair ChatGPT
Validate the patched codes produced by ChatGPT.
python main.py RQ2 validate ChatGPT
Perform Codex experiments by replacing ChatGPT with Codex. Evaluate ChatGPT_D and Codex_D by using 'data_des' instead of 'data' in Config().path.
Calculate similarity distributions in experiment 1.
python main.py RQ3 exp1