Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Setup environment for framework

# The following steps create an isolated environment codex-env and install required dependencies.

conda create --name codex-env python=3.9 # creates the codex-env environment
conda activate codex-env                 # activates the codex-env environment
conda install openai                     # installs dependency - openai

pip install backoff
pip install edit_distance
conda install difflib
conda install matplotlib
conda install plotly
conda install scipy
conda install sklearn

pip install tenacity
pip install suffix-trees
pip install lizard
conda install gensim
pip install rank_bm25

Setup environment for embedding search

We use sentence-transformers to generate embeddings for the code snippets.

conda create -n semantic-embedding --file embedding-prereq.txt
conda install -c pytorch faiss-cpu # install faiss-cpu

To install sentence-transformers, please follow the instructions from here. For linux environment, as stated in the above link, sentence-transformers gets installed using the following command: pip install -U sentence-transformers.

However, for mac with m1 chip, we had to run the following commands to get it installed:

conda list openmp
conda unistall intel-openmp
conda install -c conda-forge sentence-transformers

Install vector database

We use vector database lite for vector search. vdblite library details could be found here.

pip install vdblite

Generate embeddings for the code snippets

Run the following command to generate embeddings for ATLAS.

python atlas_generate_embedding.py

Run the following command to generate embeddings for TFix.

python tfix_generate_embedding.py

Running evaluation

Results from experiments are saved in the folder ./codex/framework/result-analysis/final-results/.

python evaluation/result_analysis_atlas.py ./results.csv
exact_match_count: 9021 match_count: 10368
exact match_count (%): 47.946, match_count(%): 55.105

Dataset acknowledgements

We use the ATLAS, and TFix dataset for our experiments. For the sake of simplicity, we have included the dataset in the repository. However, we would like to acknowledge the authors for making the dataset publicly available.