SoTaNa: The Open-Source Software Development Assistant

Environment

conda create -n sotana python=3.9 -y
conda activate sotana 
pip install datasets==2.11.0 loralib==0.1.1 sentencepiece==0.1.97 
pip install bitsandbytes==0.37.2 torch==2.0.0 gradio==3.20.1 nltk==3.8.1
pip install prettytable==3.7.0 wandb==0.14.2 fire==0.5.0
pip install openai==0.27.9
pip install git+https://github.com/huggingface/peft.git@e536616888d51b453ed354a6f1e243fecb02ea08
pip install git+https://github.com/huggingface/transformers.git@fe1f5a639d93c9272856c670cff3b0e1a10d5b2b

Data Generation

cd data-generation
bash generation_data.sh

The generated data is saved in the data-generation/output/100000. Due to the limit of uploda size, we split the data into data_0.json and data_1.json. You can execute python merge_data.py to merge them.

Parameter-Efficient Fine-tuning

cd fine-tuning
wandb login
model_size=7
epoch=5
bash fine-tuning.sh ${model_size} ${epoch}

The detailed training information is shown in as follows.

Model	# llama Param.	# lora Param.	Training Time
SoTaNa-7B	7B	8.4M	25h35m
SoTaNa-13B	13B	13.1M	39h10m
SoTaNa-30B	30B	25.6M	48h02m

Inference

Stack Overflow Question-Answering

Obtain the Answering

cd inference/stackoverflow-question-answering
model_size=7
bash inference.sh ${model_size}

Evaluation

python evaluation.py --refs_filename xxx --preds_filename xxx

Code Genereation

Obtain the Results

cd inference/code-generation
model_size=7
bash inference.sh ${model_size}

Evaluation

cd inference/code-generation
python evaluation.py --preds_filename xxx

Code Summarization

Obtain the Results

cd inference/code-summarization
bash inference.sh ${model_size}

Evaluation

python evaluation.py --refs_filename xxx --preds_filename xxx

Name	Name	Last commit message	Last commit date
Latest commit History 2 Commits
Figures	Figures
data-generation	data-generation
fine-tuning	fine-tuning
inference	inference
.gitignore	.gitignore
README.md	README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SoTaNa: The Open-Source Software Development Assistant

Environment

Data Generation

Parameter-Efficient Fine-tuning

Inference

Stack Overflow Question-Answering

Obtain the Answering

Evaluation

Code Genereation

Obtain the Results

Evaluation

Code Summarization

Obtain the Results

Evaluation

About

Uh oh!

Releases

Packages

Languages

Search code, repositories, users, issues, pull requests...

KNU-Utility-Software/SoTaNa-Software-Development-Assistant

Folders and files

Latest commit

History

Repository files navigation

SoTaNa: The Open-Source Software Development Assistant

Environment

Data Generation

Parameter-Efficient Fine-tuning

Inference

Stack Overflow Question-Answering

Obtain the Answering

Evaluation

Code Genereation

Obtain the Results

Evaluation

Code Summarization

Obtain the Results

Evaluation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages