Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit 3b02823

Browse filesBrowse files
author
yeqing.yq
committed
Add multi-eval benchmark, such as codeTrans、codeCompletion、codeDataScience.
1 parent 949a34c commit 3b02823
Copy full SHA for 3b02823

File tree

7,598 files changed

+9779
-397828
lines changed
Filter options

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Dismiss banner

7,598 files changed

+9779
-397828
lines changed

‎.gitignore

Copy file name to clipboardExpand all lines: .gitignore
+80-15Lines changed: 80 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,59 @@
1+
# OS generated files
2+
.DS_Store
3+
4+
# IntelliJ specific files/directories
5+
out
6+
.idea
7+
**/.idea/*
8+
*.ipr
9+
*.iws
10+
*.iml
11+
.factorypath
12+
atlassian-ide-plugin.xml
13+
14+
# Eclipse specific files/directories
15+
.classpath
16+
.project
17+
.settings
18+
.metadata
19+
20+
# NetBeans specific files/directories
21+
.nbattrs
22+
23+
# VSCode specific files/directories
24+
.vscode/
25+
26+
# Logs
27+
logs/
28+
*.log
29+
*.log.*
30+
131
# Byte-compiled / optimized / DLL files
232
**/__pycache__
333
*.py[cod]
434
*$py.class
535

6-
# C extensions
7-
*.so
36+
# Distribution / packaging
37+
.Python
38+
build/
39+
develop-eggs/
40+
dist/
41+
downloads/
42+
eggs/
43+
.eggs/
44+
lib/
45+
lib64/
46+
parts/
47+
sdist/
48+
var/
49+
wheels/
50+
pip-wheel-metadata/
51+
share/python-wheels/
52+
*.egg-info/
53+
.installed.cfg
54+
*.egg
55+
MANIFEST
56+
*.iml
857

958
# PyInstaller
1059
# Usually these files are written by a python script from a template
@@ -44,15 +93,42 @@ target/
4493

4594
# Jupyter Notebook
4695
.ipynb_checkpoints
96+
notebooks/
4797

4898
# IPython
4999
profile_default/
50100
ipython_config.py
51101

102+
# pyenv
103+
.python-version
104+
105+
# pipenv
106+
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
107+
# However, in case of collaboration, if having platform-specific dependencies or dependencies
108+
# having no cross-platform support, pipenv may install dependencies that don't work, or not
109+
# install all needed dependencies.
110+
#Pipfile.lock
111+
112+
# PEP 582; used by e.g. github.com/David-OConnor/pyflow
113+
__pypackages__/
114+
115+
# Celery stuff
116+
celerybeat-schedule
117+
celerybeat.pid
118+
119+
# SageMath parsed files
120+
*.sage.py
52121

53122
# Environments
54123
.env
124+
.envrc
55125
.venv
126+
.venvs
127+
env/
128+
venv/
129+
ENV/
130+
env.bak/
131+
venv.bak/
56132

57133
# Spyder project settings
58134
.spyderproject
@@ -72,16 +148,5 @@ dmypy.json
72148
# Pyre type checker
73149
.pyre/
74150

75-
# pytype static type analyzer
76-
.pytype/
77-
78-
# Cython debug symbols
79-
cython_debug/
80-
81-
# PyCharm
82-
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
83-
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
84-
# and can be added to the global gitignore or merged into this file. For a more nuclear
85-
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
86-
.idea/
87-
.DS_Store
151+
# asdf tool versions
152+
.tool-versions

‎README.md

Copy file name to clipboardExpand all lines: README.md
+113-20Lines changed: 113 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,66 @@
11
# CodeFuseEval: Multi-tasking Evaluation Benchmark for Code Large Language Model
22

3-
<div align="center">
3+
![img](./figures/logo.png)
44

5-
<p>
6-
<a href="https://github.com/codefuse-ai/codefuse-evaluation">
7-
<img alt="stars" src="https://img.shields.io/github/stars/codefuse-ai/codefuse-evaluation?style=social" />
8-
</a>
9-
<a href="https://github.com/codefuse-ai/codefuse-evaluation">
10-
<img alt="forks" src="https://img.shields.io/github/forks/codefuse-ai/codefuse-evaluation?style=social" />
11-
</a>
12-
<a href="https://github.com/codefuse-ai/codefuse-evaluation/issues">
13-
<img alt="Open Issues" src="https://img.shields.io/github/issues-raw/codefuse-ai/codefuse-evaluation" />
14-
</a>
15-
</p>
5+
CodeFuseEval is a Code Generation benchmark that combines the multi-tasking scenarios of CodeFuse Model with the benchmarks of HumanEval-x and MBPP. This benchmark is designed to evaluate the performance of models in various multi-tasking tasks, including code completion, code generation from natural language, test case generation, cross-language code translation, and code generation from Chinese commands, among others.Continuously open, stay tuned !
166

17-
[中文](README_CN.md) **** **English**
187

19-
</div>
20-
21-
CodeFuseEval is a Code Generation benchmark that combines the multi-tasking scenarios of CodeFuse Model with the benchmarks of HumanEval-x and MBPP. This benchmark is designed to evaluate the performance of models in various multi-tasking tasks, including code completion, code generation from natural language, test case generation, cross-language code translation, and code generation from Chinese commands, among others.
8+
🌐 <a href="README_CN.md" target="_blank">中文</a>
229

10+
![img](./figures/EnglishIntroduction.png)
2311

2412
## Generation environment:
2513
CodeFuse-13B: Python 3.8 or above,PyTorch 1.12 or above, with a recommendation for 2.0 or above, Transformers 4.24.0 or above ,CUDA 11.4 or above (for GPU users and flash-attention users, this option should be considered).
2614

2715
CodeFuse-CodeLlama-34B:python>=3.8,pytorch>=2.0.0,transformers==4.32.0,Sentencepiece,CUDA 11.
2816

17+
### Generation Processor:
18+
We designed an infrastructure called Processor. Its main purpose is to handle the differences between different models. It mainly needs to complete three abstract functions:
19+
* ``load_model_tokenizer``:Due to differences in model loading parameters and tokenizer terminators, models need to use different parameters for adaptation and loading. The current function is mainly to help users load and adapt different models.
20+
* ``process_before``: Since prompt adapts to different prompt styles according to different types of evaluation tasks or different models selected by users, the 「process_before」function is extracted mainly to help users process prompts.
21+
* ``process_after``:Due to the diversity of model generation results, in order to adapt to the evaluation framework, the generated result data can be spliced into appropriate use cases for automated operation. The current function mainly processes the generated results to adapt to the evaluation data set and results based on the task type and data set conditions.
22+
23+
24+
We also modified the relevant configuration of ckpt_config to save the evaluation. For example:
25+
```commandline
26+
{
27+
"CodeFuse-13B": {
28+
"path": "/mnt/user/294761/bigcode/CodeFuse13B-evol-instruction-4K/", // model path
29+
"processor_class": "codefuseEval.process.codefuse13b.Codefuse13BProcessor", // processor path (please create file in "codefuseEval.process")
30+
"tokenizer": {
31+
"truncation": true,
32+
"padding": true,
33+
"max_length": 600
34+
}, // params for tokenizer to encode input prompts
35+
"generation_config": { // generation_config, you can combine 「decode_mode」 param set your own decode, please use jsonObject to set different decodemode. Non-JsonObject data will be read directly into generation config
36+
"greedy": {
37+
"do_sample": false,
38+
"num_beams": 1,
39+
"max_new_tokens": 512
40+
},
41+
"beams": {
42+
"do_sample": false,
43+
"num_beams": 5,
44+
"max_new_tokens": 600,
45+
"num_return_sequences": 1
46+
},
47+
"dosample": {
48+
"do_sample": true
49+
},
50+
"temperature": 0.2,
51+
"max_new_tokens": 600,
52+
"num_return_sequences": 1,
53+
"top_p": 0.9,
54+
"num_beams": 1,
55+
"do_sample": true
56+
},
57+
"task_mode": "code_completion",//current support [code_completion,nl2code,code_trans,codescience] four kinds, if you eval_dataset support many task, suggest you set task mode to get suitable process
58+
"batch_size": 1,
59+
"sample_num": 1,
60+
"decode_mode": "beams" //decode_mode, The configuration of the corresponding decoding mode will be set to the generation config.
61+
}
62+
```
63+
2964
## Generation Comand:
3065

3166
```
@@ -35,7 +70,15 @@ eg:
3570
bash codefuseEval/script/generation.sh CodeFuse-13B humaneval_python result/test.jsonl python
3671
```
3772

38-
## How to use codefuseEval
73+
if you want to test code translation, the language is source language. For Example:
74+
if you want test the cpp code translate into python
75+
76+
```bash
77+
bash codefuseEval/script/generation.sh CodeFuse-CodeLlama-34B codeTrans_cpp_to_python result/test.jsonl cpp
78+
```
79+
80+
81+
## How to use CodeFuseEval
3982

4083
### Evaluation Data
4184
Data are stored in ``codefuseEval/data``, using JSON list format. We first integrated humaneval-X dataset.
@@ -56,16 +99,17 @@ Data are stored in ``codefuseEval/data``, using JSON list format. We first integ
5699
The evaluation of the generated codes involves compiling and running in multiple programming languages. The versions of the programming language environments and packages we use are as follows:
57100

58101
| Dependency | Version |
59-
| ---------- | -------- |
60-
| Python | 3.8.12 |
102+
| ---------- |----------|
103+
| Python | 3.10.9 |
61104
| JDK | 18.0.2.1 |
62105
| Node.js | 16.14.0 |
63106
| js-md5 | 0.7.3 |
64107
| C++ | 11 |
65108
| g++ | 7.5.0 |
66-
| Boost | 1.71.0 |
109+
| Boost | 1.75.0 |
67110
| OpenSSL | 3.0.0 |
68111
| go | 1.18.4 |
112+
| cargo | 1.71.1 |
69113

70114
In order to save everyone the trouble of setting up the environments for these languages, we create a Docker image with the required environments and codefuseEval.
71115
```bash
@@ -131,6 +175,7 @@ bash codefuseEval/script/check_reference.sh codefuseEval/result/CodeFuse-13B/hum
131175
```
132176

133177
# Check dataset Command:
178+
CodeCompletion
134179
```bash
135180
bash codefuseEval/script/check_dataset.sh humaneval_python
136181

@@ -144,5 +189,53 @@ bash codefuseEval/script/check_dataset.sh humaneval_go
144189

145190
bash codefuseEval/script/check_dataset.sh humaneval_cpp
146191
```
192+
NL2Code
193+
```bash
194+
bash codefuseEval/script/check_dataset.sh mbpp
195+
```
196+
CodeTrans
197+
```
198+
bash codefuseEval/script/check_dataset.sh codeTrans_python_to_java
199+
200+
bash codefuseEval/script/check_dataset.sh codeTrans_python_to_cpp
201+
202+
bash codefuseEval/script/check_dataset.sh codeTrans_cpp_to_java
203+
204+
bash codefuseEval/script/check_dataset.sh codeTrans_cpp_to_python
205+
206+
bash codefuseEval/script/check_dataset.sh codeTrans_java_to_python
207+
208+
bash codefuseEval/script/check_dataset.sh codeTrans_java_to_cpp
209+
```
210+
CodeScience
211+
```
212+
bash codefuseEval/script/check_dataset.sh codeCompletion_matplotlib
213+
214+
bash codefuseEval/script/check_dataset.sh codeCompletion_numpy
215+
216+
bash codefuseEval/script/check_dataset.sh codeCompletion_pandas
217+
218+
bash codefuseEval/script/check_dataset.sh codeCompletion_pytorch
219+
220+
bash codefuseEval/script/check_dataset.sh codeCompletion_scipy
221+
222+
bash codefuseEval/script/check_dataset.sh codeCompletion_sklearn
223+
224+
bash codefuseEval/script/check_dataset.sh codeCompletion_tensorflow
225+
226+
bash codefuseEval/script/check_dataset.sh codeInsertion_matplotlib
227+
228+
bash codefuseEval/script/check_dataset.sh codeInsertion_numpy
229+
230+
bash codefuseEval/script/check_dataset.sh codeInsertion_pandas
231+
232+
bash codefuseEval/script/check_dataset.sh codeInsertion_pytorch
233+
234+
bash codefuseEval/script/check_dataset.sh codeInsertion_scipy
235+
236+
bash codefuseEval/script/check_dataset.sh codeInsertion_sklearn
237+
238+
bash codefuseEval/script/check_dataset.sh codeInsertion_tensorflow
239+
```
147240

148241

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.