Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit f90cd62

Browse filesBrowse files
author
jimmy.xj
committed
Update README.md
1 parent c8b1ee1 commit f90cd62
Copy full SHA for f90cd62

File tree

2 files changed

+97
-17
lines changed
Filter options

2 files changed

+97
-17
lines changed

‎README.md

Copy file name to clipboardExpand all lines: README.md
+49-9Lines changed: 49 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,8 @@ DevOps-Eval is a comprehensive evaluation suite specifically designed for founda
2626
## 📜 Table of Contents
2727

2828
- [🏆 Leaderboard](#-leaderboard)
29+
- [👀 DevOps](#-devops)
30+
- [🔥 AIOps](#-aiops)
2931
- [⏬ Data](#-data)
3032
- [👀 Notes](#-notes)
3133
- [🔥 AIOps Sample Example](#-aiops-sample-example)
@@ -36,19 +38,19 @@ DevOps-Eval is a comprehensive evaluation suite specifically designed for founda
3638

3739
## 🏆 Leaderboard
3840
Below are zero-shot and five-shot accuracies from the models that we evaluate in the initial release. We note that five-shot performance is better than zero-shot for many instruction-tuned models.
39-
41+
### DevOps
4042
#### Zero Shot
4143

4244
| **ModelName** | plan | code | build | test | release | deploy | operate | monitor | **AVG** |
4345
|:------------------------:|:-----:|:-----:|:-----:|:------:|:--------:|:------:|:-------:|:--------:|:-----------:|
44-
| **DevOps-Model-14B-Chat** | 60.61 | 78.35 | 84.86 | 84.65 | 87.26 | 82.75 | 81.34 | 79.17 | **80.34** |
45-
| **DevOps-Model-14B-Base** | 54.55 | 77.82 | 83.49 | 85.96 | 86.32 | 81.96 | 85.82 | 82.41 | **80.26** |
46+
| **DevOpsPal-14B-Chat** | 60.61 | 78.35 | 84.86 | 84.65 | 87.26 | 82.75 | 81.34 | 79.17 | **80.34** |
47+
| **DevOpsPal-14B-Base** | 54.55 | 77.82 | 83.49 | 85.96 | 86.32 | 81.96 | 85.82 | 82.41 | **80.26** |
4648
| Qwen-14B-Chat | 60.61 | 75.4 | 85.32 | 84.21 | 89.62 | 82.75 | 83.58 | 80.56 | 79.28 |
4749
| Qwen-14B-Base | 57.58 | 73.81 | 84.4 | 85.53 | 86.32 | 81.18 | 82.09 | 80.09 | 77.92 |
4850
| Baichuan2-13B-Base | 60.61 | 69.42 | 79.82 | 79.82 | 82.55 | 81.18 | 85.07 | 83.8 | 75.10 |
4951
| Baichuan2-13B-Chat | 60.61 | 68.43 | 77.98 | 80.7 | 81.6 | 83.53 | 82.09 | 84.72 | 74.60 |
50-
| **DevOps-Model-7B-Chat** | 54.55 | 69.11 | 83.94 | 82.02 | 76.89 | 80 | 79.85 | 77.78 | **74.00** |
51-
| **DevOps-Model-7B-Base** | 54.55 | 68.96 | 82.11 | 78.95 | 80.66 | 76.47 | 79.85 | 78.7 | **73.55** |
52+
| **DevOpsPal-7B-Chat** | 54.55 | 69.11 | 83.94 | 82.02 | 76.89 | 80 | 79.85 | 77.78 | **74.00** |
53+
| **DevOpsPal-7B-Base** | 54.55 | 68.96 | 82.11 | 78.95 | 80.66 | 76.47 | 79.85 | 78.7 | **73.55** |
5254
| Qwen-7B-Base | 53.03 | 68.13 | 78.9 | 75.44 | 80.19 | 80 | 83.58 | 80.09 | 73.13 |
5355
| Qwen-7B-Chat | 57.58 | 66.01 | 80.28 | 79.82 | 76.89 | 77.65 | 80.6 | 79.17 | 71.96 |
5456
| Baichuan2-7B-Chat | 54.55 | 63.66 | 77.98 | 76.32 | 71.7 | 73.33 | 75.37 | 79.63 | 68.17 |
@@ -61,21 +63,59 @@ Below are zero-shot and five-shot accuracies from the models that we evaluate in
6163

6264
| **ModelName** | plan | code | build | test | release | deploy | operate | monitor | **AVG** |
6365
|:------------------------:|:-----:|:-----:|:-----:|:------:|:--------:|:------:|:-------:|:--------:|:---------:|
64-
| **DevOps-Model-14B-Chat** |63.64 | 79.49 | 81.65 | 85.96 | 86.79 | 86.67 | 89.55 | 81.48 | **81.77** |
65-
| **DevOps-Model-14B-Base** | 62.12 | 80.55 | 82.57 | 85.53 | 85.85 | 84.71 | 85.07 | 80.09 | **81.70** |
66+
| **DevOpsPal-14B-Chat** |63.64 | 79.49 | 81.65 | 85.96 | 86.79 | 86.67 | 89.55 | 81.48 | **81.77** |
67+
| **DevOpsPal-14B-Base** | 62.12 | 80.55 | 82.57 | 85.53 | 85.85 | 84.71 | 85.07 | 80.09 | **81.70** |
6668
| Qwen-14B-Chat | 65.15 | 76 | 82.57 | 85.53 | 84.91 | 84.31 | 85.82 | 81.48 | 79.55 |
6769
| Qwen-14B-Base | 66.67 | 76.15 | 84.4 | 85.53 | 86.32 | 80.39 | 86.57 | 80.56 | 79.51 |
6870
| Baichuan2-13B-Base | 63.64 | 71.39 | 80.73 | 82.46 | 81.13 | 84.31 | 91.79 | 85.19 | 77.09 |
6971
| Qwen-7B-Base | 75.76 | 72.52 | 78.9 | 81.14 | 83.96 | 81.18 | 85.07 | 81.94 | 77.02 |
7072
| Baichuan2-13B-Chat | 62.12 | 69.95 | 76.61 | 84.21 | 83.49 | 79.61 | 88.06 | 80.56 | 75.32 |
71-
| **DevOps-Model-7B-Chat** | 66.67 | 69.95 | 83.94 | 81.14 | 80.19 | 82.75 | 82.84 | 76.85 | **75.25** |
72-
| **DevOps-Model-7B-Base** | 69.7 | 69.49 | 82.11 | 81.14 | 82.55 | 82.35 | 80.6 | 79.17 | **75.17** |
73+
| **DevOpsPal-7B-Chat** | 66.67 | 69.95 | 83.94 | 81.14 | 80.19 | 82.75 | 82.84 | 76.85 | **75.25** |
74+
| **DevOpsPal-7B-Base** | 69.7 | 69.49 | 82.11 | 81.14 | 82.55 | 82.35 | 80.6 | 79.17 | **75.17** |
7375
| Qwen-7B-Chat | 65.15 | 66.54 | 82.57 | 81.58 | 81.6 | 81.18 | 80.6 | 81.02 | 73.62 |
7476
| Baichuan2-7B-Base | 60.61 | 67.22 | 76.61 | 75 | 77.83 | 78.43 | 80.6 | 79.63 | 72.11 |
7577
| Internlm-7B-Chat | 60.61 | 63.06 | 79.82 | 80.26 | 67.92 | 75.69 | 73.88 | 77.31 | 71.09 |
7678
| Baichuan2-7B-Chat | 60.61 | 64.95 | 81.19 | 75.88 | 71.23 | 75.69 | 78.36 | 79.17 | 70.49 |
7779
| Internlm-7B-Base | 62.12 | 65.25 | 77.52 | 80.7 | 74.06 | 78.82 | 79.85 | 75.46 | 69.17 |
7880

81+
### AIOps
82+
#### Zero Shot
83+
| **ModelName** | LogParsing | RootCauseAnalysis | TimeSeriesAnomalyDetection | TimeSeriesClassification | **AVG** |
84+
|:-------------------:|:------------:|:------------------:|:---------------------------:|:-------------------------:|:-------:|
85+
| Qwen-14B-Base | 66.29 | 58.8 | 25.33 | 43.5 | 49.27 |
86+
| DevOpsPal-14B—Base | 63.14 | 53.6 | 23.33 | 43.5 | 46.55 |
87+
| DevOpsPal-14B—Chat | 60 | 56 | 24 | 43 | 46.18 |
88+
| Qwen-14B-Chat | 64.57 | 51.6 | 22.67 | 36 | 45 |
89+
| Qwen-7B-Base | 50 | 39.2 | 22.67 | 54 | 40.82 |
90+
| Qwen-7B-Chat | 57.43 | 38.8 | 22.33 | 39.5 | 40.36 |
91+
| DevOpsPal-7B—Chat | 56.57 | 30.4 | 25.33 | 45 | 40 |
92+
| Baichuan2-13B-Chat | 64 | 18 | 21.33 | 37.5 | 37.09 |
93+
| Baichuan2-7B-Chat | 60.86 | 10 | 28 | 34.5 | 35.55 |
94+
| Baichuan2-7B-Base | 53.43 | 12.8 | 27.67 | 36.5 | 34.09 |
95+
| Internlm-7B—Base | 48.57 | 18.8 | 23.33 | 37.5 | 32.91 |
96+
| Baichuan2-13B-Base | 54 | 12.4 | 23 | 34.5 | 32.55 |
97+
| DevOpsPal-7B—Base | 46.57 | 20.8 | 25 | 34 | 32.55 |
98+
| Internlm-7B—Chat | 58.86 | 8.8 | 22.33 | 28.5 | 32 |
99+
100+
#### One Shot
101+
| **ModelName** | LogParsing | RootCauseAnalysis | TimeSeriesAnomalyDetection | TimeSeriesClassification | **AVG** |
102+
|:-------------------:|:------------:|:------------------:|:---------------------------:|:-------------------------:|:-------:|
103+
| DevOpsPal-14B—Chat | 66.29 | 80.8 | 23.33 | 44.5 | 53.91 |
104+
| Qwen-14B-Base | 64.29 | 74.4 | 28 | 48.5 | 53.82 |
105+
| DevOpsPal-14B—Base | 60 | 74 | 25.33 | 43.5 | 50.73 |
106+
| Qwen-14B-Chat | 49.71 | 65.6 | 28.67 | 48 | 47.27 |
107+
| Qwen-7B-Base | 56 | 60.8 | 27.67 | 44 | 47.18 |
108+
| DevOpsPal-7B—Base | 52.86 | 44.4 | 28 | 44.5 | 42.64 |
109+
| Qwen-7B-Chat | 54.57 | 52 | 29.67 | 26.5 | 42.09 |
110+
| Baichuan2-13B-Base | 56 | 43.2 | 24.33 | 41 | 41.73 |
111+
| Baichuan2-13B-Chat | 57.43 | 44.4 | 25 | 25.5 | 39.82 |
112+
| Baichuan2-7B-Base | 48.29 | 40.4 | 27 | 42 | 39.55 |
113+
| Baichuan2-7B-Chat | 58.57 | 31.6 | 27 | 31.5 | 38.91 |
114+
| DevOpsPal-7B—Chat | 56.57 | 27.2 | 25.33 | 41.5 | 38.64 |
115+
| Internlm-7B—Base | 48 | 33.2 | 29 | 35 | 37.09 |
116+
| Internlm-7B—Chat | 62.57 | 12.8 | 22.33 | 21 | 32.73 |
117+
118+
79119
## ⏬ Data
80120
#### Download
81121
* Method 1: Download the zip file (you can also simply open the following link with the browser):

‎README_zh.md

Copy file name to clipboardExpand all lines: README_zh.md
+48-8Lines changed: 48 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,8 @@ DevOps-Eval是一个专门为DevOps领域大模型设计的综合评估数据集
2626
## 📜 目录
2727

2828
- [🏆 排行榜](#-排行榜)
29+
- [👀 DevOps](#-devops)
30+
- [🔥 AIOps](#-aiops)
2931
- [⏬ 数据](#-数据)
3032
- [👀 说明](#-说明)
3133
- [🔥 AIOps样本示例](#-AIOps样本示例)
@@ -41,14 +43,14 @@ DevOps-Eval是一个专门为DevOps领域大模型设计的综合评估数据集
4143

4244
| **模型** | plan | code | build | test | release | deploy | operate | monitor | **平均分** |
4345
|:------------------------:|:-----:|:-----:|:-----:|:------:|:--------:|:------:|:-------:|:--------:|:---------:|
44-
| **DevOps-Model-14B-Chat** | 60.61 | 78.35 | 84.86 | 84.65 | 87.26 | 82.75 | 81.34 | 79.17 | **80.34** |
45-
| **DevOps-Model-14B-Base** | 54.55 | 77.82 | 83.49 | 85.96 | 86.32 | 81.96 | 85.82 | 82.41 | **80.26** |
46+
| **DevOpsPal-14B-Chat** | 60.61 | 78.35 | 84.86 | 84.65 | 87.26 | 82.75 | 81.34 | 79.17 | **80.34** |
47+
| **DevOpsPal-14B-Base** | 54.55 | 77.82 | 83.49 | 85.96 | 86.32 | 81.96 | 85.82 | 82.41 | **80.26** |
4648
| Qwen-14B-Chat | 60.61 | 75.4 | 85.32 | 84.21 | 89.62 | 82.75 | 83.58 | 80.56 | 79.28 |
4749
| Qwen-14B-Base | 57.58 | 73.81 | 84.4 | 85.53 | 86.32 | 81.18 | 82.09 | 80.09 | 77.92 |
4850
| Baichuan2-13B-Base | 60.61 | 69.42 | 79.82 | 79.82 | 82.55 | 81.18 | 85.07 | 83.8 | 75.10 |
4951
| Baichuan2-13B-Chat | 60.61 | 68.43 | 77.98 | 80.7 | 81.6 | 83.53 | 82.09 | 84.72 | 74.60 |
50-
| **DevOps-Model-7B-Chat** | 54.55 | 69.11 | 83.94 | 82.02 | 76.89 | 80 | 79.85 | 77.78 | **74.00** |
51-
| **DevOps-Model-7B-Base** | 54.55 | 68.96 | 82.11 | 78.95 | 80.66 | 76.47 | 79.85 | 78.7 | **73.55** |
52+
| **DevOpsPal-7B-Chat** | 54.55 | 69.11 | 83.94 | 82.02 | 76.89 | 80 | 79.85 | 77.78 | **74.00** |
53+
| **DevOpsPal-7B-Base** | 54.55 | 68.96 | 82.11 | 78.95 | 80.66 | 76.47 | 79.85 | 78.7 | **73.55** |
5254
| Qwen-7B-Base | 53.03 | 68.13 | 78.9 | 75.44 | 80.19 | 80 | 83.58 | 80.09 | 73.13 |
5355
| Qwen-7B-Chat | 57.58 | 66.01 | 80.28 | 79.82 | 76.89 | 77.65 | 80.6 | 79.17 | 71.96 |
5456
| Baichuan2-7B-Chat | 54.55 | 63.66 | 77.98 | 76.32 | 71.7 | 73.33 | 75.37 | 79.63 | 68.17 |
@@ -61,21 +63,59 @@ DevOps-Eval是一个专门为DevOps领域大模型设计的综合评估数据集
6163

6264
| **模型** | plan | code | build | test | release | deploy | operate | monitor | **平均分** |
6365
|:------------------------:|:-----:|:-----:|:-----:|:------:|:--------:|:------:|:-------:|:--------:|:---------:|
64-
| **DevOps-Model-14B-Chat** |63.64 | 79.49 | 81.65 | 85.96 | 86.79 | 86.67 | 89.55 | 81.48 | **81.77** |
65-
| **DevOps-Model-14B-Base** | 62.12 | 80.55 | 82.57 | 85.53 | 85.85 | 84.71 | 85.07 | 80.09 | **81.70** |
66+
| **DevOpsPal-14B-Chat** |63.64 | 79.49 | 81.65 | 85.96 | 86.79 | 86.67 | 89.55 | 81.48 | **81.77** |
67+
| **DevOpsPal-14B-Base** | 62.12 | 80.55 | 82.57 | 85.53 | 85.85 | 84.71 | 85.07 | 80.09 | **81.70** |
6668
| Qwen-14B-Chat | 65.15 | 76 | 82.57 | 85.53 | 84.91 | 84.31 | 85.82 | 81.48 | 79.55 |
6769
| Qwen-14B-Base | 66.67 | 76.15 | 84.4 | 85.53 | 86.32 | 80.39 | 86.57 | 80.56 | 79.51 |
6870
| Baichuan2-13B-Base | 63.64 | 71.39 | 80.73 | 82.46 | 81.13 | 84.31 | 91.79 | 85.19 | 77.09 |
6971
| Qwen-7B-Base | 75.76 | 72.52 | 78.9 | 81.14 | 83.96 | 81.18 | 85.07 | 81.94 | 77.02 |
7072
| Baichuan2-13B-Chat | 62.12 | 69.95 | 76.61 | 84.21 | 83.49 | 79.61 | 88.06 | 80.56 | 75.32 |
71-
| **DevOps-Model-7B-Chat** | 66.67 | 69.95 | 83.94 | 81.14 | 80.19 | 82.75 | 82.84 | 76.85 | **75.25** |
72-
| **DevOps-Model-7B-Base** | 69.7 | 69.49 | 82.11 | 81.14 | 82.55 | 82.35 | 80.6 | 79.17 | **75.17** |
73+
| **DevOpsPal-7B-Chat** | 66.67 | 69.95 | 83.94 | 81.14 | 80.19 | 82.75 | 82.84 | 76.85 | **75.25** |
74+
| **DevOpsPal-7B-Base** | 69.7 | 69.49 | 82.11 | 81.14 | 82.55 | 82.35 | 80.6 | 79.17 | **75.17** |
7375
| Qwen-7B-Chat | 65.15 | 66.54 | 82.57 | 81.58 | 81.6 | 81.18 | 80.6 | 81.02 | 73.62 |
7476
| Baichuan2-7B-Base | 60.61 | 67.22 | 76.61 | 75 | 77.83 | 78.43 | 80.6 | 79.63 | 72.11 |
7577
| Internlm-7B-Chat | 60.61 | 63.06 | 79.82 | 80.26 | 67.92 | 75.69 | 73.88 | 77.31 | 71.09 |
7678
| Baichuan2-7B-Chat | 60.61 | 64.95 | 81.19 | 75.88 | 71.23 | 75.69 | 78.36 | 79.17 | 70.49 |
7779
| Internlm-7B-Base | 62.12 | 65.25 | 77.52 | 80.7 | 74.06 | 78.82 | 79.85 | 75.46 | 69.17 |
7880

81+
82+
### AIOps
83+
#### Zero Shot
84+
| **模型** | 日志解析 | 根因分析 | 时序异常检测 | 时序分类 | **平均分** |
85+
|:-------------------:|:-----:|:----:|:------:|:----:|:-------:|
86+
| Qwen-14B-Base | 66.29 | 58.8 | 25.33 | 43.5 | 49.27 |
87+
| DevOpsPal-14B—Base | 63.14 | 53.6 | 23.33 | 43.5 | 46.55 |
88+
| DevOpsPal-14B—Chat | 60 | 56 | 24 | 43 | 46.18 |
89+
| Qwen-14B-Chat | 64.57 | 51.6 | 22.67 | 36 | 45 |
90+
| Qwen-7B-Base | 50 | 39.2 | 22.67 | 54 | 40.82 |
91+
| Qwen-7B-Chat | 57.43 | 38.8 | 22.33 | 39.5 | 40.36 |
92+
| DevOpsPal-7B—Chat | 56.57 | 30.4 | 25.33 | 45 | 40 |
93+
| Baichuan2-13B-Chat | 64 | 18 | 21.33 | 37.5 | 37.09 |
94+
| Baichuan2-7B-Chat | 60.86 | 10 | 28 | 34.5 | 35.55 |
95+
| Baichuan2-7B-Base | 53.43 | 12.8 | 27.67 | 36.5 | 34.09 |
96+
| Internlm-7B—Base | 48.57 | 18.8 | 23.33 | 37.5 | 32.91 |
97+
| Baichuan2-13B-Base | 54 | 12.4 | 23 | 34.5 | 32.55 |
98+
| DevOpsPal-7B—Base | 46.57 | 20.8 | 25 | 34 | 32.55 |
99+
| Internlm-7B—Chat | 58.86 | 8.8 | 22.33 | 28.5 | 32 |
100+
101+
#### One Shot
102+
| **模型** | 日志解析 | 根因分析 | 时序异常检测 | 时序分类 | **平均分** |
103+
|:-------------------:|:------------:|:------------------:|:---------------------------:|:-------------------------:|:-------:|
104+
| DevOpsPal-14B—Chat | 66.29 | 80.8 | 23.33 | 44.5 | 53.91 |
105+
| Qwen-14B-Base | 64.29 | 74.4 | 28 | 48.5 | 53.82 |
106+
| DevOpsPal-14B—Base | 60 | 74 | 25.33 | 43.5 | 50.73 |
107+
| Qwen-14B-Chat | 49.71 | 65.6 | 28.67 | 48 | 47.27 |
108+
| Qwen-7B-Base | 56 | 60.8 | 27.67 | 44 | 47.18 |
109+
| DevOpsPal-7B—Base | 52.86 | 44.4 | 28 | 44.5 | 42.64 |
110+
| Qwen-7B-Chat | 54.57 | 52 | 29.67 | 26.5 | 42.09 |
111+
| Baichuan2-13B-Base | 56 | 43.2 | 24.33 | 41 | 41.73 |
112+
| Baichuan2-13B-Chat | 57.43 | 44.4 | 25 | 25.5 | 39.82 |
113+
| Baichuan2-7B-Base | 48.29 | 40.4 | 27 | 42 | 39.55 |
114+
| Baichuan2-7B-Chat | 58.57 | 31.6 | 27 | 31.5 | 38.91 |
115+
| DevOpsPal-7B—Chat | 56.57 | 27.2 | 25.33 | 41.5 | 38.64 |
116+
| Internlm-7B—Base | 48 | 33.2 | 29 | 35 | 37.09 |
117+
| Internlm-7B—Chat | 62.57 | 12.8 | 22.33 | 21 | 32.73 |
118+
79119
## ⏬ 数据
80120
#### 下载
81121
* 方法一:下载zip压缩文件(你也可以直接用浏览器打开下面的链接):

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.