Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings
This repository was archived by the owner on Apr 11, 2023. It is now read-only.

Commit a30bf0c

Browse filesBrowse files
author
Hamel Husain
authored
Merge pull request #223 from github/challenge-conclusion
Wrap up challenge and publish the human relevance judgements.
2 parents c1ada63 + bb121a5 commit a30bf0c
Copy full SHA for a30bf0c

File tree

Expand file treeCollapse file tree

3 files changed

+4030
-2
lines changed
Filter options
Expand file treeCollapse file tree

3 files changed

+4030
-2
lines changed

‎BENCHMARK.md

Copy file name to clipboardExpand all lines: BENCHMARK.md
+6Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,9 @@
1+
> ## The Challenge has been concluded
2+
> No new submissions to the benchmark will be accepted. However, we would like
3+
> to encourage practitioners and researchers to continue using
4+
> the dataset and the human relevance annotations. Please see the
5+
> [main README](/README.md) for more information.
6+
17
## Submitting runs to the benchmark
28

39
The [Weights & Biases (W&B)](https://www.wandb.com) [benchmark](https://app.wandb.ai/github/CodeSearchNet/benchmark) tracks and compares models trained on the CodeSearchNet dataset by the global machine learning research community. Anyone is welcome to submit their results for review.

‎README.md

Copy file name to clipboardExpand all lines: README.md
+15-2Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,12 @@
44

55
[paper]: https://arxiv.org/abs/1909.09436
66

7+
> # The CodeSearchNet challenge has been concluded
8+
> We would like to thank all participants for their submissions
9+
> and we hope that this challenge provided insights to practitioners and researchers about the challenges in semantic code search and motivated new research. We would like to encourage everyone to continue using the dataset and the human evaluations, which we now provide publicly. Please, see below for details.
10+
>
11+
> No new submissions to the challenge will be accepted.
12+
713
**Table of Contents**
814

915
<!-- TOC depthFrom:1 depthTo:6 withLinks:1 updateOnSave:1 orderedList:0 -->
@@ -83,11 +89,11 @@ More context regarding the motivation for this problem is in this [technical rep
8389

8490
## Evaluation
8591

86-
The metric we use for evaluation is [Normalized Discounted Cumulative Gain](https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG). Please reference [this paper][paper] for further details regarding model evaluation.
92+
The metric we use for evaluation is [Normalized Discounted Cumulative Gain](https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG). Please reference [this paper][paper] for further details regarding model evaluation. The evaluation script can be found [here](/src/relevanceeval.py).
8793

8894
### Annotations
8995

90-
We manually annotated retrieval results for the six languages from 99 general [queries](resources/queries.csv). This dataset is used as groundtruth data for evaluation _only_. Please refer to [this paper][paper] for further details on the annotation process.
96+
We manually annotated retrieval results for the six languages from 99 general [queries](resources/queries.csv). This dataset is used as groundtruth data for evaluation _only_. Please refer to [this paper][paper] for further details on the annotation process. These annotations were used to compute the scores in the leaderboard. Now that the competition has been concluded, you can find the annotations, along with the annotator comments [here](/resources/annotationStore.csv).
9197

9298

9399
## Setup
@@ -242,6 +248,13 @@ For example, the link for the `java` is:
242248
243249
The size of the dataset is approximately 20 GB. The various files and the directory structure are explained [here](resources/README.md).
244250

251+
## Human Relevance Judgements
252+
To train neural models with a large dataset we use the documentation comments (e.g. docstrings) as a proxy. For evaluation (and the leaderboard), we collected human relevance judgements of pairs of realistic-looking natural language queries and code snippets. Now that the challenge has been concluded, we provide the data [here](/resources/annotationStore.csv) as a `.csv`, with the following fields:
253+
* Language: The programming language of the snippet.
254+
* Query: The natural language query
255+
* GitHubUrl: The URL of the target snippet. This matches the `URL` key in the data (see [here](#schema--format)).
256+
* Relevance: the 0-3 human relevance judgement, where "3" is the highest score (very relevant) and "0" is the lowest (irrelevant).
257+
* Notes: a free-text field with notes that annotators optionally provided.
245258

246259
# Running Our Baseline Model
247260

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.