LE

Task description

In the preliminary round of ASC20-21, a cloze style dataset is provided. The dataset is collected from internet and contains multi-level English language exams used in China for high school, and college entrance exams: CET4 (College English Test 4), CET6 (College English Test 6), and NETEM (National Entrance Test of English for MA/MS Candidates). Part of the data comes from public CLOTH dataset. There are 4603 passages and 83395 questions in the training set, 400 passages and 7798 questions in the developer set, and 400 passages and 7829 questions in the test set. The participants should design and train their neural network to achieve the best performance on the test set.

Complete the “Complete Fill-in-the-blank” task using the Bert model

introduce the "wordnet" knowledge base as a secondary weight in prediction
Use model integration, including Bert, Albert, T-5

In final we reached 92.53% accuracy on dev dataset.

Here is the directory structure :

LE：Root directory
- test：A single json file contains answers for test set.
- script：PyTorch source code here
- model：PyTorch model here

Because we have used Ensemble Learning strategy,there are many generated files in the ./LE/model and ./LE/script directories.

model directory

in this directory,there saved all PyTorch model we have trained.

Model integration

model_name-all：representing for model trained with full train dataset

model_name-n：representing for the no.n base learner model trained with re-sampling dataset.

Introduction WordNet

bert-base-uncased-none：representing for model trained without WordNet

bert-base-uncased-wn：representing for model trained with WordNet

script directory：

we have submitted scripts of three different models.

Model-albert：

codes about Albert model

Model-bert：

codes about Bert model

Model-t5：

codes about T5 model

Result：

here are files record scores of base learner for each option.

combination :

here are files record results after ranking and combining flies in ./model/result directories

model_ensembles :

here are files record options scores after normalization

test directory：

the result json file on test dataset

How to reproduct the result：

dev dataset：

run ./final_dev.sh in ./LE/script/Model-n respectively ,where you can get options scores files after normalization
run ./ensemble_dev.sh in ./LE/script directory,where you can get accuracy about ensembled model on dev dataset.

test dataset：

run ./final_test.sh in ./LE/script/Model-n directory respectively ,where you can get options scores files after normalization.
run ./ensemble_test.sh in ./LE/script directory ,where you can get the result json file saved ./LE/test.

If you want to retrain model

run makeSubData.py in ./LE/script/Model-albert directory make to get resampling subdataset.
run data_utils.py in ./LE/script/Model-n directory to package dataset to .pt file.
run ./run.sh in ./LE/script/Model-n directory.
cancel the annotation of first line in ./final_dev.sh,run it and you can get the score file from new model.
run the steps mentioned above.

Name	Name	Last commit message	Last commit date
Latest commit History 2 Commits
model	model
script	script
test	test
README.md	README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LE

Task description

model directory

Model integration

Introduction WordNet

script directory：

Model-albert：

Model-bert：

Model-t5：

Result：

combination :

model_ensembles :

test directory：

How to reproduct the result：

dev dataset：

test dataset：

If you want to retrain model

About

Uh oh!

Releases

Packages

Languages

Search code, repositories, users, issues, pull requests...

Crispig/LE

Folders and files

Latest commit

History

Repository files navigation

LE

Task description

model directory

Model integration

Introduction WordNet

script directory：

Model-albert：

Model-bert：

Model-t5：

Result：

combination :

model_ensembles :

test directory：

How to reproduct the result：

dev dataset：

test dataset：

If you want to retrain model

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages