Offensive Language Identification in Dravidian Languages at EACL2021 Workshop

Our source code for EACL2021 workshop: Offensive Language Identification in Dravidian Languages. We ranked 4th,4th and 3th in Tamil, Malayalam and Kannada language of this task finally!🥳

Updated: Source code is released!🤩

I will release the code very soon.

Repository structure

├── README.md
├── ckpt                        # store model weights during training
│   └── README.md
├── data                        # store the data
│   └── README.md
├── gen_data.py                 # generate Dataset
├── install_cli.sh              # install required package
├── loss.py                     # loss function
├── main_xlm_bert.py            # train mulingual-BERT
├── main_xlm_roberta.py         # train XLM-RoBERTa
├── model.py                    # model implementation
├── pred_data
│   └── README.md
├── preprocessing.py            # preprocess the data
├── pretrained_weights          # store the pretrained weights
│   └── README.md
└── train.py                    # define training and validation loop

Installation

Use the following command so that you can install all of required packages:

sh install_cli.sh

Preprocessing

The first step is to preprocess the data. Just use the following command:

python3 -u preprocessing.py

Training

The second step is to train our model. In our solution, We trained two models which use multilingual-BERT and XLM-RoBERTa as the encoder, respectively.

If you want to train model which use multilingual-BERT as the encoder, use the following command:

nohup python3 -u main_xlm_bert.py \
        --base_path your base path \
        --batch_size 8 \
        --epochs 50 \
        > train_xlm_bert_log.log 2>&1 &

If you want to train model which use XLM-RoBERTa as the encoder, use the following command:

nohup python3 -u main_xlm_roberta.py \
        --base_path your base path \
        --batch_size 8 \
        --epochs 50 \
        > train_xlm_roberta_log.log 2>&1 &

Inference

The final step is inference after training. Use the following command:

nohup python3 -u inference.py > inference.log 2>&1 &

Congralutions! You have got the final results!🤩

If you use our code, please indicate the source.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Offensive Language Identification in Dravidian Languages at EACL2021 Workshop

Repository structure

Installation

Preprocessing

Training

Inference

About

Uh oh!

Releases

Packages

Languages

Name	Name	Last commit message	Last commit date
Latest commit History 16 Commits
ckpt	ckpt
data	data
pred_data	pred_data
pretrained_weights	pretrained_weights
README.md	README.md
gen_data.py	gen_data.py
install_cli.sh	install_cli.sh
loss.py	loss.py
main_xlm_bert.py	main_xlm_bert.py
main_xlm_roberta.py	main_xlm_roberta.py
model.py	model.py
preprocessing.py	preprocessing.py
train.py	train.py

Search code, repositories, users, issues, pull requests...

codewithzichao/Multilingual-Transformers

Folders and files

Latest commit

History

Repository files navigation

Offensive Language Identification in Dravidian Languages at EACL2021 Workshop

Repository structure

Installation

Preprocessing

Training

Inference

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages