Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Our source code for EACL2021 workshop: Offensive Language Identification in Dravidian Languages. We ranked 4th, 4th and 3rd in Tamil, Malayalam and Kannada language of this task finally!πŸ₯³

Notifications You must be signed in to change notification settings

codewithzichao/Multilingual-Transformers

Open more actions menu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

16 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Offensive Language Identification in Dravidian Languages at EACL2021 Workshop

Our source code for EACL2021 workshop: Offensive Language Identification in Dravidian Languages. We ranked 4th,4th and 3th in Tamil, Malayalam and Kannada language of this task finally!πŸ₯³

Updated: Source code is released!🀩

I will release the code very soon.

Repository structure

β”œβ”€β”€ README.md
β”œβ”€β”€ ckpt                        # store model weights during training
β”‚Β Β  └── README.md
β”œβ”€β”€ data                        # store the data
β”‚Β Β  └── README.md
β”œβ”€β”€ gen_data.py                 # generate Dataset
β”œβ”€β”€ install_cli.sh              # install required package
β”œβ”€β”€ loss.py                     # loss function
β”œβ”€β”€ main_xlm_bert.py            # train mulingual-BERT
β”œβ”€β”€ main_xlm_roberta.py         # train XLM-RoBERTa
β”œβ”€β”€ model.py                    # model implementation
β”œβ”€β”€ pred_data
β”‚Β Β  └── README.md
β”œβ”€β”€ preprocessing.py            # preprocess the data
β”œβ”€β”€ pretrained_weights          # store the pretrained weights
β”‚Β Β  └── README.md
└── train.py                    # define training and validation loop

Installation

Use the following command so that you can install all of required packages:

sh install_cli.sh

Preprocessing

The first step is to preprocess the data. Just use the following command:

python3 -u preprocessing.py

Training

The second step is to train our model. In our solution, We trained two models which use multilingual-BERT and XLM-RoBERTa as the encoder, respectively.

If you want to train model which use multilingual-BERT as the encoder, use the following command:

nohup python3 -u main_xlm_bert.py \
        --base_path your base path \
        --batch_size 8 \
        --epochs 50 \
        > train_xlm_bert_log.log 2>&1 &

If you want to train model which use XLM-RoBERTa as the encoder, use the following command:

nohup python3 -u main_xlm_roberta.py \
        --base_path your base path \
        --batch_size 8 \
        --epochs 50 \
        > train_xlm_roberta_log.log 2>&1 &

Inference

The final step is inference after training. Use the following command:

nohup python3 -u inference.py > inference.log 2>&1 &

Congralutions! You have got the final results!🀩

If you use our code, please indicate the source.

About

Our source code for EACL2021 workshop: Offensive Language Identification in Dravidian Languages. We ranked 4th, 4th and 3rd in Tamil, Malayalam and Kannada language of this task finally!πŸ₯³

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
Morty Proxy This is a proxified and sanitized view of the page, visit original site.