Synonyms Dataset

The dataset is a set of 500 synsets (extracted from the Arabic Wordnet). Each synset is enriched with a list of candidate synonyms. The total number is 3K candidates. Each candidate synonym is then annotated with a fuzzy value by four linguists (in parallel). The dataset is important for understanding how much linguists (dis/)agree on synonymy (which we found RMSE: 32% and MAE: 27%). In addition, we used the dataset as a baseline to evaluate our algorithm. See the scoring guidelines, figures, and details in the paper section 3.

License: CC-BY-4.0

Online Demo

You can try our algorithm using the demo link below
https://sina.birzeit.edu/synonyms/

Credits

This research is funded by the Research Committee at Birzeit University (No. 2021/49)

Citation

Sana Ghanem, Mustafa Jarrar, Radi Jarrar, Ibrahim Bounhas: A Benchmark and Scoring Algorithm for Enriching Arabic Synonyms. In Proceedings of the Global WordNet Conference (gwc2023), Donostia, January. 2023

@inproceedings{GJJB23,
    title = {A Benchmark and Scoring Algorithm for Enriching Arabic Synonyms},
    author = {Sana Ghanem and Mustafa Jarrar and Radi Jarrar and Ibrahim Bounhas},
    isbn = {978-9-464027-31-0},
    booktitle = {Proceedings of the 12th International Global Wordnet Conference (GWC2023)},
    pages = {},
    location = {San Sebastian},
    month = {Jan},
    year = {2023},
    publisher = {Global Wordnet Association},
    keywords ={Synonyms, Synset, WordNet, Dictionary, Arabic, Multilingual lexicons, Online dictionary, Language resources, Lexical semantics, NLP},
    abstract = {Synonymy relationships are used in many NLP tasks and knowledge organization systems. However, automatic synonym extraction is a challenging task, especially for low-resourced and highly ambiguous languages such as Arabic. This paper addresses the task of extending a given synset with additional synonyms taking into account synonymy strength as a fuzzy value. In other words, given a mono/multilingual synset and a threshold (a fuzzy value $[0-1]$), our goal is to extract new synonyms above this threshold from existing lexicons. We present twofold contributions: an algorithm and a benchmark dataset. The dataset consists of 3K candidate synonyms for 500 synsets. Each candidate synonym is annotated with a fuzzy value by four linguists. The dataset is important for (i) understanding how much linguists (dis/)agree on synonymy, in addition to (ii) using the dataset as a baseline to evaluate our algorithm. Our proposed algorithm extracts synonyms from existing lexicons and computes a fuzzy value for each candidate. Our evaluations (using the one-way ANOVA test) show that the algorithm behaves like a linguist and its fuzzy values are close to those proposed by linguists (using RMSE and MAE).},
    url={http://www.jarrar.info/publications/GJJB23.pdf}
}

Contacts

Mustafa Jarrar: Linkedin | Twitter | Github | Mail

Name	Name	Last commit message	Last commit date
Latest commit History 16 Commits 16 Commits
LICENSE	LICENSE
README.md	README.md
Synonyms Dataset.xlsx	Synonyms Dataset.xlsx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Synonyms Dataset

Online Demo

Credits

Citation

Contacts

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Search code, repositories, users, issues, pull requests...

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Synonyms Dataset

Online Demo

Credits

Citation

Contacts

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages