Sum4Simp

Codes and the dataset for the paper "Exploiting Summarization Data to Help Text Simplification"(https://aclanthology.org/2023.eacl-main.3.pdf)

The S4S dataset is a stardard sentence simplification dataset mentioned in the paper. You could also mix them all for data augmentation.

If you want to obtain the aligned sentence pairs yourself, you should download the CNN and DM datasets at first. Then, you need to run 'python align.py'.

If you want to filter the suitable sentence pairs from the aligned pairs, you should calculate the attribute values at first. We have upload some example files (for WikiLarge) and you could run 'python filter.py' and check out the total scores. You could set a threshold to filter the pairs you need.

If you have any questions, please contact us: sunrenliangpku@gmail.com

Name	Name	Last commit message	Last commit date
Latest commit History 27 Commits
Aligned sentence pairs	Aligned sentence pairs
S4S dataset	S4S dataset
LICENSE	LICENSE
README.md	README.md
align.py	align.py
dict.txt	dict.txt
filter.py	filter.py
lexicon.tsv	lexicon.tsv
sari_value.txt	sari_value.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Sum4Simp

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Search code, repositories, users, issues, pull requests...

License

RLSNLP/Sum4Simp

Folders and files

Latest commit

History

Repository files navigation

Sum4Simp

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages