text-preprocessing

It would be great to have more friendly and funny doctest text content (instead of "Aha", "Text", ...). It's also nicer for users if the docstring examples are all similar.

One idea, for instance, is to use famous sentences said by movie Superheroes. Here are a few examples:

I have the power!
Flame on!
HULK SMASH!
Holy ____ Batman!
I am the vengeance, I am the night, I am BATMAN!

I have mostly tested trafilatura on a set of English, German and French web pages I had run into by surfing or during web crawls. There are definitely further web pages and cases in other languages for which the extraction doesn't work so far.

Corresponding bug reports can either be filed as a list in an issue like this one or in the code as XPath expressions in [xpaths.py](https://github.com

Apr	MAY	Jun
	13
2021	2022	2023

text-preprocessing

Here are 116 public repositories matching this topic...

jbesomi / texthero

Matching Content in our Doctests

Add version attribute

My wordcloud looks ugly. Which argument to change to make it look cleaner?

jfilter / clean-text

adbar / trafilatura

List of smaller extraction bugs (text & metadata)

lyeoni / prenlp

Lipairui / textgo

ezgisubasi / turkish-tweets-sentiment-analysis

jeongukjae / python-mecab

ksnugroho / basic-text-preprocessing

fmpr / texttk

berknology / text-preprocessing

csebuetnlp / normalizer

Abhishekmamidi123 / 100DaysOfMLCode

jangedoo / jange

alaradirik / TR-NLP-workshop

Ankur3107 / nlp_preprocessing

VipinJain1 / VIP-Machine-Learning-Exercises-and-Practices

praneetmehta / reSEARCH

byam / mnlp

khuyentran1401 / Extract-text-from-article

bademiya21 / Topic-Modeling-with-Automated-Determination-of-the-Number-of-Topics

carrliitos / NLPInformationExtraction

AbeerAbuZayed / Hate-Speech-Detection_OSACT4-Workshop

paul-pias / Text-Preprocessing-in-Bangla-and-English

krisograbek / text-preprocessing

anshul1004 / InformationRetrieval

Nourshosharah / introduction-to-natural-language-processing-in-python

acoustician / Tripadvisory-review-Rating-prediction-

Giuseppe-Della-Corte / It-Chapterize

ashwin4glory / Quora-Question-Pair-Similarity

vaitybharati / Assignment-11-Text-Mining-01-Elon-Musk

Improve this page

Add this topic to your repo