text-preprocessing
Here are 116 public repositories matching this topic...
-
Updated
Apr 10, 2022 - Python
I have mostly tested trafilatura on a set of English, German and French web pages I had run into by surfing or during web crawls. There are definitely further web pages and cases in other languages for which the extraction doesn't work so far.
Corresponding bug reports can either be filed as a list in an issue like this one or in the code as XPath expressions in [xpaths.py](https://github.com
-
Updated
Feb 5, 2020 - Python
-
Updated
Mar 27, 2022 - Python
-
Updated
Aug 10, 2021 - Jupyter Notebook
-
Updated
May 21, 2021 - C++
-
Updated
Sep 22, 2020 - Jupyter Notebook
-
Updated
Jan 1, 2021 - Python
-
Updated
May 7, 2022 - Python
-
Updated
Oct 17, 2018 - Jupyter Notebook
-
Updated
Sep 21, 2021 - Python
-
Updated
May 8, 2020 - Jupyter Notebook
-
Updated
Aug 16, 2020 - JavaScript
-
Updated
Dec 24, 2019 - Jupyter Notebook
-
Updated
Mar 11, 2018 - Python
-
Updated
Oct 31, 2020 - Jupyter Notebook
-
Updated
Apr 6, 2020 - Jupyter Notebook
-
Updated
Nov 15, 2018 - R
-
Updated
Apr 17, 2021 - Python
-
Updated
Jun 1, 2020 - Jupyter Notebook
-
Updated
Mar 8, 2019 - Python
-
Updated
Dec 8, 2021 - Jupyter Notebook
-
Updated
May 10, 2020 - Python
-
Updated
Feb 23, 2020 - Jupyter Notebook
-
Updated
Sep 6, 2021 - Jupyter Notebook
-
Updated
Sep 29, 2020 - Python
-
Updated
Sep 15, 2019 - Jupyter Notebook
-
Updated
Sep 3, 2021 - Jupyter Notebook
Improve this page
Add a description, image, and links to the text-preprocessing topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the text-preprocessing topic, visit your repo's landing page and select "manage topics."


It would be great to have more friendly and funny doctest text content (instead of "Aha", "Text", ...). It's also nicer for users if the docstring examples are all similar.
One idea, for instance, is to use famous sentences said by movie Superheroes. Here are a few examples: