The Wayback Machine - https://web.archive.org/web/20220513131719/https://github.com/topics/text-preprocessing
Skip to content
#

text-preprocessing

Here are 116 public repositories matching this topic...

texthero
henrifroese
henrifroese commented Sep 23, 2020

It would be great to have more friendly and funny doctest text content (instead of "Aha", "Text", ...). It's also nicer for users if the docstring examples are all similar.

One idea, for instance, is to use famous sentences said by movie Superheroes. Here are a few examples:

  • I have the power!
  • Flame on!
  • HULK SMASH!
  • Holy ____ Batman!
  • I am the vengeance, I am the night, I am BATMAN!
good first issue testing
adbar
adbar commented Jan 9, 2020

I have mostly tested trafilatura on a set of English, German and French web pages I had run into by surfing or during web crawls. There are definitely further web pages and cases in other languages for which the extraction doesn't work so far.

Corresponding bug reports can either be filed as a list in an issue like this one or in the code as XPath expressions in [xpaths.py](https://github.com

good first issue up for grabs

My version of topic modelling using Latent Dirichlet Allocation (LDA) which finds the best number of topics for a set of documents using ldatuning package which comes with different metrics

  • Updated Nov 15, 2018
  • R

Hotels play a crucial role in travelling and with the increased access to information new pathways of selecting the best ones emerged. With this model, you can explore what makes a great hotel and maybe even use this model in your trip planning.

  • Updated Sep 6, 2021
  • Jupyter Notebook

Assignment-11-Text-Mining-01-Elon-Musk, Perform sentimental analysis on the Elon-musk tweets (Exlon-musk.csv), Text Preprocessing: remove both the leading and the trailing characters, removes empty strings, because they are considered in Python as False, Joining the list into one string/text, Remove Twitter username handles from a given twitter text. (Removes @usernames), Again Joining the list into one string/text, Remove Punctuation, Remove https or url within text, Converting into Text Tokens, Tokenization, Remove Stopwords, Normalize the data, Stemming (Optional), Lemmatization, Feature Extraction, Using BoW CountVectorizer, CountVectorizer with N-grams (Bigrams & Trigrams), TF-IDF Vectorizer, Generate Word Cloud, Named Entity Recognition (NER), Emotion Mining - Sentiment Analysis.

  • Updated Sep 3, 2021
  • Jupyter Notebook

Improve this page

Add a description, image, and links to the text-preprocessing topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the text-preprocessing topic, visit your repo's landing page and select "manage topics."

Learn more

Morty Proxy This is a proxified and sanitized view of the page, visit original site.