NLP_tutorials

Natural language processing tutorials

The provided code is a machine learning pipeline to classify disaster-related tweets. It begins by importing necessary libraries and loading the training and test datasets. The tweets are preprocessed by removing articles ("a", "an", "the") using a regular expression. A CountVectorizer is then used to convert the text data into a matrix of token counts. The RidgeClassifier model is trained on this transformed data, and its performance is evaluated using 3-fold cross-validation with the F1 scoring metric. Finally, the model is used to make predictions on the test dataset, and an example tweet is printed to demonstrate preprocessing.

Name	Name	Last commit message	Last commit date
Latest commit History 16 Commits 16 Commits
counterVectorizer	counterVectorizer
Countervectorizor.ipynb	Countervectorizor.ipynb
Dense_sentimentW2vec.ipynb	Dense_sentimentW2vec.ipynb
NLP_GutenbergW2vec.ipynb	NLP_GutenbergW2vec.ipynb
NLP_tut.ipynb	NLP_tut.ipynb
README.md	README.md
youtueb_spam_classifier.ipynb	youtueb_spam_classifier.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP_tutorials

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Search code, repositories, users, issues, pull requests...

Folders and files

Latest commit

History

Repository files navigation

NLP_tutorials

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages