Questions tagged [natural-language-processing]
The field of natural language processing covers attempts to make sense of text in a human language using computers
118 questions
5
votes
2
answers
99
views
Simple Word-Based Text Truncator
I created a Python 3.11 utility that truncates an input string to a fixed word count—splitting on any whitespace, collapsing runs, and dropping trailing stop-words—so you get clean, concise snippets ...
2
votes
3
answers
116
views
Splitting input text into fixed-size overlapping word chunks
I’ve implemented a small utility function in Python 3.11 that takes an input string, splits it into word-based chunks of a given size, and allows a specified overlap between consecutive chunks. This ...
5
votes
1
answer
120
views
Local Search Engine in Rust
I made a simple search engine using the xkcd API in Rust which turned out better than I'd hoped for!
I decided to use tf-idf as a way to rank results, which I feel like has some room for improvement. ...
6
votes
2
answers
128
views
Creating csvs using Pandas on large dataset for document retrieval
I am trying to build a useable NLP corpus but getting bottlenecked by how long the program takes (200 hours). With so much data I know that optimizing my code even a little bit will net me huge time ...
3
votes
1
answer
144
views
NLP pre-processing function optimization as it is extremely low on 92MB data set
I have a data set that is of 300,000 rows approximately and two columns, each row contains a string, some might be larger than others. All in all, the data set in a ...
7
votes
4
answers
452
views
Separating a String of Text into Separate Words in Python
Occasionally, we want to do a rudimentary parsing on English text; we separate the text into separate words.
...
1
vote
1
answer
265
views
Short Text Pre-processing
For educational purpose I am preprocessing multiple short texts containing the description of the symptoms of cars fault.
The text is written by humans and is rich in misspelling, capital letters and ...
2
votes
1
answer
126
views
Python voice assistant that acts on trigger phrases
I made a Python voice assistant. It takes the user's voice input and there are multiple if-else statements that specify a condition and if it satisfies that condition it executes a specific function. ...
2
votes
1
answer
132
views
Remove duplicates from a Pandas dataframe taking into account lowercase letters and accents
I have the following DataFrame in pandas:
code
town
district
suburb
02
Benalmádena
Málaga
Arroyo de la Miel
03
Alicante
Jacarilla
Jacarilla, Correntias Bajas (Jacarilla)
04
Cabrera d'Anoia
Barcelona
...
3
votes
0
answers
761
views
Rust code implementing cosine similarity
I've been trying to create a piece of code which consists of looping through each element of a list of questions, preprocess it, and then calculate the Cosine similarity with the rest of the elements (...
2
votes
2
answers
359
views
Markov text generator program in Python
This is my first non-trivial program in my Python. I am coming from a Java background and I might have messed up or ignored some conventions. I would like to hear feedback on my code.
...
1
vote
1
answer
212
views
Finding a path from one wikipedia page to another using semantic similarity of links (Spacy)
I've just picked coding back up for the first time in a long time, so I understand if your eyes bleed looking at this code. It all works, but I'd be grateful for any tips (how to improve the python ...
1
vote
1
answer
106
views
IDF Function with a list of list
I wanted to build a Inverse Document Frequency function, because in my opinion was not easy to do with scikit and I wanted also to show how it works for educational reasons.
Also reading this question ...
2
votes
0
answers
62
views
Looping over files to create a dataframe
As part of my NLP project at work, I want to loop over all files that are either PDF of docx in the same directory. The end purpose is to create a dataframe with text content of the files in one ...
2
votes
2
answers
249
views
Text Normalizer
I am working on a text normalizer. It works just fine with small text files but takes a very long time with large text files such as 5 MB or more.
Is there anything to change in the code to make it ...