Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

hamzamemon/TextRetrievalSystem

Open more actions menu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TextRetrievalSystem

This Text Retrieval system indexes files located in src/main/resources/data/ and allows you to search through them using a single query. This performs the Porter2 Stemming Algorithm on each word in the files and in the input to group like words (such as generous and generosity).

Set-Up

  1. Put .txt files into src/main/resources/data/ that you would like to search through
  2. Add or remove words from src/main/java/process/stoplist.txt to have them ignored. Stop words do not contribute to the cosine normalization.
  3. Run src/main/java/index/Invert.java to index the files
  4. Run one of these files
    • Run src/main/java/search/Driver.java to run a normal query
    • Run src/main/java/search/VSMTester.java to perform cosine normalization and be returned the top 1000 documents
  5. Your query's results should be saved in the top level directory

Queries

  1. NOT: NOT x returns all documents that do not contain x
  2. AND: x AND y returns all documents that contain x and y
  3. OR: x OR y returns all documents that contain x, y or both

About

Indexes and searches through text files

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

Morty Proxy This is a proxified and sanitized view of the page, visit original site.