Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

rkpm22/MediQueryAI

Open more actions menu

Repository files navigation

🏥 MediQuery AI

A Retrieval-Augmented Generation (RAG) powered medical FAQ chatbot that provides accurate, context-aware answers to medical questions using semantic search and AI generation.

🌟 Features

  • 🔍 Semantic Search: Uses FAISS vector database for fast, accurate document retrieval
  • 🤖 AI-Powered Responses: Leverages GPT-3.5-turbo for natural language generation
  • 💬 Interactive Chat Interface: Beautiful Streamlit web app with dark theme
  • 📚 Medical Knowledge Base: Curated medical FAQ dataset
  • ⚡ Fast Retrieval: Optimized vector search for instant responses
  • 🎯 Context-Aware: Provides relevant answers based on retrieved medical information

🏗️ Architecture

User Query → Embedding → FAISS Search → Context Retrieval → LLM Generation → Response

Core Components

  1. Vector Database: FAISS index for efficient similarity search
  2. Embedding Model: SentenceTransformers (all-MiniLM-L6-v2)
  3. LLM: OpenAI GPT-3.5-turbo via OpenRouter
  4. Web Interface: Streamlit application
  5. Knowledge Base: Medical FAQ dataset

📁 Project Structure

MediQueryAI/
├── app.py                 # Streamlit web application
├── rag_chatbot.py         # RAG pipeline implementation
├── build_faiss_index.py   # FAISS index builder
├── embeddings.py          # Basic embedding demo
├── sample_data.py         # Sample data generator
├── train.csv             # Medical FAQ dataset
├── faiss_index.index     # FAISS vector index
├── documents.pkl         # Serialized documents
├── requirements.txt      # Python dependencies
└── README.md            # This file

🚀 Quick Start

Prerequisites

  • Python 3.8+
  • OpenAI API key (via OpenRouter)

Installation

  1. Clone the repository

    git clone <repository-url>
    cd MediQueryAI
  2. Install dependencies

    pip install -r requirements.txt
  3. Set up environment variables

    # Create .env file
    echo "OPENROUTER_API_KEY=your_api_key_here" > .env
  4. Build the vector database

    python build_faiss_index.py
  5. Launch the web application

    streamlit run app.py
  6. Open your browser and navigate to http://localhost:8501

🔧 Usage

Web Interface

  1. Open the Streamlit app
  2. Type your medical question in the chat input
  3. Receive AI-generated answers based on the medical knowledge base
  4. Use the sidebar to clear chat history

🛠️ Development

Building the Vector Index

The FAISS index is built from the medical FAQ dataset:

python build_faiss_index.py

This script:

  • Loads medical FAQs from train.csv
  • Creates embeddings using SentenceTransformers
  • Builds FAISS L2 index for fast similarity search
  • Saves index and documents for persistence

Testing Individual Components

Test basic embeddings:

python embeddings.py

Test RAG pipeline:

python rag_chatbot.py

Generate sample data:

python sample_data.py

📊 Technical Details

Embedding Model

  • Model: all-MiniLM-L6-v2
  • Dimensions: 384
  • Type: Semantic embeddings optimized for similarity search

Vector Search

  • Library: FAISS (Facebook AI Similarity Search)
  • Index Type: IndexFlatL2 (exact L2 distance)
  • Search Time: O(log n) for approximate, O(n) for exact

LLM Configuration

  • Model: openai/gpt-3.5-turbo
  • Provider: OpenRouter
  • Temperature: 0.7 (balanced creativity/consistency)
  • Max Tokens: 300

Performance

  • Index Size: ~80MB for 60K+ documents
  • Query Latency: <2 seconds end-to-end
  • Memory Usage: ~500MB for full system

🔒 Security & Privacy

  • No Data Storage: Queries are not stored permanently
  • API Security: Uses environment variables for API keys
  • Local Processing: Embeddings and search performed locally
  • Disclaimer: For educational purposes only, not medical advice

🧪 Testing

The system includes comprehensive testing for:

  • Embedding Generation: Verify semantic similarity
  • FAISS Search: Test retrieval accuracy
  • RAG Pipeline: End-to-end query processing
  • Web Interface: User interaction flows

📈 Future Enhancements

  • Support for larger medical datasets
  • Multi-modal inputs (images, documents)
  • Advanced ranking algorithms
  • User feedback integration
  • Medical source citations
  • Multi-language support

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

⚠️ Disclaimer

This application is for educational purposes only and does not provide medical advice. Always consult with qualified healthcare professionals for medical concerns. The AI responses should not be used as a substitute for professional medical diagnosis or treatment.

🙏 Acknowledgments

  • FAISS: Facebook AI Research for vector search
  • SentenceTransformers: UKP Lab for embedding models
  • OpenAI: For GPT models via OpenRouter
  • Streamlit: For the web framework
  • Medical Community: For the knowledge base inspiration

About

A RAG powered medical FAQ chatbot that provides accurate, context-aware answers to medical questions using semantic search and AI generation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

Morty Proxy This is a proxified and sanitized view of the page, visit original site.