A Retrieval-Augmented Generation (RAG) powered medical FAQ chatbot that provides accurate, context-aware answers to medical questions using semantic search and AI generation.
- 🔍 Semantic Search: Uses FAISS vector database for fast, accurate document retrieval
- 🤖 AI-Powered Responses: Leverages GPT-3.5-turbo for natural language generation
- 💬 Interactive Chat Interface: Beautiful Streamlit web app with dark theme
- 📚 Medical Knowledge Base: Curated medical FAQ dataset
- ⚡ Fast Retrieval: Optimized vector search for instant responses
- 🎯 Context-Aware: Provides relevant answers based on retrieved medical information
User Query → Embedding → FAISS Search → Context Retrieval → LLM Generation → Response
- Vector Database: FAISS index for efficient similarity search
- Embedding Model: SentenceTransformers (
all-MiniLM-L6-v2) - LLM: OpenAI GPT-3.5-turbo via OpenRouter
- Web Interface: Streamlit application
- Knowledge Base: Medical FAQ dataset
MediQueryAI/
├── app.py # Streamlit web application
├── rag_chatbot.py # RAG pipeline implementation
├── build_faiss_index.py # FAISS index builder
├── embeddings.py # Basic embedding demo
├── sample_data.py # Sample data generator
├── train.csv # Medical FAQ dataset
├── faiss_index.index # FAISS vector index
├── documents.pkl # Serialized documents
├── requirements.txt # Python dependencies
└── README.md # This file
- Python 3.8+
- OpenAI API key (via OpenRouter)
-
Clone the repository
git clone <repository-url> cd MediQueryAI
-
Install dependencies
pip install -r requirements.txt
-
Set up environment variables
# Create .env file echo "OPENROUTER_API_KEY=your_api_key_here" > .env
-
Build the vector database
python build_faiss_index.py
-
Launch the web application
streamlit run app.py
-
Open your browser and navigate to
http://localhost:8501
- Open the Streamlit app
- Type your medical question in the chat input
- Receive AI-generated answers based on the medical knowledge base
- Use the sidebar to clear chat history
The FAISS index is built from the medical FAQ dataset:
python build_faiss_index.pyThis script:
- Loads medical FAQs from
train.csv - Creates embeddings using SentenceTransformers
- Builds FAISS L2 index for fast similarity search
- Saves index and documents for persistence
Test basic embeddings:
python embeddings.pyTest RAG pipeline:
python rag_chatbot.pyGenerate sample data:
python sample_data.py- Model:
all-MiniLM-L6-v2 - Dimensions: 384
- Type: Semantic embeddings optimized for similarity search
- Library: FAISS (Facebook AI Similarity Search)
- Index Type:
IndexFlatL2(exact L2 distance) - Search Time: O(log n) for approximate, O(n) for exact
- Model:
openai/gpt-3.5-turbo - Provider: OpenRouter
- Temperature: 0.7 (balanced creativity/consistency)
- Max Tokens: 300
- Index Size: ~80MB for 60K+ documents
- Query Latency: <2 seconds end-to-end
- Memory Usage: ~500MB for full system
- No Data Storage: Queries are not stored permanently
- API Security: Uses environment variables for API keys
- Local Processing: Embeddings and search performed locally
- Disclaimer: For educational purposes only, not medical advice
The system includes comprehensive testing for:
- Embedding Generation: Verify semantic similarity
- FAISS Search: Test retrieval accuracy
- RAG Pipeline: End-to-end query processing
- Web Interface: User interaction flows
- Support for larger medical datasets
- Multi-modal inputs (images, documents)
- Advanced ranking algorithms
- User feedback integration
- Medical source citations
- Multi-language support
This project is licensed under the MIT License - see the LICENSE file for details.
This application is for educational purposes only and does not provide medical advice. Always consult with qualified healthcare professionals for medical concerns. The AI responses should not be used as a substitute for professional medical diagnosis or treatment.
- FAISS: Facebook AI Research for vector search
- SentenceTransformers: UKP Lab for embedding models
- OpenAI: For GPT models via OpenRouter
- Streamlit: For the web framework
- Medical Community: For the knowledge base inspiration