Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Automatically extracts, processes, and indexes code snippets from GitHub repositories into a searchable vector database.

License

Notifications You must be signed in to change notification settings

cheolwanpark/snippets

Open more actions menu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SNIPPETS

Snippets is an intelligent code repository system that automatically extracts, processes, and indexes code snippets from GitHub repositories into a searchable vector database. It combines AI-powered code analysis with semantic search capabilities to help developers quickly find relevant code examples and patterns.

Key Features

  • 🚀 Use Your Claude Subscription: Access this tool with your Claude subscription plan. no extra cost!
  • 🤖 Automated Repository Processing: Extract meaningful code snippets from any GitHub repository
  • 🔍 Semantic Search: Find code by meaning, not just keywords, using vector embeddings
  • 🔧 MCP Integration: Seamless integration with Claude Code through Model Context Protocol
  • ⚡ Background Processing: Efficient queue-based processing for large repositories
  • 🐳 Docker Ready: Complete containerized setup for easy deployment

How It Works

  1. Repository Ingestion: Add GitHub repositories through the web interface
  2. AI Processing: Claude Code agents analyze and extract meaningful code snippets
  3. Vector Embedding: Code snippets are converted to semantic vectors using state-of-the-art models
  4. Smart Storage: Snippets are indexed in Qdrant vector database for fast similarity search
  5. Easy Discovery: Search and explore code through the web UI or integrate with Claude Code via MCP

Quick Start

Prerequisites

  • Docker & Docker Compose: For containerized deployment
  • API Keys:
    • Claude Code OAuth Token (from you subscription, claude setup-token)
    • Google Gemini API key (for embeddings, you can claim Free API Key)

1. Clone and Setup

git clone https://github.com/cheolwanpark/snippets
cd snippets

2. Environment Configuration

Create your environment file:

cp docker/.env.example docker/.env

Edit docker/.env with your API keys:

# Required: Claude API for code analysis
CLAUDE_CODE_OAUTH_TOKEN=your_claude_token_here

# Required: Gemini API for embeddings
EMBEDDING_API_KEY=your_gemini_api_key_here

# Optional: GitHub PAT, required for PRIVATE repository access
GITHUB_TOKEN=your_github_pat_here

# Optional: Cohere API key, required for reranking
COHERE_API_KEY=your_cohere_api_key_here

# Optional: Customize ports
FRONT_PORT=3000
MCP_PORT=8080

3. Launch with Docker

cd docker
docker-compose up -d

This starts all services:

4. First Repository

  1. Open http://localhost:3000 in your browser
  2. Enter a GitHub repository URL (e.g., https://github.com/user/repo)
  3. Click 'Embed'
  4. Monitor progress in the dashboard
  5. Search your snippets in 'Query' tab once processing completes!

5. Connect MCP server to claude

claude mcp add --transport http snippets http://localhost:8080/mcp

This enables Claude Code to search your processed snippets directly during development.

Configuration

Essential Environment Variables

The system requires minimal configuration for most use cases:

Required API Keys

# Claude API token for AI-powered code analysis
CLAUDE_CODE_OAUTH_TOKEN=your_claude_api_token

# Google Gemini API key for generating embeddings
EMBEDDING_API_KEY=your_gemini_api_key

Optional Customization

# Service Ports
FRONT_PORT=3000              # Frontend web interface
MCP_PORT=8080               # MCP server port

# Database Configuration (use defaults unless you have existing instances)
QDRANT_URL=http://qdrant:6333
REDIS_URL=redis://redis:6379

# Processing Settings
EMBEDDING_MODEL=gemini-embedding-001    # Embedding model to use
PIPELINE_MAX_CONCURRENCY=10             # Concurrency setup - how many files will be processed in parallel

Getting API Keys

  • Claude API Token: type claude setup-token in your terminal
  • Gemini API Key: Get one from Google AI Studio

Usage

Web Interface

The web interface at http://localhost:3000 provides a complete repository management experience:

Repository Management

  1. Add Repository:

    • Go to 'Embed' Tab
    • Enter GitHub URL (private repositories supported with your PAT token)
    • Optional: Configure processing options (file filters, size limits)
    • Start processing
  2. Monitor Progress:

    • Real-time processing status
    • Progress indicators for extraction phases
    • Error reporting for failed repositories
  3. Repository Settings:

    • Edit processing configuration
    • Re-process with different settings
    • Remove repositories and their snippets

Snippet Search

  1. Semantic Search:

    • Go to 'Query' Tab
    • Enter natural language queries ("authentication middleware", "error handling")
    • Use technical terms ("async function", "React hooks")
    • Search by programming concepts
    • Optional: Configure search options (repository name, language)
  2. Explore Snippets:

    • View code with syntax highlighting
    • See file context and repository source
    • Copy snippets for use in your projects

MCP Integration with Claude Code

The Snippets MCP server enables seamless integration with Claude Code for enhanced development workflows.

Setup MCP Connection

# in your project directory
claude mcp add --transport http snippets http://localhost:8080/mcp
# OR configure the mcp server for user scope
claude mcp add -s user --transport http snippets http://localhost:8080/mcp
# $CODEX_HOME/config.toml
[mcp_servers.snippets]
url = "http://localhost:8080/mcp"

Using the Search Tool

Once connected, Claude Code gains access to the search tool:

# Find error handling patterns
search error handling patterns in Python. use snippets.

# Find authentication patterns
search JWT authentication middleware. use snippets.

# Language-specific search
search async database queries. use snippets.

# Repository-specific search
search React component patterns. use snippets.

Search Parameters (will be configured automatically by claude code):

  • query: Natural language description of what you're looking for
  • limit: Number of results (default: 10, max: 50)
  • repo_name: Filter to specific repository
  • language: Filter by programming language

Workflow Integration

Use the MCP integration for:

  • Code Discovery: Find examples before implementing new features
  • Pattern Research: Explore different approaches to common problems
  • Learning: Understand how concepts are implemented across repositories
  • Code Review: Find similar implementations for comparison

Architecture

System Overview

Snippets is built as a microservices architecture with clear separation of concerns:

┌─────────────────┐    ┌───────────────────┐    ┌─────────────────┐
│   Frontend      │    │  API/MCP Server   │    │   Worker Pool   │
│   (Next.js)     │◄──►│ (FastAPI/FastMCP) │◄──►│   (RQ/Redis)    │
└─────────────────┘    └───────────────────┘    └─────────────────┘
                                │                        │
                                ▼                        ▼
                       ┌──────────────────┐    ┌─────────────────┐
                       │   Vector DB      │    │   Message Queue │
                       │   (Qdrant)       │    │   (Redis)       │
                       └──────────────────┘    └─────────────────┘

Components

Frontend (Next.js)

  • Location: front/
  • Purpose: Web interface for repository management and snippet search
  • Technology: Next.js 14, React, TypeScript, Tailwind CSS
  • Features:
    • Repository CRUD operations
    • Real-time processing status
    • Semantic search interface
    • Responsive design with dark/light themes

API Server (FastAPI)

  • Location: src/api/
  • Purpose: REST API for all backend operations
  • Technology: FastAPI, Python 3.12, Pydantic
  • Endpoints:
    • Repository management (/repo)
    • Snippet search (/snippets)

Worker System (RQ/Redis)

  • Location: src/worker/
  • Purpose: Background processing of repositories
  • Technology: RQ (Redis Queue), Redis
  • Responsibilities:
    • Repository cloning and analysis
    • Code snippet extraction using AI agents
    • Vector embedding generation
    • Database storage operations

Vector Database (Qdrant)

  • Purpose: Store and search code snippet embeddings
  • Technology: Qdrant vector database
  • Features:
    • High-performance similarity search
    • Metadata filtering (language, repository)
    • Scalable vector storage
    • Built-in clustering and indexing

MCP Server (FastMCP)

  • Location: src/mcpserver/
  • Purpose: Model Context Protocol integration
  • Technology: FastMCP, mounted on main API
  • Tools Provided:
    • search: Semantic snippet search for Claude Code

Data Flow

Repository Processing Flow

  1. User Input: Repository URL submitted via web interface
  2. API Validation: URL validation and repository metadata extraction
  3. Queue Job: Processing job added to Redis queue
  4. Worker Processing:
    • Clone repository to temporary storage
    • Filter files by type and size
    • Extract meaningful code snippets using Claude Code Agents
    • Generate vector embeddings for each snippet
    • Store in Qdrant with metadata
  5. Status Updates: Real-time status updates via polling
  6. Completion: Repository marked as processed, snippets available for search

Search Flow

  1. Query Input: User enters search query (Web UI or MCP)
  2. Query Embedding: Convert query to vector using same embedding model
  3. Vector Search: Qdrant performs similarity search with filters
  4. Result Ranking: Results reranked by cohere rerank API
  5. Response: Formatted results returned with snippet metadata

Technology Stack

  • FastAPI: Web framework and API server
  • RQ + Redis: Task queue and caching
  • Qdrant: Vector database
  • Claude Agent Toolkit: Claude Code powered analysis
  • Google Gemini: Text embedding generation
  • Next.js: Frontend
  • Docker Compose: Deployment

About

Automatically extracts, processes, and indexes code snippets from GitHub repositories into a searchable vector database.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  
Morty Proxy This is a proxified and sanitized view of the page, visit original site.