Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

codegraph-ai/CodeGraph

Open more actions menu

Repository files navigation

CodeGraph

Cross-language code intelligence for AI agents and developers.

License

CodeGraph builds a semantic graph of your codebase — functions, classes, imports, call chains — and exposes it through 45 MCP tools, a VS Code extension, and a persistent memory layer. Parses 37 languages via tree-sitter. AI agents get structured code understanding instead of grepping through files.

Quick Start

MCP Server (Claude Code, Cursor, any MCP client)

Add to ~/.claude.json (or your MCP client config):

{
  "mcpServers": {
    "codegraph": {
      "command": "/path/to/codegraph-server",
      "args": ["--mcp"]
    }
  }
}

The server indexes the current working directory automatically.

VS Code Extension

Install the VSIX:

code --install-extension codegraph-0.14.0.vsix

The extension starts the server automatically and registers all tools as Language Model Tools for Copilot.

Rules for AI agents

Pre-configured rule files that teach AI coding agents (Claude, Cursor, Windsurf, Codex, Cline) to use CodeGraph MCP tools before falling back to grep / multi-file reads. Maps natural-language intent to the right codegraph_* tool.

codegraph-ai/codegraph-rules-for-agents

Setup is cp <agent>/codegraph.md ~/<agent>/ (one line per agent — see the rules repo's README).

GitHub Action — PR review in CI

Drop a workflow into your repo to get an automatic code-graph analysis comment on every PR — blast radius, test gaps, stale docs, suggested reviewers. Runs graph-only (no embeddings, no ONNX model), so it's fast and needs no API keys — just the built-in GITHUB_TOKEN.

Copy .github/workflows/codegraph-pr.yml into your repo. The core invocation is a single command:

codegraph-server --graph-only \
  --run-tool codegraph_pr_context \
  --tool-args '{"baseBranch":"main","format":"markdown"}'

This prints a ready-to-post markdown comment. The --graph-only flag skips embedding generation (10-50× faster indexing); --run-tool runs one tool and exits without the MCP stdio handshake — ideal for scripting.


Configuration

MCP Server flags

Flag Default Description
--workspace <path> current dir Directories to index (repeatable for multi-project)
--exclude <dir> Directories to skip (repeatable)
--embedding-model <model> bge-small bge-small (384d, fast), jina-code-v2 (768d, 6× slower), or granite-97m (384d, 32K ctx, ~3× slower)
--full-body-embedding true Embed full function body (~50 lines) for better semantic search and duplicate detection
--max-files <n> 5000 Maximum files to index
--profile <name> all Filter the exposed MCP tool surface to a named subset (see below)
--graph-only off Skip embedding generation — build the graph and serve structural tools only. No ONNX model load, 10-50× faster indexing. Semantic search unavailable. For CI / one-shot graph queries.
--run-tool <name> One-shot mode: index, run a single tool, print its result, exit. No MCP handshake. Pair with --tool-args '<json>'.

--profile — narrow the MCP tool surface

The full 32-tool surface is convenient but inflates the agent's prompt-context cost. A profile exposes only the slice you need (also settable via the CODEGRAPH_TOOL_PROFILE env var):

Profile Tools Use when
all (default) every tool (community + pro) normal sessions
core 8 — search + symbol info + AI context chatty agent sessions where you only need lookups
graph 16 — callers/callees/deps/impact/traverse refactoring + structural analysis
memory 7 — codegraph_memory_* only note-taking / knowledge-base workflows
security pro security tools only (empty on community) pro security audits

VS Code settings

{
  "codegraph.indexOnStartup": true,
  "codegraph.indexPaths": ["/path/to/project-a", "/path/to/project-b"],
  "codegraph.excludePatterns": ["**/cmake-build-debug/**", "**/generated/**"],
  "codegraph.embeddingModel": "bge-small",
  "codegraph.maxFileSizeKB": 1024,
  "codegraph.debug": false
}

Full-body embeddings are enabled by default. Function body text is captured at parse time with zero I/O overhead.

Built-in exclusions (always skipped) cover ~47 directories across three categories:

  • Build / cache: node_modules, target, dist, build, out, .git, __pycache__, vendor, .venv, venv, .tox, .pytest_cache, .mypy_cache, .ruff_cache, .next, .nuxt, .svelte-kit, .parcel-cache, .npm, .yarn, .pnpm-store, .cache, .cargo, .bundle, .gradle, DerivedData, Pods, xcuserdata, cmake-build-*
  • IDE / IaC state: .idea, .vscode-test, .fleet, .terraform, .terragrunt-cache, .serverless
  • Sensitive credential dirs: .aws, .ssh, .gnupg, .kube, .docker

Plus glob patterns for binary archives, native libraries, OS metadata, and secret file extensions (*.pem, *.key, *.p12, *.pfx, *.crt, *.gpg, *.kdbx, SSH key conventions like id_rsa, etc.) — defense in depth against accidentally embedding credentials.


Tools (42 community + 27 pro, 17 security)

Code Analysis (11)

Tool What it does
get_ai_context Primary context tool. Intent-aware (explain/modify/debug/test) with token budgeting. Returns source, related symbols, imports, siblings, debug hints.
get_edit_context Everything needed before editing: source + callers + tests + memories + git history
get_curated_context Cross-codebase context for a natural language query ("how does auth work?")
analyze_impact Blast radius prediction — what breaks if you modify, delete, or rename
analyze_complexity Cyclomatic complexity with breakdown (branches, loops, nesting, exceptions, early returns)
find_circular_deps Detect circular import/dependency chains across files
find_hot_paths Most-called functions ranked by transitive caller count
find_dead_imports Find unused imports — modules imported but never referenced
get_module_summary High-level summary of a directory: file count, functions, language breakdown, top complex functions
search_by_pattern Regex search across function bodies, signatures, names, and docstrings
search_by_error Find functions that throw, catch, or handle specific error types

Code Navigation (13)

Tool What it does
symbol_search Find symbols by name or natural language (hybrid BM25 + semantic search)
get_callers / get_callees Who calls this? What does it call? (with transitive depth)
get_detailed_symbol Full symbol info: source, callers, callees, complexity
get_symbol_info Quick metadata: signature, visibility, kind
get_dependency_graph File/module import relationships with depth control
get_call_graph Function call chains (callers and callees)
find_by_imports Find files importing a module
find_by_signature Search by param count, return type, modifiers
find_entry_points Main functions, HTTP handlers, CLI commands, event handlers
find_implementors Find all functions registered as ops struct callbacks
find_related_tests Tests that exercise a given function
traverse_graph Custom graph traversal with edge/node type filters

Indexing (3)

Tool What it does
reindex_workspace Full or incremental workspace reindex
index_files Add/update specific files without full reindex
index_directory Add directory to graph alongside existing data

Memory (7)

Persistent AI context across sessions — debugging insights, architectural decisions, known issues.

Tool What it does
memory_store / memory_get / memory_search Store, retrieve, search memories (BM25 + semantic)
memory_context Get memories relevant to a file/function
memory_list / memory_invalidate / memory_stats Browse, retire, monitor

Pairs well with Tempera — an episodic memory system that captures transferable debugging strategies and solutions across projects. CodeGraph's memory tools store project-scoped notes; Tempera captures cross-project BKMs (best-known methods) that improve over time.

PR / Change Analysis (1)

Tool What it does
pr_context One-call PR review. Runs git diff against base branch, finds changed functions in the graph, reports: blast radius (callers), test coverage + gaps, affected modules, diff-aware change classification (signature vs body), stale-doc warnings, complexity, commit-message hint, suggested reviewers from git blame.

Documentation (7)

Persistent project documentation — index design docs, search them semantically, verify code matches the design, generate architecture docs from the code graph.

Tool What it does
index_markdown Index a local .md file (ARCHITECTURE.md, API_DESIGN.md, etc.) into the persistent docs store. Heading-tree chunking with leaf-node embeddings.
search_docs Semantic search over indexed docs — returns matching sections with heading-path breadcrumbs
list_doc_sources List all indexed source files
remove_doc_source Remove all indexed chunks from a source file
verify_design Cross-reference doc claims vs code graph. direction=forward (doc→code), reverse (code→doc), or both
design_gaps Find identifiers described in docs that don't exist in code yet — build TODO lists from specs
generate_architecture_doc Auto-generate a structured ARCHITECTURE.md from the live code graph (modules, hot paths, complexity, circular deps)

All tool names are prefixed with codegraph_ (e.g. codegraph_get_ai_context). Tools that target a specific symbol accept uri + line or nodeId from symbol_search results.


Usage examples

Index a design doc and search it:

codegraph_index_markdown(path: "/projects/myapp/docs/ARCHITECTURE.md")
codegraph_search_docs(query: "how does the auth module handle JWT refresh?")

Check if the code matches the design:

codegraph_verify_design(source: "/projects/myapp/docs/ARCHITECTURE.md", direction: "forward")
// → "132/132 identifiers verified, 0 gaps"

Find what's described in docs but not yet implemented:

codegraph_design_gaps(source: "/projects/myapp/docs/API_DESIGN.md")
// → "4 of 12 identifiers not found in code: PaymentService, RefundHandler, ..."

Generate architecture docs from the code graph:

codegraph_generate_architecture_doc(scope: "src/", topN: 5)
// → Markdown with modules, complexity hotspots, hot paths, circular deps

Save a debugging insight for future sessions:

codegraph_memory_store(kind: "debug_context", title: "Nginx body size limit",
  content: "The /upload endpoint fails on payloads > 1MB...",
  problem: "API returns 500 on large uploads",
  solution: "Increase nginx client_max_body_size to 10M",
  agentSource: "claude")

Get AI context with graph compression stats + design doc augmentation:

codegraph_get_ai_context(uri: "file:///projects/myapp/src/auth.rs", line: 42, intent: "modify")
// → Code context + graphStats: {entitiesInGraph: 13555, entitiesTraversed: 47, entitiesKept: 8}
// → design_context section from indexed docs mentioning "auth"

Review a PR — blast radius, test gaps, stale docs, reviewers in one call:

codegraph_pr_context(baseBranch: "main")
// → "PR changes 4 files (+263/-77, 12 functions). 37 direct callers, 8 tests, 3 untested. Risk: medium."
// → test_gaps: [refresh_token, revoke_session] — functions with 0 test callers
// → stale_docs: ["auth.rs described in ARCHITECTURE.md > Authentication — doc may need updating"]
// → suggested_reviewers: [{author: "anvanster", lines_owned: 3200}]
// → commit_hint: "feat(mcp): <describe the change>"

Narrow the tool surface for chatty sessions:

codegraph-server --mcp --profile=core  # Only 8 tools: search + symbol info + AI context

CodeGraph Pro

Additional tools available in CodeGraph Pro:

Tool What it does
scan_security Security vulnerability scan: 40+ dangerous function patterns, source-to-sink taint tracing, auth coverage for HTTP endpoints (7 languages/frameworks), architectural layer violations, weak crypto, hardcoded secrets
analyze_coupling Module coupling metrics and instability scores
find_unused_code Dead code detection with confidence scoring
find_duplicates Detect duplicate/near-duplicate functions
find_similar / cluster_symbols / compare_symbols Embedding-based code similarity
cross_project_search Search across all indexed projects
mine_git_history / mine_git_history_for_file / search_git_history Git history mining and semantic search
security_control_flow Map every execution path through a function — "can this return without hitting the auth check?"
security_trace_data_flow Follow a variable from birth to death — "does user input reach this SQL query?"
security_generate_sbom CycloneDX SBOM from 8 lockfile formats
security_audit_deps OSV vulnerability check on dependencies
security_check_unchecked_returns / _resource_leaks / _misconfig / _input_validation / _error_exposure 5 heuristic analyzers covering ~80% of CWE Top 25
security_scan_iac Docker / Kubernetes / Terraform misconfiguration scan
security_check_licenses Lockfile license policy enforcement (copyleft detection)
security_check_secrets_entropy Shannon-entropy hardcoded-secret detection
security_detect_injection Focused SQL/XSS/cmd/path/deser/template injection detection (20 patterns)
security_check_search_path Untrusted search-path / DLL-hijacking detection (CWE-426/CWE-427)
security_check_crypto Cryptographic misuse: weak ciphers/hashes/PRNG/keys, static IVs, timing-leak comparisons (CWE-208/326-330/338/916, 35 patterns)
security_export_sarif Aggregate findings as SARIF 2.1.0 (GitHub Code Scanning, GitLab SAST)

Cross-cutting features (all security_check_* tools):

  • include_tests / treat_as_production — first-class skip for tests/samples/vendored
  • check_compile_gates — C/C++ findings inside #ifdef X are marked DEFENSIVE_GATED_OFF when X isn't defined by CMake/Cargo/Makefile
  • 25-marker suppression honoring (# nosec, // NOLINT, // codeql[ignore], # rubocop:disable, etc.) at line and function level
  • Telemetry blocks per scan: path_filter (examined/matched/skipped) + compile_gate (gated_off count)

Languages

38 languages parsed via tree-sitter — functions, classes, imports, call graph, complexity metrics, dependency graphs, symbol search, and impact analysis:

Category Languages
Systems C, C++, Rust, Zig, Objective-C
JVM Java, Kotlin, Scala, Groovy, Clojure
Web/Scripting TypeScript/JS, Python, Ruby, PHP, Perl, Lua, Elixir, Elm
Web/Style CSS
Mobile Swift, Dart
Functional Haskell, OCaml, Julia, Erlang, Elm, Clojure
Enterprise C#, COBOL, Fortran, Go
Blockchain Solidity
Shell/Config Bash, HCL/Terraform, TOML, YAML
Hardware Verilog/SystemVerilog, Tcl
Data Science R, Julia

HTTP handler detection: Python (FastAPI/Flask/Django), TypeScript (NestJS), Java (Spring/JAX-RS), Go (stdlib/Gin/Echo/Fiber), C# (ASP.NET), Ruby (Rails), PHP (Laravel/Symfony).


Architecture

MCP Client (Claude, Cursor, ...)        VS Code Extension
        |                                       |
    MCP (stdio)                            LSP Protocol
        |                                       |
        └───────────┐               ┌───────────┘
                    ▼               ▼
            ┌─────────────────────────────┐
            │       codegraph-server      │
            ├─────────────────────────────┤
            │  38 tree-sitter parsers     │
            │  Semantic graph engine      │
            │  AI query engine (BM25)     │
            │  Memory layer (RocksDB)     │
            │  Docs store (RocksDB+HNSW)  │
            │  Full-body embeddings (BGE) │
            │  HNSW vector index          │
            └─────────────────────────────┘

A single Rust binary serves both MCP and LSP protocols.

  • Indexing: ~60 files/sec. Incremental re-indexing on file changes via FNV-1a content hashing.
  • Persistence: Graph and embeddings persist to ~/.codegraph/graph.db (RocksDB). Instant startup on restart — no re-parsing, no re-embedding.
  • Queries: Sub-100ms. Cross-file import and call resolution at index time.
  • Embeddings: Full-body (function bodies captured at parse time, zero disk I/O). Vectors stored in RocksDB alongside the graph. Auto-downloads model on first run.

Building from Source

git clone https://github.com/codegraph-ai/codegraph
cd codegraph
cargo build --release -p codegraph-server    # Rust server
cd vscode && npm install && npm run esbuild  # VS Code extension
npx @vscode/vsce package                     # VSIX

Requires Rust stable, Node.js 18+, VS Code 1.90+.


Support the project

CodeGraph is free, open-source, and maintained by a solo developer. If it saves you time, consider sponsoring on GitHub — it helps keep the project alive and growing.


License

Apache-2.0

About

CodeGraph builds a semantic graph of your codebase — functions, classes, imports, call chains — and exposes it through 42 MCP tools, 38 languages, a VS Code extension, and a persistent memory layer. AI agents get structured code understanding instead of grepping through files.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Morty Proxy This is a proxified and sanitized view of the page, visit original site.