CodeGraph

Cross-language code intelligence for AI agents and developers.

CodeGraph builds a semantic graph of your codebase — functions, classes, imports, call chains — and exposes it through 45 MCP tools, a VS Code extension, and a persistent memory layer. Parses 37 languages via tree-sitter. AI agents get structured code understanding instead of grepping through files.

Quick Start

MCP Server (Claude Code, Cursor, any MCP client)

Add to ~/.claude.json (or your MCP client config):

{
  "mcpServers": {
    "codegraph": {
      "command": "/path/to/codegraph-server",
      "args": ["--mcp"]
    }
  }
}

The server indexes the current working directory automatically.

VS Code Extension

Install the VSIX:

code --install-extension codegraph-0.14.0.vsix

The extension starts the server automatically and registers all tools as Language Model Tools for Copilot.

Rules for AI agents

Pre-configured rule files that teach AI coding agents (Claude, Cursor, Windsurf, Codex, Cline) to use CodeGraph MCP tools before falling back to grep / multi-file reads. Maps natural-language intent to the right codegraph_* tool.

→ codegraph-ai/codegraph-rules-for-agents

Setup is cp <agent>/codegraph.md ~/<agent>/ (one line per agent — see the rules repo's README).

GitHub Action — PR review in CI

Drop a workflow into your repo to get an automatic code-graph analysis comment on every PR — blast radius, test gaps, stale docs, suggested reviewers. Runs graph-only (no embeddings, no ONNX model), so it's fast and needs no API keys — just the built-in GITHUB_TOKEN.

Copy .github/workflows/codegraph-pr.yml into your repo. The core invocation is a single command:

codegraph-server --graph-only \
  --run-tool codegraph_pr_context \
  --tool-args '{"baseBranch":"main","format":"markdown"}'

This prints a ready-to-post markdown comment. The --graph-only flag skips embedding generation (10-50× faster indexing); --run-tool runs one tool and exits without the MCP stdio handshake — ideal for scripting.

Configuration

MCP Server flags

Flag	Default	Description
`--workspace <path>`	current dir	Directories to index (repeatable for multi-project)
`--exclude <dir>`	—	Directories to skip (repeatable)
`--embedding-model <model>`	`bge-small`	`bge-small` (384d, fast), `jina-code-v2` (768d, 6× slower), or `granite-97m` (384d, 32K ctx, ~3× slower)
`--full-body-embedding`	`true`	Embed full function body (~50 lines) for better semantic search and duplicate detection
`--max-files <n>`	5000	Maximum files to index
`--profile <name>`	`all`	Filter the exposed MCP tool surface to a named subset (see below)
`--graph-only`	off	Skip embedding generation — build the graph and serve structural tools only. No ONNX model load, 10-50× faster indexing. Semantic search unavailable. For CI / one-shot graph queries.
`--run-tool <name>`	—	One-shot mode: index, run a single tool, print its result, exit. No MCP handshake. Pair with `--tool-args '<json>'`.

`--profile` — narrow the MCP tool surface

The full 32-tool surface is convenient but inflates the agent's prompt-context cost. A profile exposes only the slice you need (also settable via the CODEGRAPH_TOOL_PROFILE env var):

Profile	Tools	Use when
`all` (default)	every tool (community + pro)	normal sessions
`core`	8 — search + symbol info + AI context	chatty agent sessions where you only need lookups
`graph`	16 — callers/callees/deps/impact/traverse	refactoring + structural analysis
`memory`	7 — `codegraph_memory_*` only	note-taking / knowledge-base workflows
`security`	pro security tools only (empty on community)	pro security audits

VS Code settings

{
  "codegraph.indexOnStartup": true,
  "codegraph.indexPaths": ["/path/to/project-a", "/path/to/project-b"],
  "codegraph.excludePatterns": ["**/cmake-build-debug/**", "**/generated/**"],
  "codegraph.embeddingModel": "bge-small",
  "codegraph.maxFileSizeKB": 1024,
  "codegraph.debug": false
}

Full-body embeddings are enabled by default. Function body text is captured at parse time with zero I/O overhead.

Built-in exclusions (always skipped) cover ~47 directories across three categories:

Build / cache: node_modules, target, dist, build, out, .git, __pycache__, vendor, .venv, venv, .tox, .pytest_cache, .mypy_cache, .ruff_cache, .next, .nuxt, .svelte-kit, .parcel-cache, .npm, .yarn, .pnpm-store, .cache, .cargo, .bundle, .gradle, DerivedData, Pods, xcuserdata, cmake-build-*
IDE / IaC state: .idea, .vscode-test, .fleet, .terraform, .terragrunt-cache, .serverless
Sensitive credential dirs: .aws, .ssh, .gnupg, .kube, .docker

Plus glob patterns for binary archives, native libraries, OS metadata, and secret file extensions (*.pem, *.key, *.p12, *.pfx, *.crt, *.gpg, *.kdbx, SSH key conventions like id_rsa, etc.) — defense in depth against accidentally embedding credentials.

Tools (42 community + 27 pro, 17 security)

Code Analysis (11)

Tool	What it does
`get_ai_context`	Primary context tool. Intent-aware (explain/modify/debug/test) with token budgeting. Returns source, related symbols, imports, siblings, debug hints.
`get_edit_context`	Everything needed before editing: source + callers + tests + memories + git history
`get_curated_context`	Cross-codebase context for a natural language query ("how does auth work?")
`analyze_impact`	Blast radius prediction — what breaks if you modify, delete, or rename
`analyze_complexity`	Cyclomatic complexity with breakdown (branches, loops, nesting, exceptions, early returns)
`find_circular_deps`	Detect circular import/dependency chains across files
`find_hot_paths`	Most-called functions ranked by transitive caller count
`find_dead_imports`	Find unused imports — modules imported but never referenced
`get_module_summary`	High-level summary of a directory: file count, functions, language breakdown, top complex functions
`search_by_pattern`	Regex search across function bodies, signatures, names, and docstrings
`search_by_error`	Find functions that throw, catch, or handle specific error types

Code Navigation (13)

Tool	What it does
`symbol_search`	Find symbols by name or natural language (hybrid BM25 + semantic search)
`get_callers` / `get_callees`	Who calls this? What does it call? (with transitive depth)
`get_detailed_symbol`	Full symbol info: source, callers, callees, complexity
`get_symbol_info`	Quick metadata: signature, visibility, kind
`get_dependency_graph`	File/module import relationships with depth control
`get_call_graph`	Function call chains (callers and callees)
`find_by_imports`	Find files importing a module
`find_by_signature`	Search by param count, return type, modifiers
`find_entry_points`	Main functions, HTTP handlers, CLI commands, event handlers
`find_implementors`	Find all functions registered as ops struct callbacks
`find_related_tests`	Tests that exercise a given function
`traverse_graph`	Custom graph traversal with edge/node type filters

Indexing (3)

Tool	What it does
`reindex_workspace`	Full or incremental workspace reindex
`index_files`	Add/update specific files without full reindex
`index_directory`	Add directory to graph alongside existing data

Memory (7)

Persistent AI context across sessions — debugging insights, architectural decisions, known issues.

Tool	What it does
`memory_store` / `memory_get` / `memory_search`	Store, retrieve, search memories (BM25 + semantic)
`memory_context`	Get memories relevant to a file/function
`memory_list` / `memory_invalidate` / `memory_stats`	Browse, retire, monitor

Pairs well with Tempera — an episodic memory system that captures transferable debugging strategies and solutions across projects. CodeGraph's memory tools store project-scoped notes; Tempera captures cross-project BKMs (best-known methods) that improve over time.

PR / Change Analysis (1)

Tool	What it does
`pr_context`	One-call PR review. Runs git diff against base branch, finds changed functions in the graph, reports: blast radius (callers), test coverage + gaps, affected modules, diff-aware change classification (signature vs body), stale-doc warnings, complexity, commit-message hint, suggested reviewers from git blame.

Documentation (7)

Persistent project documentation — index design docs, search them semantically, verify code matches the design, generate architecture docs from the code graph.

Tool	What it does
`index_markdown`	Index a local `.md` file (ARCHITECTURE.md, API_DESIGN.md, etc.) into the persistent docs store. Heading-tree chunking with leaf-node embeddings.
`search_docs`	Semantic search over indexed docs — returns matching sections with heading-path breadcrumbs
`list_doc_sources`	List all indexed source files
`remove_doc_source`	Remove all indexed chunks from a source file
`verify_design`	Cross-reference doc claims vs code graph. `direction=forward` (doc→code), `reverse` (code→doc), or `both`
`design_gaps`	Find identifiers described in docs that don't exist in code yet — build TODO lists from specs
`generate_architecture_doc`	Auto-generate a structured ARCHITECTURE.md from the live code graph (modules, hot paths, complexity, circular deps)

All tool names are prefixed with codegraph_ (e.g. codegraph_get_ai_context). Tools that target a specific symbol accept uri + line or nodeId from symbol_search results.

Usage examples

Index a design doc and search it:

codegraph_index_markdown(path: "/projects/myapp/docs/ARCHITECTURE.md")
codegraph_search_docs(query: "how does the auth module handle JWT refresh?")

Check if the code matches the design:

codegraph_verify_design(source: "/projects/myapp/docs/ARCHITECTURE.md", direction: "forward")
// → "132/132 identifiers verified, 0 gaps"

Find what's described in docs but not yet implemented:

codegraph_design_gaps(source: "/projects/myapp/docs/API_DESIGN.md")
// → "4 of 12 identifiers not found in code: PaymentService, RefundHandler, ..."

Generate architecture docs from the code graph:

codegraph_generate_architecture_doc(scope: "src/", topN: 5)
// → Markdown with modules, complexity hotspots, hot paths, circular deps

Save a debugging insight for future sessions:

codegraph_memory_store(kind: "debug_context", title: "Nginx body size limit",
  content: "The /upload endpoint fails on payloads > 1MB...",
  problem: "API returns 500 on large uploads",
  solution: "Increase nginx client_max_body_size to 10M",
  agentSource: "claude")

Get AI context with graph compression stats + design doc augmentation:

codegraph_get_ai_context(uri: "file:///projects/myapp/src/auth.rs", line: 42, intent: "modify")
// → Code context + graphStats: {entitiesInGraph: 13555, entitiesTraversed: 47, entitiesKept: 8}
// → design_context section from indexed docs mentioning "auth"

Review a PR — blast radius, test gaps, stale docs, reviewers in one call:

codegraph_pr_context(baseBranch: "main")
// → "PR changes 4 files (+263/-77, 12 functions). 37 direct callers, 8 tests, 3 untested. Risk: medium."
// → test_gaps: [refresh_token, revoke_session] — functions with 0 test callers
// → stale_docs: ["auth.rs described in ARCHITECTURE.md > Authentication — doc may need updating"]
// → suggested_reviewers: [{author: "anvanster", lines_owned: 3200}]
// → commit_hint: "feat(mcp): <describe the change>"

Narrow the tool surface for chatty sessions:

codegraph-server --mcp --profile=core  # Only 8 tools: search + symbol info + AI context

CodeGraph Pro

Additional tools available in CodeGraph Pro:

Tool	What it does
`scan_security`	Security vulnerability scan: 40+ dangerous function patterns, source-to-sink taint tracing, auth coverage for HTTP endpoints (7 languages/frameworks), architectural layer violations, weak crypto, hardcoded secrets
`analyze_coupling`	Module coupling metrics and instability scores
`find_unused_code`	Dead code detection with confidence scoring
`find_duplicates`	Detect duplicate/near-duplicate functions
`find_similar` / `cluster_symbols` / `compare_symbols`	Embedding-based code similarity
`cross_project_search`	Search across all indexed projects
`mine_git_history` / `mine_git_history_for_file` / `search_git_history`	Git history mining and semantic search
`security_control_flow`	Map every execution path through a function — "can this return without hitting the auth check?"
`security_trace_data_flow`	Follow a variable from birth to death — "does user input reach this SQL query?"
`security_generate_sbom`	CycloneDX SBOM from 8 lockfile formats
`security_audit_deps`	OSV vulnerability check on dependencies
`security_check_unchecked_returns` / `_resource_leaks` / `_misconfig` / `_input_validation` / `_error_exposure`	5 heuristic analyzers covering ~80% of CWE Top 25
`security_scan_iac`	Docker / Kubernetes / Terraform misconfiguration scan
`security_check_licenses`	Lockfile license policy enforcement (copyleft detection)
`security_check_secrets_entropy`	Shannon-entropy hardcoded-secret detection
`security_detect_injection`	Focused SQL/XSS/cmd/path/deser/template injection detection (20 patterns)
`security_check_search_path`	Untrusted search-path / DLL-hijacking detection (CWE-426/CWE-427)
`security_check_crypto`	Cryptographic misuse: weak ciphers/hashes/PRNG/keys, static IVs, timing-leak comparisons (CWE-208/326-330/338/916, 35 patterns)
`security_export_sarif`	Aggregate findings as SARIF 2.1.0 (GitHub Code Scanning, GitLab SAST)

Cross-cutting features (all security_check_* tools):

include_tests / treat_as_production — first-class skip for tests/samples/vendored
check_compile_gates — C/C++ findings inside #ifdef X are marked DEFENSIVE_GATED_OFF when X isn't defined by CMake/Cargo/Makefile
25-marker suppression honoring (# nosec, // NOLINT, // codeql[ignore], # rubocop:disable, etc.) at line and function level
Telemetry blocks per scan: path_filter (examined/matched/skipped) + compile_gate (gated_off count)

Languages

38 languages parsed via tree-sitter — functions, classes, imports, call graph, complexity metrics, dependency graphs, symbol search, and impact analysis:

Category	Languages
Systems	C, C++, Rust, Zig, Objective-C
JVM	Java, Kotlin, Scala, Groovy, Clojure
Web/Scripting	TypeScript/JS, Python, Ruby, PHP, Perl, Lua, Elixir, Elm
Web/Style	CSS
Mobile	Swift, Dart
Functional	Haskell, OCaml, Julia, Erlang, Elm, Clojure
Enterprise	C#, COBOL, Fortran, Go
Blockchain	Solidity
Shell/Config	Bash, HCL/Terraform, TOML, YAML
Hardware	Verilog/SystemVerilog, Tcl
Data Science	R, Julia

HTTP handler detection: Python (FastAPI/Flask/Django), TypeScript (NestJS), Java (Spring/JAX-RS), Go (stdlib/Gin/Echo/Fiber), C# (ASP.NET), Ruby (Rails), PHP (Laravel/Symfony).

Architecture

MCP Client (Claude, Cursor, ...)        VS Code Extension
        |                                       |
    MCP (stdio)                            LSP Protocol
        |                                       |
        └───────────┐               ┌───────────┘
                    ▼               ▼
            ┌─────────────────────────────┐
            │       codegraph-server      │
            ├─────────────────────────────┤
            │  38 tree-sitter parsers     │
            │  Semantic graph engine      │
            │  AI query engine (BM25)     │
            │  Memory layer (RocksDB)     │
            │  Docs store (RocksDB+HNSW)  │
            │  Full-body embeddings (BGE) │
            │  HNSW vector index          │
            └─────────────────────────────┘

A single Rust binary serves both MCP and LSP protocols.

Indexing: ~60 files/sec. Incremental re-indexing on file changes via FNV-1a content hashing.
Persistence: Graph and embeddings persist to ~/.codegraph/graph.db (RocksDB). Instant startup on restart — no re-parsing, no re-embedding.
Queries: Sub-100ms. Cross-file import and call resolution at index time.
Embeddings: Full-body (function bodies captured at parse time, zero disk I/O). Vectors stored in RocksDB alongside the graph. Auto-downloads model on first run.

Building from Source

git clone https://github.com/codegraph-ai/codegraph
cd codegraph
cargo build --release -p codegraph-server    # Rust server
cd vscode && npm install && npm run esbuild  # VS Code extension
npx @vscode/vsce package                     # VSIX

Requires Rust stable, Node.js 18+, VS Code 1.90+.

Support the project

CodeGraph is free, open-source, and maintained by a solo developer. If it saves you time, consider sponsoring on GitHub — it helps keep the project alive and growing.

License

Apache-2.0

Name	Name	Last commit message	Last commit date
Latest commit History 112 Commits 112 Commits
.cargo	.cargo
.github	.github
crates	crates
docs	docs
mcp-package	mcp-package
scripts	scripts
vs-extension	vs-extension
vscode	vscode
.gitignore	.gitignore
CHANGELOG.md	CHANGELOG.md
Cargo.lock	Cargo.lock
Cargo.toml	Cargo.toml
LICENSE	LICENSE
NOTICE	NOTICE
README.md	README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CodeGraph

Quick Start

MCP Server (Claude Code, Cursor, any MCP client)

VS Code Extension

Rules for AI agents

GitHub Action — PR review in CI

Configuration

MCP Server flags

`--profile` — narrow the MCP tool surface

VS Code settings

Tools (42 community + 27 pro, 17 security)

Code Analysis (11)

Code Navigation (13)

Indexing (3)

Memory (7)

PR / Change Analysis (1)

Documentation (7)

Usage examples

CodeGraph Pro

Languages

Architecture

Building from Source

Support the project

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Search code, repositories, users, issues, pull requests...

Folders and files

Latest commit

History

Repository files navigation

CodeGraph

Quick Start

MCP Server (Claude Code, Cursor, any MCP client)

VS Code Extension

Rules for AI agents

GitHub Action — PR review in CI

Configuration

MCP Server flags

--profile — narrow the MCP tool surface

VS Code settings

Tools (42 community + 27 pro, 17 security)

Code Analysis (11)

Code Navigation (13)

Indexing (3)

Memory (7)

PR / Change Analysis (1)

Documentation (7)

Usage examples

CodeGraph Pro

Languages

Architecture

Building from Source

Support the project

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`--profile` — narrow the MCP tool surface

Packages