TreeSitter Integration #953

Sep 3, 2025

TheApeMachine
Sep 3, 2025

TreeSitter ideas for Crush

The thinking is that TreeSitter can help Crush build safer, smarter AI-assisted workflows.

Why TreeSitter here?

Structured code view: Instead of raw text, we operate on functions, types, imports, and statements.
Smaller, better context: Feed the LLM only the AST parts that matter, not whole files.
Safer edits: AST-aware searches/renames reduce accidental breakage.

What I am trying out already

Language loader: detect language by filename and parse with TreeSitter.
Query helper: run queries and return captures in document order.
Symbol extraction:
- Go: top-level function extraction
- JS/TS: basic function/arrow-function export extraction
Tools for the agent:
- symbols: list top‑level symbols for a file.
- impact: first‑pass change‑impact analysis. Confirms definition presence and finds references via ripgrep; returns buckets (definitions/imports/call_sites/test_files) with a rough “blast‑radius” score.

Near‑term ideas (incremental, pragmatic)

Context packer tool
- Input: (path, symbol[, radius])
- Output: compact bundle with the symbol’s AST node, nearby helpers, import lines, and N nearest callers.
- Goal: reduce prompt size and model tunnel vision by giving the LLM the right slice of code.
Find‑refs / Go‑to‑def tools (AST‑aware)
- Per language queries for definitions and call sites.
- Reduce false positives vs regex and inform safer edits.
Preflight edit guardrails (agent middleware)
- Before any write/multiedit: call impact (+ context packer) and score blast‑radius.
- Gate large edits: split into stages or ask for approval if risk exceeds threshold.
Postflight verification
- Re‑run impact to ensure no new unresolved refs appeared.
- Build/lint and run tests for impacted packages/files.
- Auto‑revise or roll back last edit on regressions.
Better classification of references
- Distinguish imports, interface impls, method calls, and tests per language.
- Prioritize what the LLM should read/update first.

Medium‑term improvements

AST‑aware refactors
- Rename symbols, change function signatures, or move functions across files using TreeSitter transforms, paired with existing edit tooling.
Semantic chunking for RAG
- Index code by AST units (functions/types/modules), not lines, to improve retrieval quality.
Structural diff summaries
- Summarize changes by AST (added function, changed params, removed branch) for PR descriptions and agent memory.
Test scaffolding
- Generate table‑driven test skeletons from exported units and public APIs.
Style/security queries
- Detect risky patterns (string‑concat SQL, unchecked errors, unsafe exec) and propose targeted fixes.

Longer‑term experiments

Cross‑language dependency mapping
- Relate backend endpoints to frontend callers and tests; navigate across boundaries during edits.
Prompt budget optimizer
- Convert code to minimal AST summaries (names, signatures, key literals) for quick overviews, expand only when needed.

This would work quite well also in combination with GraphRAG, a while back I experimented with using TreeSitter to analyze our platform at work, consisting of multiple codebases, using various languages, and building up a connected graph in Neo4J (though any graph database would work).

The benefit there is that relationships become first-class citizens in the context, and graph-based algorithms can be used for all kinds of things, such as finding the shortest path between two nodes, finding the most similar nodes to a given node, or even community detection.

Just some ideas, not even to sure how viable/useful they are. I'll probably give some of these a try anyway.

TheApeMachine · Sep 3, 2025

meowgorithm
Sep 3, 2025
Maintainer

This is a great idea, and totally on our radar.

1 reply

TheApeMachine Sep 8, 2025
Author

If it helps, I have done some experimenting here: https://github.com/TheApeMachine/crush/tree/feature/treesitter-integration, which I will likely continue for a little while. I fear that I may not be doing everything correctly, or might not be doing everything aligned with your team's vision, so I am just considering this as experimentation, likely not a realistic implementation.

Oct 9, 2025

nickchomey
Oct 9, 2025

This would be fantastic. I dont know if this helps you at all, but RooCode and KiloCode (same thing) use tree-sitter for various things, including a codebase indexer that stores embeddings (genreated via whatever embedding provider youve selected) in qdrant. It works quite well.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TreeSitter Integration #953

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments · 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Search code, repositories, users, issues, pull requests...

TreeSitter Integration #953

Uh oh!

TheApeMachine Sep 3, 2025

TreeSitter ideas for Crush

Why TreeSitter here?

What I am trying out already

Near‑term ideas (incremental, pragmatic)

Medium‑term improvements

Longer‑term experiments

Replies: 2 comments · 1 reply

Uh oh!

meowgorithm Sep 3, 2025 Maintainer

Uh oh!

TheApeMachine Sep 8, 2025 Author

Uh oh!

nickchomey Oct 9, 2025

TheApeMachine
Sep 3, 2025

meowgorithm
Sep 3, 2025
Maintainer

TheApeMachine Sep 8, 2025
Author

nickchomey
Oct 9, 2025