Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Add query provenance tracking to DSL v2 and legacy DSL#988

Merged
SkBlaz merged 3 commits intomasterSkBlaz/py3plex:masterfrom
copilot/add-provenance-to-queryresultSkBlaz/py3plex:copilot/add-provenance-to-queryresultCopy head branch name to clipboard
Jan 5, 2026
Merged

Add query provenance tracking to DSL v2 and legacy DSL#988
SkBlaz merged 3 commits intomasterSkBlaz/py3plex:masterfrom
copilot/add-provenance-to-queryresultSkBlaz/py3plex:copilot/add-provenance-to-queryresultCopy head branch name to clipboard

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Jan 5, 2026

Query results now include execution provenance for reproducibility, debugging, and performance analysis. Every QueryResult (DSL v2) and result dict (legacy) contains meta["provenance"] with AST hash, timing breakdown, network fingerprint, and execution metadata.

Changes

Core Infrastructure (py3plex/dsl/provenance.py)

  • ProvenanceRecord dataclass with stable schema: engine, version, timestamp, network fingerprint, query AST hash, performance timings, backend info, warnings
  • Stable AST fingerprinting via canonical serialization (excludes object IDs, timestamps)
  • Network fingerprinting: node/edge/layer counts, layer names
  • ProvenanceBuilder for incremental construction during execution

DSL v2 Instrumentation (py3plex/dsl/executor.py)

  • Stage-level timing in execute_ast() and _execute_select(): bind_parameters, get_items, filter_layers, filter_where, compute, group_aggregate, limit, materialize
  • Populates QueryResult.meta["provenance"] with complete metadata
  • ProvenanceBuilder passed through execution pipeline

Legacy DSL Instrumentation (py3plex/dsl_legacy.py)

  • Provenance tracking in execute_query() with same structure as v2
  • Tracks parse time and execution time separately
  • Stores raw query string and computed hash

Documentation (AGENTS.md)

  • Updated Result Dictionary Structure section with provenance schema
  • Example payloads and access patterns
  • Backward compatibility guarantees

Tests (tests/test_dsl_provenance.py)

  • 23 tests covering: provenance presence/structure, AST hash stability, timing validity, backward compatibility
  • Validates both DSL v2 and legacy implementations

Example Usage

from py3plex.dsl import Q

# Execute query
result = Q.nodes().compute("degree", "betweenness").execute(network)

# Access provenance
prov = result.meta["provenance"]
print(f"Query took {prov['performance']['total_ms']:.2f}ms")
print(f"AST hash: {prov['query']['ast_hash']}")
print(f"Network: {prov['network_fingerprint']['node_count']} nodes")

# Provenance structure
{
  "engine": "dsl_v2_executor",
  "py3plex_version": "1.1.0",
  "timestamp_utc": "2026-01-05T04:20:34.736044+00:00",
  "network_fingerprint": {"node_count": 100, "edge_count": 250, "layer_count": 3, "layers": [...]},
  "query": {"target": "nodes", "ast_hash": "5b2b7d9b3312d929", "ast_summary": "SELECT nodes COMPUTE ...", "params": {}},
  "backend": {"graph_backend": "networkx"},
  "performance": {"get_items": 2.5, "compute": 45.6, "total_ms": 53.5}
}

Backward Compatibility

All existing code works unchanged. QueryResult iteration, exports (to_pandas(), to_networkx()), and attribute access remain unaffected. Provenance is additive metadata.

Original prompt

This section details on the original issue you should resolve

<issue_title>query provenance</issue_title>
<issue_description>

Add provenance + performance tracing to QueryResult in py3plex so every result can report how it was computed (AST, params, seeds, backend path, timings, cache hits), without breaking existing APIs.

Hard constraints

Preserve backward compatibility: existing code using QueryResult must keep working.

No new required parameters in user-facing APIs.

Store provenance in QueryResult.meta (or a compatible nested structure) with stable keys.

❌ Do NOT create new markdown files.

✅ Update AGENTS.md in place to reflect the new QueryResult.meta["provenance"] structure.


  1. Define a provenance schema (internal, stable)

Add an internal dataclass/TypedDict (e.g., ProvenanceRecord) with fields:

engine: "dsl_v2_executor" | "dsl_legacy" | "graph_ops" | "pipeline_step" | ...

py3plex_version

timestamp_utc (ISO8601)

network_fingerprint: lightweight (node_count, edge_count, layer_count, optional hash if available)

network_version: mutation counter if available; else None

query:

target: nodes/edges/communities/paths

ast_hash: stable hash of AST (or compiled plan)

ast_summary: short human-readable summary (no huge dumps)

params: normalized params (gamma, ci, n_samples, etc.)

randomness:

seed: base seed if any

child_seeds: optional list or seed-derivation descriptor (avoid huge arrays)

backend:

graph_backend: "networkx" (future-proof)

algo_backend: module/function identifiers used

fast_path: bool + reason

performance:

timings_ms: dict per stage (parse/compile/filter/compute/group/aggregate/format)

cache: hits/misses + keys used (redact large)

warnings: list of strings (e.g., “approximation used”, “fallback to serial”)

  1. Instrument DSL v2 execution pipeline

In DSL v2 executor:

start a timer at execute() entry

time stages: compile→filter→compute→group/agg→materialize

attach provenance into QueryResult.meta["provenance"]

Ensure zero/low overhead when tracing is minimal (simple time.monotonic calls only).

  1. Instrument legacy DSL execution

Add provenance fields:

engine="dsl_legacy"

query.raw_string

parse_time_ms, exec_time_ms

Same meta["provenance"] shape as v2 where possible.

  1. Instrument graph_ops + pipelines (best-effort)

graph_ops:

when materializing (to_pandas, to_subgraph), attach provenance if a QueryResult exists or create a minimal provenance meta object.

pipelines:

when steps produce a QueryResult-like artifact, propagate/merge provenance (append step info).

  1. Stable AST hash + summary

Implement ast_fingerprint(ast):

stable across process runs (avoid object ids)

include node types + normalized params only

Implement ast_summary(ast):

short: target, layer selection, predicates, computes, grouping/agg, limits

  1. Network fingerprint + optional mutation counter

If network already has versioning, use it.

Otherwise implement lightweight fingerprint:

counts + layer names (optional)

optionally a cheap hash of sorted layers and a sample of edges (avoid O(E log E) on huge graphs)

  1. Cache reporting hooks (if caches exist)

When metric/result caches are used:

record cache.hit, cache.miss, cache_keys (truncate)

  1. Tests (required)

Ensure QueryResult still serializes/behaves as before.

Add tests that:

meta["provenance"] exists after DSL v2 execute

keys are present and types are correct

timings are non-negative

AST hash is stable across identical queries

legacy DSL also populates provenance

  1. Documentation (AGENTS.md only)

Update AGENTS.md in place:

document QueryResult.meta["provenance"] keys and example payload

clarify it’s additive, backward compatible

❌ No new markdown files

Definition of done

Every query execution returns a QueryResult with meta["provenance"] populated (v2 + legacy).

Provenance is stable, lightweight, and does not break existing code.

Tests cover presence, stability, and basic correctness.</issue_description>

Comments on the Issue (you are @copilot in this section)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI self-assigned this Jan 5, 2026
Copilot AI and others added 2 commits January 5, 2026 04:19
- Implement provenance schema in py3plex/dsl/provenance.py with ProvenanceRecord dataclass
- Add timing instrumentation to DSL v2 executor (execute_ast and _execute_select)
- Add provenance tracking to legacy DSL execute_query function
- All provenance stored in QueryResult.meta["provenance"] with stable schema
- Add comprehensive tests in tests/test_dsl_provenance.py (23 passing)
- Update AGENTS.md documentation with provenance structure and examples
- Provenance includes: engine, version, timestamp, network fingerprint, query hash, performance timings, backend info
- Fully backward compatible - existing code continues to work without changes

Co-authored-by: SkBlaz <10035780+SkBlaz@users.noreply.github.com>
- Create example_provenance_demo.py demonstrating query provenance features
- Shows DSL v2 and legacy DSL provenance tracking
- Demonstrates JSON export of provenance metadata
- Includes performance timing breakdown and network fingerprinting

Co-authored-by: SkBlaz <10035780+SkBlaz@users.noreply.github.com>
Copilot AI changed the title [WIP] Add provenance and performance tracing to QueryResult Add query provenance tracking to DSL v2 and legacy DSL Jan 5, 2026
Copilot AI requested a review from SkBlaz January 5, 2026 04:23
@SkBlaz SkBlaz marked this pull request as ready for review January 5, 2026 15:52
@SkBlaz SkBlaz merged commit db3a462 into master Jan 5, 2026
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

query provenance

2 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.