fix(kernel): cursor-state tracking + metadata parity for use_kernel by vikrantpuppala · Pull Request #838 · databricks/databricks-sql-python

vikrantpuppala · Jun 6, 2026

What

Brings the use_kernel=True path closer to Thrift parity on the cursor-state-tracking (T7) and metadata clusters from the API+CUJ gap audit, and surfaces DML rowcount as an enhancement. This is batch 3 of the use_kernel parity work (follows #824 / #830).

Depends on kernel PR databricks/databricks-sql-kernel#139 (case-insensitive table_types, HTTP error-envelope parsing, num_modified_rows pyo3 getter). KERNEL_REV here points at that PR's branch HEAD; it will be re-pinned to the squash SHA before this PR merges.

Cursor-state tracking (T7)

Metadata no longer leaves active_command_id stale. Set it in the single _make_result_set chokepoint so every result-producing path — sync execute, async fetch, AND metadata — leaves the cursor pointing at the command that produced the current result set (matches Thrift's unconditional set in _handle_execute_response).
Failed sync execute publishes the statement id. Read the server-issued id from the canceller's inflight slot and set active_command_id before re-raising, so a failed query is still correlatable (telemetry / log lookup). Best-effort; never masks the original error; untouched on a pre-id transport failure.
get_async_execution_result() is re-callable again. Removed the eager close/drop of the async handle after the first await_result() — the kernel's await_result() is idempotent, so the handle stays tracked until close_command/close_session (Thrift parity). The prior eager close made a second call raise.
_closed_commands is bounded. Now an OrderedDict FIFO capped at 10,000 (_record_closed) instead of an unbounded set.

Metadata

Wildcard/blank catalog → all-catalogs. '%' / '*' / '' / None normalize to None across get_schemas/get_tables/get_columns (_catalog_or_none), making the three symmetric (fixes columns, which treated catalog as an exact identifier). JDBC exact-or-all semantics.
Empty-string filter → match-all. Whitespace-only pattern args normalize to None (_none_if_blank), mapping the kernel's InvalidArgument-on-"" to Thrift's effective match-all. %/* stay as real LIKE wildcards on patterns.
Dropped the table_types connector drain. The kernel now filters table_types itself (case-insensitively, kernel Bump version to 2.6.0 #139), so get_tables forwards it and returns the stream unchanged — restores streaming and removes the duplicated case-sensitive client-side filter. Deleted the now-dead _drain_kernel_handle / _StaticArrowHandle helpers + unused imports.
Fixed the Cursor.columns() docstring — catalog_name=None is accepted on all backends (not rejected with ProgrammingError).

DML rowcount (enhancement)

Surface the kernel's num_modified_rows as cursor.rowcount for DML instead of the hardcoded -1. None (SELECT, or warehouses that don't report it) leaves it at -1, matching Thrift. NB this makes the kernel path exceed Thrift, which also hardcodes -1. getattr-guarded against an older wheel.

Testing

Unit: tests/unit/test_kernel_client.py — added/updated tests for A–D (active_command_id chokepoint + failure-path + bounded _closed_commands + re-callable get_execution_result), metadata normalization (wildcard catalog, blank pattern, kernel-side table_types forwarding), and num_modified_rows→rowcount. 115 pass.
E2E (tests/e2e/test_kernel_backend.py, live dogfood): empty-string filter match-all; case-insensitive table_types (self-contained create-table+view, lowercase view / uppercase TABLE); active_command_id published after a metadata call; DML rowcount wiring doesn't break DML. 4 pass in ~20s.
black (pinned 22.3.0) clean.

Brings the use_kernel=True path closer to Thrift parity on the cursor-state-tracking and metadata clusters from the API+CUJ gap audit, and surfaces DML rowcount. Cursor-state tracking (T7): - Set cursor.active_command_id in the _make_result_set chokepoint so every result-producing path (sync execute, async fetch, AND metadata) leaves the cursor pointing at the command that produced the current result set, matching Thrift's unconditional set in _handle_execute_response. Previously metadata calls left it stale. - On a failed sync execute, publish the server-issued statement id (read from the canceller's inflight slot) onto active_command_id before re-raising, so the cursor can correlate the FAILED query. Best-effort; never masks the original failure; left untouched on a pre-id transport failure. - get_execution_result no longer eagerly closes/drops the async handle after the first await_result. The kernel's await_result is idempotent and re-callable, so the handle stays tracked until close_command / close_session (Thrift parity: re-fetchable until explicit close). The prior eager close made a second call raise. - _closed_commands is now a bounded FIFO (OrderedDict capped at 10_000 via _record_closed) so it can't grow unbounded on a long-lived session. Metadata: - Normalize a wildcard/blank catalog ('%' / '*' / '' / None) to None (all-catalogs) across get_schemas/get_tables/get_columns via _catalog_or_none, making the three symmetric (fixes columns, which previously treated catalog as an exact identifier). - Normalize empty/whitespace-only pattern args to None (match-all) via _none_if_blank, mapping the kernel's InvalidArgument-on-"" to Thrift's effective match-all. % / * stay as real LIKE wildcards on patterns. - Drop the connector-side table_types drain + client-side refilter in get_tables; the kernel now filters table_types itself (case-insensitively as of the batch-3 kernel change), so we forward it and let the kernel do the work — restores streaming, removes the duplicated case-sensitive filter. Removes the now-dead _drain_kernel_handle / _StaticArrowHandle helpers and unused imports. - Fix the Cursor.columns() docstring: catalog_name=None is accepted on all backends (kernel issues SHOW COLUMNS across all catalogs), not rejected with ProgrammingError. DML rowcount: - Surface the kernel's num_modified_rows as cursor.rowcount for DML (INSERT/UPDATE/DELETE/MERGE) instead of the hardcoded -1. None (SELECT, or warehouses that don't report it) leaves rowcount at -1, matching the Thrift backend; getattr-guarded against an older kernel wheel. Bumps KERNEL_REV to the batch-3 kernel commit (case-insensitive table_types, HTTP error-envelope parsing, num_modified_rows pyo3 getter). Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>

vikrantpuppala temporarily deployed to azure-prod June 6, 2026 13:27 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(kernel): cursor-state tracking + metadata parity for use_kernel#838

fix(kernel): cursor-state tracking + metadata parity for use_kernel#838
vikrantpuppala wants to merge 1 commit into
maindatabricks/databricks-sql-python:mainfrom
feat/use-kernel-cuj-batch3-fixesdatabricks/databricks-sql-python:feat/use-kernel-cuj-batch3-fixesCopy head branch name to clipboard

vikrantpuppala commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Search code, repositories, users, issues, pull requests...

Conversation

vikrantpuppala commented Jun 6, 2026

What

Cursor-state tracking (T7)

Metadata

DML rowcount (enhancement)

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant