Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

fix(kernel): cursor-state tracking + metadata parity for use_kernel#838

Open
vikrantpuppala wants to merge 1 commit into
maindatabricks/databricks-sql-python:mainfrom
feat/use-kernel-cuj-batch3-fixesdatabricks/databricks-sql-python:feat/use-kernel-cuj-batch3-fixesCopy head branch name to clipboard
Open

fix(kernel): cursor-state tracking + metadata parity for use_kernel#838
vikrantpuppala wants to merge 1 commit into
maindatabricks/databricks-sql-python:mainfrom
feat/use-kernel-cuj-batch3-fixesdatabricks/databricks-sql-python:feat/use-kernel-cuj-batch3-fixesCopy head branch name to clipboard

Conversation

@vikrantpuppala
Copy link
Copy Markdown
Contributor

What

Brings the use_kernel=True path closer to Thrift parity on the cursor-state-tracking (T7) and metadata clusters from the API+CUJ gap audit, and surfaces DML rowcount as an enhancement. This is batch 3 of the use_kernel parity work (follows #824 / #830).

Depends on kernel PR databricks/databricks-sql-kernel#139 (case-insensitive table_types, HTTP error-envelope parsing, num_modified_rows pyo3 getter). KERNEL_REV here points at that PR's branch HEAD; it will be re-pinned to the squash SHA before this PR merges.

Cursor-state tracking (T7)

  • Metadata no longer leaves active_command_id stale. Set it in the single _make_result_set chokepoint so every result-producing path — sync execute, async fetch, AND metadata — leaves the cursor pointing at the command that produced the current result set (matches Thrift's unconditional set in _handle_execute_response).
  • Failed sync execute publishes the statement id. Read the server-issued id from the canceller's inflight slot and set active_command_id before re-raising, so a failed query is still correlatable (telemetry / log lookup). Best-effort; never masks the original error; untouched on a pre-id transport failure.
  • get_async_execution_result() is re-callable again. Removed the eager close/drop of the async handle after the first await_result() — the kernel's await_result() is idempotent, so the handle stays tracked until close_command/close_session (Thrift parity). The prior eager close made a second call raise.
  • _closed_commands is bounded. Now an OrderedDict FIFO capped at 10,000 (_record_closed) instead of an unbounded set.

Metadata

  • Wildcard/blank catalog → all-catalogs. '%' / '*' / '' / None normalize to None across get_schemas/get_tables/get_columns (_catalog_or_none), making the three symmetric (fixes columns, which treated catalog as an exact identifier). JDBC exact-or-all semantics.
  • Empty-string filter → match-all. Whitespace-only pattern args normalize to None (_none_if_blank), mapping the kernel's InvalidArgument-on-"" to Thrift's effective match-all. %/* stay as real LIKE wildcards on patterns.
  • Dropped the table_types connector drain. The kernel now filters table_types itself (case-insensitively, kernel Bump version to 2.6.0 #139), so get_tables forwards it and returns the stream unchanged — restores streaming and removes the duplicated case-sensitive client-side filter. Deleted the now-dead _drain_kernel_handle / _StaticArrowHandle helpers + unused imports.
  • Fixed the Cursor.columns() docstringcatalog_name=None is accepted on all backends (not rejected with ProgrammingError).

DML rowcount (enhancement)

  • Surface the kernel's num_modified_rows as cursor.rowcount for DML instead of the hardcoded -1. None (SELECT, or warehouses that don't report it) leaves it at -1, matching Thrift. NB this makes the kernel path exceed Thrift, which also hardcodes -1. getattr-guarded against an older wheel.

Testing

  • Unit: tests/unit/test_kernel_client.py — added/updated tests for A–D (active_command_id chokepoint + failure-path + bounded _closed_commands + re-callable get_execution_result), metadata normalization (wildcard catalog, blank pattern, kernel-side table_types forwarding), and num_modified_rowsrowcount. 115 pass.
  • E2E (tests/e2e/test_kernel_backend.py, live dogfood): empty-string filter match-all; case-insensitive table_types (self-contained create-table+view, lowercase view / uppercase TABLE); active_command_id published after a metadata call; DML rowcount wiring doesn't break DML. 4 pass in ~20s.
  • black (pinned 22.3.0) clean.

Brings the use_kernel=True path closer to Thrift parity on the
cursor-state-tracking and metadata clusters from the API+CUJ gap
audit, and surfaces DML rowcount.

Cursor-state tracking (T7):
- Set cursor.active_command_id in the _make_result_set chokepoint so
  every result-producing path (sync execute, async fetch, AND metadata)
  leaves the cursor pointing at the command that produced the current
  result set, matching Thrift's unconditional set in
  _handle_execute_response. Previously metadata calls left it stale.
- On a failed sync execute, publish the server-issued statement id
  (read from the canceller's inflight slot) onto active_command_id
  before re-raising, so the cursor can correlate the FAILED query.
  Best-effort; never masks the original failure; left untouched on a
  pre-id transport failure.
- get_execution_result no longer eagerly closes/drops the async handle
  after the first await_result. The kernel's await_result is idempotent
  and re-callable, so the handle stays tracked until close_command /
  close_session (Thrift parity: re-fetchable until explicit close). The
  prior eager close made a second call raise.
- _closed_commands is now a bounded FIFO (OrderedDict capped at 10_000
  via _record_closed) so it can't grow unbounded on a long-lived
  session.

Metadata:
- Normalize a wildcard/blank catalog ('%' / '*' / '' / None) to None
  (all-catalogs) across get_schemas/get_tables/get_columns via
  _catalog_or_none, making the three symmetric (fixes columns, which
  previously treated catalog as an exact identifier).
- Normalize empty/whitespace-only pattern args to None (match-all) via
  _none_if_blank, mapping the kernel's InvalidArgument-on-"" to Thrift's
  effective match-all. % / * stay as real LIKE wildcards on patterns.
- Drop the connector-side table_types drain + client-side refilter in
  get_tables; the kernel now filters table_types itself
  (case-insensitively as of the batch-3 kernel change), so we forward it
  and let the kernel do the work — restores streaming, removes the
  duplicated case-sensitive filter. Removes the now-dead
  _drain_kernel_handle / _StaticArrowHandle helpers and unused imports.
- Fix the Cursor.columns() docstring: catalog_name=None is accepted on
  all backends (kernel issues SHOW COLUMNS across all catalogs), not
  rejected with ProgrammingError.

DML rowcount:
- Surface the kernel's num_modified_rows as cursor.rowcount for DML
  (INSERT/UPDATE/DELETE/MERGE) instead of the hardcoded -1. None (SELECT,
  or warehouses that don't report it) leaves rowcount at -1, matching the
  Thrift backend; getattr-guarded against an older kernel wheel.

Bumps KERNEL_REV to the batch-3 kernel commit (case-insensitive
table_types, HTTP error-envelope parsing, num_modified_rows pyo3 getter).

Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Morty Proxy This is a proxified and sanitized view of the page, visit original site.