fix(kernel): cursor-state tracking + metadata parity for use_kernel#838
Open
vikrantpuppala wants to merge 1 commit into
maindatabricks/databricks-sql-python:mainfrom
feat/use-kernel-cuj-batch3-fixesdatabricks/databricks-sql-python:feat/use-kernel-cuj-batch3-fixesCopy head branch name to clipboard
Open
fix(kernel): cursor-state tracking + metadata parity for use_kernel#838vikrantpuppala wants to merge 1 commit intomaindatabricks/databricks-sql-python:mainfrom feat/use-kernel-cuj-batch3-fixesdatabricks/databricks-sql-python:feat/use-kernel-cuj-batch3-fixesCopy head branch name to clipboard
vikrantpuppala wants to merge 1 commit into
maindatabricks/databricks-sql-python:mainfrom
feat/use-kernel-cuj-batch3-fixesdatabricks/databricks-sql-python:feat/use-kernel-cuj-batch3-fixesCopy head branch name to clipboard
Conversation
Brings the use_kernel=True path closer to Thrift parity on the
cursor-state-tracking and metadata clusters from the API+CUJ gap
audit, and surfaces DML rowcount.
Cursor-state tracking (T7):
- Set cursor.active_command_id in the _make_result_set chokepoint so
every result-producing path (sync execute, async fetch, AND metadata)
leaves the cursor pointing at the command that produced the current
result set, matching Thrift's unconditional set in
_handle_execute_response. Previously metadata calls left it stale.
- On a failed sync execute, publish the server-issued statement id
(read from the canceller's inflight slot) onto active_command_id
before re-raising, so the cursor can correlate the FAILED query.
Best-effort; never masks the original failure; left untouched on a
pre-id transport failure.
- get_execution_result no longer eagerly closes/drops the async handle
after the first await_result. The kernel's await_result is idempotent
and re-callable, so the handle stays tracked until close_command /
close_session (Thrift parity: re-fetchable until explicit close). The
prior eager close made a second call raise.
- _closed_commands is now a bounded FIFO (OrderedDict capped at 10_000
via _record_closed) so it can't grow unbounded on a long-lived
session.
Metadata:
- Normalize a wildcard/blank catalog ('%' / '*' / '' / None) to None
(all-catalogs) across get_schemas/get_tables/get_columns via
_catalog_or_none, making the three symmetric (fixes columns, which
previously treated catalog as an exact identifier).
- Normalize empty/whitespace-only pattern args to None (match-all) via
_none_if_blank, mapping the kernel's InvalidArgument-on-"" to Thrift's
effective match-all. % / * stay as real LIKE wildcards on patterns.
- Drop the connector-side table_types drain + client-side refilter in
get_tables; the kernel now filters table_types itself
(case-insensitively as of the batch-3 kernel change), so we forward it
and let the kernel do the work — restores streaming, removes the
duplicated case-sensitive filter. Removes the now-dead
_drain_kernel_handle / _StaticArrowHandle helpers and unused imports.
- Fix the Cursor.columns() docstring: catalog_name=None is accepted on
all backends (kernel issues SHOW COLUMNS across all catalogs), not
rejected with ProgrammingError.
DML rowcount:
- Surface the kernel's num_modified_rows as cursor.rowcount for DML
(INSERT/UPDATE/DELETE/MERGE) instead of the hardcoded -1. None (SELECT,
or warehouses that don't report it) leaves rowcount at -1, matching the
Thrift backend; getattr-guarded against an older kernel wheel.
Bumps KERNEL_REV to the batch-3 kernel commit (case-insensitive
table_types, HTTP error-envelope parsing, num_modified_rows pyo3 getter).
Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Brings the
use_kernel=Truepath closer to Thrift parity on the cursor-state-tracking (T7) and metadata clusters from the API+CUJ gap audit, and surfaces DML rowcount as an enhancement. This is batch 3 of the use_kernel parity work (follows #824 / #830).Cursor-state tracking (T7)
active_command_idstale. Set it in the single_make_result_setchokepoint so every result-producing path — sync execute, async fetch, AND metadata — leaves the cursor pointing at the command that produced the current result set (matches Thrift's unconditional set in_handle_execute_response).active_command_idbefore re-raising, so a failed query is still correlatable (telemetry / log lookup). Best-effort; never masks the original error; untouched on a pre-id transport failure.get_async_execution_result()is re-callable again. Removed the eager close/drop of the async handle after the firstawait_result()— the kernel'sawait_result()is idempotent, so the handle stays tracked untilclose_command/close_session(Thrift parity). The prior eager close made a second call raise._closed_commandsis bounded. Now anOrderedDictFIFO capped at 10,000 (_record_closed) instead of an unbounded set.Metadata
'%'/'*'/''/Nonenormalize toNoneacrossget_schemas/get_tables/get_columns(_catalog_or_none), making the three symmetric (fixescolumns, which treated catalog as an exact identifier). JDBC exact-or-all semantics.None(_none_if_blank), mapping the kernel'sInvalidArgument-on-""to Thrift's effective match-all.%/*stay as real LIKE wildcards on patterns.table_typesconnector drain. The kernel now filterstable_typesitself (case-insensitively, kernel Bump version to 2.6.0 #139), soget_tablesforwards it and returns the stream unchanged — restores streaming and removes the duplicated case-sensitive client-side filter. Deleted the now-dead_drain_kernel_handle/_StaticArrowHandlehelpers + unused imports.Cursor.columns()docstring —catalog_name=Noneis accepted on all backends (not rejected withProgrammingError).DML rowcount (enhancement)
num_modified_rowsascursor.rowcountfor DML instead of the hardcoded-1.None(SELECT, or warehouses that don't report it) leaves it at-1, matching Thrift. NB this makes the kernel path exceed Thrift, which also hardcodes-1.getattr-guarded against an older wheel.Testing
tests/unit/test_kernel_client.py— added/updated tests for A–D (active_command_id chokepoint + failure-path + bounded_closed_commands+ re-callableget_execution_result), metadata normalization (wildcard catalog, blank pattern, kernel-sidetable_typesforwarding), andnum_modified_rows→rowcount. 115 pass.tests/e2e/test_kernel_backend.py, live dogfood): empty-string filter match-all; case-insensitivetable_types(self-contained create-table+view, lowercaseview/ uppercaseTABLE);active_command_idpublished after a metadata call; DML rowcount wiring doesn't break DML. 4 pass in ~20s.black(pinned 22.3.0) clean.