Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

read_geotiff_gpu: warn on CPU fallback, add gpu='strict' (#1516)#1523

Merged
brendancol merged 3 commits intoxarray-contrib:mainxarray-contrib/xarray-spatial:mainfrom
brendancol:fix/1516-gpu-strict-fallbackbrendancol/xarray-spatial:fix/1516-gpu-strict-fallbackCopy head branch name to clipboard
May 8, 2026
Merged

read_geotiff_gpu: warn on CPU fallback, add gpu='strict' (#1516)#1523
brendancol merged 3 commits intoxarray-contrib:mainxarray-contrib/xarray-spatial:mainfrom
brendancol:fix/1516-gpu-strict-fallbackbrendancol/xarray-spatial:fix/1516-gpu-strict-fallbackCopy head branch name to clipboard

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

read_geotiff_gpu wrapped the GPU decode call in try/except Exception: pass, so any failure on the GPU path was silently absorbed and execution fell through to the CPU path. The user-visible result stayed correct, which is why issue #1508 (an AttributeError from cupy.ndarray.byteswap going away) was invisible until tracked down by hand.

This PR keeps the fallback but makes failures observable.

Changes

  • New gpu keyword on read_geotiff_gpu:
    • gpu='auto' (default): existing fallback behaviour, plus a RuntimeWarning that names the original exception type and message. stacklevel=2 so the warning points at the caller.
    • gpu='strict': re-raise the original exception instead of falling back. Useful for tests and CI that exercise the GPU fast path.
    • Unknown values raise ValueError early.
  • The except clause still catches Exception rather than a narrower set. CUDA failures arrive as many distinct types (numba.cuda.cudadrv.driver.CudaAPIError, cupy.cuda.runtime.CUDARuntimeError, plain RuntimeError, ...). BaseException is intentionally not caught so KeyboardInterrupt and SystemExit propagate.
  • Docstring updated to describe both modes.

Tests

xrspatial/geotiff/tests/test_gpu_strict_fallback_1516.py monkeypatches gpu_decode_tiles_from_file to raise a synthetic RuntimeError and asserts:

  • gpu='auto' emits a RuntimeWarning and returns the CPU result.
  • gpu='strict' re-raises.
  • gpu='loose' (or any other value) raises ValueError.

Tests do not require a real GPU. A numpy-backed cupy stub is installed when the real package is unavailable, so they run on CPU-only CI.

Test plan

  • pytest xrspatial/geotiff/tests/test_gpu_strict_fallback_1516.py -x -q -> 3 passed
  • pytest xrspatial/geotiff/tests/test_gpu_byteswap_1508.py -q -> 7 passed (no regression on adjacent tests)
  • Full xrspatial/geotiff/tests/ run shows no new failures (3 pre-existing palette-plot failures are unrelated)

Closes #1516

…rib#1516)

The GPU decode path was wrapped in a too-broad ``try/except Exception:
pass`` that silently fell back to CPU on any error. Real GPU bugs (xarray-contrib#1508
was an AttributeError) lived undetected because the user-visible result
stayed correct.

Behaviour now:

- Default ``gpu='auto'`` keeps the CPU fallback but emits a
  ``RuntimeWarning`` that includes the original exception type and
  message. Stacklevel=2 so the warning points at the caller.
- New ``gpu='strict'`` re-raises the original exception instead of
  falling back. Useful for tests and CI that exercise the GPU fast
  path.
- Unknown ``gpu=`` values raise ``ValueError`` early.

Catch is still ``Exception`` rather than a narrower set: CUDA failures
arrive as many distinct types (``cuda.RuntimeError``,
``CudaAPIError``, ``CUDARuntimeError``, plain ``RuntimeError``, ...).
``BaseException`` is intentionally not caught so KeyboardInterrupt and
SystemExit propagate.

Test ``xrspatial/geotiff/tests/test_gpu_strict_fallback_1516.py``
monkeypatches ``gpu_decode_tiles_from_file`` to raise a synthetic
``RuntimeError`` and asserts the warn-and-fallback and strict-reraise
paths. Tests do not need a real GPU; a ``cupy`` shim is installed when
the real package is unavailable.

Closes xarray-contrib#1516
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 8, 2026
@brendancol brendancol requested a review from Copilot May 8, 2026 16:17
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR makes failures in read_geotiff_gpu’s GPU fast path observable instead of being silently swallowed, while also adding a “strict” mode to force GPU errors to surface (useful for CI/tests that want to exercise the GPU pipeline).

Changes:

  • Add a gpu={'auto','strict'} keyword to read_geotiff_gpu to control CPU fallback vs re-raising GPU exceptions.
  • Emit a RuntimeWarning (with caller-facing stacklevel=2) when the GPU decode fails in gpu='auto' mode.
  • Add regression tests that monkeypatch the GPU decode to raise and assert warning/raise/ValueError behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
xrspatial/geotiff/__init__.py Adds gpu mode selection, warning on fallback, and validation of gpu kwarg in read_geotiff_gpu.
xrspatial/geotiff/tests/test_gpu_strict_fallback_1516.py Adds tests covering gpu='auto' warning+fallback, gpu='strict' re-raise, and invalid kwarg rejection (with a CuPy stub for CPU-only runs).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1557 to +1566
except Exception as e:
if gpu == 'strict':
raise
warnings.warn(
f"read_geotiff_gpu: GPU decode failed "
f"({type(e).__name__}: {e}); falling back to CPU.",
RuntimeWarning,
stacklevel=2,
)
arr_gpu = None
Comment thread xrspatial/geotiff/__init__.py Outdated
Comment thread xrspatial/geotiff/tests/test_gpu_strict_fallback_1516.py Outdated
brendancol added 2 commits May 8, 2026 10:04
- Plumb gpu='strict'/'auto' through the second GPU decode stage too.
  Previously only gpu_decode_tiles_from_file failures honored the mode;
  the gpu_decode_tiles call after CPU-mmap tile extraction silently
  fell back regardless. Strict now raises and auto warns from either
  stage.
- Update the gpu kwarg docstring to describe the actual two-stage
  fallback (GDS GPU decode -> CPU-mmap + GPU decode -> full CPU + transfer).
- Strengthen _ensure_cupy_stub: also stub when cupy is installed but
  CUDA is not available, and restore the original module after each test.
- Add tests covering the second-stage strict re-raise and second-stage
  warning behaviour.
- Document that stripped layouts and sparse-tile files route directly
  to the CPU reader before any GPU decode stage runs, so the gpu kwarg
  does not affect them. cupy.asarray upload failures in those paths
  propagate unchanged.
- Tighten test_default_mode_warns_on_second_stage_failure to assert
  exactly two RuntimeWarning records (one per stage). The previous
  pytest.warns context only verified that >=1 matching warning fired,
  so a regression where one of the two handlers stopped warning would
  not have been caught.
@brendancol brendancol merged commit 6147d0a into xarray-contrib:main May 8, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

read_geotiff_gpu: silent CPU fallback hides GPU bugs

2 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.