read_geotiff_gpu: warn on CPU fallback, add gpu='strict' (#1516)#1523
Merged
brendancol merged 3 commits intoxarray-contrib:mainxarray-contrib/xarray-spatial:mainfrom May 8, 2026
brendancol:fix/1516-gpu-strict-fallbackbrendancol/xarray-spatial:fix/1516-gpu-strict-fallbackCopy head branch name to clipboard
Merged
read_geotiff_gpu: warn on CPU fallback, add gpu='strict' (#1516)#1523brendancol merged 3 commits intoxarray-contrib:mainxarray-contrib/xarray-spatial:mainfrom brendancol:fix/1516-gpu-strict-fallbackbrendancol/xarray-spatial:fix/1516-gpu-strict-fallbackCopy head branch name to clipboard
brendancol merged 3 commits intoxarray-contrib:mainxarray-contrib/xarray-spatial:mainfrom
brendancol:fix/1516-gpu-strict-fallbackbrendancol/xarray-spatial:fix/1516-gpu-strict-fallbackCopy head branch name to clipboard
Conversation
…rib#1516) The GPU decode path was wrapped in a too-broad ``try/except Exception: pass`` that silently fell back to CPU on any error. Real GPU bugs (xarray-contrib#1508 was an AttributeError) lived undetected because the user-visible result stayed correct. Behaviour now: - Default ``gpu='auto'`` keeps the CPU fallback but emits a ``RuntimeWarning`` that includes the original exception type and message. Stacklevel=2 so the warning points at the caller. - New ``gpu='strict'`` re-raises the original exception instead of falling back. Useful for tests and CI that exercise the GPU fast path. - Unknown ``gpu=`` values raise ``ValueError`` early. Catch is still ``Exception`` rather than a narrower set: CUDA failures arrive as many distinct types (``cuda.RuntimeError``, ``CudaAPIError``, ``CUDARuntimeError``, plain ``RuntimeError``, ...). ``BaseException`` is intentionally not caught so KeyboardInterrupt and SystemExit propagate. Test ``xrspatial/geotiff/tests/test_gpu_strict_fallback_1516.py`` monkeypatches ``gpu_decode_tiles_from_file`` to raise a synthetic ``RuntimeError`` and asserts the warn-and-fallback and strict-reraise paths. Tests do not need a real GPU; a ``cupy`` shim is installed when the real package is unavailable. Closes xarray-contrib#1516
There was a problem hiding this comment.
Pull request overview
This PR makes failures in read_geotiff_gpu’s GPU fast path observable instead of being silently swallowed, while also adding a “strict” mode to force GPU errors to surface (useful for CI/tests that want to exercise the GPU pipeline).
Changes:
- Add a
gpu={'auto','strict'}keyword toread_geotiff_gputo control CPU fallback vs re-raising GPU exceptions. - Emit a
RuntimeWarning(with caller-facingstacklevel=2) when the GPU decode fails ingpu='auto'mode. - Add regression tests that monkeypatch the GPU decode to raise and assert warning/raise/ValueError behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
xrspatial/geotiff/__init__.py |
Adds gpu mode selection, warning on fallback, and validation of gpu kwarg in read_geotiff_gpu. |
xrspatial/geotiff/tests/test_gpu_strict_fallback_1516.py |
Adds tests covering gpu='auto' warning+fallback, gpu='strict' re-raise, and invalid kwarg rejection (with a CuPy stub for CPU-only runs). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+1557
to
+1566
| except Exception as e: | ||
| if gpu == 'strict': | ||
| raise | ||
| warnings.warn( | ||
| f"read_geotiff_gpu: GPU decode failed " | ||
| f"({type(e).__name__}: {e}); falling back to CPU.", | ||
| RuntimeWarning, | ||
| stacklevel=2, | ||
| ) | ||
| arr_gpu = None |
- Plumb gpu='strict'/'auto' through the second GPU decode stage too. Previously only gpu_decode_tiles_from_file failures honored the mode; the gpu_decode_tiles call after CPU-mmap tile extraction silently fell back regardless. Strict now raises and auto warns from either stage. - Update the gpu kwarg docstring to describe the actual two-stage fallback (GDS GPU decode -> CPU-mmap + GPU decode -> full CPU + transfer). - Strengthen _ensure_cupy_stub: also stub when cupy is installed but CUDA is not available, and restore the original module after each test. - Add tests covering the second-stage strict re-raise and second-stage warning behaviour.
- Document that stripped layouts and sparse-tile files route directly to the CPU reader before any GPU decode stage runs, so the gpu kwarg does not affect them. cupy.asarray upload failures in those paths propagate unchanged. - Tighten test_default_mode_warns_on_second_stage_failure to assert exactly two RuntimeWarning records (one per stage). The previous pytest.warns context only verified that >=1 matching warning fired, so a regression where one of the two handlers stopped warning would not have been caught.
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
read_geotiff_gpuwrapped the GPU decode call intry/except Exception: pass, so any failure on the GPU path was silently absorbed and execution fell through to the CPU path. The user-visible result stayed correct, which is why issue #1508 (anAttributeErrorfromcupy.ndarray.byteswapgoing away) was invisible until tracked down by hand.This PR keeps the fallback but makes failures observable.
Changes
gpukeyword onread_geotiff_gpu:gpu='auto'(default): existing fallback behaviour, plus aRuntimeWarningthat names the original exception type and message.stacklevel=2so the warning points at the caller.gpu='strict': re-raise the original exception instead of falling back. Useful for tests and CI that exercise the GPU fast path.ValueErrorearly.exceptclause still catchesExceptionrather than a narrower set. CUDA failures arrive as many distinct types (numba.cuda.cudadrv.driver.CudaAPIError,cupy.cuda.runtime.CUDARuntimeError, plainRuntimeError, ...).BaseExceptionis intentionally not caught soKeyboardInterruptandSystemExitpropagate.Tests
xrspatial/geotiff/tests/test_gpu_strict_fallback_1516.pymonkeypatchesgpu_decode_tiles_from_fileto raise a syntheticRuntimeErrorand asserts:gpu='auto'emits aRuntimeWarningand returns the CPU result.gpu='strict're-raises.gpu='loose'(or any other value) raisesValueError.Tests do not require a real GPU. A numpy-backed
cupystub is installed when the real package is unavailable, so they run on CPU-only CI.Test plan
pytest xrspatial/geotiff/tests/test_gpu_strict_fallback_1516.py -x -q-> 3 passedpytest xrspatial/geotiff/tests/test_gpu_byteswap_1508.py -q-> 7 passed (no regression on adjacent tests)xrspatial/geotiff/tests/run shows no new failures (3 pre-existing palette-plot failures are unrelated)Closes #1516