Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

feat(raster): view machinery for non-identity band views#813

Draft
james-willis wants to merge 6 commits intoapache:mainapache/sedona-db:mainfrom
james-willis:jw/nd-raster-viewsjames-willis/sedona-db:jw/nd-raster-viewsCopy head branch name to clipboard
Draft

feat(raster): view machinery for non-identity band views#813
james-willis wants to merge 6 commits intoapache:mainapache/sedona-db:mainfrom
james-willis:jw/nd-raster-viewsjames-willis/sedona-db:jw/nd-raster-viewsCopy head branch name to clipboard

Conversation

@james-willis
Copy link
Copy Markdown
Contributor

Summary

Reintroduces view machinery on top of #749 so callers can construct and read bands with non-identity view entries. #749 itself only writes the canonical identity (null view row) and rejects non-null views at read time; this PR re-enables the composition path that the slice/manipulation functions in #750 depend on.

Stack

What's in this PR

Single commit, three files:

  • rust/sedona-raster/src/traits.rs
    • validate_view() + 22 unit tests (length checks, axis range, negative step, broadcast / step=0, i64 overflow guards, etc.)
  • rust/sedona-raster/src/builder.rs
    • start_band_with_view() builder API
    • ~12 view-construction tests: slice, broadcast, transpose, negative-step, multi-band mixed views, Arrow IPC round-trip
  • rust/sedona-raster/src/array.rs
    • View → byte-stride composition in nd_buffer()
    • View-aware contiguous_data() Cow::Owned strided-copy slow path
    • Reader tests for explicit views (negative steps, OOB axis, length mismatch, malformed view rejection)

Behavior change vs #749

  • A band with a non-null view row now decodes via the composition path instead of being rejected as a read error. Identity-view (null view row) writer/reader behavior is unchanged.

Test plan

  • cargo build -p sedona-schema -p sedona-raster -p sedona-raster-gdal -p sedona-raster-functions -p sedona-testing
  • cargo test -p sedona-raster — 91 tests pass (includes the 22 validate_view tests, 12 builder view tests, 4 array view tests)
  • cargo clippy -p sedona-raster --all-targets -- -D warnings
  • cargo fmt --all --check

@github-actions github-actions Bot requested a review from paleolimbot May 5, 2026 00:21
@james-willis james-willis marked this pull request as draft May 5, 2026 16:58
@james-willis james-willis force-pushed the jw/nd-raster-views branch 2 times, most recently from 349c957 to 91966ed Compare May 5, 2026 18:14
Replaces apache#787's 2D-only band schema with the canonical N-D schema:
spatial_dims/spatial_shape at the raster level; bands carry dim_names,
source_shape, nullable view, outdb_uri, outdb_format, plus the
non-nullable data buffer. Removes nodata_value, storage_type,
outdb_url, and outdb_band_id - every one is encodable in the new
schema:

- storage_type ↔ outdb_uri.is_null() (null = InDb, set = OutDbRef).
- outdb_url ↔ outdb_uri (no rename, same string).
- outdb_band_id ↔ encoded inside outdb_uri (#band=N or GDAL native
  subdataset URI), parsed only inside the GDAL format driver.
- nodata_value ↔ typed nodata: Binary (a null row means "no nodata").

Top-level adds spatial_dims: List<Utf8View> and spatial_shape:
List<Int64>; nullable view is List<Struct<source_axis, start, step,
steps: Int64>> where a null row encodes the canonical identity view.

Note: intermediate commits in this PR are not expected to build; only
the PR tip is CI-green. The trait, reader/builder, RS_* migration,
and GDAL loader port land in subsequent commits.
RasterRef and BandRef accessors over the canonical N-D schema:
spatial_dims/spatial_shape, transform, crs, num_bands, band(i), and
band-level dim_names, source_shape, shape (visible, derived from view),
view, data_type, nodata, outdb_uri, outdb_format, nd_buffer,
contiguous_data returning Cow<[u8]>.

validate_view enforces all view rules including i64-overflow on
start + (steps-1)*step. NdBuffer exposes raw buffer + shape + byte
strides + offset for zero-copy access (numpy / Arrow C Data Interface
boundary); VIEW → byte strides happens inside nd_buffer().

Adds BandRef::is_2d() default method as the gate GDAL-backed paths
use to refuse N-D input cleanly: true iff dim_names == ["y","x"]
over the identity view.
… reader/builder + RS_* migration

View-aware Arrow reader (RasterStructArray, BandRefImpl) with corruption-
surgery (negative steps, bad source_axis, length mismatch) that
round-trips an ArrowError. Builder exposes start_raster / start_band
for full N-D plus start_raster_2d / start_band_2d for legacy 2D, with
identity-view default written as a null view row. finish_raster
validates each band's visible shape against the raster's spatial_shape
along the spatial dims.

All 33 RS_* functions migrated mechanically; outputs on 2D inputs are
byte-identical to apache#787. RS_BandPath keeps its existing inline
fragment-stripping (format-agnostic display, untouched by the GDAL
parser). Test helpers in sedona-testing rewritten on the N-D builder
API.
Reads outdb_uri + parse_outdb_source instead of apache#787's storage_type /
outdb_url / outdb_band_id triplet. Each GDAL-backed SQL function gates
on BandRef::is_2d() at entry and returns an Execution error on N-D
input. VSI normalization, the dataset cache, and RasterIO bodies are
byte-for-byte unchanged from apache#787 - only the schema-read sites move.

In-db reads use BandRef::contiguous_data() and require Cow::Borrowed
so MEM datasets can point at the StructArray's backing buffer without
copying; for is_2d identity views this always holds.

Tests rebuilt to use RasterBuilder directly. Adds an N-D rejection
test for raster_ref_to_gdal_mem and the VRT path, plus an end-to-end
`raster_ref_to_gdal_mem` previously returned a `Result<Dataset>` and
guarded against `BandRef::contiguous_data()` returning `Cow::Owned`
with a runtime tripwire ("Internal: contiguous_data must be borrowed
for is_2d bands; got owned"). The check was correct — handing GDAL a
pointer into a `Vec<u8>` that drops at the end of the iteration would
dangle — but it ties an internal invariant ("`is_2d` ⇒ Borrowed") to
incidental properties of today's reader. Any future copy path in the
reader (compression, BinaryView block-boundary stitching, alignment
fix-up, sliced/broadcast/transposed views from apache#813 / apache#750) would
detonate the tripwire on perfectly valid 2-D rasters.

Change: return `Result<(Dataset, Vec<Vec<u8>>)>`. On `Cow::Borrowed`
the GDAL band still points directly at the StructArray buffer
(zero-copy). On `Cow::Owned` we move the `Vec<u8>` out of the Cow
without copying — the reader's existing materialization is the only
allocation — and stash it in the returned vector. The caller (the
provider in `gdal_dataset_provider.rs`) parks it in a new
`RasterDataset::_owned_band_bytes` field that lives as long as the
MEM dataset that holds the pointers.

`raster_ref_to_gdal_empty` discards the always-empty vector.
Re-enables non-null `view` rows in the N-D raster reader. PR-B treats a
non-null view as a hard read error because its only writer (start_band)
emits the canonical identity (null view row); the view → byte-stride
composition path is needed by the slice/manipulation functions
(RS_Slice, RS_SliceRange, RS_DimToBand, RS_BandToDim) that land on
top of this PR.

Reintroduced:

- traits.rs: validate_view() + 22 unit tests.
- builder.rs: start_band_with_view() builder API and ~12
  view-construction tests (slice, broadcast, transpose, negative-step,
  IPC round-trip, etc.).
- array.rs: view → byte-stride composition in nd_buffer(), view-aware
  contiguous_data() Cow::Owned strided-copy slow path, and the array
  reader tests for explicit views (negative steps, OOB axis, length
  mismatch, malformed view rejection).

Identity-view writer/reader behavior is unchanged; PR-B's error path on
non-null view rows is replaced by the composition path validated here.
@james-willis james-willis force-pushed the jw/nd-raster-views branch from 91966ed to a7fd9cc Compare May 7, 2026 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Morty Proxy This is a proxified and sanitized view of the page, visit original site.