Commit 72c7a27
authored
feat(storage): Enable full object checksum PR 1/3 : parse finalize_time and server crc32c in async object stream (#17261)
### 1. Overview of the Solution
This solution implements end-to-end full-object checksum validation in
`AsyncMultiRangeDownloader` for the asynchronous Google Cloud Storage
Python client library. As asynchronous multiplexed downloads of
non-contiguous ranges are performed concurrently over a single
bidirectional gRPC connection, this feature automatically and
incrementally calculates a rolling checksum as bytes arrive and
validates it against the server's authoritative object checksum once the
download completes.
The technical approach consists of three coordinated layers:
* **`_AsyncReadObjectStream` (Stream Ingestion)**: Safely extracts the
authoritative server checksum (`full_obj_server_crc32c`) and
finalization status (`is_finalized`) from the object metadata received
in the first data payload response of the stream.
* **`_ReadResumptionStrategy` & `_DownloadState` (Verification Logic)**:
Computes an isolated, persistent rolling checksum in the individual
`_DownloadState` object to ensure calculations do not bleed across
concurrent multiplexed ranges. Crucially, the rolling hash updates only
*after* buffer writes succeed to prevent state corruption during retry
re-connects, raising a `DataCorruption` exception on completion if a
mismatch occurs.
* **`AsyncMultiRangeDownloader` (Orchestration & Cleanup)**: Detects
candidate full-object ranges (e.g., `(0, 0)` or `(0, persisted_size)`),
propagates checksum settings to the resumption strategy, and guarantees
robust cleanup (closing the stream immediately and unregistering IDs) if
data corruption or write errors occur.
### 2. What This PR Specifically Does
This PR implements **Step 1: Stream Metadata Ingestion** of the
solution:
* Modifies `_AsyncReadObjectStream` to safely parse GCS object metadata
from the first data payload of the response.
* Populates `is_finalized`, `full_obj_server_crc32c`, and
`object_metadata` attributes in `_AsyncReadObjectStream.open()`.
* Adds an autouse pytest event loop fixture in `tests/unit/conftest.py`
to resolve compatibility issues with `pytest-asyncio` under Python
3.11+.
* Adds unit tests in `test_async_read_object_stream.py` to verify that
finalization status and server-authoritative checksums are correctly
extracted or skipped for unfinalized objects.1 parent d01a4ba commit 72c7a27Copy full SHA for 72c7a27
3 files changed
+85-1Lines changed: 85 additions & 1 deletion
File tree
Expand file treeCollapse file tree
Open diff view settings
Filter options
- packages/google-cloud-storage
- google/cloud/storage/asyncio
- tests/unit
- asyncio
Expand file treeCollapse file tree
Open diff view settings
Collapse file
packages/google-cloud-storage/google/cloud/storage/asyncio/async_read_object_stream.py
Copy file name to clipboardExpand all lines: packages/google-cloud-storage/google/cloud/storage/asyncio/async_read_object_stream.py+15Lines changed: 15 additions & 0 deletions
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| ||
79 | 79 | |
80 | 80 | |
81 | 81 | |
| 82 | + |
| 83 | + |
| 84 | + |
82 | 85 | |
83 | 86 | |
84 | 87 | |
| ||
132 | 135 | |
133 | 136 | |
134 | 137 | |
| 138 | + |
| 139 | + |
| 140 | + |
| 141 | + |
| 142 | + |
| 143 | + |
| 144 | + |
| 145 | + |
| 146 | + |
| 147 | + |
| 148 | + |
| 149 | + |
135 | 150 | |
136 | 151 | |
137 | 152 | |
|
Collapse file
packages/google-cloud-storage/tests/unit/asyncio/test_async_read_object_stream.py
Copy file name to clipboardExpand all lines: packages/google-cloud-storage/tests/unit/asyncio/test_async_read_object_stream.py+38-1Lines changed: 38 additions & 1 deletion
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| ||
38 | 38 | |
39 | 39 | |
40 | 40 | |
41 | | - |
| 41 | + |
42 | 42 | |
43 | 43 | |
| 44 | + |
| 45 | + |
44 | 46 | |
45 | 47 | |
46 | 48 | |
| ||
130 | 132 | |
131 | 133 | |
132 | 134 | |
| 135 | + |
| 136 | + |
133 | 137 | |
134 | 138 | |
135 | 139 | |
| ||
381 | 385 | |
382 | 386 | |
383 | 387 | |
| 388 | + |
| 389 | + |
| 390 | + |
| 391 | + |
| 392 | + |
| 393 | + |
| 394 | + |
| 395 | + |
| 396 | + |
| 397 | + |
| 398 | + |
| 399 | + |
| 400 | + |
| 401 | + |
| 402 | + |
| 403 | + |
| 404 | + |
| 405 | + |
| 406 | + |
| 407 | + |
| 408 | + |
| 409 | + |
| 410 | + |
| 411 | + |
| 412 | + |
| 413 | + |
| 414 | + |
| 415 | + |
| 416 | + |
| 417 | + |
| 418 | + |
| 419 | + |
| 420 | + |
Collapse file
packages/google-cloud-storage/tests/unit/conftest.py
Copy file name to clipboard+32Lines changed: 32 additions & 0 deletions
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| ||
| 1 | + |
| 2 | + |
| 3 | + |
| 4 | + |
| 5 | + |
| 6 | + |
| 7 | + |
| 8 | + |
| 9 | + |
| 10 | + |
| 11 | + |
| 12 | + |
| 13 | + |
| 14 | + |
| 15 | + |
| 16 | + |
| 17 | + |
| 18 | + |
| 19 | + |
| 20 | + |
| 21 | + |
| 22 | + |
| 23 | + |
| 24 | + |
| 25 | + |
| 26 | + |
| 27 | + |
| 28 | + |
| 29 | + |
| 30 | + |
| 31 | + |
| 32 | + |
0 commit comments