Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

GH-3011: Deny further writes after InternalParquetRecordWriter is aborted#3450

Open
LuciferYang wants to merge 1 commit intoapache:masterapache/parquet-java:masterfrom
LuciferYang:GH-3011LuciferYang/parquet-java:GH-3011Copy head branch name to clipboard
Open

GH-3011: Deny further writes after InternalParquetRecordWriter is aborted#3450
LuciferYang wants to merge 1 commit intoapache:masterapache/parquet-java:masterfrom
LuciferYang:GH-3011LuciferYang/parquet-java:GH-3011Copy head branch name to clipboard

Conversation

@LuciferYang
Copy link
Contributor

@LuciferYang LuciferYang commented Mar 17, 2026

Rationale for this change

After a write error (e.g. OOM during page flush), InternalParquetRecordWriter sets its aborted flag to true and re-throws the exception. However, subsequent calls to write() are silently accepted without checking this flag. Since close() skips flushing when aborted is true, all data written after the error is silently discarded, producing a corrupted Parquet file without a footer. Users only discover the corruption when they attempt to read the file later.

What changes are included in this PR?

Added an aborted state check at the beginning of write(). If the writer has been aborted due to a previous error, an IOException is thrown immediately with a clear error message, preventing further writes to a writer in an undefined state.

Are these changes tested?

Yes. Added testWriteAfterAbortShouldThrow in TestParquetWriterError that verifies:

  1. Writing to an aborted writer throws IOException with the expected message
  2. close() on an aborted writer completes without throwing

All existing tests in parquet-hadoop pass without modification.

Are there any user-facing changes?

Yes. Users who previously caught write exceptions and continued writing to the same ParquetWriter will now receive an IOException on subsequent write attempts. This is an intentional change to prevent silent data loss — the correct behavior after a write failure is to discard the writer and create a new one.

Closes #3011

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Should deny further write if ParquetWriter is aborted

1 participant

Morty Proxy This is a proxified and sanitized view of the page, visit original site.