Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Conversation

@Fokko
Copy link
Contributor

@Fokko Fokko commented Aug 7, 2024

And update the docs

Fixes #1013

@Fokko Fokko force-pushed the fd-allow-setting-max-row-group-size branch from 5b91696 to 46afeaf Compare August 7, 2024 15:00
@Fokko Fokko added this to the PyIceberg 0.7.1 release milestone Aug 7, 2024
@sungwy
Copy link
Collaborator

sungwy commented Aug 7, 2024

LGTM @Fokko - merging in the change from main to resolve the conflict on the doc

@Fokko
Copy link
Contributor Author

Fokko commented Aug 8, 2024

Also threw in a test here 👍

@sungwy sungwy merged commit debda66 into apache:main Aug 8, 2024
| -------------------------------------- | --------------------------------- | ------- | ------------------------------------------------------------------------------------------- |
| `write.parquet.compression-codec` | `{uncompressed,zstd,gzip,snappy}` | zstd | Sets the Parquet compression coddec. |
| `write.parquet.compression-level` | Integer | null | Parquet compression level for the codec. If not set, it is up to PyIceberg |
| `write.parquet.row-group-limit` | Number of rows | 1048576 | The upper bound of the number of entries within a single row group |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fokko @sungwy Thanks, I believe this has resolved my issue #1012 as well.

However, I would like to remind you that this option already exists in the doc, right after write.parquet.dict-size-bytes, the UI doesn't allow me to leave a comment there, so please expand the collapsed area to see it.

Additionally, I'm kind of curious as to why the default value used this time is significantly larger than the previous one?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for flagging this @zhongyujiang - I'll get the second one below with the older default value removed.

To my understanding the new value is the correct default value that matches the default in the PyArrow ParquetWriter: https://arrow.apache.org/docs/python/generated/pyarrow.parquet.ParquetWriter.html

sungwy added a commit that referenced this pull request Aug 9, 2024
* Allow setting `write.parquet.row-group-limit`

And update the docs

* Add test

* Make ruff happy

---------

Co-authored-by: Sung Yun <107272191+sungwy@users.noreply.github.com>
sungwy added a commit that referenced this pull request Aug 9, 2024
* Allow setting `write.parquet.row-group-limit`

And update the docs

* Add test

* Make ruff happy

---------

Co-authored-by: Sung Yun <107272191+sungwy@users.noreply.github.com>
sungwy added a commit to sungwy/iceberg-python that referenced this pull request Dec 7, 2024
* Allow setting `write.parquet.row-group-limit`

And update the docs

* Add test

* Make ruff happy

---------

Co-authored-by: Sung Yun <107272191+sungwy@users.noreply.github.com>
sungwy added a commit to sungwy/iceberg-python that referenced this pull request Dec 7, 2024
* Allow setting `write.parquet.row-group-limit`

And update the docs

* Add test

* Make ruff happy

---------

Co-authored-by: Sung Yun <107272191+sungwy@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NotImplementedError: Parquet writer option(s) ['write.parquet.row-group-size-bytes'] not implemented

3 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.