-
Notifications
You must be signed in to change notification settings - Fork 412
Set field-id when needed #1867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set field-id when needed #1867
Conversation
| fields.append(self._construct_field(field, array.type)) | ||
| elif field.optional: | ||
| arrow_type = schema_to_pyarrow(field.field_type, include_field_ids=False) | ||
| arrow_type = schema_to_pyarrow(field.field_type, include_field_ids=self._include_field_ids) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah!! i think this is the right approach
i tried to resolve it by passing the schema's name mapping
https://github.com/apache/iceberg-python/compare/main...kevinjqliu:iceberg-python:kevinjqliu/use-name-mapping?expand=1#diff-8d5e63f2a87ead8cebe2fd8ac5dcf2198d229f01e16bb9e06e21f7277c328abdR1655
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we missed this in a review 🤦 Field-IDs are superior over name-mapping, for example: dropping a field, and then adding a new field with the same name is not supported by name-mapping because it re-uses the name. In the case of field-IDs, a new ID is assigned and it will look like a new column 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found 3 other places where include_field_ids=False
- in
to_table, this is fine since we're just materializing the table from record batches - in
_to_requested_schema, the 2 places where_to_requested_schemais called setsinclude_field_ids=True(1, 2) - in
ArrowProjectionVisitor, but this is only called here get uses theinclude_field_idsfrom_to_requested_schema, which setsinclude_field_ids=True
kevinjqliu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Fixes #1798 <!-- Thanks for opening a pull request! --> <!-- In the case this PR will resolve an issue, please replace ${GITHUB_ISSUE_ID} below with the actual Github issue id. --> <!-- Closes #${GITHUB_ISSUE_ID} --> # Rationale for this change # Are these changes tested? # Are there any user-facing changes? <!-- In the case of user-facing changes, please add the changelog label. -->
Fixes apache#1798 <!-- Thanks for opening a pull request! --> <!-- In the case this PR will resolve an issue, please replace ${GITHUB_ISSUE_ID} below with the actual Github issue id. --> <!-- Closes #${GITHUB_ISSUE_ID} --> # Rationale for this change # Are these changes tested? # Are there any user-facing changes? <!-- In the case of user-facing changes, please add the changelog label. -->
Fixes #1798
Rationale for this change
Are these changes tested?
Are there any user-facing changes?