Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Conversation

@yothinix
Copy link
Contributor

@yothinix yothinix commented May 24, 2024

Change proposed

  • Introduce support for HMS behind kerberize environment, to enable this feature we need to add catalog config hive.kerberos-authorization=true

Resolves #135

@yothinix yothinix force-pushed the feat/add-support-kerberize-hivemetastore branch from ea0ce8f to 259978e Compare May 27, 2024 03:35
@yothinix yothinix force-pushed the feat/add-support-kerberize-hivemetastore branch from 259978e to d17b7ce Compare June 10, 2024 03:35
Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @yothinix for working on this, I left some comments, but looks good in general 👍

mkdocs/docs/configuration.md Outdated Show resolved Hide resolved
mkdocs/docs/configuration.md Outdated Show resolved Hide resolved
mkdocs/docs/configuration.md Show resolved Hide resolved
mkdocs/docs/configuration.md Outdated Show resolved Hide resolved
@yothinix
Copy link
Contributor Author

Hi @Fokko I added new change as commented, could you help review it again

Also, As I tested more on HMS behind Kerberize we found out that it's required the thrift client to be re-create every time we open a new connection, So that the implementation was change a bit to handle this case. The mentioned change is in this commit a7b8b1c

@awdavidson
Copy link
Contributor

LGTM+1

I had something very similar to this patched locally. It would be good to get merged asap

@kevinjqliu
Copy link
Contributor

Hi @yothinix do you mind rebasing with main? I think this is very close to being merged, I can work with you to get it through.

@yothinix yothinix force-pushed the feat/add-support-kerberize-hivemetastore branch from a7b8b1c to 322d143 Compare August 17, 2024 01:58
@yothinix
Copy link
Contributor Author

Hi @kevinjqliu Thank you for look into this PR, I just rebased the PR branch to latest main branch as requested.

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a few minor comments

Comment on lines -276 to -283
When using Hive 2.x, make sure to set the compatibility flag:

```yaml
catalog:
default:
...
hive.hive2-compatible: true
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: leave this as is, since this is only required for hive2.

and maybe add similar block for hive.kerberos-authorization since that's also optional

HIVE2_COMPATIBLE = "hive.hive2-compatible"
HIVE2_COMPATIBLE_DEFAULT = False

HIVE_KERBEROS_AUTH = "hive.kerberos-authorization"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: authentication is different than authorization

Suggested change
HIVE_KERBEROS_AUTH = "hive.kerberos-authorization"
HIVE_KERBEROS_AUTH = "hive.kerberos-authentication"

Kerberos is authentication https://docs.cloudera.com/cdw-runtime/1.5.1/securing-hive/topics/hive_remote_data_access.html

if not self._kerberos_auth:
self._transport = TTransport.TBufferedTransport(socket)
else:
self._transport = TTransport.TSaslClientTransport(socket, host=url_parts.hostname, service="hive")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is "hive" a static value here or should it be configurable?

if not self._kerberos_auth:
self._transport.open()
else:
self._init_thrift_client()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be called multiple times? The init function already calls this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove it here, it is also called on line 149

@uniqueinput
Copy link

It would be really great if this could make it into 0.8. :)

@kevinjqliu kevinjqliu added this to the PyIceberg 0.9.0 release milestone Oct 30, 2024
@kevinjqliu
Copy link
Contributor

hi @yothinix thanks for the contribution. Would love like to address the comment above? Would be great to have this as part of the 0.9.0 release

@kevinjqliu kevinjqliu removed this from the PyIceberg 0.9.0 release milestone Feb 1, 2025
kevinjqliu pushed a commit that referenced this pull request Feb 12, 2025
Took #766 and addressed
the comments, to make sure that it gets into 0.9.0

---------

Co-authored-by: Yothin Muangsommuk <m@yothinix.com>
@kevinjqliu
Copy link
Contributor

THanks @yothinix for the contribution, this is merged at #1634

@kevinjqliu kevinjqliu closed this Feb 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

using pyiceberg with kerberized hive metastore

5 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.