Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Conversation

@isc-psulin
Copy link
Contributor

Remove unnecessary _ensure_tables_exist method as this is already the default behavior of Metadata.create_all()

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Do you mind adding a test to show that this works for when

  1. tables already exist
  2. some tables exist, but not all
  3. none of the tables exist yet

@Fokko
Copy link
Contributor

Fokko commented Sep 10, 2024

@isc-patrick do you know how it will check if the table exists? I'm asking since a CREATE TABLE IF NOT EXISTS ... might require CREATE TABLE permissions.

@isc-psulin
Copy link
Contributor Author

I can certainly add tests, but that is really testing the Metadata.create_all() function in SQLAlchemy and not pyiceberg code. I think that CREATE TABLE IF NOT EXISTS requires same permissions as CREATE TABLE. How this is executed will be determined by the Dialect.

@kevinjqliu
Copy link
Contributor

please do! we want to ensure that this change does not break new and existing DB integrations.

@isc-psulin
Copy link
Contributor Author

It does not look like CREATE TABLE IF NOT EXISTS is what is used. It is specific to each dialect and how they implement the has_table method, but the ones I looked at use metadata on the schema to determine the existence of the table.

Copy link
Collaborator

@sungwy sungwy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @isc-patrick - thank you very much for putting up this PR, and for leading the thorough discussion on #1148.

I have added some nits, and in addition could we:

  • update the PR description to fit the updated scope of work for this PR
  • add a line to the SqlCatalog property so that users will be able to find this property?

| Key | Example | Default | Description |
| ------------- | ------------------------------------------------------------ | ------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| uri | postgresql+psycopg2://username:password@localhost/mydatabase | | SQLAlchemy backend URL for the catalog database (see [documentation for URL format](https://docs.sqlalchemy.org/en/20/core/engines.html#backend-specific-urls)) |
| echo | true | false | SQLAlchemy engine [echo param](https://docs.sqlalchemy.org/en/20/core/engines.html#sqlalchemy.create_engine.params.echo) to log all statements to the default log handler |
| pool_pre_ping | true | false | SQLAlchemy engine [pool_pre_ping param](https://docs.sqlalchemy.org/en/20/core/engines.html#sqlalchemy.create_engine.params.pool_pre_ping) to test connections for liveness upon each checkout |

)


def confirm_no_tables_exist(alchemy_engine: Engine) -> list[str]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def confirm_no_tables_exist(alchemy_engine: Engine) -> list[str]:
def confirm_no_tables_exist(alchemy_engine: Engine) -> List[str]:

for python3.8 compatibility (which will be deprecated soon), but this is currently needed for our CI to pass

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not returning this anymore, so removed, CATALOG_TABLES is no a constant. PR title updated and flag added to docs

if inspector.has_table(c.__tablename__):
c.__table__.drop(alchemy_engine)

catalog_tables = [c.__tablename__ for c in SqlCatalogBaseTable.__subclasses__()]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function seems to serve two purposes:

  1. confirm that no tables exist
  2. return the list of SqlCatalogBaseTable.__subclasses__()

(2) is a static list of tables required for the SqlCatalog. Would it be cleaner to separate out (2) to a global variable in this module, instead of returning it and passing it as an input to confirm_all_tables_exist ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made it a constant

return catalog_tables


def confirm_all_tables_exist(inspector: Inspector, catalog_tables: list[str], catalog: Catalog) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we are passing the catalog here just to assert its type - is this necessary for this check? Was the intention to check the tables through the catalog's engine instead like:

inspect(catalog.engine).get_table_names()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not passing the inspector anymore. Also needed to cast the Catalog to use the engine. Made the uri a fixture to insure using same source between catalog and inspector that was creating tables.

@isc-psulin isc-psulin changed the title Remove unnecessary _ensure_tables_exist method Add flag to allow diabling creation of catalog tables Sep 17, 2024
@isc-psulin isc-psulin changed the title Add flag to allow diabling creation of catalog tables Add flag to allow disabling creation of catalog tables Sep 17, 2024
@sungwy
Copy link
Collaborator

sungwy commented Oct 18, 2024

Thank you again for working on this PR @isc-patrick !

@sungwy sungwy merged commit 1c50f53 into apache:main Oct 18, 2024
sungwy pushed a commit to sungwy/iceberg-python that referenced this pull request Dec 7, 2024
* Remove unnecessary _ensure_tables_exist method as this is already the default behavior of Metadata.create_all()

* Added tests for creating Catalog tables when no, some or all tables already exist

* add init_catalog_tables flag to SQLCatalog

* add _ensure_tables_exists back until postgres integration tests completed

* fixed tests, added flag to docs
sungwy pushed a commit to sungwy/iceberg-python that referenced this pull request Dec 7, 2024
* Remove unnecessary _ensure_tables_exist method as this is already the default behavior of Metadata.create_all()

* Added tests for creating Catalog tables when no, some or all tables already exist

* add init_catalog_tables flag to SQLCatalog

* add _ensure_tables_exists back until postgres integration tests completed

* fixed tests, added flag to docs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.