Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Inconsistency in catalog.list_tables Behavior Across Python and Java: Returns Non-Iceberg Tables in Python Only #314

Copy link
Copy link
@HonahX

Description

@HonahX
Issue body actions

Feature Request / Improvement

I noticed that in python, hive, glue and dynamo list all tables, including non-Iceberg ones, in the namespace

def list_tables(self, namespace: Union[str, Identifier]) -> List[Identifier]:
"""List tables under the given namespace in the catalog (including non-Iceberg tables).
When the database doesn't exist, it will just return an empty list.
Args:
namespace: Database to list.
Returns:
List[Identifier]: list of table identifiers.
Raises:
NoSuchNamespaceError: If a namespace with the given name does not exist, or the identifier is invalid.
"""
database_name = self.identifier_to_database(namespace, NoSuchNamespaceError)
with self._client as open_client:
return [(database_name, table_name) for table_name in open_client.get_all_tables(db_name=database_name)]

def list_tables(self, namespace: Union[str, Identifier]) -> List[Identifier]:
"""List tables under the given namespace in the catalog (including non-Iceberg tables).
Args:
namespace (str | Identifier): Namespace identifier to search.
Returns:
List[Identifier]: list of table identifiers.
Raises:
NoSuchNamespaceError: If a namespace with the given name does not exist, or the identifier is invalid.
"""
database_name = self.identifier_to_database(namespace, NoSuchNamespaceError)
table_list: List[TableTypeDef] = []
next_token: Optional[str] = None
try:
while True:
table_list_response = (
self.glue.get_tables(DatabaseName=database_name)
if not next_token
else self.glue.get_tables(DatabaseName=database_name, NextToken=next_token)
)
table_list.extend(table_list_response["TableList"])
next_token = table_list_response.get("NextToken")
if not next_token:
break
except self.glue.exceptions.EntityNotFoundException as e:
raise NoSuchNamespaceError(f"Database does not exist: {database_name}") from e
return [(database_name, table["Name"]) for table in table_list]

However, in java, we apply a filter to only return Iceberg tables in the given namespace:
GlueCatalog.listTables
HiveCatalog.listTables

I forgot if we discussed this before: Why do we choose to include non-iceberg tables in the result in python?

cc @Fokko

geruh

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    Morty Proxy This is a proxified and sanitized view of the page, visit original site.