Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Buggy metadata reflection without a default dataset #1088

Copy link
Copy link
Open
@JacobHayes

Description

@JacobHayes
Issue body actions

The get_table_names and get_view_names methods are supposed to return the bare names (no {schema}. prefix) of the resources for a single schema/dataset (where schema=None is the "default schema"). However, the BigQueryDialect implementation returns:

  • bare names for a specific schema: if the connection has a default dataset
  • {schema}. prefixed names for one schema: if the connection doesn't have a default dataset and the schema arg is a string
  • {schema}. prefixed names for all schemas: if the connection doesn't have a default dataset and the schema arg is None

The bolded behaviors, which arise when the connection doesn't have a default dataset, are incorrect and trigger some edge cases:

  • permission errors when access is limited (1) because all datasets are scanned
  • duplicate schema. prefixes / incorrect NoSuchTableErrors (1, 2) because SQLAlchemy adds the prefix when appropriate
  • alembic revision --autogenerate wipes and recreates all tables
    • include_schemas=False (default) should manage a single schema but the BigQueryDialect code would:
      • scan the current schema and incorrectly prefix the schema. resulting in ("schema", "schema.table") tuples that don't match the models represented as ("schema", "table") tuples. This causes alembic to try to delete and recreate all of our models.
      • scan all other schemas. Alembic doesn't see any models defined for them so tries to delete all tables in other schemas.
    • include_schemas=True manages all schemas, but:
      • alembic looks up all existing schemas and then does the reflection for schema=None and all discovered schemas. Since the BigQueryDialect would return prefixed names for all schemas, it would reflect table tuples like (None, "schema1.table"), (None, "schema2.table"), ("schema1", "schema1.table"), ("schema2", "schema2.table"). The None records will seem new and emit CREATE TABLEs while the double prefixed ones later error with NoSuchTableError when looking up their metadata.

Environment details

  • OS type and version: macOS Sonoma 14.5; M1
  • Python version: 3.12.3
  • pip version: 24.0
  • sqlalchemy-bigquery version: 1.11.0

Steps to reproduce

  1. Run the script below
  2. At the breakpoint, notice that Base.metadata.tables has tables from all datasets
  3. Notice that the script errors with NoSuchTableError even though the table exists

Code example

from os import getenv

from sqlalchemy import create_engine
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column

dataset_1, dataset_2 = "tmp1", "tmp2"  # bq query "create schema tmp1; create schema tmp2;"
project = getenv("GOOGLE_CLOUD_PROJECT")

engine = create_engine(f"bigquery://{project}")


class Base(DeclarativeBase):
    pass


class User(Base):
    __tablename__ = "user"
    __table_args__ = {"schema": dataset_1}

    id: Mapped[int] = mapped_column(primary_key=True)
    name: Mapped[str]
    fullname: Mapped[str]
    nickname: Mapped[str | None]


class Org(Base):
    __tablename__ = "org"
    __table_args__ = {"schema": dataset_2}

    id: Mapped[int] = mapped_column(primary_key=True)
    name: Mapped[str]


Base.metadata.create_all(engine)
Base.metadata.clear()

Base.metadata.reflect(engine)
# NOTE: Base.metadata.tables *should* be empty (no schema nor a connection default), but is actually populated from all datasets
breakpoint()

# NOTE: This will confusingly raise `sqlalchemy.exc.NoSuchTableError: {schema}.{table}` - it's actually
# looking for `{schema}.{table}` _within_ `schema`.
Base.metadata.reflect(engine, schema=dataset_1)

Stack trace

$ python3 x.py
(Pdb) len(Base.metadata.tables)
15
(Pdb) c
Traceback (most recent call last):
  File "/.../x.py", line 51, in <module>
    Base.metadata.reflect(engine, schema=dataset_1)
  File "/.../.venv/lib/python3.12/site-packages/sqlalchemy/sql/schema.py", line 5843, in reflect
    Table(name, self, **reflect_opts)
  File "<string>", line 2, in __new__
  File "/.../.venv/lib/python3.12/site-packages/sqlalchemy/util/deprecations.py", line 281, in warned
    return fn(*args, **kwargs)  # type: ignore[no-any-return]
           ^^^^^^^^^^^^^^^^^^^
  File "/.../.venv/lib/python3.12/site-packages/sqlalchemy/sql/schema.py", line 432, in __new__
    return cls._new(*args, **kw)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/.../.venv/lib/python3.12/site-packages/sqlalchemy/sql/schema.py", line 486, in _new
    with util.safe_reraise():
  File "/.../.venv/lib/python3.12/site-packages/sqlalchemy/util/langhelpers.py", line 146, in __exit__
    raise exc_value.with_traceback(exc_tb)
  File "/.../.venv/lib/python3.12/site-packages/sqlalchemy/sql/schema.py", line 482, in _new
    table.__init__(name, metadata, *args, _no_init=False, **kw)
  File "/.../.venv/lib/python3.12/site-packages/sqlalchemy/sql/schema.py", line 862, in __init__
    self._autoload(
  File "/.../.venv/lib/python3.12/site-packages/sqlalchemy/sql/schema.py", line 894, in _autoload
    conn_insp.reflect_table(
  File "/[...]/.venv/lib/python3.12/site-packages/sqlalchemy/engine/reflection.py", line 1538, in reflect_table
    raise exc.NoSuchTableError(table_name)
sqlalchemy.exc.NoSuchTableError: tmp1.user

Metadata

Metadata

Assignees

Labels

api: bigqueryIssues related to the googleapis/python-bigquery-sqlalchemy API.Issues related to the googleapis/python-bigquery-sqlalchemy API.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    Morty Proxy This is a proxified and sanitized view of the page, visit original site.