Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Support wildcard tables (and filter on _TABLE_SUFFIX) in read_gbq / read_gbq_table #169

Copy link
Copy link
Closed
@tswast

Description

@tswast
Issue body actions

Is your feature request related to a problem? Please describe.

In https://cloud.google.com/bigquery/docs/create-machine-learning-model there is the following SQL:

SELECT
  IF(totals.transactions IS NULL, 0, 1) AS label,
  IFNULL(device.operatingSystem, "") AS os,
  device.isMobile AS is_mobile,
  IFNULL(geoNetwork.country, "") AS country,
  IFNULL(totals.pageviews, 0) AS pageviews
FROM
  `bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
  _TABLE_SUFFIX BETWEEN '20160801' AND '20170630'

I'd like to be able to represent all of this in Python without any SQL code. There are two problems right now:

  1. bigquery-public-data.google_analytics_sample.ga_sessions_* isn't supported as a table ID in read_gbq. This doesn't refer to any single table, so API requests based on the table ID will fail.

  2. Even if (1) were supported, it would try to copy all the data into a temp table. It would be best to be able to specify a filter on _TABLE_SUFFIX at data read time.

Describe the solution you'd like

bpd.read_gbq("bigquery-public-data.google_analytics_sample.ga_sessions_*") should work.

Also, somewhat inspired by the BigQuery Storage API, accept a row_restriction parameter to filter rows.

bpd.read_gbq(
    "bigquery-public-data.google_analytics_sample.ga_sessions_*",
    filters=[("_TABLE_SUFFIX", ">=", '20160801'), ("_TABLE_SUFFIX", "<=", '20170630')],
)

See: https://cloud.google.com/bigquery/docs/reference/storage/rpc/google.cloud.bigquery.storage.v1#google.cloud.bigquery.storage.v1.ReadSession.TableReadOptions.FIELDS.string.google.cloud.bigquery.storage.v1.ReadSession.TableReadOptions.row_restriction

See also: "filters" parameter in pandas.read_parquet

Describe alternatives you've considered

SQL as an input to read_gbq works as an alternative right now.

Additional context

Related feature request on pandas-gbq:

Metadata

Metadata

Assignees

Labels

api: bigqueryIssues related to the googleapis/python-bigquery-dataframes API.Issues related to the googleapis/python-bigquery-dataframes API.type: feature request‘Nice-to-have’ improvement, new feature or different behavior or design.‘Nice-to-have’ improvement, new feature or different behavior or design.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    Morty Proxy This is a proxified and sanitized view of the page, visit original site.