Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

feat(bigframes): add AI TVFs to the pandas bq accessor#17402

Merged
tswast merged 6 commits into
maingoogleapis/google-cloud-python:mainfrom
sycai_ai_generate_tvfgoogleapis/google-cloud-python:sycai_ai_generate_tvfCopy head branch name to clipboard
Jun 23, 2026
Merged

feat(bigframes): add AI TVFs to the pandas bq accessor#17402
tswast merged 6 commits into
maingoogleapis/google-cloud-python:mainfrom
sycai_ai_generate_tvfgoogleapis/google-cloud-python:sycai_ai_generate_tvfCopy head branch name to clipboard

Conversation

@sycai

@sycai sycai commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Also updated the tests to fully utilize the mocking framework.

internal issue: b/517233441

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds generate_embedding, generate_text, and generate_table accessor methods to the BigQuery DataFrame accessor, along with comprehensive unit tests. The feedback suggests removing pd.Series from the type annotations of the model parameter in these new methods, as BigQuery ML models cannot be pandas Series.

Comment thread packages/bigframes/bigframes/extensions/core/dataframe_accessor.py Outdated
Comment thread packages/bigframes/bigframes/extensions/core/dataframe_accessor.py Outdated
Comment thread packages/bigframes/bigframes/extensions/core/dataframe_accessor.py Outdated
@sycai sycai requested review from TrevorBergeron and tswast June 9, 2026 00:47
@sycai sycai marked this pull request as ready for review June 9, 2026 00:47
@sycai sycai requested review from a team as code owners June 9, 2026 00:47
)
return self._to_series(result)

def generate_embedding(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this acts only on a single column (called "content"), I think the Series accessor would be a much better fit.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call! Let me move the contents to the series accessor

)
return self._to_dataframe(result)

def generate_text(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. This acts on a single "prompt" column. I think Series accessor would be a better fit.

To avoid conflicts with generated code, please use a "mixin" pattern to avoid conflicts between the generated Series code and these methods: https://adamj.eu/tech/2025/05/01/python-type-hints-mixin-classes/

)
return self._to_dataframe(result)

def generate_table(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. This acts on a single "prompt" column and would be better suited as a series accessor method.

@sycai

sycai commented Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

There have been some large update on the series bq accessor setup:

  • The AbstractBigQuerySeriesAccessor has been separated out to another file.
  • I defined AITVFMixin to be a subclass of AbstractBigQuerySeriesAccessor.
  • I updated the template generate script such that it generates an empty namespace class even if there are no functions provided in yaml.
  • I updated the core accessor's template such that, if the accessor class is AI, let it inherit from the AITVFMixin instead of the AbstractBigQuerySeriesAccessor (the base class).
  • The functions in the dataframe AI accessor are migrated to the AITVFMixin.
  • The tests are updated.
  • Updated some import statements in the template to meet the readability standards.

FYI @tswast

@sycai sycai requested a review from tswast June 23, 2026 03:28

@tswast tswast left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So cool! Have you tried out something like:

import bigframes  # registers the accessor
import pandas as pd

df = pd.DataFrame(...)
output = df['prompt'].bigquery.ai.some_ai_function()

yet? I think such a workflow where output comes back as a pandas DataFrame could be really compelling.

return self._obj

def _to_series(self, bf_series: bigframes.series.Series) -> S:
def _to_dataframe(self, bf_df: dataframe.DataFrame) -> T:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit strange seeing _to_dataframe at this level, but I suppose it's reasonable to expect that we'll have other TVFs at some point, such as the ML functions.

Comment on lines +493 to +495
if not data or not isinstance(data, dict) or "scalar_functions" not in data:
# If the file is empty or has no functions, just create the namespace.
return [{"namespace": namespace}]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to leave a comment that this is for things like the AI accessor where everything is currently handwritten.


{% for ns in namespaces %}
class {{ ns.class_name }}(AbstractBigQuerySeriesAccessor[S]):
{% if ns.class_name == "AiSeriesAccessor" %}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to put in writing, we also spoke about making this more general by putting mixin options into the yaml format, but given that AI is the only case right now I think this change is the simplest option.

@tswast tswast merged commit ee74e31 into main Jun 23, 2026
32 checks passed
@tswast tswast deleted the sycai_ai_generate_tvf branch June 23, 2026 19:10
@sycai

sycai commented Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

So cool! Have you tried out something like:

import bigframes  # registers the accessor
import pandas as pd

df = pd.DataFrame(...)
output = df['prompt'].bigquery.ai.some_ai_function()

yet? I think such a workflow where output comes back as a pandas DataFrame could be really compelling.

Yes I did! I tried both the scalar ones and the TVF ones. They all worked flawlessly.

chalmerlowe pushed a commit that referenced this pull request Jun 25, 2026
Also updated the tests to fully utilize the mocking framework.

internal issue: b/517233441
shuoweil pushed a commit that referenced this pull request Jun 25, 2026
🤖 I have created a release *beep* *boop*
---


##
[2.44.0](bigframes-v2.43.0...bigframes-v2.44.0)
(2026-06-25)


### Features

* add date functions to `bigframes.bigquery` module
([#17514](#17514))
([e5d2e35](e5d2e35))
* **bigframes:** add AI TVFs to the pandas bq accessor
([#17402](#17402))
([ee74e31](ee74e31))
* Experimental transpilation of unannotated python callables
([#17419](#17419))
([ea9aad9](ea9aad9))
* support gemini-3.x models in loader and update default model to
gemini-3.5-flash
([#17557](#17557))
([3619b29](3619b29))
* support interactive execution of deferred DataFrames in TableWidget
([#17486](#17486))
([421eebd](421eebd))


### Bug Fixes

* avoid invalid CAST(NULL AS NULL) in SQLGlot compiler
([#17487](#17487))
([3b79caa](3b79caa))
* **bigframes:** world-readable temp zip in create_cloud_function
([#17522](#17522))
([e726878](e726878))
* bump @angular/common, @angular/forms, @angular/platform-browser and
@angular/router in
/packages/bigframes/bigframes/display/table_widget_angular
([#17525](#17525))
([2f893b1](2f893b1))
* bump langsmith from 0.8.0 to 0.8.18 in /packages/bigframes
([#17518](#17518))
([f23063f](f23063f))
* bump msgpack from 1.1.1 to 1.2.1 in /packages/bigframes
([#17520](#17520))
([36b5b7e](36b5b7e))
* bump undici and @angular/build in
/packages/bigframes/bigframes/display/table_widget_angular
([#17519](#17519))
([6fc45e3](6fc45e3))
* handle empty endpoints during cloud function reuse
([#17501](#17501))
([4f5593a](4f5593a))


### Documentation

* ensure that PlotAccessor is included in the API reference
([#17513](#17513))
([6febabf](6febabf))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: release-please[bot] <55107282+release-please[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.