feat(bigframes): add AI TVFs to the pandas bq accessor#17402
feat(bigframes): add AI TVFs to the pandas bq accessor#17402tswast merged 6 commits intomaingoogleapis/google-cloud-python:mainfrom sycai_ai_generate_tvfgoogleapis/google-cloud-python:sycai_ai_generate_tvfCopy head branch name to clipboard
Conversation
There was a problem hiding this comment.
Code Review
This pull request adds generate_embedding, generate_text, and generate_table accessor methods to the BigQuery DataFrame accessor, along with comprehensive unit tests. The feedback suggests removing pd.Series from the type annotations of the model parameter in these new methods, as BigQuery ML models cannot be pandas Series.
| ) | ||
| return self._to_series(result) | ||
|
|
||
| def generate_embedding( |
There was a problem hiding this comment.
Given that this acts only on a single column (called "content"), I think the Series accessor would be a much better fit.
There was a problem hiding this comment.
Good call! Let me move the contents to the series accessor
| ) | ||
| return self._to_dataframe(result) | ||
|
|
||
| def generate_text( |
There was a problem hiding this comment.
Same here. This acts on a single "prompt" column. I think Series accessor would be a better fit.
To avoid conflicts with generated code, please use a "mixin" pattern to avoid conflicts between the generated Series code and these methods: https://adamj.eu/tech/2025/05/01/python-type-hints-mixin-classes/
| ) | ||
| return self._to_dataframe(result) | ||
|
|
||
| def generate_table( |
There was a problem hiding this comment.
Same here. This acts on a single "prompt" column and would be better suited as a series accessor method.
|
There have been some large update on the series bq accessor setup:
FYI @tswast |
tswast
left a comment
There was a problem hiding this comment.
So cool! Have you tried out something like:
import bigframes # registers the accessor
import pandas as pd
df = pd.DataFrame(...)
output = df['prompt'].bigquery.ai.some_ai_function()yet? I think such a workflow where output comes back as a pandas DataFrame could be really compelling.
| return self._obj | ||
|
|
||
| def _to_series(self, bf_series: bigframes.series.Series) -> S: | ||
| def _to_dataframe(self, bf_df: dataframe.DataFrame) -> T: |
There was a problem hiding this comment.
A bit strange seeing _to_dataframe at this level, but I suppose it's reasonable to expect that we'll have other TVFs at some point, such as the ML functions.
| if not data or not isinstance(data, dict) or "scalar_functions" not in data: | ||
| # If the file is empty or has no functions, just create the namespace. | ||
| return [{"namespace": namespace}] |
There was a problem hiding this comment.
Might be good to leave a comment that this is for things like the AI accessor where everything is currently handwritten.
|
|
||
| {% for ns in namespaces %} | ||
| class {{ ns.class_name }}(AbstractBigQuerySeriesAccessor[S]): | ||
| {% if ns.class_name == "AiSeriesAccessor" %} |
There was a problem hiding this comment.
Just to put in writing, we also spoke about making this more general by putting mixin options into the yaml format, but given that AI is the only case right now I think this change is the simplest option.
Yes I did! I tried both the scalar ones and the TVF ones. They all worked flawlessly. |
Also updated the tests to fully utilize the mocking framework. internal issue: b/517233441
🤖 I have created a release *beep* *boop* --- ## [2.44.0](bigframes-v2.43.0...bigframes-v2.44.0) (2026-06-25) ### Features * add date functions to `bigframes.bigquery` module ([#17514](#17514)) ([e5d2e35](e5d2e35)) * **bigframes:** add AI TVFs to the pandas bq accessor ([#17402](#17402)) ([ee74e31](ee74e31)) * Experimental transpilation of unannotated python callables ([#17419](#17419)) ([ea9aad9](ea9aad9)) * support gemini-3.x models in loader and update default model to gemini-3.5-flash ([#17557](#17557)) ([3619b29](3619b29)) * support interactive execution of deferred DataFrames in TableWidget ([#17486](#17486)) ([421eebd](421eebd)) ### Bug Fixes * avoid invalid CAST(NULL AS NULL) in SQLGlot compiler ([#17487](#17487)) ([3b79caa](3b79caa)) * **bigframes:** world-readable temp zip in create_cloud_function ([#17522](#17522)) ([e726878](e726878)) * bump @angular/common, @angular/forms, @angular/platform-browser and @angular/router in /packages/bigframes/bigframes/display/table_widget_angular ([#17525](#17525)) ([2f893b1](2f893b1)) * bump langsmith from 0.8.0 to 0.8.18 in /packages/bigframes ([#17518](#17518)) ([f23063f](f23063f)) * bump msgpack from 1.1.1 to 1.2.1 in /packages/bigframes ([#17520](#17520)) ([36b5b7e](36b5b7e)) * bump undici and @angular/build in /packages/bigframes/bigframes/display/table_widget_angular ([#17519](#17519)) ([6fc45e3](6fc45e3)) * handle empty endpoints during cloud function reuse ([#17501](#17501)) ([4f5593a](4f5593a)) ### Documentation * ensure that PlotAccessor is included in the API reference ([#17513](#17513)) ([6febabf](6febabf)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: release-please[bot] <55107282+release-please[bot]@users.noreply.github.com>
Also updated the tests to fully utilize the mocking framework.
internal issue: b/517233441