-
Notifications
You must be signed in to change notification settings - Fork 49
feat: support array output in remote_function
#1057
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is feature request to support use cases like creating custom feature vectors, embeddings etc.
…tr array outputs
|
||
# if the output is an array, reconstruct it from the json serialized | ||
# string form | ||
if bigframes.dtypes.is_array_like(func.output_dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually handle any array-like dtype?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Um, in this PR we are looking to support types like list[int]
on the output side? Or I didn't get you?
@@ -1513,6 +1513,18 @@ def apply( | ||
ops.RemoteFunctionOp(func=func, apply_on_null=True) | ||
) | ||
|
||
# if the output is an array, reconstruct it from the json serialized | ||
# string form | ||
if bigframes.dtypes.is_array_like(func.output_dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems the code within this block assume not just array_like, but specifically that it is a pyarrow list_ type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's exactly what the array_like
implementation checks?
python-bigquery-dataframes/bigframes/dtypes.py
Lines 301 to 304 in 5a2731b
def is_array_like(type_: ExpressionType) -> bool: | |
return isinstance(type_, pd.ArrowDtype) and isinstance( | |
type_.pyarrow_dtype, pa.ListType | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eh, probably fine then, I don't really see array_like definition expanding anytime soon
bigframes/functions/_utils.py
Outdated
return None | ||
|
||
try: | ||
python_output_type = eval(output_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eval
always makes me a bit uncomfortable - can we do this in a more constrained way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed eval in the latest patch, PTAL
bigframes/functions/_utils.py
Outdated
if typing.get_origin(python_output_type) is list: | ||
python_output_type_ser = repr(python_output_type) | ||
else: | ||
python_output_type_ser = python_output_type.__name__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shoudl we bother with non-list types right now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
throwing error for non-array and not-supported-array types in the latest patch, PTAL
* feat: support array output in `remote_function` This is feature request to support use cases like creating custom feature vectors, embeddings etc. * add multiindex test * move array type conversion to bigquery module, test multiindex * add `bigframes.bigquery.json_extract_string_array`, support int and str array outputs * increase cleanup rate * update input and output types doc * support array output in DataFrame.apply * support read_gbq_function on a remote function created for array output * fix the json_set after variable renaming * add tests for output_type in read_gbq_function * temporarily exclude system 3.9 tests and include 3.10 and 3.11 * Revert "temporarily exclude system 3.9 tests and include 3.10 and 3.11" This reverts commit 2485aa3. * add more info in the unexpected exception * more debug info * use unique routine name across tests * Revert "more debug info" This reverts commit 86fe316. * Revert "add more info in the unexpected exception" This reverts commit fe010cb. * support array output in binary remote function operations * support array output in nary remote function operations * preserve array output type in function description to avoid explit output_type in read_gbq_function * fix one failing read_gbq_function test * make test parameterization order deterministic * fix sorting of types for mypy * remove test parameterization with sorting inside * include partial ordering mode testing for read_gbq_function * add remote function array out test in partial ordering mode * avoid repr-eval for output type serialization/deserialization * remove unsupported scenarios system tests, use common exception for unsupported
* feat: support array output in `remote_function` This is feature request to support use cases like creating custom feature vectors, embeddings etc. * add multiindex test * move array type conversion to bigquery module, test multiindex * add `bigframes.bigquery.json_extract_string_array`, support int and str array outputs * increase cleanup rate * update input and output types doc * support array output in DataFrame.apply * support read_gbq_function on a remote function created for array output * fix the json_set after variable renaming * add tests for output_type in read_gbq_function * temporarily exclude system 3.9 tests and include 3.10 and 3.11 * Revert "temporarily exclude system 3.9 tests and include 3.10 and 3.11" This reverts commit 2485aa3. * add more info in the unexpected exception * more debug info * use unique routine name across tests * Revert "more debug info" This reverts commit 86fe316. * Revert "add more info in the unexpected exception" This reverts commit fe010cb. * support array output in binary remote function operations * support array output in nary remote function operations * preserve array output type in function description to avoid explit output_type in read_gbq_function * fix one failing read_gbq_function test * make test parameterization order deterministic * fix sorting of types for mypy * remove test parameterization with sorting inside * include partial ordering mode testing for read_gbq_function * add remote function array out test in partial ordering mode * avoid repr-eval for output type serialization/deserialization * remove unsupported scenarios system tests, use common exception for unsupported
* feat: support array output in `remote_function` This is feature request to support use cases like creating custom feature vectors, embeddings etc. * add multiindex test * move array type conversion to bigquery module, test multiindex * add `bigframes.bigquery.json_extract_string_array`, support int and str array outputs * increase cleanup rate * update input and output types doc * support array output in DataFrame.apply * support read_gbq_function on a remote function created for array output * fix the json_set after variable renaming * add tests for output_type in read_gbq_function * temporarily exclude system 3.9 tests and include 3.10 and 3.11 * Revert "temporarily exclude system 3.9 tests and include 3.10 and 3.11" This reverts commit 2485aa3. * add more info in the unexpected exception * more debug info * use unique routine name across tests * Revert "more debug info" This reverts commit 86fe316. * Revert "add more info in the unexpected exception" This reverts commit fe010cb. * support array output in binary remote function operations * support array output in nary remote function operations * preserve array output type in function description to avoid explit output_type in read_gbq_function * fix one failing read_gbq_function test * make test parameterization order deterministic * fix sorting of types for mypy * remove test parameterization with sorting inside * include partial ordering mode testing for read_gbq_function * add remote function array out test in partial ordering mode * avoid repr-eval for output type serialization/deserialization * remove unsupported scenarios system tests, use common exception for unsupported
* feat: support array output in `remote_function` This is feature request to support use cases like creating custom feature vectors, embeddings etc. * add multiindex test * move array type conversion to bigquery module, test multiindex * add `bigframes.bigquery.json_extract_string_array`, support int and str array outputs * increase cleanup rate * update input and output types doc * support array output in DataFrame.apply * support read_gbq_function on a remote function created for array output * fix the json_set after variable renaming * add tests for output_type in read_gbq_function * temporarily exclude system 3.9 tests and include 3.10 and 3.11 * Revert "temporarily exclude system 3.9 tests and include 3.10 and 3.11" This reverts commit 2485aa3. * add more info in the unexpected exception * more debug info * use unique routine name across tests * Revert "more debug info" This reverts commit 86fe316. * Revert "add more info in the unexpected exception" This reverts commit fe010cb. * support array output in binary remote function operations * support array output in nary remote function operations * preserve array output type in function description to avoid explit output_type in read_gbq_function * fix one failing read_gbq_function test * make test parameterization order deterministic * fix sorting of types for mypy * remove test parameterization with sorting inside * include partial ordering mode testing for read_gbq_function * add remote function array out test in partial ordering mode * avoid repr-eval for output type serialization/deserialization * remove unsupported scenarios system tests, use common exception for unsupported
* feat: support array output in `remote_function` This is feature request to support use cases like creating custom feature vectors, embeddings etc. * add multiindex test * move array type conversion to bigquery module, test multiindex * add `bigframes.bigquery.json_extract_string_array`, support int and str array outputs * increase cleanup rate * update input and output types doc * support array output in DataFrame.apply * support read_gbq_function on a remote function created for array output * fix the json_set after variable renaming * add tests for output_type in read_gbq_function * temporarily exclude system 3.9 tests and include 3.10 and 3.11 * Revert "temporarily exclude system 3.9 tests and include 3.10 and 3.11" This reverts commit 2485aa3. * add more info in the unexpected exception * more debug info * use unique routine name across tests * Revert "more debug info" This reverts commit 86fe316. * Revert "add more info in the unexpected exception" This reverts commit fe010cb. * support array output in binary remote function operations * support array output in nary remote function operations * preserve array output type in function description to avoid explit output_type in read_gbq_function * fix one failing read_gbq_function test * make test parameterization order deterministic * fix sorting of types for mypy * remove test parameterization with sorting inside * include partial ordering mode testing for read_gbq_function * add remote function array out test in partial ordering mode * avoid repr-eval for output type serialization/deserialization * remove unsupported scenarios system tests, use common exception for unsupported
* feat: support array output in `remote_function` This is feature request to support use cases like creating custom feature vectors, embeddings etc. * add multiindex test * move array type conversion to bigquery module, test multiindex * add `bigframes.bigquery.json_extract_string_array`, support int and str array outputs * increase cleanup rate * update input and output types doc * support array output in DataFrame.apply * support read_gbq_function on a remote function created for array output * fix the json_set after variable renaming * add tests for output_type in read_gbq_function * temporarily exclude system 3.9 tests and include 3.10 and 3.11 * Revert "temporarily exclude system 3.9 tests and include 3.10 and 3.11" This reverts commit 2485aa3. * add more info in the unexpected exception * more debug info * use unique routine name across tests * Revert "more debug info" This reverts commit 86fe316. * Revert "add more info in the unexpected exception" This reverts commit fe010cb. * support array output in binary remote function operations * support array output in nary remote function operations * preserve array output type in function description to avoid explit output_type in read_gbq_function * fix one failing read_gbq_function test * make test parameterization order deterministic * fix sorting of types for mypy * remove test parameterization with sorting inside * include partial ordering mode testing for read_gbq_function * add remote function array out test in partial ordering mode * avoid repr-eval for output type serialization/deserialization * remove unsupported scenarios system tests, use common exception for unsupported
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
remote_function
: screen/5rMtCZVaUYKdqxPSeries.apply
: screen/9HkKMuWxMvbbPgfDataFrame.apply
: screen/BoXH9A7d4hGpETuFixes internal issue 298876217 🦕