Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

chore: add snippet tests for type system doc #1783

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 22 commits into
base: main
Choose a base branch
Loading
from

Conversation

sycai
Copy link
Contributor

@sycai sycai commented May 30, 2025

unblock b/407575739

@product-auto-label product-auto-label bot added size: l Pull request size is large. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. samples Issues that are directly related to samples. labels May 30, 2025
@sycai sycai marked this pull request as ready for review May 30, 2025 20:46
@sycai sycai requested review from a team as code owners May 30, 2025 20:46
@sycai sycai requested review from parthea and jialuoo May 30, 2025 20:46
Copy link

snippet-bot bot commented May 30, 2025

Here is the summary of changes.

You are about to add 15 region tags.
You are about to delete 1 region tag.

This comment is generated by snippet-bot.
If you find problems with this result, please file an issue at:
https://github.com/googleapis/repo-automation-bots/issues.
To update this comment, add snippet-bot:force-run label or use the checkbox below:

  • Refresh this comment

@sycai sycai requested review from tswast, TrevorBergeron and chelsea-lin and removed request for parthea and jialuoo May 30, 2025 20:46
Comment on lines 23 to 26
print(s.dtype)
# datetime64[ns]
print(bpd.read_pandas(s).dtype)
# timestamp[us][pyarrow]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to confirm that these comments are accurate with some assertions. Otherwise, we won't know when our docs are incorrect and we're possibly making breaking changes.

Suggested change
print(s.dtype)
# datetime64[ns]
print(bpd.read_pandas(s).dtype)
# timestamp[us][pyarrow]
pandas_dtype = s.dtype
assert str(pandas_dtype) == "datetime64[ns]"
bigframes_dtype = bpd.read_pandas(s).dtype
assert str(bigframes_dtype) == "timestamp[us][pyarrow]"

Same applies to the other samples where we want to demonstrate bigframes behavior.

In the cases where an assertion in the code sample itself doesn't make sense, we can put the assertions after the [END ...] comment and before the next sample starts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Tests added.



def test_type_system_examples() -> None:
# [START bigquery_dataframes_type_sytem_local_type_conversion]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Region tags need to be globally unique. Maybe it makes more sense to include "timestamp" somewhere in here? I imagine bigquery_dataframes_type_sytem_local_type_conversion could apply to lots of different types beyond the scope of this sample.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Changed it to "..._timestamp_local_type_conversion"

@sycai sycai requested a review from tswast June 2, 2025 21:04
@chelsea-lin chelsea-lin removed their request for review June 2, 2025 22:28
@sycai sycai added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jun 3, 2025
@bigframes-bot bigframes-bot removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jun 3, 2025
@@ -39,3 +40,6 @@ def test_bigquery_dataframes_set_options() -> None:
# [END bigquery_dataframes_set_options]
assert bpd.options.bigquery.project == PROJECT_ID
assert bpd.options.bigquery.location == REGION

bpd.close_session()
bpd.options.bigquery.project = old_project
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use bpd.options.reset()

@sycai sycai requested a review from GarrettWu June 3, 2025 22:35
Comment on lines 43 to 44
bpd.close_session()
bpd.options.reset()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put anything that changes global options in a try/finally block so that this cleanup can run no matter what.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG!

@sycai sycai requested a review from tswast June 4, 2025 17:02
# 3 ["a",{"b":1},null]
# 4 {"a":{"b":[1,2,3],"c":true}}
# 5 <NA>
# dtype: extension<dbjson<JSONArrowType>>[pyarrow]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we show this, let's include a comment with a disclaimer that this will be switching to pyarrow json_ type in future.

# TODO: switch to pyarrow.json_(pyarrow.string()) when available.
JSON_ARROW_TYPE = db_dtypes.JSONArrowType()

CC @chelsea-lin

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we suggest folks use bigframes.dtypes.JSON_DTYPE instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Let's use bigframes.dtypes.JSON_DTYPE. Further more, we may want to create a more official APIs for these dtypes, similar to: https://github.com/pandas-dev/pandas/blob/v2.3.0/pandas/core/arrays/integer.py#L228-L232

bpd.Series(
pd.arrays.ArrowExtensionArray(pa_array),
dtype=pd.ArrowDtype(
pa.list_(pa.struct([("key", db_dtypes.JSONArrowType())])),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. I think we'll always allow this for backwards compatibility but we might convert this to the arrow built-in json_ type in future.

CC @chelsea-lin

Maybe we suggest folks use bigframes.dtypes.JSON_ARROW_TYPE instead?

'{"fruits": [{"name": "guava"}, {"name": "grapes"}]}',
]

json_s = bpd.Series(fruits, dtype=pd.ArrowDtype(db_dtypes.JSONArrowType()))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here re: JSON_DTYPE maybe changing in future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. samples Issues that are directly related to samples. size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.