-
Notifications
You must be signed in to change notification settings - Fork 49
chore: add snippet tests for type system doc #1783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Here is the summary of changes. You are about to add 15 region tags.
You are about to delete 1 region tag.
This comment is generated by snippet-bot.
|
samples/snippets/type_system_test.py
Outdated
print(s.dtype) | ||
# datetime64[ns] | ||
print(bpd.read_pandas(s).dtype) | ||
# timestamp[us][pyarrow] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be good to confirm that these comments are accurate with some assertions. Otherwise, we won't know when our docs are incorrect and we're possibly making breaking changes.
print(s.dtype) | |
# datetime64[ns] | |
print(bpd.read_pandas(s).dtype) | |
# timestamp[us][pyarrow] | |
pandas_dtype = s.dtype | |
assert str(pandas_dtype) == "datetime64[ns]" | |
bigframes_dtype = bpd.read_pandas(s).dtype | |
assert str(bigframes_dtype) == "timestamp[us][pyarrow]" |
Same applies to the other samples where we want to demonstrate bigframes behavior.
In the cases where an assertion in the code sample itself doesn't make sense, we can put the assertions after the [END ...]
comment and before the next sample starts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Tests added.
samples/snippets/type_system_test.py
Outdated
|
||
|
||
def test_type_system_examples() -> None: | ||
# [START bigquery_dataframes_type_sytem_local_type_conversion] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Region tags need to be globally unique. Maybe it makes more sense to include "timestamp" somewhere in here? I imagine bigquery_dataframes_type_sytem_local_type_conversion
could apply to lots of different types beyond the scope of this sample.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Changed it to "..._timestamp_local_type_conversion"
samples/snippets/set_options_test.py
Outdated
@@ -39,3 +40,6 @@ def test_bigquery_dataframes_set_options() -> None: | ||
# [END bigquery_dataframes_set_options] | ||
assert bpd.options.bigquery.project == PROJECT_ID | ||
assert bpd.options.bigquery.location == REGION | ||
|
||
bpd.close_session() | ||
bpd.options.bigquery.project = old_project |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use bpd.options.reset()
samples/snippets/set_options_test.py
Outdated
bpd.close_session() | ||
bpd.options.reset() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's put anything that changes global options in a try/finally block so that this cleanup can run no matter what.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SG!
# 3 ["a",{"b":1},null] | ||
# 4 {"a":{"b":[1,2,3],"c":true}} | ||
# 5 <NA> | ||
# dtype: extension<dbjson<JSONArrowType>>[pyarrow] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we show this, let's include a comment with a disclaimer that this will be switching to pyarrow json_ type in future.
python-bigquery-dataframes/bigframes/dtypes.py
Lines 65 to 66 in 1d45646
# TODO: switch to pyarrow.json_(pyarrow.string()) when available. | |
JSON_ARROW_TYPE = db_dtypes.JSONArrowType() |
CC @chelsea-lin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we suggest folks use bigframes.dtypes.JSON_DTYPE
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Let's use bigframes.dtypes.JSON_DTYPE
. Further more, we may want to create a more official APIs for these dtypes, similar to: https://github.com/pandas-dev/pandas/blob/v2.3.0/pandas/core/arrays/integer.py#L228-L232
bpd.Series( | ||
pd.arrays.ArrowExtensionArray(pa_array), | ||
dtype=pd.ArrowDtype( | ||
pa.list_(pa.struct([("key", db_dtypes.JSONArrowType())])), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. I think we'll always allow this for backwards compatibility but we might convert this to the arrow built-in json_ type in future.
CC @chelsea-lin
Maybe we suggest folks use bigframes.dtypes.JSON_ARROW_TYPE
instead?
'{"fruits": [{"name": "guava"}, {"name": "grapes"}]}', | ||
] | ||
|
||
json_s = bpd.Series(fruits, dtype=pd.ArrowDtype(db_dtypes.JSONArrowType())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here re: JSON_DTYPE maybe changing in future.
unblock b/407575739