-
Notifications
You must be signed in to change notification settings - Fork 50
feat: Support bigframes.pandas.to_datetime for scalars, iterables and series. #372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
34 commits
Select commit
Hold shift + click to select a range
6eefb40
feat: Support pd.to_datetime for scalars, iterables and series.
Genesis929 033e338
update test and docstring
Genesis929 e4feb09
update types
Genesis929 35f14f5
format update
Genesis929 22ede7d
remove import.
Genesis929 af274cb
update docstring
Genesis929 fe955db
update arg conversion
Genesis929 8c1f633
update examples
Genesis929 637ca21
update format
Genesis929 23fbf15
update code examples, and working logic.
Genesis929 c6d254d
docstring update.
Genesis929 0692c79
type update.
Genesis929 f436149
format update.
Genesis929 87d1749
Update docstring format
Genesis929 b180fe3
remove import
Genesis929 3f0f7db
remove empty line
Genesis929 dc6cfcd
Remove extra code
Genesis929 68ec37e
remove prints.
Genesis929 8b8d61a
Code logic updates.
Genesis929 5e5842b
Add constants.
Genesis929 d4a71b0
Update comments
Genesis929 e0d1f8c
Move datetime helpers to the end of file.
Genesis929 d0db699
Update helper
Genesis929 958ca00
update format
Genesis929 6ef47fb
String process logic updated.
Genesis929 a08ea2e
update import
Genesis929 6732fd9
remove print
Genesis929 097ca77
Merge branch 'main' into huanc-to_datetime
Genesis929 7c54aaa
update docstring
Genesis929 1b68883
update docstring
Genesis929 7057758
update docstring
Genesis929 22abed0
update note
Genesis929 a4e981b
update docstring
Genesis929 24347a2
Update code examples
Genesis929 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
# Copyright 2024 Google LLC | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
from bigframes.core.tools.datetimes import to_datetime | ||
|
||
__all__ = [ | ||
"to_datetime", | ||
] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,82 @@ | ||
# Copyright 2024 Google LLC | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
from collections.abc import Mapping | ||
from datetime import datetime | ||
from typing import Optional, Union | ||
|
||
import pandas as pd | ||
|
||
import bigframes.constants as constants | ||
import bigframes.core.global_session as global_session | ||
import bigframes.dataframe | ||
import bigframes.operations as ops | ||
import bigframes.series | ||
import third_party.bigframes_vendored.pandas.core.tools.datetimes as vendored_pandas_datetimes | ||
|
||
|
||
def to_datetime( | ||
arg: Union[ | ||
vendored_pandas_datetimes.local_scalars, | ||
vendored_pandas_datetimes.local_iterables, | ||
bigframes.series.Series, | ||
bigframes.dataframe.DataFrame, | ||
], | ||
*, | ||
utc: bool = False, | ||
format: Optional[str] = None, | ||
unit: Optional[str] = None, | ||
) -> Union[pd.Timestamp, datetime, bigframes.series.Series]: | ||
if isinstance(arg, (int, float, str, datetime)): | ||
return pd.to_datetime( | ||
arg, | ||
utc=utc, | ||
format=format, | ||
unit=unit, | ||
) | ||
|
||
if isinstance(arg, (Mapping, pd.DataFrame, bigframes.dataframe.DataFrame)): | ||
raise NotImplementedError( | ||
"Conversion of Mapping, pandas.DataFrame, or bigframes.dataframe.DataFrame " | ||
f"to datetime is not implemented. {constants.FEEDBACK_LINK}" | ||
) | ||
|
||
if not isinstance(arg, bigframes.series.Series): | ||
# This block ensures compatibility with local data formats, including | ||
# iterables and pandas.Series | ||
# TODO: Currently, data upload is performed using pandas DataFrames | ||
# combined with the `read_pandas` method due to the BigFrames DataFrame | ||
# constructor's limitations in handling various data types. Plan to update | ||
# the upload process to utilize the BigFrames DataFrame constructor directly | ||
# once it is enhanced for more related datatypes. | ||
arg = global_session.with_default_session( | ||
bigframes.session.Session.read_pandas, pd.DataFrame(arg) | ||
) | ||
if len(arg.columns) != 1: | ||
raise ValueError("Input must be 1-dimensional.") | ||
|
||
arg = arg[arg.columns[0]] | ||
|
||
if not utc and arg.dtype not in ("Int64", "Float64"): # type: ignore | ||
raise NotImplementedError( | ||
f"String and Timestamp requires utc=True. {constants.FEEDBACK_LINK}" | ||
) | ||
|
||
return arg._apply_unary_op( # type: ignore | ||
ops.ToDatetimeOp( | ||
utc=utc, | ||
format=format, | ||
unit=unit, | ||
) | ||
) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.
77 changes: 77 additions & 0 deletions
77
third_party/bigframes_vendored/pandas/core/tools/datetimes.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
# Contains code from https://github.com/pandas-dev/pandas/blob/main/pandas/core/tools/datetimes.py | ||
|
||
from datetime import datetime | ||
from typing import Iterable, Mapping, Union | ||
|
||
import pandas as pd | ||
|
||
from bigframes import constants, series | ||
|
||
local_scalars = Union[int, float, str, datetime] | ||
local_iterables = Union[Iterable, pd.Series, pd.DataFrame, Mapping] | ||
|
||
|
||
def to_datetime( | ||
arg, | ||
*, | ||
utc=False, | ||
format=None, | ||
unit=None, | ||
) -> Union[pd.Timestamp, datetime, series.Series]: | ||
""" | ||
This function converts a scalar, array-like or Series to a datetime object. | ||
|
||
.. note:: | ||
BigQuery only supports precision up to microseconds (us). Therefore, when working | ||
with timestamps that have a finer granularity than microseconds, be aware that | ||
the additional precision will not be represented in BigQuery. | ||
|
||
.. note:: | ||
The format strings for specifying datetime representations in BigQuery and pandas | ||
are not completely identical. Ensure that the format string provided is compatible | ||
with BigQuery. | ||
|
||
**Examples:** | ||
|
||
>>> import bigframes.pandas as bpd | ||
>>> bpd.options.display.progress_bar = None | ||
|
||
Converting a Scalar to datetime: | ||
|
||
>>> scalar = 123456.789 | ||
>>> bpd.to_datetime(scalar, unit = 's') | ||
Timestamp('1970-01-02 10:17:36.789000') | ||
|
||
Converting a List of Strings without Timezone Information: | ||
|
||
>>> list_str = ["01-31-2021 14:30", "02-28-2021 15:45"] | ||
>>> bpd.to_datetime(list_str, format="%m-%d-%Y %H:%M", utc=True) | ||
0 2021-01-31 14:30:00+00:00 | ||
1 2021-02-28 15:45:00+00:00 | ||
Name: 0, dtype: timestamp[us, tz=UTC][pyarrow] | ||
|
||
Converting a Series of Strings with Timezone Information: | ||
|
||
>>> series_str = bpd.Series(["01-31-2021 14:30+08:00", "02-28-2021 15:45+00:00"]) | ||
>>> bpd.to_datetime(series_str, format="%m-%d-%Y %H:%M%Z", utc=True) | ||
0 2021-01-31 06:30:00+00:00 | ||
1 2021-02-28 15:45:00+00:00 | ||
dtype: timestamp[us, tz=UTC][pyarrow] | ||
|
||
Args: | ||
arg (int, float, str, datetime, list, tuple, 1-d array, Series): | ||
The object to convert to a datetime. | ||
utc (bool, default False): | ||
Control timezone-related parsing, localization and conversion. If True, the | ||
function always returns a timezone-aware UTC-localized timestamp or series. | ||
If False (default), inputs will not be coerced to UTC. | ||
format (str, default None): | ||
The strftime to parse time, e.g. "%d/%m/%Y". | ||
unit (str, default 'ns'): | ||
The unit of the arg (D,s,ms,us,ns) denote the unit, which is an integer or | ||
float number. | ||
|
||
Returns: | ||
Timestamp, datetime.datetime or bigframes.series.Series: Return type depends on input. | ||
""" | ||
raise NotImplementedError(constants.ABSTRACT_METHOD_ERROR_MESSAGE) |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fascinating. So utc=False will use DATETIME type in BigQuery?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, for utc=False, it will be later cast to DATETIME type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the example sql: SELECT
CAST(t0.0 AS DATETIME) AS Cast_0_ timestamp
FROM ...