Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit 7362f3e

Browse filesBrowse files
committed
Merge branch 'main' into bug-hash-dt64
2 parents 95069e0 + 7d545f0 commit 7362f3e
Copy full SHA for 7362f3e

File tree

Expand file treeCollapse file tree

12 files changed

+201
-109
lines changed
Filter options
Expand file treeCollapse file tree

12 files changed

+201
-109
lines changed

‎doc/source/development/contributing.rst

Copy file name to clipboardExpand all lines: doc/source/development/contributing.rst
-23Lines changed: 0 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -331,29 +331,6 @@ To automatically fix formatting errors on each commit you make, you can
331331
set up pre-commit yourself. First, create a Python :ref:`environment
332332
<contributing_environment>` and then set up :ref:`pre-commit <contributing.pre-commit>`.
333333

334-
Delete your merged branch (optional)
335-
------------------------------------
336-
337-
Once your feature branch is accepted into upstream, you'll probably want to get rid of
338-
the branch. First, merge upstream main into your branch so git knows it is safe to
339-
delete your branch::
340-
341-
git fetch upstream
342-
git checkout main
343-
git merge upstream/main
344-
345-
Then you can do::
346-
347-
git branch -d shiny-new-feature
348-
349-
Make sure you use a lower-case ``-d``, or else git won't warn you if your feature
350-
branch has not actually been merged.
351-
352-
The branch will still exist on GitHub, so to delete it there do::
353-
354-
git push origin --delete shiny-new-feature
355-
356-
357334
Tips for a successful pull request
358335
==================================
359336

‎doc/source/user_guide/io.rst

Copy file name to clipboardExpand all lines: doc/source/user_guide/io.rst
+12-3Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1001,14 +1001,23 @@ way to parse dates is to explicitly set ``format=``.
10011001
)
10021002
df
10031003
1004-
In the case that you have mixed datetime formats within the same column, you'll need to
1005-
first read it in as an object dtype and then apply :func:`to_datetime` to each element.
1004+
In the case that you have mixed datetime formats within the same column, you can
1005+
pass ``format='mixed'``
10061006

10071007
.. ipython:: python
10081008
10091009
data = io.StringIO("date\n12 Jan 2000\n2000-01-13\n")
10101010
df = pd.read_csv(data)
1011-
df['date'] = df['date'].apply(pd.to_datetime)
1011+
df['date'] = pd.to_datetime(df['date'], format='mixed')
1012+
df
1013+
1014+
or, if your datetime formats are all ISO8601 (possibly not identically-formatted):
1015+
1016+
.. ipython:: python
1017+
1018+
data = io.StringIO("date\n2020-01-01\n2020-01-01 03:00\n")
1019+
df = pd.read_csv(data)
1020+
df['date'] = pd.to_datetime(df['date'], format='ISO8601')
10121021
df
10131022
10141023
.. ipython:: python

‎doc/source/whatsnew/v2.0.0.rst

Copy file name to clipboardExpand all lines: doc/source/whatsnew/v2.0.0.rst
+10-3Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -311,6 +311,8 @@ Other enhancements
311311
- Added :meth:`DatetimeIndex.as_unit` and :meth:`TimedeltaIndex.as_unit` to convert to different resolutions; supported resolutions are "s", "ms", "us", and "ns" (:issue:`50616`)
312312
- Added :meth:`Series.dt.unit` and :meth:`Series.dt.as_unit` to convert to different resolutions; supported resolutions are "s", "ms", "us", and "ns" (:issue:`51223`)
313313
- Added new argument ``dtype`` to :func:`read_sql` to be consistent with :func:`read_sql_query` (:issue:`50797`)
314+
- :func:`to_datetime` now accepts ``"ISO8601"`` as an argument to ``format``, which will match any ISO8601 string (but possibly not identically-formatted) (:issue:`50411`)
315+
- :func:`to_datetime` now accepts ``"mixed"`` as an argument to ``format``, which will infer the format for each element individually (:issue:`50972`)
314316
- Added new argument ``engine`` to :func:`read_json` to support parsing JSON with pyarrow by specifying ``engine="pyarrow"`` (:issue:`48893`)
315317
- Added support for SQLAlchemy 2.0 (:issue:`40686`)
316318
- :class:`Index` set operations :meth:`Index.union`, :meth:`Index.intersection`, :meth:`Index.difference`, and :meth:`Index.symmetric_difference` now support ``sort=True``, which will always return a sorted result, unlike the default ``sort=None`` which does not sort in some cases (:issue:`25151`)
@@ -738,11 +740,16 @@ In the past, :func:`to_datetime` guessed the format for each element independent
738740
739741
Note that this affects :func:`read_csv` as well.
740742

741-
If you still need to parse dates with inconsistent formats, you'll need to apply :func:`to_datetime`
742-
to each element individually, e.g. ::
743+
If you still need to parse dates with inconsistent formats, you can use
744+
``format='mixed`` (possibly alongside ``dayfirst``) ::
743745

744746
ser = pd.Series(['13-01-2000', '12 January 2000'])
745-
ser.apply(pd.to_datetime)
747+
pd.to_datetime(ser, format='mixed', dayfirst=True)
748+
749+
or, if your formats are all ISO8601 (but possibly not identically-formatted) ::
750+
751+
ser = pd.Series(['2020-01-01', '2020-01-01 03:00'])
752+
pd.to_datetime(ser, format='ISO8601')
746753

747754
.. _whatsnew_200.api_breaking.other:
748755

‎pandas/_libs/tslibs/strptime.pyx

Copy file name to clipboardExpand all lines: pandas/_libs/tslibs/strptime.pyx
+44-26Lines changed: 44 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -186,6 +186,7 @@ def array_strptime(
186186
bint iso_format = format_is_iso(fmt)
187187
NPY_DATETIMEUNIT out_bestunit
188188
int out_local = 0, out_tzoffset = 0
189+
bint string_to_dts_succeeded = 0
189190

190191
assert is_raise or is_ignore or is_coerce
191192

@@ -306,53 +307,62 @@ def array_strptime(
306307
else:
307308
val = str(val)
308309

309-
if iso_format:
310-
string_to_dts_failed = string_to_dts(
310+
if fmt == "ISO8601":
311+
string_to_dts_succeeded = not string_to_dts(
312+
val, &dts, &out_bestunit, &out_local,
313+
&out_tzoffset, False, None, False
314+
)
315+
elif iso_format:
316+
string_to_dts_succeeded = not string_to_dts(
311317
val, &dts, &out_bestunit, &out_local,
312318
&out_tzoffset, False, fmt, exact
313319
)
314-
if not string_to_dts_failed:
315-
# No error reported by string_to_dts, pick back up
316-
# where we left off
317-
value = npy_datetimestruct_to_datetime(NPY_FR_ns, &dts)
318-
if out_local == 1:
319-
# Store the out_tzoffset in seconds
320-
# since we store the total_seconds of
321-
# dateutil.tz.tzoffset objects
322-
tz = timezone(timedelta(minutes=out_tzoffset))
323-
result_timezone[i] = tz
324-
out_local = 0
325-
out_tzoffset = 0
326-
iresult[i] = value
327-
check_dts_bounds(&dts)
328-
continue
320+
if string_to_dts_succeeded:
321+
# No error reported by string_to_dts, pick back up
322+
# where we left off
323+
value = npy_datetimestruct_to_datetime(NPY_FR_ns, &dts)
324+
if out_local == 1:
325+
# Store the out_tzoffset in seconds
326+
# since we store the total_seconds of
327+
# dateutil.tz.tzoffset objects
328+
tz = timezone(timedelta(minutes=out_tzoffset))
329+
result_timezone[i] = tz
330+
out_local = 0
331+
out_tzoffset = 0
332+
iresult[i] = value
333+
check_dts_bounds(&dts)
334+
continue
329335

330336
if parse_today_now(val, &iresult[i], utc):
331337
continue
332338

333339
# Some ISO formats can't be parsed by string_to_dts
334-
# For example, 6-digit YYYYMD. So, if there's an error,
335-
# try the string-matching code below.
340+
# For example, 6-digit YYYYMD. So, if there's an error, and a format
341+
# was specified, then try the string-matching code below. If the format
342+
# specified was 'ISO8601', then we need to error, because
343+
# only string_to_dts handles mixed ISO8601 formats.
344+
if not string_to_dts_succeeded and fmt == "ISO8601":
345+
raise ValueError(f"Time data {val} is not ISO8601 format")
336346

337347
# exact matching
338348
if exact:
339349
found = format_regex.match(val)
340350
if not found:
341-
raise ValueError(f"time data \"{val}\" doesn't "
342-
f"match format \"{fmt}\"")
351+
raise ValueError(
352+
f"time data \"{val}\" doesn't match format \"{fmt}\""
353+
)
343354
if len(val) != found.end():
344355
raise ValueError(
345-
f"unconverted data remains: "
346-
f'"{val[found.end():]}"'
356+
"unconverted data remains when parsing with "
357+
f"format \"{fmt}\": \"{val[found.end():]}\""
347358
)
348359

349360
# search
350361
else:
351362
found = format_regex.search(val)
352363
if not found:
353364
raise ValueError(
354-
f"time data \"{val}\" doesn't match "
355-
f"format \"{fmt}\""
365+
f"time data \"{val}\" doesn't match format \"{fmt}\""
356366
)
357367

358368
iso_year = -1
@@ -504,7 +514,15 @@ def array_strptime(
504514
result_timezone[i] = tz
505515

506516
except (ValueError, OutOfBoundsDatetime) as ex:
507-
ex.args = (f"{str(ex)}, at position {i}",)
517+
ex.args = (
518+
f"{str(ex)}, at position {i}. You might want to try:\n"
519+
" - passing `format` if your strings have a consistent format;\n"
520+
" - passing `format='ISO8601'` if your strings are "
521+
"all ISO8601 but not necessarily in exactly the same format;\n"
522+
" - passing `format='mixed'`, and the format will be "
523+
"inferred for each element individually. "
524+
"You might want to use `dayfirst` alongside this.",
525+
)
508526
if is_coerce:
509527
iresult[i] = NPY_NAT
510528
continue

‎pandas/core/arrays/interval.py

Copy file name to clipboardExpand all lines: pandas/core/arrays/interval.py
+5-2Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -675,7 +675,6 @@ def _shallow_copy(self: IntervalArrayT, left, right) -> IntervalArrayT:
675675
"""
676676
dtype = IntervalDtype(left.dtype, closed=self.closed)
677677
left, right, dtype = self._ensure_simple_new_inputs(left, right, dtype=dtype)
678-
self._validate(left, right, dtype=dtype)
679678

680679
return self._simple_new(left, right, dtype=dtype)
681680

@@ -727,7 +726,11 @@ def __getitem__(
727726
if np.ndim(left) > 1:
728727
# GH#30588 multi-dimensional indexer disallowed
729728
raise ValueError("multi-dimensional indexing not allowed")
730-
return self._shallow_copy(left, right)
729+
# Argument 2 to "_simple_new" of "IntervalArray" has incompatible type
730+
# "Union[Period, Timestamp, Timedelta, NaTType, DatetimeArray, TimedeltaArray,
731+
# ndarray[Any, Any]]"; expected "Union[Union[DatetimeArray, TimedeltaArray],
732+
# ndarray[Any, Any]]"
733+
return self._simple_new(left, right, dtype=self.dtype) # type: ignore[arg-type]
731734

732735
def __setitem__(self, key, value) -> None:
733736
value_left, value_right = self._validate_setitem_value(value)

‎pandas/core/frame.py

Copy file name to clipboardExpand all lines: pandas/core/frame.py
+7-9Lines changed: 7 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -8252,19 +8252,19 @@ def groupby(
82528252
82538253
Parameters
82548254
----------%s
8255+
columns : str or object or a list of str
8256+
Column to use to make new frame's columns.
8257+
8258+
.. versionchanged:: 1.1.0
8259+
Also accept list of columns names.
8260+
82558261
index : str or object or a list of str, optional
82568262
Column to use to make new frame's index. If None, uses
82578263
existing index.
82588264
82598265
.. versionchanged:: 1.1.0
82608266
Also accept list of index names.
82618267
8262-
columns : str or object or a list of str
8263-
Column to use to make new frame's columns.
8264-
8265-
.. versionchanged:: 1.1.0
8266-
Also accept list of columns names.
8267-
82688268
values : str, object or a list of the previous, optional
82698269
Column(s) to use for populating new frame's values. If not
82708270
specified, all remaining columns will be used and the result will
@@ -8387,9 +8387,7 @@ def groupby(
83878387

83888388
@Substitution("")
83898389
@Appender(_shared_docs["pivot"])
8390-
def pivot(
8391-
self, *, index=lib.NoDefault, columns=lib.NoDefault, values=lib.NoDefault
8392-
) -> DataFrame:
8390+
def pivot(self, *, columns, index=lib.NoDefault, values=lib.NoDefault) -> DataFrame:
83938391
from pandas.core.reshape.pivot import pivot
83948392

83958393
return pivot(self, index=index, columns=columns, values=values)

‎pandas/core/reshape/pivot.py

Copy file name to clipboardExpand all lines: pandas/core/reshape/pivot.py
+1-3Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -500,12 +500,10 @@ def _convert_by(by):
500500
def pivot(
501501
data: DataFrame,
502502
*,
503+
columns: IndexLabel,
503504
index: IndexLabel | lib.NoDefault = lib.NoDefault,
504-
columns: IndexLabel | lib.NoDefault = lib.NoDefault,
505505
values: IndexLabel | lib.NoDefault = lib.NoDefault,
506506
) -> DataFrame:
507-
if columns is lib.NoDefault:
508-
raise TypeError("pivot() missing 1 required argument: 'columns'")
509507

510508
columns_listlike = com.convert_to_list_like(columns)
511509

‎pandas/core/tools/datetimes.py

Copy file name to clipboardExpand all lines: pandas/core/tools/datetimes.py
+13-5Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -445,7 +445,8 @@ def _convert_listlike_datetimes(
445445
if format is None:
446446
format = _guess_datetime_format_for_array(arg, dayfirst=dayfirst)
447447

448-
if format is not None:
448+
# `format` could be inferred, or user didn't ask for mixed-format parsing.
449+
if format is not None and format != "mixed":
449450
return _array_strptime_with_fallback(arg, name, utc, format, exact, errors)
450451

451452
result, tz_parsed = objects_to_datetime64ns(
@@ -687,7 +688,7 @@ def to_datetime(
687688
yearfirst: bool = False,
688689
utc: bool = False,
689690
format: str | None = None,
690-
exact: bool = True,
691+
exact: bool | lib.NoDefault = lib.no_default,
691692
unit: str | None = None,
692693
infer_datetime_format: lib.NoDefault | bool = lib.no_default,
693694
origin: str = "unix",
@@ -717,9 +718,7 @@ def to_datetime(
717718
.. warning::
718719
719720
``dayfirst=True`` is not strict, but will prefer to parse
720-
with day first. If a delimited date string cannot be parsed in
721-
accordance with the given `dayfirst` option, e.g.
722-
``to_datetime(['31-12-2021'])``, then a warning will be shown.
721+
with day first.
723722
724723
yearfirst : bool, default False
725724
Specify a date parse order if `arg` is str or is list-like.
@@ -759,13 +758,20 @@ def to_datetime(
759758
<https://docs.python.org/3/library/datetime.html
760759
#strftime-and-strptime-behavior>`_ for more information on choices, though
761760
note that :const:`"%f"` will parse all the way up to nanoseconds.
761+
You can also pass:
762+
763+
- "ISO8601", to parse any `ISO8601 <https://en.wikipedia.org/wiki/ISO_8601>`_
764+
time string (not necessarily in exactly the same format);
765+
- "mixed", to infer the format for each element individually. This is risky,
766+
and you should probably use it along with `dayfirst`.
762767
exact : bool, default True
763768
Control how `format` is used:
764769
765770
- If :const:`True`, require an exact `format` match.
766771
- If :const:`False`, allow the `format` to match anywhere in the target
767772
string.
768773
774+
Cannot be used alongside ``format='ISO8601'`` or ``format='mixed'``.
769775
unit : str, default 'ns'
770776
The unit of the arg (D,s,ms,us,ns) denote the unit, which is an
771777
integer or float number. This will be based off the origin.
@@ -997,6 +1003,8 @@ def to_datetime(
9971003
DatetimeIndex(['2018-10-26 12:00:00+00:00', '2020-01-01 18:00:00+00:00'],
9981004
dtype='datetime64[ns, UTC]', freq=None)
9991005
"""
1006+
if exact is not lib.no_default and format in {"mixed", "ISO8601"}:
1007+
raise ValueError("Cannot use 'exact' when 'format' is 'mixed' or 'ISO8601'")
10001008
if infer_datetime_format is not lib.no_default:
10011009
warnings.warn(
10021010
"The argument 'infer_datetime_format' is deprecated and will "

‎pandas/tests/io/parser/test_parse_dates.py

Copy file name to clipboardExpand all lines: pandas/tests/io/parser/test_parse_dates.py
+2-1Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1721,7 +1721,8 @@ def test_parse_multiple_delimited_dates_with_swap_warnings():
17211721
with pytest.raises(
17221722
ValueError,
17231723
match=(
1724-
r'^time data "31/05/2000" doesn\'t match format "%m/%d/%Y", at position 1$'
1724+
r'^time data "31/05/2000" doesn\'t match format "%m/%d/%Y", '
1725+
r"at position 1. You might want to try:"
17251726
),
17261727
):
17271728
pd.to_datetime(["01/01/2000", "31/05/2000", "31/05/2001", "01/02/2000"])

‎pandas/tests/reshape/test_pivot.py

Copy file name to clipboardExpand all lines: pandas/tests/reshape/test_pivot.py
+2-2Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -799,7 +799,7 @@ def test_pivot_with_list_like_values_nans(self, values, method):
799799
def test_pivot_columns_none_raise_error(self):
800800
# GH 30924
801801
df = DataFrame({"col1": ["a", "b", "c"], "col2": [1, 2, 3], "col3": [1, 2, 3]})
802-
msg = r"pivot\(\) missing 1 required argument: 'columns'"
802+
msg = r"pivot\(\) missing 1 required keyword-only argument: 'columns'"
803803
with pytest.raises(TypeError, match=msg):
804804
df.pivot(index="col1", values="col3")
805805

@@ -2513,7 +2513,7 @@ def test_pivot_index_list_values_none_immutable_args(self):
25132513
def test_pivot_columns_not_given(self):
25142514
# GH#48293
25152515
df = DataFrame({"a": [1], "b": 1})
2516-
with pytest.raises(TypeError, match="missing 1 required argument"):
2516+
with pytest.raises(TypeError, match="missing 1 required keyword-only argument"):
25172517
df.pivot()
25182518

25192519
def test_pivot_columns_is_none(self):

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.