Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
import sys
print(f"Pandas version: {pd.__version__}")
print(f"Python version: {sys.version}")
df = pd.DataFrame({'day': ["31-May-2025","01-Jun-2025","02-Jun-2025"]})
pd.to_datetime(df['day'])
Issue Description
gives
'Pandas version: 2.2.3'
'Python version: 3.11.11 (main, Dec 4 2024, 08:55:07) [GCC 11.4.0]'
ValueError: time data "01-Jun-2025" doesn't match format "%d-%B-%Y", at position 1. You might want to try:
- passing format
if your strings have a consistent format;
- passing format='ISO8601'
if your strings are all ISO8601 but not necessarily in exactly the same format;
- passing format='mixed'
, and the format will be inferred for each element individually. You might want to use dayfirst
alongside this.
File , line 2
1 df = pd.DataFrame({'day': ["31-May-2025","01-Jun-2025","02-Jun-2025"]})
----> 2 pd.to_datetime(df['day'])
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/pandas/core/tools/datetimes.py:1067, in to_datetime(arg, errors, dayfirst, yearfirst, utc, format, exact, unit, infer_datetime_format, origin, cache)
1065 result = arg.map(cache_array)
1066 else:
-> 1067 values = convert_listlike(arg._values, format)
1068 result = arg._constructor(values, index=arg.index, name=arg.name)
1069 elif isinstance(arg, (ABCDataFrame, abc.MutableMapping)):
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/pandas/core/tools/datetimes.py:433, in _convert_listlike_datetimes(arg, format, name, utc, unit, errors, dayfirst, yearfirst, exact)
431 # format
could be inferred, or user didn't ask for mixed-format parsing.
432 if format is not None and format != "mixed":
--> 433 return _array_strptime_with_fallback(arg, name, utc, format, exact, errors)
435 result, tz_parsed = objects_to_datetime64(
436 arg,
437 dayfirst=dayfirst,
(...)
441 allow_object=True,
Expected Behavior
it parses happily and correctly with no exception
interestingly it's having the transition end of may. start of June. Starting with 01-Jun-2025 works, ending with 31-May-2025 works,
dateparser.parse is happy
I'm guessing it infers a full month from the May when in fact it is a three character abbreviation.
Installed Versions
running in databricks notebook - checked in a separate version of python locally, with pandas 2.2.1
'Pandas version: 2.2.3'
'Python version: 3.11.11 (main, Dec 4 2024, 08:55:07) [GCC 11.4.0]' for the notebook.
pd.show_versions() doesn't return anything
locally
Pandas version: 2.2.1
Python version: 3.12.2 (main, Mar 25 2024, 11:48:28) [Clang 15.0.0 (clang-1500.3.9.4)]
and pd.show_versions() gives.
FileNotFoundError Traceback (most recent call last)
File /Users/J.Drummond/Documents/wip/python/truth_soc_1.py:2
1 # %%
----> 2 pd.show_versions()
File ~/Documents/wip/python/.venv/lib/python3.12/site-packages/pandas/util/_print_versions.py:141, in show_versions(as_json)
104 """
105 Provide useful information, important for bug reports.
106
(...)
138 ...
139 """
140 sys_info = _get_sys_info()
--> 141 deps = _get_dependency_info()
143 if as_json:
144 j = {"system": sys_info, "dependencies": deps}
File ~/Documents/wip/python/.venv/lib/python3.12/site-packages/pandas/util/_print_versions.py:98, in _get_dependency_info()
96 result: dict[str, JSONSerializable] = {}
97 for modname in deps:
---> 98 mod = import_optional_dependency(modname, errors="ignore")
99 result[modname] = get_version(mod) if mod else None
100 return result
...