Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

BUG: datetime ExtensionDtype do not work with DataFrame #35767

Copy link
Copy link
Closed
@marco-neumann-by

Description

@marco-neumann-by
Issue body actions
  • I have checked that this issue has not already been reported. (at least I couldn't find one)

  • I have confirmed this bug exists on the latest version of pandas. (1.1.0)

  • (optional) I have confirmed this bug exists on the master branch of pandas. (934e9f840ebd2e8b5a5181b19a23e033bd3985a5)


Code Sample, a copy-pastable example

This is some high-level example that lead to the investion. It relies on rle-array (commit dfa79295a580d533ee9d2ea901e8808496dbcdc9 was used), because the pandas-provided DatetimeArray uses a NumPy dtype or DatetimeTZDtype. Both cases somewhat work (see "Problem description").

import pandas as pd
from rle_array import RLEArray

array = RLEArray._from_sequence([], dtype="datetime64[ns]")
df = pd.DataFrame({"x": array})
Traceback (most recent call last):
  File "bug.py", line 5, in <module>
    pd.DataFrame({"x": array})
  File ".../lib/python3.8/site-packages/pandas/core/frame.py", line 467, in __init__
    mgr = init_dict(data, index, columns, dtype=dtype)
  File ".../lib/python3.8/site-packages/pandas/core/internals/construction.py", line 283, in init_dict
    return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
  File ".../lib/python3.8/site-packages/pandas/core/internals/construction.py", line 93, in arrays_to_mgr
    return create_block_manager_from_arrays(arrays, arr_names, axes)
  File ".../lib/python3.8/site-packages/pandas/core/internals/managers.py", line 1650, in create_block_manager_from_arrays
    blocks = form_blocks(arrays, names, axes)
  File ".../lib/python3.8/site-packages/pandas/core/internals/managers.py", line 1703, in form_blocks
    block_type = get_block_type(v)
  File ".../lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 2672, in get_block_type
    assert not is_datetime64tz_dtype(values.dtype)
AssertionError

Problem description

See here:

def get_block_type(values, dtype=None):
"""
Find the appropriate Block subclass to use for the given values and dtype.
Parameters
----------
values : ndarray-like
dtype : numpy or pandas dtype
Returns
-------
cls : class, subclass of Block
"""
dtype = dtype or values.dtype
vtype = dtype.type
if is_sparse(dtype):
# Need this first(ish) so that Sparse[datetime] is sparse
cls = ExtensionBlock
elif is_categorical_dtype(values.dtype):
cls = CategoricalBlock
elif issubclass(vtype, np.datetime64):
assert not is_datetime64tz_dtype(values.dtype)
cls = DatetimeBlock
elif is_datetime64tz_dtype(values.dtype):
cls = DatetimeTZBlock
elif is_interval_dtype(dtype) or is_period_dtype(dtype):
cls = ObjectValuesExtensionBlock
elif is_extension_array_dtype(values.dtype):
cls = ExtensionBlock
elif issubclass(vtype, np.floating):
cls = FloatBlock
elif issubclass(vtype, np.timedelta64):
assert issubclass(vtype, np.integer)
cls = TimeDeltaBlock
elif issubclass(vtype, np.complexfloating):
cls = ComplexBlock
elif issubclass(vtype, np.integer):
cls = IntBlock
elif dtype == np.bool_:
cls = BoolBlock
else:
cls = ObjectBlock
return cls

datetime (and also interval) types are checked BEFORE extension types which means that extension datetime types never end up in ExtensionBlocks. The latter one would be useful if:

  • the datetime objects is not compatible with NumPy
  • the data should not be converted to to NumPy (e.g. due to compression, like in the rle-array case)

Furthermore the invariant issubclass(vtype, np.datetime64) => not is_datetime64tz_dtype(values.dtype) does NOT hold for all extension dtypes, at least not under the current implementation of is_datetime64tz_dtype:

if isinstance(arr_or_dtype, ExtensionDtype):
# GH#33400 fastpath for dtype object
return arr_or_dtype.kind == "M"
if arr_or_dtype is None:
return False
return DatetimeTZDtype.is_dtype(arr_or_dtype)

Expected Output

The code example works and df._data shows that the data ends up in an ExtensionBlock.

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit           : d9fff2792bf16178d4e450fe7384244e50635733
python           : 3.8.5.final.0
python-bits      : 64
OS               : Darwin
OS-release       : 19.6.0
Version          : Darwin Kernel Version 19.6.0: Thu Jun 18 20:49:00 PDT 2020; root:xnu-6153.141.1~1/RELEASE_X86_64
machine          : x86_64
processor        : i386
byteorder        : little
LC_ALL           : en_US.UTF-8
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.1.0
numpy            : 1.19.1
pytz             : 2020.1
dateutil         : 2.8.1
pip              : 20.1.1
setuptools       : 47.1.0
Cython           : None
pytest           : 6.0.1
hypothesis       : None
sphinx           : 3.2.0
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 2.11.2
IPython          : 7.16.1
pandas_datareader: None
bs4              : None
bottleneck       : None
fsspec           : None
fastparquet      : None
gcsfs            : None
matplotlib       : None
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : None
pytables         : None
pyxlsb           : None
s3fs             : None
scipy            : None
sqlalchemy       : None
tables           : None
tabulate         : None
xarray           : None
xlrd             : None
xlwt             : None
numba            : 0.50.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugClosing CandidateMay be closeable, needs more eyeballsMay be closeable, needs more eyeballsExtensionArrayExtending pandas with custom dtypes or arrays.Extending pandas with custom dtypes or arrays.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Morty Proxy This is a proxified and sanitized view of the page, visit original site.