Description
-
I have checked that this issue has not already been reported. (at least I couldn't find one)
-
I have confirmed this bug exists on the latest version of pandas. (1.1.0)
-
(optional) I have confirmed this bug exists on the master branch of pandas. (
934e9f840ebd2e8b5a5181b19a23e033bd3985a5
)
Code Sample, a copy-pastable example
This is some high-level example that lead to the investion. It relies on rle-array
(commit dfa79295a580d533ee9d2ea901e8808496dbcdc9
was used), because the pandas-provided DatetimeArray
uses a NumPy dtype or DatetimeTZDtype
. Both cases somewhat work (see "Problem description").
import pandas as pd
from rle_array import RLEArray
array = RLEArray._from_sequence([], dtype="datetime64[ns]")
df = pd.DataFrame({"x": array})
Traceback (most recent call last):
File "bug.py", line 5, in <module>
pd.DataFrame({"x": array})
File ".../lib/python3.8/site-packages/pandas/core/frame.py", line 467, in __init__
mgr = init_dict(data, index, columns, dtype=dtype)
File ".../lib/python3.8/site-packages/pandas/core/internals/construction.py", line 283, in init_dict
return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
File ".../lib/python3.8/site-packages/pandas/core/internals/construction.py", line 93, in arrays_to_mgr
return create_block_manager_from_arrays(arrays, arr_names, axes)
File ".../lib/python3.8/site-packages/pandas/core/internals/managers.py", line 1650, in create_block_manager_from_arrays
blocks = form_blocks(arrays, names, axes)
File ".../lib/python3.8/site-packages/pandas/core/internals/managers.py", line 1703, in form_blocks
block_type = get_block_type(v)
File ".../lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 2672, in get_block_type
assert not is_datetime64tz_dtype(values.dtype)
AssertionError
Problem description
See here:
pandas/pandas/core/internals/blocks.py
Lines 2647 to 2690 in 934e9f8
datetime (and also interval) types are checked BEFORE extension types which means that extension datetime types never end up in ExtensionBlock
s. The latter one would be useful if:
- the datetime objects is not compatible with NumPy
- the data should not be converted to to NumPy (e.g. due to compression, like in the
rle-array
case)
Furthermore the invariant issubclass(vtype, np.datetime64) => not is_datetime64tz_dtype(values.dtype)
does NOT hold for all extension dtypes, at least not under the current implementation of is_datetime64tz_dtype
:
pandas/pandas/core/dtypes/common.py
Lines 415 to 421 in 934e9f8
Expected Output
The code example works and df._data
shows that the data ends up in an ExtensionBlock
.
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit : d9fff2792bf16178d4e450fe7384244e50635733
python : 3.8.5.final.0
python-bits : 64
OS : Darwin
OS-release : 19.6.0
Version : Darwin Kernel Version 19.6.0: Thu Jun 18 20:49:00 PDT 2020; root:xnu-6153.141.1~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.1.0
numpy : 1.19.1
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 47.1.0
Cython : None
pytest : 6.0.1
hypothesis : None
sphinx : 3.2.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.16.1
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : 0.50.1