Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

BUG: NaN level values in stack() and unstack() #9406

Copy link
Copy link
Closed
@seth-p

Description

@seth-p
Issue body actions

As described in #9023 (comment), the way DataFrame.stack() and DataFrame.unstack() treat NaN indices is rather odd/inconsistent. Despite passing test_unstack_nan_index() in test_frame.py, I observe the following (this is from 0.15.2, but I think it's unchanged in the current master for 0.16.0):

In [140]: df = pd.DataFrame(np.arange(4).reshape(2, 2),
                            columns=pd.MultiIndex.from_tuples([('A','a'), ('B', 'b')],
                                                              names=['Upper', 'Lower']),
                            index=Index([0, 1], name='Num'), dtype=np.float64)

In [141]: df_nan = pd.DataFrame(np.arange(4).reshape(2, 2),
                                columns=pd.MultiIndex.from_tuples([('A',np.nan), ('B', 'b')],
                                                                  names=['Upper', 'Lower']),
                                index=Index([0, 1], name='Num'), dtype=np.float64)

In [148]: df
Out[148]:
Upper  A  B
Lower  a  b
Num
0      0  1
1      2  3

In [149]: df.stack()
Out[149]:
Upper       A   B
Num Lower
0   a       0 NaN
    b     NaN   1
1   a       2 NaN
    b     NaN   3

In [150]: df.T.unstack().T
Out[150]:
Upper       A   B
Num Lower
0   a       0 NaN
    b     NaN   1
1   a       2 NaN
    b     NaN   3

In [151]: df_nan
Out[151]:
Upper   A  B
Lower NaN  b
Num
0       0  1
1       2  3

In [152]: df_nan.stack()
Out[152]:
Upper      A  B
Num Lower
0   NaN    0  1
    b      0  1
1   NaN    2  3
    b      2  3

In [153]: df_nan.T.unstack().T
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-153-edbcaeb64f64> in <module>()
----> 1 df_nan.T.unstack().T

C:\Python34\lib\site-packages\pandas\core\frame.py in unstack(self, level)
   3486         """
   3487         from pandas.core.reshape import unstack
-> 3488         return unstack(self, level)
   3489
   3490     #----------------------------------------------------------------------

C:\Python34\lib\site-packages\pandas\core\reshape.py in unstack(obj, level)
    439     if isinstance(obj, DataFrame):
    440         if isinstance(obj.index, MultiIndex):
--> 441             return _unstack_frame(obj, level)
    442         else:
    443             return obj.T.stack(dropna=False)

C:\Python34\lib\site-packages\pandas\core\reshape.py in _unstack_frame(obj, level)
    479     else:
    480         unstacker = _Unstacker(obj.values, obj.index, level=level,
--> 481                                value_columns=obj.columns)
    482         return unstacker.get_result()
    483

C:\Python34\lib\site-packages\pandas\core\reshape.py in __init__(self, values, index, level, value_columns)
    101
    102         self._make_sorted_values_labels()
--> 103         self._make_selectors()
    104
    105     def _make_sorted_values_labels(self):

C:\Python34\lib\site-packages\pandas\core\reshape.py in _make_selectors(self)
    143
    144         if mask.sum() < len(self.index):
--> 145             raise ValueError('Index contains duplicate entries, '
    146                              'cannot reshape')
    147

ValueError: Index contains duplicate entries, cannot reshape

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Morty Proxy This is a proxified and sanitized view of the page, visit original site.