Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

DataFrame and DataMatrix column ordering #12

Copy link
Copy link
Closed
@hector13

Description

@hector13
Issue body actions

First, thank you for the pandas package -- it's incredibly useful and well done.

I know that one of the fundamental concepts behind the data structures is that column ordering doesn't matter. And, as long as one only uses pandas' data access/manipulation functions (eg, sum(), ewma(), etc.), this works fine. But often, it's useful to access the underling values in a numpy array for some more complicated data manipulation. Using the values attribute (or values() method for a series) does this, but it's not always obvious what order the values come back in.

For example:

In [1]: dm = DataMatrix(np.arange(2*3).reshape(2,3), index=[1,0], columns=['B', 'A', 'C' ])

In [2]: dm 
Out[2]: 
     B              A              C  
1    0              1              2  
0    3              4              5              

In [3]: df = DataFrame(np.arange(2*3).reshape(2,3), index=[1,0], columns=['B', 'A', 'C' ])

In [4]: df 
Out[4]: 
     A              B              C  
1    1              0              2  
0    4              3              5              

In [5]: df.values
Out[5]: 
array([[1, 0, 2],
       [4, 3, 5]])

In [6]: dm.values
Out[6]: 
array([[0, 1, 2],
       [3, 4, 5]])

DataMatrix seems to respect the passed in ordering of columns, while DataFrame does not. I know this is documented, and not the biggest deal in the world, but does seem to cause quite a bit of confusion for some. Is it possible to have both data types keep the ordering that's passed in? If a user passes in the same column name twice, could this just throw an exception? Something stills need to be done when an operation is performed on two DataFrames (eg, combining them), but instead of reordering in alphabetical order, how about preserving the column ordering from left to right?

Anyway, my bigger concern is actually the following:

In [7]: dm.reindex(columns=['C','B','A']).values
Out[7]: 
array([[2, 0, 1],
       [5, 3, 4]])

In [8]: df.reindex(columns=['C','B','A']).values
Out[8]: 
array([[1, 0, 2],
       [4, 3, 5]])

Regardless of the ordering of the columns after creating a DataFrame/Matrix, a naive users (ie, me) would expect calling reindex and values would return an ndarray with the columns in the same order as was requested. But it looks like this only happens for DataMatrixes (and I'm not even sure that's always guaranteed).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Morty Proxy This is a proxified and sanitized view of the page, visit original site.