Description
First, thank you for the pandas package -- it's incredibly useful and well done.
I know that one of the fundamental concepts behind the data structures is that column ordering doesn't matter. And, as long as one only uses pandas' data access/manipulation functions (eg, sum(), ewma(), etc.), this works fine. But often, it's useful to access the underling values in a numpy array for some more complicated data manipulation. Using the values attribute (or values() method for a series) does this, but it's not always obvious what order the values come back in.
For example:
In [1]: dm = DataMatrix(np.arange(2*3).reshape(2,3), index=[1,0], columns=['B', 'A', 'C' ]) In [2]: dm Out[2]: B A C 1 0 1 2 0 3 4 5 In [3]: df = DataFrame(np.arange(2*3).reshape(2,3), index=[1,0], columns=['B', 'A', 'C' ]) In [4]: df Out[4]: A B C 1 1 0 2 0 4 3 5 In [5]: df.values Out[5]: array([[1, 0, 2], [4, 3, 5]]) In [6]: dm.values Out[6]: array([[0, 1, 2], [3, 4, 5]])
DataMatrix seems to respect the passed in ordering of columns, while DataFrame does not. I know this is documented, and not the biggest deal in the world, but does seem to cause quite a bit of confusion for some. Is it possible to have both data types keep the ordering that's passed in? If a user passes in the same column name twice, could this just throw an exception? Something stills need to be done when an operation is performed on two DataFrames (eg, combining them), but instead of reordering in alphabetical order, how about preserving the column ordering from left to right?
Anyway, my bigger concern is actually the following:
In [7]: dm.reindex(columns=['C','B','A']).values Out[7]: array([[2, 0, 1], [5, 3, 4]]) In [8]: df.reindex(columns=['C','B','A']).values Out[8]: array([[1, 0, 2], [4, 3, 5]])
Regardless of the ordering of the columns after creating a DataFrame/Matrix, a naive users (ie, me) would expect calling reindex and values would return an ndarray with the columns in the same order as was requested. But it looks like this only happens for DataMatrixes (and I'm not even sure that's always guaranteed).