Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
The row and column indexing mechanism of your dataframe is inefficient, leading to errors and unnecessary time consumption for users. When two dataframes are merged or concated horizontally or vertically, it can cause index duplication. If iterating the index in a for
loop, the operation will be repeated twice in one iteration, which is a typical scenario that leads to calculation errors. For example,
df = pd.concat([df1, df2]).drop_duplicates('title')
df.reset_index(drop=True, inplace=True) # this expression must be included every time, otherwise duplicate indexes will cause loop iteration errors.
df['name'] = None
for idx, row in df.iterrows():
name_list = ['mike', 'jake', 'cook']
df.at[idx, 'name'] = ",".join(name_list)
If there is no expression df.reset_index(drop=True, inplace=True)
, this cell will have two of the name_list instead of one written in the code, (Pdb) p df.at[idx, 'name'].index Index([1, 1], dtype='int64')
.
So I hope that when the rows or columns of the dataframe change, you can automatically maintain the index as an internal mechanism, just like C++'s vectors or arrays. After deletion and removal, the index or iterator is automatically maintained as a continuous number, and users do not manage this. This is also competitor analysis and benchmarking. Hope for improvement. Thank you.
Feature Description
n/a
Alternative Solutions
n/a
Additional Context
No response