-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
BUG: Fix datetime hash to be a bit more sane and start on timedelta #24408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
3c07991
to
d1cfe49
Compare
if ((Py_hash_t)hash == -1) { | ||
return -2; | ||
} | ||
return (Py_hash_t)hash; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the object identity hash. Did pandas need NaTs to be identical, because from a Python point of view it is problematic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the pandas version we return NPY_DATETIME_NAT here. im not sure it makes a big difference bc the main place it would matter is in dict lookups which would fall back to an equality check which would then show up as not-equal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(FWIW pd.NaT is a singleton)
If you're inclined to adapt the tests from pandas-dev/pandas#50960, i think the most relevant one would be something like
|
NumPy unique doesn't use hashing even for objects, could use a set, but I already check for equality and hash equality in the test? |
OK never mind then |
This is a start, it should work for datetime64, I am not attempting to do anything about timedelta yet, because we don't really have the timedeltastruct so it needs more reorganization. I don't like this, but TBH, I don't see that it is better to add a bad hack into pandas than to add the same bad hack into NumPy...
d9d64de
to
0aca9dd
Compare
gh-14622 was picked up and we merged this now, so this is superseded (which is awesome). |
This is a start, it should work for datetime64, I am not attempting to do anything about timedelta yet, because we don't really have the timedeltastruct so it needs more reorganization.
I don't like this, but TBH, I don't see that it is better to add a bad hack into pandas than to add the same bad hack into NumPy...
This is moving the start from Pandas in pandas-dev/pandas#50960 to NumPy. I chose to move it into the datetime.c file, but I don't care about where it lives. We can probably cut down the hash calculation a little, but TBH, I doubt it matters in practice and we can change it later.
Timedelta is proving more annoying. We need to cut corners even more heavily probably because a huge number of casts/comparisons that are allowed probably shouldn't be allowed to begin with.
CC @jbrockmendel