Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

BUG: Fix datetime hash to be a bit more sane and start on timedelta #24408

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

seberg
Copy link
Member

@seberg seberg commented Aug 13, 2023

This is a start, it should work for datetime64, I am not attempting to do anything about timedelta yet, because we don't really have the timedeltastruct so it needs more reorganization.

I don't like this, but TBH, I don't see that it is better to add a bad hack into pandas than to add the same bad hack into NumPy...


This is moving the start from Pandas in pandas-dev/pandas#50960 to NumPy. I chose to move it into the datetime.c file, but I don't care about where it lives. We can probably cut down the hash calculation a little, but TBH, I doubt it matters in practice and we can change it later.

Timedelta is proving more annoying. We need to cut corners even more heavily probably because a huge number of casts/comparisons that are allowed probably shouldn't be allowed to begin with.

CC @jbrockmendel

@seberg seberg force-pushed the datetime-insane-hash branch from 3c07991 to d1cfe49 Compare August 13, 2023 12:22
if ((Py_hash_t)hash == -1) {
return -2;
}
return (Py_hash_t)hash;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the object identity hash. Did pandas need NaTs to be identical, because from a Python point of view it is problematic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the pandas version we return NPY_DATETIME_NAT here. im not sure it makes a big difference bc the main place it would matter is in dict lookups which would fall back to an equality check which would then show up as not-equal

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(FWIW pd.NaT is a singleton)

@jbrockmendel
Copy link
Contributor

If you're inclined to adapt the tests from pandas-dev/pandas#50960, i think the most relevant one would be something like

def test_unique_datetime64_mismatched_unit(self):
    arr = np.array(
        [np.datetime64(1, "ms"), np.datetime64(1000, "us")], dtype=object
    )
    res = np.unique(arr)
    np.testing.assert_array_equal(res, arr[1:])

@seberg
Copy link
Member Author

seberg commented Aug 15, 2023

NumPy unique doesn't use hashing even for objects, could use a set, but I already check for equality and hash equality in the test?

@jbrockmendel
Copy link
Contributor

NumPy unique doesn't use hashing even for objects, could use a set, but I already check for equality and hash equality in the test?

OK never mind then

seberg added 2 commits August 15, 2023 18:31
This is a start, it should work for datetime64, I am not attempting
to do anything about timedelta yet, because we don't really have the
timedeltastruct so it needs more reorganization.

I don't like this, but TBH, I don't see that it is better to add
a bad hack into pandas than to add the same bad hack into NumPy...
@seberg seberg force-pushed the datetime-insane-hash branch from d9d64de to 0aca9dd Compare August 15, 2023 16:31
@seberg
Copy link
Member Author

seberg commented Nov 22, 2024

gh-14622 was picked up and we merged this now, so this is superseded (which is awesome).

@seberg seberg closed this Nov 22, 2024
@seberg seberg deleted the datetime-insane-hash branch November 22, 2024 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.