Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

BUG : add __hash__ to AffineBase #2909

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

tacaswell
Copy link
Member

python3 requires that if a class defines __eq__ then in must define
__hash__ (so that if a == b then hash(a) == hash(b)). Define 64bit
hash by pulling the first 8 characters out of a sha1 hash of the matrix

closes #2828

python3 requires that if a class defines __eq__ then in must define
__hash__ (so that if a == b then hash(a) == hash(b)).  Define 64bit
hash by pulling the first 8 characters out of a sha1 hash of the matrix

closes matplotlib#2828
@tacaswell tacaswell added this to the v1.4.0 milestone Mar 19, 2014
# hash by pulling the first 8 characters out of a sha1 hash of the matrix
# see issue #2828
def __hash__(self):
return int(hashlib.sha1(self.get_matrix()).hexdigest()[:8], 16)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a correct approach. But I wonder if it's faster to do:

hash(self.get_matrix().tostring())

(Don't know if it is, just though the code would spend more time in C that way...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed

In [51]: tt = np.random.rand(3, 3)

In [52]: %timeit int(hashlib.sha1(tt).hexdigest()[:8], 16)
1000000 loops, best of 3: 1.58 µs per loop

In [53]: %timeit hash(tt.tostring())
1000000 loops, best of 3: 270 ns per loop

@mdboom
Copy link
Member

mdboom commented Mar 19, 2014

I think this is the right approach, but it is subtlely different than the old Python 2.x behavior -- which is that these objects were hashed on their identity (their pointer in memory essentially), and not on their content. I think this is probably more correct, but we should verify there aren't unintended consequences (though beyond running the test suite, I'm not sure how to anticipate any of those consequences).

@mdboom
Copy link
Member

mdboom commented Mar 19, 2014

@pelson: The PS backend maintains a dictionary where the keys are (path, transform) and the values are a chunk of PS content to track all of the clipping paths that have been seen. If the same clipping path is used twice, it will reuse the same Postscript content, rather than rewriting out a potentially long path each time.

On Python 2, objects are always hashable, based on their memory pointer. On Python 3, objects are hashable unless they customize __eq__, in which case you must also define __hash__ (which makes sense, when you think about it, because the two should match). It was good enough (on Python 2) that the transform hash was based on identity, since they are by convention immutable (though in reality they are mutable by virtue of being based on Numpy arrays). This fix has the potential to more correct, I think, in the face of unintended mutation, as well as possibly merging two different transforms with the same values and thus reducing file sizes (though I doubt the latter will ever happen in practice).

@tacaswell
Copy link
Member Author

However, this opens up an issue that (existed before) that this could be badly abused else where in the code I don't know the transform stack well, but skimming the code Affine2D looks like it's purpose is to be mutable .

It might be better to fix the problem where it is and use (path, tranfrom_hash) as the key instead.

@mdboom
Copy link
Member

mdboom commented Mar 19, 2014

I think with this PR merged, the net effect will be that (path, transform) as a key will be equivalent to (path, transform_hash) as a key anyway.

@tacaswell
Copy link
Member Author

However it leaves with hash(trans) which is not stable over the course of an objects life as it violates the data-model

If a class does not define an eq() method it should not define a hash() operation either; if it defines eq() but not hash(), its instances will not be usable as items in hashable collections. If a class defines mutable objects and implements an eq() method, it should not implement hash(), since the implementation of hashable collections requires that a key’s hash value is immutable (if the object’s hash value changes, it will be in the wrong hash bucket).

With in this context it's ok (as the transforms should probably not be changing on the way out to disk) but is probably not good.

@pelson
Copy link
Member

pelson commented Mar 19, 2014

I'm on my mobile so can't see everything within this PR, but as the transform is mutable it should only implement hash based on the object id and not its actual content.

@tacaswell
Copy link
Member Author

@pelson That introduces an inconsistency a different way, in that a == b =/= hash(a) == hash(b) which it should.

@pelson
Copy link
Member

pelson commented Mar 25, 2014

@pelson That introduces an inconsistency a different way, in that a == b =/= hash(a) == hash(b) which it should.

Sorry, I'm not following your notation - if I'm reading this correctly then you are raising concern that an object which is equal has a different hash? If so, that is a perfectly valid definition - anything else would mean we are computing hashes which are mutable, which is asking for trouble. This is also the existing behaviour in python2.

@tacaswell
Copy link
Member Author

The inconsistency of having equal objects with different hashes is the reason that py3k requires objects that define __eq__ also define a __hash__, we should not work back against that.

@tacaswell
Copy link
Member Author

I am closing this in favor of #2927

@tacaswell tacaswell closed this Mar 25, 2014
@tacaswell tacaswell deleted the add_affine_hash branch March 25, 2014 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PS backend fails to save polar plot
4 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.