Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Remove md5 usage to prevent issues on FIPS enabled systems (closes #29603) #29608

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 12, 2025

Conversation

martinky24
Copy link
Contributor

@martinky24 martinky24 commented Feb 12, 2025

PR summary

If you use a FIPS (Federal Information Processing Standards) compliant Python install, then you cannot use pyplot.show to display a plot with plt.rcParams['text.usetex'] = True due to a hashlib.md5 call in texmanager.py.

Instead of just fixing the one instance I ran into, I changed the other usages of hashlib.md5 in the codebase to nip those in the bud.

Per https://docs.python.org/3/library/hashlib.html#hash-algorithms, note that sha256 is in hashlib.algorithms_guaranteed (while md5 might not be), so it seems like a reasonable drop-in-place replacement.

Constructors for hash algorithms that are always present in this module are sha1(), sha224(), sha256(), sha384(), sha512(), sha3_224(), sha3_256(), sha3_384(), sha3_512(), shake_128(), shake_256(), blake2b(), and blake2s(). md5() is normally available as well, though it may be missing or blocked if you are using a rare “FIPS compliant” build of Python. These correspond to algorithms_guaranteed.

I think that these changes should be entirely transparent to end-users (unless they're on FIPS-enabled systems) and are merely implementation details, and thus do not require documentation changes. Correct me if this understanding is incorrect.

PR checklist

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for opening your first PR into Matplotlib!

If you have not heard from us in a week or so, please leave a new comment below and that should bring it to our attention. Most of our reviewers are volunteers and sometimes things fall through the cracks.

You can also join us on gitter for real-time discussion.

For details on testing, writing docs, and our review process, please see the developer guide

We strive to be a welcoming and open project. Please follow our Code of Conduct.

@tacaswell tacaswell added this to the v3.10.1 milestone Feb 12, 2025
@jklymak
Copy link
Member

jklymak commented Feb 12, 2025

I'm not at all an expert but would these have passed FIPS if usedforsecurity was set to False, regardless of the algorithm? https://docs.python.org/3/library/hashlib.html

@martinky24
Copy link
Contributor Author

martinky24 commented Feb 12, 2025

I'm not at all an expert but would these have passed FIPS if usedforsecurity was set to False, regardless of the algorithm? https://docs.python.org/3/library/hashlib.html

Great question. I primarily went this route following the precedent from #18193. If it worked there, I figure it should work here.

It was a few years ago, but I did run across a static compliance scanner that wasn't smart enough to respect (or perhaps intentionally didn't respect?) the usedforsecurity flag. I don't recall any specific details, it was a few years ago.

The other aspect is that I don't think that any of the instances I changed used md5 for any reason other than blind convenience. As such, unless there is a performance concern, I'd say that in these use cases sha256 is just as convenient. Correct me if I am wrong here, perhaps md5 may have been chosen intentionally over other algorithms in one of these use cases?

In any event I am happy to discuss or look into this further if there's a desire.

@jklymak
Copy link
Member

jklymak commented Feb 12, 2025

I'm sure it was indeed chosen for convenience. But the point of labelling as not for security is to prevent whack a mole with security checkers for something that is just being done to create a convenient label.

…ations are not used for security-based purposes
@martinky24
Copy link
Contributor Author

martinky24 commented Feb 12, 2025

That's reasonable. Regardless of the algorithm used it can be added, which is helpful in-code documentation. I have added a follow-up commit.

@martinky24
Copy link
Contributor Author

I am not an expert in Github / Azure CI pipelining, but I do not think any of the failures are legitimate issues with my changes?

Copy link
Member

@jklymak jklymak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe @anntzer had a minute for a second review as I think he has some expertise on these parts of the code. But this seems fine to me.

Did you test if FIPS is happy with the flag even with md5 ?

@tacaswell
Copy link
Member

I'm happy to merge this as is. A quick search suggests that getting a GHA running in FIPS mode is not trivial.

The conventional wisdom is "md5 is insecure but fast" does not seem to hold up:

In [6]: %timeit hashlib.md5(data, usedforsecurity=False).hexdigest()
46.9 μs ± 69.9 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [7]: %timeit hashlib.sha256(data, usedforsecurity=False).hexdigest()
20.3 μs ± 19 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

where data is len 48407 (a small jpeg) and

In [12]: %timeit hashlib.sha256(b'some short text', usedforsecurity=False).hexdigest()
246 ns ± 1.09 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [13]: %timeit hashlib.md5(b'some short text', usedforsecurity=False).hexdigest()
259 ns ± 0.872 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

so I see no reason to not always prefer sha256

@tacaswell tacaswell merged commit 2d5e503 into matplotlib:main Feb 12, 2025
41 checks passed
meeseeksmachine pushed a commit to meeseeksmachine/matplotlib that referenced this pull request Feb 12, 2025
Copy link

lumberbot-app bot commented Feb 12, 2025

Something went wrong ... Please have a look at my logs.

@martinky24
Copy link
Contributor Author

For the sake of posterity:

Did you test if FIPS is happy with the flag even with md5 ?

In a quick test in my FIPS environment, FIPS is happy with just usedforsecurity=False

Although I agree with the decision to change the algorithm anyway.

Thanks all!

@tacaswell
Copy link
Member

Congratulations on your first merged Maptlotlib PR @martinky24 🎉 Thank you for reporting and fixing this issue and I hope we hear from you again!

@jklymak
Copy link
Member

jklymak commented Feb 12, 2025

In a quick test in my FIPS environment, FIPS is happy with just usedforsecurity=False

Thats great - so we have somewhat future proofed ourselves against future changes to what is considered "secure"

@ksunden ksunden mentioned this pull request Mar 3, 2025
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Setting text.usetex=True in pyplot.rcParams Raises FIPS Compliance Errors
3 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.