Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Fix underflow issues due to float precision #19472

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Feb 18, 2021
Merged

Fix underflow issues due to float precision #19472

merged 13 commits into from
Feb 18, 2021

Conversation

dkobak
Copy link
Contributor

@dkobak dkobak commented Feb 16, 2021

Fixes #19471 by replacing floats with doubles in the Cython code for _binary_search_perplexity().

Copy link
Member

@TomDLT TomDLT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the report and fix, it looks reasonable to me, especially as P also uses a (temporary) double precision. I guess we can also use double precision in the constants at the top of the file, although it might not change much for Cython.

Can you also add a bugfix entry in doc/whats_new/v1.0.rst?

@dkobak
Copy link
Contributor Author

dkobak commented Feb 16, 2021

Thanks @TomDLT. I added the description to v1.0.rst. I am not familiar with the formatting conventions there, but tried to copy the existing formatting.

I was not aware that the next release will be 1.0! Wow! Huge milestone. When is this release planned?

doc/whats_new/v1.0.rst Outdated Show resolved Hide resolved
@TomDLT
Copy link
Member

TomDLT commented Feb 16, 2021

I was not aware that the next release will be 1.0! Wow! Huge milestone. When is this release planned?

Don't know :)

dkobak and others added 2 commits February 17, 2021 17:10
Co-authored-by: Tom Dupré la Tour <tom.dupre-la-tour@m4x.org>
Copy link
Member

@TomDLT TomDLT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can also use double precision in the constants at the top of the file, although it might not change much for Cython.

Do you know if it makes a difference ?

doc/whats_new/v1.0.rst Outdated Show resolved Hide resolved
doc/whats_new/v1.0.rst Outdated Show resolved Hide resolved
@dkobak
Copy link
Contributor Author

dkobak commented Feb 17, 2021

I guess we can also use double precision in the constants at the top of the file, although it might not change much for Cython.

I don't think it will change anything. These constants are only used in expressions like this sum_Pi = EPSILON_DBL and since sum_Pi is now double, it does not matter that EPSILON_DBL is defined as float.

It won't hurt to change them to double either, so if you prefer I can do it, but I think it should have no effect.

@TomDLT
Copy link
Member

TomDLT commented Feb 17, 2021

That's what I thought, I have no preference.

@dkobak
Copy link
Contributor Author

dkobak commented Feb 17, 2021

Thanks a lot for your help! Do I need to do anything here anymore? Like request further approvals or anything? This is my first PR to sklearn so I am not very familiar with the process.

@TomDLT
Copy link
Member

TomDLT commented Feb 17, 2021

Congrats for your first PR ! Also thanks for taking the time to improve the scikit-learn's implementation of tSNE. It might not be the best implementation, but it is likely used by many, so any improvement is very welcome !

Before a PR can be merged, it needs to be approved by two core developers

(from here)

You just need to wait a moment for other core developers to approve this PR. This might take some time since most developers have limited time to do code reviews. If it takes too much time, you can try to ping specific developers.

@TomDLT
Copy link
Member

TomDLT commented Feb 17, 2021

Also, ideally your PR should include a unit test, maybe based on the example you gave in the original issue.

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. I agree with @TomDLT that adding a test would be nice (in sklearn/manifold/tests/test_t_sne.py).

doc/whats_new/v1.0.rst Outdated Show resolved Hide resolved
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
@dkobak
Copy link
Contributor Author

dkobak commented Feb 18, 2021

OK. Let me try to add a test.

@jeremiedbb
Copy link
Member

could you move it in a dedicated test, maybe named test_binary_search_underflow ? I think it's better to have tests dedicated to test one thing. Also please reference this PR number in the comment.

@dkobak
Copy link
Contributor Author

dkobak commented Feb 18, 2021

@TomDLT @jeremiedbb Test added and all checks passed!

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. I confirm that the test fails on main. Thanks @dkobak

@jeremiedbb jeremiedbb merged commit e9c6fca into scikit-learn:main Feb 18, 2021
@glemaitre glemaitre mentioned this pull request Apr 22, 2021
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Underflow issues in the Cython perplexity search code used in t-SNE
3 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.