Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

MAINT Use memoryviews in _random.pyx #25780

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

2357juan
Copy link
Contributor

@2357juan 2357juan commented Mar 8, 2023

Reference Issues/PRs

Fixes sklearn/utils/_random.pyx from #25484

What does this implement/fix? Explain your changes.

  • Replaces cnp.ndarray with memory views in sklearn/utils/_random.pyx

Any other comments?

The output needed to be casted to an numpy array as tests (for example test_svm.py) call .astype on the return of the modified functions.

I also tried to reason as to why cnp.int_t and int are the types as values are intuitively indices. The indices can be between 0<= index < inf. Should we be using an unsigned long maybe? So, a cnp.uint64_t and np.uint64 or maybe a uint32.

Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once the call to cnp.import_array has been removed.

@thomasjpfan
Copy link
Member

LGTM once the call to cnp.import_array has been removed.

@jjerphan I'm always concerned with removing cnp.import_array when cnp is used for anything: #17054

For _random.pyx, we are still using cnp for data types. Is it still safe to remove cnp.import_array?

@jjerphan
Copy link
Member

jjerphan commented Mar 8, 2023

It is true that the safeness of the code after the removal of this call is uncertain.

I think that it is safe if we only are relying on cnp for types, yet I would like it to be sure ; but I think this approach is prone to introduce errors.

@da-woods: do you know if removing the call to cnp.import_array is safe? Thank you for your guidance!

@thomasjpfan
Copy link
Member

Thinking it over, I prefer to always include cnp.import_array() when cnp is cimported. This way, we do not have to look through the whole file and decide "is it safe not to call cnp.import_array()".

Secondly, if we decide to use NumPy's C-API which requires cnp.import_array() in the future, we could forget to include it and run into issues described in #17054.

@2357juan
Copy link
Contributor Author

2357juan commented Mar 9, 2023

It is true that the safeness of the code after the removal of this call is uncertain.

I think that it is safe if we only are relying on cnp for types, yet I would like it to be sure ; but I think this approach is prone to introduce errors.

@da-woods: do you know if removing the call to cnp.import_array is safe? Thank you for your guidance!

dawoods has covered this.

His recommendation is to include it. I think we are all in agreement that it should include the import.

Copy link
Contributor

@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@OmarManzoor
Copy link
Contributor

@jjerphan I think auto merge did not work as expected here.

@jjerphan
Copy link
Member

We need another maintainer to approve this PR so that it gets merged.

@OmarManzoor
Copy link
Contributor

We need another maintainer to approve this PR so that it gets merged.

Oh I see! That is one of the requirements.

@jjerphan jjerphan added the Quick Review For PRs that are quick to review label Mar 10, 2023
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @2357juan ! I have one nit about using [::1] because we are sure that the data is contiguous. Otherwise this looks good!

@@ -78,7 +78,7 @@ cpdef _sample_without_replacement_with_tracking_selection(

cdef cnp.int_t i
cdef cnp.int_t j
cdef cnp.ndarray[cnp.int_t, ndim=1] out = np.empty((n_samples, ), dtype=int)
cdef cnp.int_t[:] out = np.empty((n_samples, ), dtype=int)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are sure that the data is contiguous:

Suggested change
cdef cnp.int_t[:] out = np.empty((n_samples, ), dtype=int)
cdef cnp.int_t[::1] out = np.empty((n_samples, ), dtype=int)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thomasjpfan This is good feedback as it tests my understanding of this concept. Do you have any reference docs that I can read to further understand this?(pieces,tanenbaum,etc.. your fav reference here) I def have to read more source code but this is fun.

From what I understand, we can swap values but not pointers. In these sampling algorithms we are swapping values. Are these somewhat correct statements? Linked lists come to mind as a non contiguous example right?

Thanks for the brain exercise!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thomasjpfan here is a PR updating your Nits in case you are still interested.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@2357juan contiguous means all elements of the array are next to each other in memory. This page https://en.wikipedia.org/wiki/Stride_of_an_array describes why it's not always the case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @2357juan, you can find more information about memory layout encodings in this section of Cython's documentation.

@@ -133,9 +133,9 @@ cpdef _sample_without_replacement_with_pool(cnp.int_t n_population,

cdef cnp.int_t i
cdef cnp.int_t j
cdef cnp.ndarray[cnp.int_t, ndim=1] out = np.empty((n_samples, ), dtype=int)
cdef cnp.int_t[:] out = np.empty((n_samples, ), dtype=int)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here:

Suggested change
cdef cnp.int_t[:] out = np.empty((n_samples, ), dtype=int)
cdef cnp.int_t[::1] out = np.empty((n_samples, ), dtype=int)

Comment on lines +138 to 139
cdef cnp.int_t[:] pool = np.empty((n_population, ),
dtype=int)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here:

Suggested change
cdef cnp.int_t[:] pool = np.empty((n_population, ),
dtype=int)
cdef cnp.int_t[::1] pool = np.empty((n_population, ), dtype=int)

@@ -195,7 +195,7 @@ cpdef _sample_without_replacement_with_reservoir_sampling(

cdef cnp.int_t i
cdef cnp.int_t j
cdef cnp.ndarray[cnp.int_t, ndim=1] out = np.empty((n_samples, ), dtype=int)
cdef cnp.int_t[:] out = np.empty((n_samples, ), dtype=int)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cdef cnp.int_t[:] out = np.empty((n_samples, ), dtype=int)
cdef cnp.int_t[::1] out = np.empty((n_samples, ), dtype=int)

@da-woods
Copy link

da-woods commented Mar 11, 2023

Sorry for slightly slow reply. Cython will issue a (low level) warning if you cimport numpy without doing import_array() so from that point of view it probably makes sense to leave it in.

Practically I don't think it really matters here though

@jjerphan jjerphan removed the Quick Review For PRs that are quick to review label Mar 13, 2023
@jjerphan jjerphan merged commit 5ffec32 into scikit-learn:main Mar 13, 2023
Veghit pushed a commit to Veghit/scikit-learn that referenced this pull request Apr 15, 2023
Co-authored-by: Juan Gomez <{789543}+{2357juan}@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.