-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
MAINT Use memoryviews in _random.pyx #25780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAINT Use memoryviews in _random.pyx #25780
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM once the call to cnp.import_array
has been removed.
It is true that the safeness of the code after the removal of this call is uncertain. I think that it is safe if we only are relying on @da-woods: do you know if removing the call to |
Thinking it over, I prefer to always include Secondly, if we decide to use NumPy's C-API which requires |
dawoods has covered this. His recommendation is to include it. I think we are all in agreement that it should include the import. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@jjerphan I think auto merge did not work as expected here. |
We need another maintainer to approve this PR so that it gets merged. |
Oh I see! That is one of the requirements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR @2357juan ! I have one nit about using [::1]
because we are sure that the data is contiguous. Otherwise this looks good!
@@ -78,7 +78,7 @@ cpdef _sample_without_replacement_with_tracking_selection( | ||
|
||
cdef cnp.int_t i | ||
cdef cnp.int_t j | ||
cdef cnp.ndarray[cnp.int_t, ndim=1] out = np.empty((n_samples, ), dtype=int) | ||
cdef cnp.int_t[:] out = np.empty((n_samples, ), dtype=int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are sure that the data is contiguous:
cdef cnp.int_t[:] out = np.empty((n_samples, ), dtype=int) | |
cdef cnp.int_t[::1] out = np.empty((n_samples, ), dtype=int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thomasjpfan This is good feedback as it tests my understanding of this concept. Do you have any reference docs that I can read to further understand this?(pieces,tanenbaum,etc.. your fav reference here) I def have to read more source code but this is fun.
From what I understand, we can swap values but not pointers. In these sampling algorithms we are swapping values. Are these somewhat correct statements? Linked lists come to mind as a non contiguous example right?
Thanks for the brain exercise!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thomasjpfan here is a PR updating your Nits in case you are still interested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@2357juan contiguous means all elements of the array are next to each other in memory. This page https://en.wikipedia.org/wiki/Stride_of_an_array describes why it's not always the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @2357juan, you can find more information about memory layout encodings in this section of Cython's documentation.
@@ -133,9 +133,9 @@ cpdef _sample_without_replacement_with_pool(cnp.int_t n_population, | ||
|
||
cdef cnp.int_t i | ||
cdef cnp.int_t j | ||
cdef cnp.ndarray[cnp.int_t, ndim=1] out = np.empty((n_samples, ), dtype=int) | ||
cdef cnp.int_t[:] out = np.empty((n_samples, ), dtype=int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here:
cdef cnp.int_t[:] out = np.empty((n_samples, ), dtype=int) | |
cdef cnp.int_t[::1] out = np.empty((n_samples, ), dtype=int) |
cdef cnp.int_t[:] pool = np.empty((n_population, ), | ||
dtype=int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here:
cdef cnp.int_t[:] pool = np.empty((n_population, ), | |
dtype=int) | |
cdef cnp.int_t[::1] pool = np.empty((n_population, ), dtype=int) |
@@ -195,7 +195,7 @@ cpdef _sample_without_replacement_with_reservoir_sampling( | ||
|
||
cdef cnp.int_t i | ||
cdef cnp.int_t j | ||
cdef cnp.ndarray[cnp.int_t, ndim=1] out = np.empty((n_samples, ), dtype=int) | ||
cdef cnp.int_t[:] out = np.empty((n_samples, ), dtype=int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cdef cnp.int_t[:] out = np.empty((n_samples, ), dtype=int) | |
cdef cnp.int_t[::1] out = np.empty((n_samples, ), dtype=int) |
Sorry for slightly slow reply. Cython will issue a (low level) warning if you Practically I don't think it really matters here though |
Co-authored-by: Juan Gomez <{789543}+{2357juan}@users.noreply.github.com>
Reference Issues/PRs
Fixes sklearn/utils/_random.pyx from #25484
What does this implement/fix? Explain your changes.
Any other comments?
The output needed to be casted to an numpy array as tests (for example test_svm.py) call .astype on the return of the modified functions.
I also tried to reason as to why cnp.int_t and int are the types as values are intuitively indices. The indices can be between 0<= index < inf. Should we be using an unsigned long maybe? So, a cnp.uint64_t and np.uint64 or maybe a uint32.