Description
Highly related to #14481 and maybe a little bit to #13986.
My understanding of the copy=False
parameter of estimators is "allow inplace modifications of X".
When avoiding a copy is not possible (X doesn't have the right dtype or memory layout for instance), a copy is still triggered. I believe that X being read-only is a valid reason for still triggering a copy.
My main argument is that the user isn't always in control of the permissions of an input array within the whole pipeline. Especially when joblib parallelism is enabled, which may create read-only memmaps. We've have a bunch of issues because of that, the latest being #28781. And it's poorly tested because it requires big arrays which we try to avoid in the tests (although joblib 1.13 makes it easy to trigger with small arrays).
I wouldn't make check_array(copy=False)
always trigger a copy when X is read-only because the semantic of the copy
param of check_array
is not the same as the one of estimators. We could introduce a new param in check_array, like copy_if_readonly
?
- Estimator has no copy param (i.e.) doesn't intend to do inplace modification:
check_array(copy=False, copy_if_readonly=False)
- Estimator has copy param:
check_array(copy=self.copy, copy_if_readonly=True)
It could also be a third option for copy in check_array: True, False, "if_readonly":
- Estimator has no copy param:
check_array(copy=False)
- Estimator has copy param:
check_array(copy=self.copy or "if_readonly")