Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

BUG: quantile should error when weights are all zeros #28595

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 19 commits into
base: main
Choose a base branch
Loading
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Streamlined testing, improved error handling capabilities
  • Loading branch information
Tontonio3 committed Mar 28, 2025
commit c76b5ad09cc7f33f02211e0a78a3f87d0d7e600d
21 changes: 13 additions & 8 deletions 21 numpy/lib/_function_base_impl.py
Original file line number Diff line number Diff line change
Expand Up @@ -4535,21 +4535,26 @@ def quantile(a,
if axis is not None:
axis = _nx.normalize_axis_tuple(axis, a.ndim, argname="axis")
weights = _weights_are_valid(weights=weights, a=a, axis=axis)
if np.any(weights < 0):
raise ValueError("Weights must be non-negative.")
elif np.all(weights == 0):
raise ValueError("At least one weight must be non-zero")
if weights.dtype != object:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A general comment: I think these checks should happen inside _weights_ar_valid - this will ensure they are used for percentile as well.

if np.any(np.isinf(weights)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another general comment: as written, the common case has to go through a lot of checks. I think it would be better to optimize for the common case, and not worry too much about distinguishing failure cases. E.g., you can do just one evaluation with:

if not np.all(np.isfinite(weights)):
    raise ValueError("weights must be finite.")

raise ValueError("Weights must be non-infinite")
raise ValueError("Weights must be non-infinite.")
elif np.any(np.isnan(weights)):
raise ValueError("At least one weight is nan")
raise ValueError("At least one weight is nan.")
# Since np.isinf and np.isnan do not work in dtype object arrays
# Also, dtpye object arrays with np.nan in them break <, > and == opperators
# This specific handling had to be done (Can be improved)
elif weights.dtype == object:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this loop can still give unexpected errors, because you are here counting on object arrays to be turned into their values as scalars. E.g.,

np.isnan(np.array([1.,None,np.inf])[1])
# TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

This will be an uninformative error!

I think we have two choices: just not check for object dtype, or convert to float before checking (and then passing on it that conversion fails).

for w in weights:
if np.isnan(w):
raise ValueError("At least one weight is nan")
raise ValueError("At least one weight is nan.")
if np.isinf(w):
raise ValueError("Weights must be non-infinite")
raise ValueError("Weights must be non-infinite.")

if np.any(weights < 0):
raise ValueError("Weights must be non-negative.")
elif np.all(weights == 0):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here again we could ensure the common case remains fast by doing:

if np.any(weights <= 0):
    raise ValueError("weights must be non-negative and cannot be all zero.")
    # or, more explicit error messages,
    if np.all(weights == 0):
        raise ValueError("At least one weight must be non-zero.")
    else:
        raise ValueError("Weights must be non-negative.")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to keep this inline:

The issue with this is that some of the weights might be 0, but none of them are negative. So it would raise an error even though it shouldn't

You're right, I was too sloppy in writing this, the else should be elif np.any(weights <0) so that the case of some weights 0 falls through (slowly, but better than making all cases slow!).

p.s. Given this, I'd probably swap the order, i.e.,

if np.any(weights <= 0):
    # Do these checks guarded by the above `if` to avoid slowing down the common case.
    if np.any(weights < 0):
        raise ValueError("Weights must be non-negative.")
    elif np.all(weights == 0):
        raise ValueError("At least one weight must be non-zero.")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Tontonio3 I don't see how you responded to this suggestion. Please make sure all reviewer feedback is addressed before requesting re-review.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ngoldbaum Your're right, I forgot to implement this

raise ValueError("At least one weight must be non-zero.")


return _quantile_unchecked(
a, q, axis, out, overwrite_input, method, keepdims, weights)
Expand Down
63 changes: 15 additions & 48 deletions 63 numpy/lib/tests/test_function_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -4142,58 +4142,25 @@ def test_closest_observation(self):
assert_equal(4, np.quantile(arr[0:9], q, method=m))
assert_equal(5, np.quantile(arr, q, method=m))

def test_inf_err(self):

m = "inverted_cdf"
q = 0.5
arr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
wgt = np.ones(10)

for i in range(len(arr)):
wgt[i] = np.inf
with pytest.raises(ValueError) as ex:
a = np.quantile(arr, q, weights=wgt, method=m)
assert "Weights must be non-infinite" in str(ex)
wgt[i] = 1

for i in range(len(arr)):
wgt[i] = np.inf
with pytest.raises(ValueError) as ex:
a = np.quantile(arr, q, weights=wgt, method=m)
assert "Weights must be non-infinite" in str(ex)

def test_nan_err(self):
@pytest.mark.parametrize(["err_msg", "weight"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd parametrize over np.quantile and np.percentile as well - they should have the same errors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if you pass in a list rather than an array, you could parametrize over dtype=float and dtype=object, to make this a little more readable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

[("Weights must be non-infinite.", np.array([1,np.inf, 1, 1])),
("Weights must be non-infinite.", np.array([1,np.inf, 1, 1], dtype=object)),
("Weights must be non-infinite.", np.array([1,-np.inf, 1, 1])),
("Weights must be non-infinite.", np.array([1,-np.inf, 1, 1], dtype=object)),
("Weights must be non-infinite.", np.array([1,np.inf, 1, np.inf])),
("At least one weight is nan.", np.array([1,np.nan, 1, 1])),
("At least one weight is nan.", np.array([1,np.nan, 1, 1], dtype=object)),
("At least one weight is nan.", np.array([1,np.nan, np.nan, 1])),
("At least one weight must be non-zero.", np.zeros(4))])
def test_inf_nan_err(self, err_msg, weight):

m = "inverted_cdf"
q = 0.5
arr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
wgt = np.ones(10)

for i in range(len(arr)):
wgt[i] = np.nan
with pytest.raises(ValueError) as ex:
a = np.quantile(arr, q, weights=wgt, method=m)
assert "At least one weight is nan" in str(ex)
wgt[i] = 1

for i in range(len(arr)):
wgt[i] = np.nan
with pytest.raises(ValueError) as ex:
a = np.quantile(arr, q, weights=wgt, method=m)
assert "At least one weight is nan" in str(ex)

def test_all_zeroes_err(self):

m = "inverted_cdf"
q = 0.5
arr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
wgt = np.zeros(10)
with pytest.raises(ValueError) as ex:
a = np.quantile(arr, q, weights=wgt, method=m)

assert "At least one weight must be non-zero" in str(ex)


arr = [1, 2, 3, 4]
with pytest.raises(ValueError, match=err_msg):
a = np.quantile(arr, q, weights=weight, method=m)

class TestLerp:
@hypothesis.given(t0=st.floats(allow_nan=False, allow_infinity=False,
min_value=0, max_value=1),
Expand Down
Loading
Morty Proxy This is a proxified and sanitized view of the page, visit original site.