Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

BUG: Make arr = np.fromstring("\n", sep=" ") return []. #18495

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
Loading
from

Conversation

tdakic
Copy link

@tdakic tdakic commented Feb 25, 2021

Closes #18435.

arr = np.fromstring("\n", sep=" ") now retuns []
@charris charris changed the title Bug fix for 18435 BUG: Make arr = np.fromstring("\n", sep=" ") return []. Feb 25, 2021
@@ -73,7 +73,7 @@ fromstr_next_element(char **s, void *dptr, PyArray_Descr *dtype,
int r = dtype->f->fromstr(*s, dptr, &e, dtype);
/*
* fromstr always returns 0 for basic dtypes; s points to the end of the
* parsed string. If s is not changed an error occurred or the end was
* parsed string. If s is not changed an error occu5trred or the end was
* reached.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also use a comma after "changed".

Copy link
Author

@tdakic tdakic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, a typo :-(

Copy link
Author

@tdakic tdakic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

@tdakic
Copy link
Author

tdakic commented Mar 3, 2021

With this fix, print(np.fromstring(", 1,2,3", sep= ",",dtype =np.int64)) will print [1 2 3]. Should it?

Base automatically changed from master to main March 4, 2021 02:05
@charris
Copy link
Member

charris commented Mar 26, 2021

close/reopen

@charris charris closed this Mar 26, 2021
@charris charris reopened this Mar 26, 2021
@charris charris self-assigned this Mar 26, 2021
@charris
Copy link
Member

charris commented Mar 26, 2021

Looking at this, I'm not convinced the result should be []. Apparently white space (\n) is interpreted as -1. For instance

In [17]: fromstring(' , ',sep=',')                                              
Out[17]: array([-1., -1.])

That may be a problem, but it is a different problem. I'm not yet clear where that value comes from. @seberg Thoughts?

This raises a FutureWarning with other separators.

n [7]: fromstring(',',sep=',')                                                 
<ipython-input-7-5aea430bb5d9>:1: DeprecationWarning: string or file could not be read to its end due to unmatched data; this will raise a ValueError in the future.
  fromstring(',',sep=',')
Out[7]: array([], dtype=float64)

@seberg
Copy link
Member

seberg commented Mar 27, 2021

Uh, that is strange... It seems you are right and whitespace is parsed as a number and that number is -1.... That explains how the -1. can be so persistent.

It basically drops out of this:

/*
* NumPyOS_ascii_strtod_plain:
*
* PyOS_ascii_strtod work-alike, with no enhanced features,
* for forward compatibility with Python >= 2.7
*/
static double
NumPyOS_ascii_strtod_plain(const char *s, char** endptr)
{
double result;
NPY_ALLOW_C_API_DEF;
NPY_ALLOW_C_API;
result = PyOS_string_to_double(s, endptr, NULL);
if (PyErr_Occurred()) {
if (endptr) {
*endptr = (char*)s;
}
PyErr_Clear();
}
NPY_DISABLE_C_API;
return result;
}

Now, that should indicate that an error occurred, by resetting the end-pointer to the start position on error, which then should trigger the error return path here:

char *e = *s;
int r = dtype->f->fromstr(*s, dptr, &e, dtype);
/*
* fromstr always returns 0 for basic dtypes; s points to the end of the
* parsed string. If s is not changed an error occurred or the end was
* reached.
*/
if (*s == e || r < 0) {

Which then has a negative return value indicating an error in the read.

All of that seems a bit strange and every single level of that code looks like it could be smoothened a bit more. Maybe we should try and see if we can do a proper deprecation of that -1 if the -1 works for reliably for everything that cannot be read.

The bug seems to be that call to string_is_fully_read which probably does the right thing for the skip function, but not for the parse function. Since it checks start >= end and equality holds here.

The easier fix should probably to properly propagate a -1 return value when not successfully reading from dtype->f->fromstr (or even lower). The only question is whether we want to try a deprecation there, I am not sure how easy it is.

Long story short, the -1. is definitely an accident (I wonder if it has always been there, or an artifact of fixing another bug). But if we want to gently "deprecate" it, I guess it might be a bit tricky. If we are happy to not be gentle about it, I expect the string_is_fully_read might just be unnecessary after an unsuccessfull value parsing attempt. (worst case, we have to check for a leading \0 maybe, in case the string is fully empty.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Needs decision
Development

Successfully merging this pull request may close these issues.

np.fromstring("\n", sep=" ") returns non-empty array
3 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.