You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The root cause is that valid_utf8() in Parser/tokenizer/helpers.c checks continuation bytes in reverse order thus reader s[expected] before s[1] on these lines:
When a multi-byte UTF-8 sequence is truncated - such as a 3-byte lead \xEA followed immediately by a null terminator - the backward loop reads past the end of the valid data before encountering the null byte that would stop it.
Bug report
Bug description:
OSS-Fuzz has found a heap buffer overflow in
_PyTokenizer_ensure_utf8. Link to OSS-Fuzz bug report.The root cause is that
valid_utf8()inParser/tokenizer/helpers.cchecks continuation bytes in reverse order thus readers[expected]befores[1]on these lines:cpython/Parser/tokenizer/helpers.c
Lines 497 to 499 in 8b7b5a9
When a multi-byte UTF-8 sequence is truncated - such as a 3-byte lead
\xEAfollowed immediately by a null terminator - the backward loop reads past the end of the valid data before encountering the null byte that would stop it.This is not a security-critical issue.
CPython versions tested on:
CPython main branch
Operating systems tested on:
No response
Linked PRs
_PyTokenizer_ensure_utf8#144807_PyTokenizer_ensure_utf8(GH-144807) #145287_PyTokenizer_ensure_utf8(GH-144807) #145441