Heap buffer overflow in _PyTokenizer_ensure_utf8

Bug report

Bug description:

OSS-Fuzz has found a heap buffer overflow in _PyTokenizer_ensure_utf8. Link to OSS-Fuzz bug report.

The root cause is that valid_utf8() in Parser/tokenizer/helpers.c checks continuation bytes in reverse order thus reader s[expected] before s[1] on these lines:

cpython/Parser/tokenizer/helpers.c

Lines 497 to 499 in 8b7b5a9

    
           for (; expected; expected--) 
        
               if (s[expected] < 0x80 || s[expected] >= 0xC0) 
        
                   return 0;

When a multi-byte UTF-8 sequence is truncated - such as a 3-byte lead \xEA followed immediately by a null terminator - the backward loop reads past the end of the valid data before encountering the null byte that would stop it.

This is not a security-critical issue.

CPython versions tested on:

CPython main branch

Operating systems tested on:

No response

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Heap buffer overflow in `_PyTokenizer_ensure_utf8` #144872

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

Linked PRs

Metadata

Assignees

Labels

Fields

Projects

Milestone

Relationships

Development

	for (; expected; expected--)
	if (s[expected] < 0x80 \|\| s[expected] >= 0xC0)
	return 0;

Search code, repositories, users, issues, pull requests...

Uh oh!

Heap buffer overflow in _PyTokenizer_ensure_utf8 #144872

Description

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

Linked PRs

Metadata

Metadata

Assignees

Labels

Fields

Projects

Milestone

Relationships

Development

Issue actions

Heap buffer overflow in `_PyTokenizer_ensure_utf8` #144872