Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Heap-use-after-free issue in pyexpat related to .ExternalEntityParserCreate #139400

Copy link
Copy link
@hartwork

Description

@hartwork
Issue body actions

Bug report

Bug description:

When configured with --with-address-sanitizer, the test below crashes all versions of CPython:

# Copyright (c) 2025 Sebastian Pipping <sebastian@pipping.org>
# Licensed under Zero-Clause BSD ("0BSD")

import pyexpat as expat
import unittest


class ParentParserLifetimeTest(unittest.TestCase):
    """
    Subparsers make use of their parent XML_Parser inside of Expat.
    As a result, parent parsers need to outlive subparsers.
    Regression test for issue 139400
    """
    def test_parent_parser_outlives_its_subparsers(self):
        parser = expat.ParserCreate()
        subparser = parser.ExternalEntityParserCreate(None)

        # Now try to cause garbage collection of the parent parser
        # while it's still being referenced by a related subparser
        del parser


if __name__ == '__main__':
    unittest.main()

The finding was first documented at #139367 (comment) .

For 3.13, the AddressSanitizer crash details are: (click to expand)
# ./python ..test_file_with_test_class_above_added_dot_py.. -v
test_use_after_free__crash (__main__.UseAfterFreeCrashDemoTest.test_use_after_free__crash) ... =================================================================
==16187==ERROR: AddressSanitizer: heap-use-after-free on address 0x7cda1ad7e038 at pc 0x7f4a1af8a94f bp 0x7ffc61f63720 sp 0x7ffc61f63710
READ of size 8 at 0x7cda1ad7e038 thread T0
    #0 0x7f4a1af8a94e in getRootParserOf Modules/expat/xmlparse.c:8660
    #1 0x7f4a1af8a94e in expat_free Modules/expat/xmlparse.c:913
    #2 0x7f4a1af8a94e in expat_free Modules/expat/xmlparse.c:906
    #3 0x7f4a1af8a94e in PyExpat_XML_ParserFree Modules/expat/xmlparse.c:1997
    #4 0x7f4a1af70286 in xmlparse_dealloc Modules/pyexpat.c:1266
    #5 0x558a55f60555 in Py_DECREF Include/object.h:949
    #6 0x558a55f60555 in Py_XDECREF Include/object.h:1042
    #7 0x558a55f60555 in _PyFrame_ClearLocals Python/frame.c:104
    #8 0x558a55f60555 in _PyFrame_ClearExceptCode Python/frame.c:129
    #9 0x558a55e9df91 in clear_thread_frame Python/ceval.c:1682
    #10 0x558a55e9df91 in _PyEval_FrameClearAndPop Python/ceval.c:1709
    #11 0x558a55ec6b44 in _PyEval_EvalFrameDefault Python/generated_cases.c.h:5222
    #12 0x558a55b523d3 in _PyObject_VectorcallTstate Include/internal/pycore_call.h:168
    #13 0x558a55b523d3 in method_vectorcall Objects/classobject.c:93
    #14 0x558a55b4d5f6 in _PyVectorcall_Call Objects/call.c:273
    #15 0x558a55b4d5f6 in _PyObject_Call Objects/call.c:348
    #16 0x558a55b4d5f6 in PyObject_Call Objects/call.c:373
    #17 0x558a55eb1e8a in _PyEval_EvalFrameDefault Python/generated_cases.c.h:1362
    #18 0x558a55b4e383 in _PyObject_VectorcallDictTstate Objects/call.c:135
    #19 0x558a55b4e383 in _PyObject_Call_Prepend Objects/call.c:504
    #20 0x558a55ce88e6 in slot_tp_call Objects/typeobject.c:9570
    #21 0x558a55b470d1 in _PyObject_MakeTpCall Objects/call.c:242
    #22 0x558a55ebf7de in _PyEval_EvalFrameDefault Python/generated_cases.c.h:1850
    #23 0x558a55b523d3 in _PyObject_VectorcallTstate Include/internal/pycore_call.h:168
    #24 0x558a55b523d3 in method_vectorcall Objects/classobject.c:93
    #25 0x558a55b4d5f6 in _PyVectorcall_Call Objects/call.c:273
    #26 0x558a55b4d5f6 in _PyObject_Call Objects/call.c:348
    #27 0x558a55b4d5f6 in PyObject_Call Objects/call.c:373
    #28 0x558a55eb1e8a in _PyEval_EvalFrameDefault Python/generated_cases.c.h:1362
    #29 0x558a55b4e383 in _PyObject_VectorcallDictTstate Objects/call.c:135
    #30 0x558a55b4e383 in _PyObject_Call_Prepend Objects/call.c:504
    #31 0x558a55ce88e6 in slot_tp_call Objects/typeobject.c:9570
    #32 0x558a55b470d1 in _PyObject_MakeTpCall Objects/call.c:242
    #33 0x558a55ebf7de in _PyEval_EvalFrameDefault Python/generated_cases.c.h:1850
    #34 0x558a55b523d3 in _PyObject_VectorcallTstate Include/internal/pycore_call.h:168
    #35 0x558a55b523d3 in method_vectorcall Objects/classobject.c:93
    #36 0x558a55b4d5f6 in _PyVectorcall_Call Objects/call.c:273
    #37 0x558a55b4d5f6 in _PyObject_Call Objects/call.c:348
    #38 0x558a55b4d5f6 in PyObject_Call Objects/call.c:373
    #39 0x558a55eb1e8a in _PyEval_EvalFrameDefault Python/generated_cases.c.h:1362
    #40 0x558a55b4e383 in _PyObject_VectorcallDictTstate Objects/call.c:135
    #41 0x558a55b4e383 in _PyObject_Call_Prepend Objects/call.c:504
    #42 0x558a55ce88e6 in slot_tp_call Objects/typeobject.c:9570
    #43 0x558a55b470d1 in _PyObject_MakeTpCall Objects/call.c:242
    #44 0x558a55eaa121 in _PyEval_EvalFrameDefault Python/generated_cases.c.h:813
    #45 0x558a55b4e383 in _PyObject_VectorcallDictTstate Objects/call.c:135
    #46 0x558a55b4e383 in _PyObject_Call_Prepend Objects/call.c:504
    #47 0x558a55cfb652 in slot_tp_init Objects/typeobject.c:9816
    #48 0x558a55cd8107 in type_call Objects/typeobject.c:1997
    #49 0x558a55b470d1 in _PyObject_MakeTpCall Objects/call.c:242
    #50 0x558a55eaa121 in _PyEval_EvalFrameDefault Python/generated_cases.c.h:813
    #51 0x558a55ed8b3e in _PyEval_EvalFrame Include/internal/pycore_ceval.h:119
    #52 0x558a55ed8b3e in _PyEval_Vector Python/ceval.c:1820
    #53 0x558a55ed8b3e in PyEval_EvalCode Python/ceval.c:604
    #54 0x558a5600eb0e in run_eval_code_obj Python/pythonrun.c:1381
    #55 0x558a5600eb0e in run_eval_code_obj Python/pythonrun.c:1348
    #56 0x558a5600f127 in run_mod Python/pythonrun.c:1489
    #57 0x558a560138d0 in pyrun_file Python/pythonrun.c:1295
    #58 0x558a560138d0 in _PyRun_SimpleFileObject Python/pythonrun.c:517
    #59 0x558a5601421c in _PyRun_AnyFileObject Python/pythonrun.c:77
    #60 0x558a560833ec in pymain_run_file_obj Modules/main.c:410
    #61 0x558a560833ec in pymain_run_file Modules/main.c:429
    #62 0x558a560833ec in pymain_run_python Modules/main.c:696
    #63 0x558a56085156 in Py_RunMain Modules/main.c:775
    #64 0x558a56085156 in pymain_main Modules/main.c:805
    #65 0x558a56085156 in Py_BytesMain Modules/main.c:829
    #66 0x7f4a1b9a733f  (/lib64/libc.so.6+0x2733f)
    #67 0x7f4a1b9a73f8 in __libc_start_main (/lib64/libc.so.6+0x273f8)
    #68 0x558a55a23754 in _start ([..]/cpython/python+0x19c754)

0x7cda1ad7e038 is located 952 bytes inside of 1096-byte region [0x7cda1ad7dc80,0x7cda1ad7e0c8)
freed by thread T0 here:
    #0 0x7f4a1bd6b9eb  (/usr/lib/gcc/x86_64-pc-linux-gnu/15/libasan.so.8+0x11f9eb)
    #1 0x7f4a1af87e12 in expat_free Modules/expat/xmlparse.c:934
    #2 0x7f4a1af87e12 in expat_free Modules/expat/xmlparse.c:906
    #3 0x7f4a1af87e12 in PyExpat_XML_ParserFree Modules/expat/xmlparse.c:2011
    #4 0x7f4a1af70286 in xmlparse_dealloc Modules/pyexpat.c:1266
    #5 0x558a55f60555 in Py_DECREF Include/object.h:949
    #6 0x558a55f60555 in Py_XDECREF Include/object.h:1042
    #7 0x558a55f60555 in _PyFrame_ClearLocals Python/frame.c:104
    #8 0x558a55f60555 in _PyFrame_ClearExceptCode Python/frame.c:129
    #9 0x558a55e9df91 in clear_thread_frame Python/ceval.c:1682
    #10 0x558a55e9df91 in _PyEval_FrameClearAndPop Python/ceval.c:1709
    #11 0x558a55ec6b44 in _PyEval_EvalFrameDefault Python/generated_cases.c.h:5222
    #12 0x558a55b523d3 in _PyObject_VectorcallTstate Include/internal/pycore_call.h:168
    #13 0x558a55b523d3 in method_vectorcall Objects/classobject.c:93
    #14 0x558a55b4d5f6 in _PyVectorcall_Call Objects/call.c:273
    #15 0x558a55b4d5f6 in _PyObject_Call Objects/call.c:348
    #16 0x558a55b4d5f6 in PyObject_Call Objects/call.c:373
    #17 0x558a55eb1e8a in _PyEval_EvalFrameDefault Python/generated_cases.c.h:1362
    #18 0x558a55b4e383 in _PyObject_VectorcallDictTstate Objects/call.c:135
    #19 0x558a55b4e383 in _PyObject_Call_Prepend Objects/call.c:504
    #20 0x558a55ce88e6 in slot_tp_call Objects/typeobject.c:9570
    #21 0x558a55b470d1 in _PyObject_MakeTpCall Objects/call.c:242
    #22 0x558a55ebf7de in _PyEval_EvalFrameDefault Python/generated_cases.c.h:1850
    #23 0x558a55b523d3 in _PyObject_VectorcallTstate Include/internal/pycore_call.h:168
    #24 0x558a55b523d3 in method_vectorcall Objects/classobject.c:93
    #25 0x558a55b4d5f6 in _PyVectorcall_Call Objects/call.c:273
    #26 0x558a55b4d5f6 in _PyObject_Call Objects/call.c:348
    #27 0x558a55b4d5f6 in PyObject_Call Objects/call.c:373
    #28 0x558a55eb1e8a in _PyEval_EvalFrameDefault Python/generated_cases.c.h:1362
    #29 0x558a55b4e383 in _PyObject_VectorcallDictTstate Objects/call.c:135
    #30 0x558a55b4e383 in _PyObject_Call_Prepend Objects/call.c:504
    #31 0x558a55ce88e6 in slot_tp_call Objects/typeobject.c:9570
    #32 0x558a55b470d1 in _PyObject_MakeTpCall Objects/call.c:242
    #33 0x558a55ebf7de in _PyEval_EvalFrameDefault Python/generated_cases.c.h:1850
    #34 0x558a55b523d3 in _PyObject_VectorcallTstate Include/internal/pycore_call.h:168
    #35 0x558a55b523d3 in method_vectorcall Objects/classobject.c:93
    #36 0x558a55b4d5f6 in _PyVectorcall_Call Objects/call.c:273
    #37 0x558a55b4d5f6 in _PyObject_Call Objects/call.c:348
    #38 0x558a55b4d5f6 in PyObject_Call Objects/call.c:373
    #39 0x558a55eb1e8a in _PyEval_EvalFrameDefault Python/generated_cases.c.h:1362
    #40 0x558a55b4e383 in _PyObject_VectorcallDictTstate Objects/call.c:135
    #41 0x558a55b4e383 in _PyObject_Call_Prepend Objects/call.c:504
    #42 0x558a55ce88e6 in slot_tp_call Objects/typeobject.c:9570
    #43 0x558a55b470d1 in _PyObject_MakeTpCall Objects/call.c:242
    #44 0x558a55eaa121 in _PyEval_EvalFrameDefault Python/generated_cases.c.h:813
    #45 0x558a55b4e383 in _PyObject_VectorcallDictTstate Objects/call.c:135
    #46 0x558a55b4e383 in _PyObject_Call_Prepend Objects/call.c:504
    #47 0x558a55cfb652 in slot_tp_init Objects/typeobject.c:9816
    #48 0x558a55cd8107 in type_call Objects/typeobject.c:1997

previously allocated by thread T0 here:
    #0 0x7f4a1bd6ceab in malloc (/usr/lib/gcc/x86_64-pc-linux-gnu/15/libasan.so.8+0x120eab)
    #1 0x7f4a1af8b227 in parserCreate Modules/expat/xmlparse.c:1364
    #2 0x7f4a1af6f0d1 in newxmlparseobject Modules/pyexpat.c:1211
    #3 0x7f4a1af6f0d1 in pyexpat_ParserCreate_impl Modules/pyexpat.c:1609
    #4 0x7f4a1af6f0d1 in pyexpat_ParserCreate Modules/clinic/pyexpat.c.h:511
    #5 0x558a55c4ed39 in cfunction_vectorcall_FASTCALL_KEYWORDS Objects/methodobject.c:440
    #6 0x558a55b48813 in _PyObject_VectorcallTstate Include/internal/pycore_call.h:168
    #7 0x558a55b48813 in PyObject_Vectorcall Objects/call.c:327
    #8 0x558a55eaa121 in _PyEval_EvalFrameDefault Python/generated_cases.c.h:813
    #9 0x558a55b523d3 in _PyObject_VectorcallTstate Include/internal/pycore_call.h:168
    #10 0x558a55b523d3 in method_vectorcall Objects/classobject.c:93
    #11 0x558a55b4d5f6 in _PyVectorcall_Call Objects/call.c:273
    #12 0x558a55b4d5f6 in _PyObject_Call Objects/call.c:348
    #13 0x558a55b4d5f6 in PyObject_Call Objects/call.c:373
    #14 0x558a55eb1e8a in _PyEval_EvalFrameDefault Python/generated_cases.c.h:1362
    #15 0x558a55b4e383 in _PyObject_VectorcallDictTstate Objects/call.c:135
    #16 0x558a55b4e383 in _PyObject_Call_Prepend Objects/call.c:504
    #17 0x558a55ce88e6 in slot_tp_call Objects/typeobject.c:9570
    #18 0x558a55b470d1 in _PyObject_MakeTpCall Objects/call.c:242
    #19 0x558a55ebf7de in _PyEval_EvalFrameDefault Python/generated_cases.c.h:1850
    #20 0x558a55b523d3 in _PyObject_VectorcallTstate Include/internal/pycore_call.h:168
    #21 0x558a55b523d3 in method_vectorcall Objects/classobject.c:93
    #22 0x558a55b4d5f6 in _PyVectorcall_Call Objects/call.c:273
    #23 0x558a55b4d5f6 in _PyObject_Call Objects/call.c:348
    #24 0x558a55b4d5f6 in PyObject_Call Objects/call.c:373
    #25 0x558a55eb1e8a in _PyEval_EvalFrameDefault Python/generated_cases.c.h:1362
    #26 0x558a55b4e383 in _PyObject_VectorcallDictTstate Objects/call.c:135
    #27 0x558a55b4e383 in _PyObject_Call_Prepend Objects/call.c:504
    #28 0x558a55ce88e6 in slot_tp_call Objects/typeobject.c:9570
    #29 0x558a55b470d1 in _PyObject_MakeTpCall Objects/call.c:242
    #30 0x558a55ebf7de in _PyEval_EvalFrameDefault Python/generated_cases.c.h:1850
    #31 0x558a55b523d3 in _PyObject_VectorcallTstate Include/internal/pycore_call.h:168
    #32 0x558a55b523d3 in method_vectorcall Objects/classobject.c:93
    #33 0x558a55b4d5f6 in _PyVectorcall_Call Objects/call.c:273
    #34 0x558a55b4d5f6 in _PyObject_Call Objects/call.c:348
    #35 0x558a55b4d5f6 in PyObject_Call Objects/call.c:373
    #36 0x558a55eb1e8a in _PyEval_EvalFrameDefault Python/generated_cases.c.h:1362
    #37 0x558a55b4e383 in _PyObject_VectorcallDictTstate Objects/call.c:135
    #38 0x558a55b4e383 in _PyObject_Call_Prepend Objects/call.c:504
    #39 0x558a55ce88e6 in slot_tp_call Objects/typeobject.c:9570
    #40 0x558a55b470d1 in _PyObject_MakeTpCall Objects/call.c:242
    #41 0x558a55eaa121 in _PyEval_EvalFrameDefault Python/generated_cases.c.h:813
    #42 0x558a55b4e383 in _PyObject_VectorcallDictTstate Objects/call.c:135
    #43 0x558a55b4e383 in _PyObject_Call_Prepend Objects/call.c:504
    #44 0x558a55cfb652 in slot_tp_init Objects/typeobject.c:9816
    #45 0x558a55cd8107 in type_call Objects/typeobject.c:1997

SUMMARY: AddressSanitizer: heap-use-after-free Modules/expat/xmlparse.c:8660 in getRootParserOf
Shadow bytes around the buggy address:
  0x7cda1ad7dd80: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x7cda1ad7de00: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x7cda1ad7de80: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x7cda1ad7df00: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x7cda1ad7df80: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
=>0x7cda1ad7e000: fd fd fd fd fd fd fd[fd]fd fd fd fd fd fd fd fd
  0x7cda1ad7e080: fd fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa
  0x7cda1ad7e100: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x7cda1ad7e180: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x7cda1ad7e200: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x7cda1ad7e280: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==16187==ABORTING

My understanding is that there is a bug in the graph of object relations and that the same parser instance is being freed twice as a consequence.

CC @picnixz

CPython versions tested on:

3.15, 3.14, 3.13, 3.12, 3.11, 3.10, 3.9

Operating systems tested on:

Other, Windows, macOS, Linux

Linked PRs

Reactions are currently unavailable

Metadata

Metadata

Assignees

No fields configured for issues without a type.

Projects

Status
Done
Show more project fields

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    Morty Proxy This is a proxified and sanitized view of the page, visit original site.