Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit 2007624

Browse filesBrowse files
hartworkgpshead
andauthored
[3.9] gh-115398: Expose Expat >=2.6.0 reparse deferral API (CVE-2023-52425) (GH-115623) (GH-116272)
Allow controlling Expat >=2.6.0 reparse deferral (CVE-2023-52425) by adding five new methods: - `xml.etree.ElementTree.XMLParser.flush` - `xml.etree.ElementTree.XMLPullParser.flush` - `xml.parsers.expat.xmlparser.GetReparseDeferralEnabled` - `xml.parsers.expat.xmlparser.SetReparseDeferralEnabled` - `xml.sax.expatreader.ExpatParser.flush` Based on the "flush" idea from #115138 (comment) . Includes code suggested-by: Snild Dolkow <snild@sony.com> and by core dev Serhiy Storchaka. Co-authored-by: Gregory P. Smith <greg@krypto.org>
1 parent 468ba95 commit 2007624
Copy full SHA for 2007624

14 files changed

+435
-20
lines changed

‎Doc/library/pyexpat.rst

Copy file name to clipboardExpand all lines: Doc/library/pyexpat.rst
+36Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -196,6 +196,42 @@ XMLParser Objects
196196
:exc:`ExpatError` to be raised with the :attr:`code` attribute set to
197197
``errors.codes[errors.XML_ERROR_CANT_CHANGE_FEATURE_ONCE_PARSING]``.
198198

199+
.. method:: xmlparser.SetReparseDeferralEnabled(enabled)
200+
201+
.. warning::
202+
203+
Calling ``SetReparseDeferralEnabled(False)`` has security implications,
204+
as detailed below; please make sure to understand these consequences
205+
prior to using the ``SetReparseDeferralEnabled`` method.
206+
207+
Expat 2.6.0 introduced a security mechanism called "reparse deferral"
208+
where instead of causing denial of service through quadratic runtime
209+
from reparsing large tokens, reparsing of unfinished tokens is now delayed
210+
by default until a sufficient amount of input is reached.
211+
Due to this delay, registered handlers may — depending of the sizing of
212+
input chunks pushed to Expat — no longer be called right after pushing new
213+
input to the parser. Where immediate feedback and taking over responsiblity
214+
of protecting against denial of service from large tokens are both wanted,
215+
calling ``SetReparseDeferralEnabled(False)`` disables reparse deferral
216+
for the current Expat parser instance, temporarily or altogether.
217+
Calling ``SetReparseDeferralEnabled(True)`` allows re-enabling reparse
218+
deferral.
219+
220+
Note that :meth:`SetReparseDeferralEnabled` has been backported to some
221+
prior releases of CPython as a security fix. Check for availability of
222+
:meth:`SetReparseDeferralEnabled` using :func:`hasattr` if used in code
223+
running across a variety of Python versions.
224+
225+
.. versionadded:: 3.9.19
226+
227+
.. method:: xmlparser.GetReparseDeferralEnabled()
228+
229+
Returns whether reparse deferral is currently enabled for the given
230+
Expat parser instance.
231+
232+
.. versionadded:: 3.9.19
233+
234+
199235
:class:`xmlparser` objects have the following attributes:
200236

201237

‎Doc/library/xml.etree.elementtree.rst

Copy file name to clipboardExpand all lines: Doc/library/xml.etree.elementtree.rst
+39Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -165,6 +165,11 @@ data but would still like to have incremental parsing capabilities, take a look
165165
at :func:`iterparse`. It can be useful when you're reading a large XML document
166166
and don't want to hold it wholly in memory.
167167

168+
Where *immediate* feedback through events is wanted, calling method
169+
:meth:`XMLPullParser.flush` can help reduce delay;
170+
please make sure to study the related security notes.
171+
172+
168173
Finding interesting elements
169174
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
170175

@@ -1352,6 +1357,24 @@ XMLParser Objects
13521357

13531358
Feeds data to the parser. *data* is encoded data.
13541359

1360+
1361+
.. method:: flush()
1362+
1363+
Triggers parsing of any previously fed unparsed data, which can be
1364+
used to ensure more immediate feedback, in particular with Expat >=2.6.0.
1365+
The implementation of :meth:`flush` temporarily disables reparse deferral
1366+
with Expat (if currently enabled) and triggers a reparse.
1367+
Disabling reparse deferral has security consequences; please see
1368+
:meth:`xml.parsers.expat.xmlparser.SetReparseDeferralEnabled` for details.
1369+
1370+
Note that :meth:`flush` has been backported to some prior releases of
1371+
CPython as a security fix. Check for availability of :meth:`flush`
1372+
using :func:`hasattr` if used in code running across a variety of Python
1373+
versions.
1374+
1375+
.. versionadded:: 3.9.19
1376+
1377+
13551378
:meth:`XMLParser.feed` calls *target*\'s ``start(tag, attrs_dict)`` method
13561379
for each opening tag, its ``end(tag)`` method for each closing tag, and data
13571380
is processed by method ``data(data)``. For further supported callback
@@ -1413,6 +1436,22 @@ XMLPullParser Objects
14131436

14141437
Feed the given bytes data to the parser.
14151438

1439+
.. method:: flush()
1440+
1441+
Triggers parsing of any previously fed unparsed data, which can be
1442+
used to ensure more immediate feedback, in particular with Expat >=2.6.0.
1443+
The implementation of :meth:`flush` temporarily disables reparse deferral
1444+
with Expat (if currently enabled) and triggers a reparse.
1445+
Disabling reparse deferral has security consequences; please see
1446+
:meth:`xml.parsers.expat.xmlparser.SetReparseDeferralEnabled` for details.
1447+
1448+
Note that :meth:`flush` has been backported to some prior releases of
1449+
CPython as a security fix. Check for availability of :meth:`flush`
1450+
using :func:`hasattr` if used in code running across a variety of Python
1451+
versions.
1452+
1453+
.. versionadded:: 3.9.19
1454+
14161455
.. method:: close()
14171456

14181457
Signal the parser that the data stream is terminated. Unlike

‎Include/pyexpat.h

Copy file name to clipboardExpand all lines: Include/pyexpat.h
+3-1Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,8 +48,10 @@ struct PyExpat_CAPI
4848
enum XML_Status (*SetEncoding)(XML_Parser parser, const XML_Char *encoding);
4949
int (*DefaultUnknownEncodingHandler)(
5050
void *encodingHandlerData, const XML_Char *name, XML_Encoding *info);
51-
/* might be none for expat < 2.1.0 */
51+
/* might be NULL for expat < 2.1.0 */
5252
int (*SetHashSalt)(XML_Parser parser, unsigned long hash_salt);
53+
/* might be NULL for expat < 2.6.0 */
54+
XML_Bool (*SetReparseDeferralEnabled)(XML_Parser parser, XML_Bool enabled);
5355
/* always add new stuff to the end! */
5456
};
5557

‎Lib/test/test_pyexpat.py

Copy file name to clipboardExpand all lines: Lib/test/test_pyexpat.py
+54Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -730,5 +730,59 @@ def resolve_entity(context, base, system_id, public_id):
730730
self.assertEqual(handler_call_args, [("bar", "baz")])
731731

732732

733+
class ReparseDeferralTest(unittest.TestCase):
734+
def test_getter_setter_round_trip(self):
735+
parser = expat.ParserCreate()
736+
enabled = (expat.version_info >= (2, 6, 0))
737+
738+
self.assertIs(parser.GetReparseDeferralEnabled(), enabled)
739+
parser.SetReparseDeferralEnabled(False)
740+
self.assertIs(parser.GetReparseDeferralEnabled(), False)
741+
parser.SetReparseDeferralEnabled(True)
742+
self.assertIs(parser.GetReparseDeferralEnabled(), enabled)
743+
744+
def test_reparse_deferral_enabled(self):
745+
if expat.version_info < (2, 6, 0):
746+
self.skipTest(f'Expat {expat.version_info} does not '
747+
'support reparse deferral')
748+
749+
started = []
750+
751+
def start_element(name, _):
752+
started.append(name)
753+
754+
parser = expat.ParserCreate()
755+
parser.StartElementHandler = start_element
756+
self.assertTrue(parser.GetReparseDeferralEnabled())
757+
758+
for chunk in (b'<doc', b'/>'):
759+
parser.Parse(chunk, False)
760+
761+
# The key test: Have handlers already fired? Expecting: no.
762+
self.assertEqual(started, [])
763+
764+
parser.Parse(b'', True)
765+
766+
self.assertEqual(started, ['doc'])
767+
768+
def test_reparse_deferral_disabled(self):
769+
started = []
770+
771+
def start_element(name, _):
772+
started.append(name)
773+
774+
parser = expat.ParserCreate()
775+
parser.StartElementHandler = start_element
776+
if expat.version_info >= (2, 6, 0):
777+
parser.SetReparseDeferralEnabled(False)
778+
self.assertFalse(parser.GetReparseDeferralEnabled())
779+
780+
for chunk in (b'<doc', b'/>'):
781+
parser.Parse(chunk, False)
782+
783+
# The key test: Have handlers already fired? Expecting: yes.
784+
self.assertEqual(started, ['doc'])
785+
786+
733787
if __name__ == "__main__":
734788
unittest.main()

‎Lib/test/test_sax.py

Copy file name to clipboardExpand all lines: Lib/test/test_sax.py
+51Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818
from io import BytesIO, StringIO
1919
import codecs
2020
import os.path
21+
import pyexpat
2122
import shutil
2223
from urllib.error import URLError
2324
import urllib.request
@@ -1210,6 +1211,56 @@ def test_expat_incremental_reset(self):
12101211

12111212
self.assertEqual(result.getvalue(), start + b"<doc>text</doc>")
12121213

1214+
def test_flush_reparse_deferral_enabled(self):
1215+
if pyexpat.version_info < (2, 6, 0):
1216+
self.skipTest(f'Expat {pyexpat.version_info} does not support reparse deferral')
1217+
1218+
result = BytesIO()
1219+
xmlgen = XMLGenerator(result)
1220+
parser = create_parser()
1221+
parser.setContentHandler(xmlgen)
1222+
1223+
for chunk in ("<doc", ">"):
1224+
parser.feed(chunk)
1225+
1226+
self.assertEqual(result.getvalue(), start) # i.e. no elements started
1227+
self.assertTrue(parser._parser.GetReparseDeferralEnabled())
1228+
1229+
parser.flush()
1230+
1231+
self.assertTrue(parser._parser.GetReparseDeferralEnabled())
1232+
self.assertEqual(result.getvalue(), start + b"<doc>")
1233+
1234+
parser.feed("</doc>")
1235+
parser.close()
1236+
1237+
self.assertEqual(result.getvalue(), start + b"<doc></doc>")
1238+
1239+
def test_flush_reparse_deferral_disabled(self):
1240+
result = BytesIO()
1241+
xmlgen = XMLGenerator(result)
1242+
parser = create_parser()
1243+
parser.setContentHandler(xmlgen)
1244+
1245+
for chunk in ("<doc", ">"):
1246+
parser.feed(chunk)
1247+
1248+
if pyexpat.version_info >= (2, 6, 0):
1249+
parser._parser.SetReparseDeferralEnabled(False)
1250+
1251+
self.assertEqual(result.getvalue(), start) # i.e. no elements started
1252+
self.assertFalse(parser._parser.GetReparseDeferralEnabled())
1253+
1254+
parser.flush()
1255+
1256+
self.assertFalse(parser._parser.GetReparseDeferralEnabled())
1257+
self.assertEqual(result.getvalue(), start + b"<doc>")
1258+
1259+
parser.feed("</doc>")
1260+
parser.close()
1261+
1262+
self.assertEqual(result.getvalue(), start + b"<doc></doc>")
1263+
12131264
# ===== Locator support
12141265

12151266
def test_expat_locator_noinfo(self):

‎Lib/test/test_xml_etree.py

Copy file name to clipboardExpand all lines: Lib/test/test_xml_etree.py
+63-17Lines changed: 63 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -104,11 +104,6 @@
104104
"""
105105

106106

107-
fails_with_expat_2_6_0 = (unittest.expectedFailure
108-
if pyexpat.version_info >= (2, 6, 0) else
109-
lambda test: test)
110-
111-
112107
def checkwarnings(*filters, quiet=False):
113108
def decorator(test):
114109
def newtest(*args, **kwargs):
@@ -1375,12 +1370,14 @@ def test_tree_write_attribute_order(self):
13751370

13761371
class XMLPullParserTest(unittest.TestCase):
13771372

1378-
def _feed(self, parser, data, chunk_size=None):
1373+
def _feed(self, parser, data, chunk_size=None, flush=False):
13791374
if chunk_size is None:
13801375
parser.feed(data)
13811376
else:
13821377
for i in range(0, len(data), chunk_size):
13831378
parser.feed(data[i:i+chunk_size])
1379+
if flush:
1380+
parser.flush()
13841381

13851382
def assert_events(self, parser, expected, max_events=None):
13861383
self.assertEqual(
@@ -1398,34 +1395,32 @@ def assert_event_tags(self, parser, expected, max_events=None):
13981395
self.assertEqual([(action, elem.tag) for action, elem in events],
13991396
expected)
14001397

1401-
def test_simple_xml(self, chunk_size=None):
1398+
def test_simple_xml(self, chunk_size=None, flush=False):
14021399
parser = ET.XMLPullParser()
14031400
self.assert_event_tags(parser, [])
1404-
self._feed(parser, "<!-- comment -->\n", chunk_size)
1401+
self._feed(parser, "<!-- comment -->\n", chunk_size, flush)
14051402
self.assert_event_tags(parser, [])
14061403
self._feed(parser,
14071404
"<root>\n <element key='value'>text</element",
1408-
chunk_size)
1405+
chunk_size, flush)
14091406
self.assert_event_tags(parser, [])
1410-
self._feed(parser, ">\n", chunk_size)
1407+
self._feed(parser, ">\n", chunk_size, flush)
14111408
self.assert_event_tags(parser, [('end', 'element')])
1412-
self._feed(parser, "<element>text</element>tail\n", chunk_size)
1413-
self._feed(parser, "<empty-element/>\n", chunk_size)
1409+
self._feed(parser, "<element>text</element>tail\n", chunk_size, flush)
1410+
self._feed(parser, "<empty-element/>\n", chunk_size, flush)
14141411
self.assert_event_tags(parser, [
14151412
('end', 'element'),
14161413
('end', 'empty-element'),
14171414
])
1418-
self._feed(parser, "</root>\n", chunk_size)
1415+
self._feed(parser, "</root>\n", chunk_size, flush)
14191416
self.assert_event_tags(parser, [('end', 'root')])
14201417
self.assertIsNone(parser.close())
14211418

1422-
@fails_with_expat_2_6_0
14231419
def test_simple_xml_chunk_1(self):
1424-
self.test_simple_xml(chunk_size=1)
1420+
self.test_simple_xml(chunk_size=1, flush=True)
14251421

1426-
@fails_with_expat_2_6_0
14271422
def test_simple_xml_chunk_5(self):
1428-
self.test_simple_xml(chunk_size=5)
1423+
self.test_simple_xml(chunk_size=5, flush=True)
14291424

14301425
def test_simple_xml_chunk_22(self):
14311426
self.test_simple_xml(chunk_size=22)
@@ -1624,6 +1619,57 @@ def test_unknown_event(self):
16241619
with self.assertRaises(ValueError):
16251620
ET.XMLPullParser(events=('start', 'end', 'bogus'))
16261621

1622+
def test_flush_reparse_deferral_enabled(self):
1623+
if pyexpat.version_info < (2, 6, 0):
1624+
self.skipTest(f'Expat {pyexpat.version_info} does not '
1625+
'support reparse deferral')
1626+
1627+
parser = ET.XMLPullParser(events=('start', 'end'))
1628+
1629+
for chunk in ("<doc", ">"):
1630+
parser.feed(chunk)
1631+
1632+
self.assert_event_tags(parser, []) # i.e. no elements started
1633+
if ET is pyET:
1634+
self.assertTrue(parser._parser._parser.GetReparseDeferralEnabled())
1635+
1636+
parser.flush()
1637+
1638+
self.assert_event_tags(parser, [('start', 'doc')])
1639+
if ET is pyET:
1640+
self.assertTrue(parser._parser._parser.GetReparseDeferralEnabled())
1641+
1642+
parser.feed("</doc>")
1643+
parser.close()
1644+
1645+
self.assert_event_tags(parser, [('end', 'doc')])
1646+
1647+
def test_flush_reparse_deferral_disabled(self):
1648+
parser = ET.XMLPullParser(events=('start', 'end'))
1649+
1650+
for chunk in ("<doc", ">"):
1651+
parser.feed(chunk)
1652+
1653+
if pyexpat.version_info >= (2, 6, 0):
1654+
if not ET is pyET:
1655+
self.skipTest(f'XMLParser.(Get|Set)ReparseDeferralEnabled '
1656+
'methods not available in C')
1657+
parser._parser._parser.SetReparseDeferralEnabled(False)
1658+
1659+
self.assert_event_tags(parser, []) # i.e. no elements started
1660+
if ET is pyET:
1661+
self.assertFalse(parser._parser._parser.GetReparseDeferralEnabled())
1662+
1663+
parser.flush()
1664+
1665+
self.assert_event_tags(parser, [('start', 'doc')])
1666+
if ET is pyET:
1667+
self.assertFalse(parser._parser._parser.GetReparseDeferralEnabled())
1668+
1669+
parser.feed("</doc>")
1670+
parser.close()
1671+
1672+
self.assert_event_tags(parser, [('end', 'doc')])
16271673

16281674
#
16291675
# xinclude tests (samples from appendix C of the xinclude specification)

‎Lib/xml/etree/ElementTree.py

Copy file name to clipboardExpand all lines: Lib/xml/etree/ElementTree.py
+14Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1325,6 +1325,11 @@ def read_events(self):
13251325
else:
13261326
yield event
13271327

1328+
def flush(self):
1329+
if self._parser is None:
1330+
raise ValueError("flush() called after end of stream")
1331+
self._parser.flush()
1332+
13281333

13291334
def XML(text, parser=None):
13301335
"""Parse XML document from string constant.
@@ -1733,6 +1738,15 @@ def close(self):
17331738
del self.parser, self._parser
17341739
del self.target, self._target
17351740

1741+
def flush(self):
1742+
was_enabled = self.parser.GetReparseDeferralEnabled()
1743+
try:
1744+
self.parser.SetReparseDeferralEnabled(False)
1745+
self.parser.Parse(b"", False)
1746+
except self._error as v:
1747+
self._raiseerror(v)
1748+
finally:
1749+
self.parser.SetReparseDeferralEnabled(was_enabled)
17361750

17371751
# --------------------------------------------------------------------
17381752
# C14N 2.0

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.