Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit 516a6d4

Browse filesBrowse files
hartworkgpshead
andauthored
[3.10] gh-115398: Expose Expat >=2.6.0 reparse deferral API (CVE-2023-52425) (GH-115623) (GH-116270)
Allow controlling Expat >=2.6.0 reparse deferral (CVE-2023-52425) by adding five new methods: - `xml.etree.ElementTree.XMLParser.flush` - `xml.etree.ElementTree.XMLPullParser.flush` - `xml.parsers.expat.xmlparser.GetReparseDeferralEnabled` - `xml.parsers.expat.xmlparser.SetReparseDeferralEnabled` - `xml.sax.expatreader.ExpatParser.flush` Based on the "flush" idea from #115138 (comment) . Includes code suggested-by: Snild Dolkow <snild@sony.com> and by core dev Serhiy Storchaka. Co-authored-by: Gregory P. Smith <greg@krypto.org>
1 parent b612ec6 commit 516a6d4
Copy full SHA for 516a6d4

14 files changed

+435
-19
lines changed

‎Doc/library/pyexpat.rst

Copy file name to clipboardExpand all lines: Doc/library/pyexpat.rst
+36Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -196,6 +196,42 @@ XMLParser Objects
196196
:exc:`ExpatError` to be raised with the :attr:`code` attribute set to
197197
``errors.codes[errors.XML_ERROR_CANT_CHANGE_FEATURE_ONCE_PARSING]``.
198198

199+
.. method:: xmlparser.SetReparseDeferralEnabled(enabled)
200+
201+
.. warning::
202+
203+
Calling ``SetReparseDeferralEnabled(False)`` has security implications,
204+
as detailed below; please make sure to understand these consequences
205+
prior to using the ``SetReparseDeferralEnabled`` method.
206+
207+
Expat 2.6.0 introduced a security mechanism called "reparse deferral"
208+
where instead of causing denial of service through quadratic runtime
209+
from reparsing large tokens, reparsing of unfinished tokens is now delayed
210+
by default until a sufficient amount of input is reached.
211+
Due to this delay, registered handlers may — depending of the sizing of
212+
input chunks pushed to Expat — no longer be called right after pushing new
213+
input to the parser. Where immediate feedback and taking over responsiblity
214+
of protecting against denial of service from large tokens are both wanted,
215+
calling ``SetReparseDeferralEnabled(False)`` disables reparse deferral
216+
for the current Expat parser instance, temporarily or altogether.
217+
Calling ``SetReparseDeferralEnabled(True)`` allows re-enabling reparse
218+
deferral.
219+
220+
Note that :meth:`SetReparseDeferralEnabled` has been backported to some
221+
prior releases of CPython as a security fix. Check for availability of
222+
:meth:`SetReparseDeferralEnabled` using :func:`hasattr` if used in code
223+
running across a variety of Python versions.
224+
225+
.. versionadded:: 3.10.14
226+
227+
.. method:: xmlparser.GetReparseDeferralEnabled()
228+
229+
Returns whether reparse deferral is currently enabled for the given
230+
Expat parser instance.
231+
232+
.. versionadded:: 3.10.14
233+
234+
199235
:class:`xmlparser` objects have the following attributes:
200236

201237

‎Doc/library/xml.etree.elementtree.rst

Copy file name to clipboardExpand all lines: Doc/library/xml.etree.elementtree.rst
+39Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -165,6 +165,11 @@ data but would still like to have incremental parsing capabilities, take a look
165165
at :func:`iterparse`. It can be useful when you're reading a large XML document
166166
and don't want to hold it wholly in memory.
167167

168+
Where *immediate* feedback through events is wanted, calling method
169+
:meth:`XMLPullParser.flush` can help reduce delay;
170+
please make sure to study the related security notes.
171+
172+
168173
Finding interesting elements
169174
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
170175

@@ -1370,6 +1375,24 @@ XMLParser Objects
13701375

13711376
Feeds data to the parser. *data* is encoded data.
13721377

1378+
1379+
.. method:: flush()
1380+
1381+
Triggers parsing of any previously fed unparsed data, which can be
1382+
used to ensure more immediate feedback, in particular with Expat >=2.6.0.
1383+
The implementation of :meth:`flush` temporarily disables reparse deferral
1384+
with Expat (if currently enabled) and triggers a reparse.
1385+
Disabling reparse deferral has security consequences; please see
1386+
:meth:`xml.parsers.expat.xmlparser.SetReparseDeferralEnabled` for details.
1387+
1388+
Note that :meth:`flush` has been backported to some prior releases of
1389+
CPython as a security fix. Check for availability of :meth:`flush`
1390+
using :func:`hasattr` if used in code running across a variety of Python
1391+
versions.
1392+
1393+
.. versionadded:: 3.10.14
1394+
1395+
13731396
:meth:`XMLParser.feed` calls *target*\'s ``start(tag, attrs_dict)`` method
13741397
for each opening tag, its ``end(tag)`` method for each closing tag, and data
13751398
is processed by method ``data(data)``. For further supported callback
@@ -1431,6 +1454,22 @@ XMLPullParser Objects
14311454

14321455
Feed the given bytes data to the parser.
14331456

1457+
.. method:: flush()
1458+
1459+
Triggers parsing of any previously fed unparsed data, which can be
1460+
used to ensure more immediate feedback, in particular with Expat >=2.6.0.
1461+
The implementation of :meth:`flush` temporarily disables reparse deferral
1462+
with Expat (if currently enabled) and triggers a reparse.
1463+
Disabling reparse deferral has security consequences; please see
1464+
:meth:`xml.parsers.expat.xmlparser.SetReparseDeferralEnabled` for details.
1465+
1466+
Note that :meth:`flush` has been backported to some prior releases of
1467+
CPython as a security fix. Check for availability of :meth:`flush`
1468+
using :func:`hasattr` if used in code running across a variety of Python
1469+
versions.
1470+
1471+
.. versionadded:: 3.10.14
1472+
14341473
.. method:: close()
14351474

14361475
Signal the parser that the data stream is terminated. Unlike

‎Include/pyexpat.h

Copy file name to clipboardExpand all lines: Include/pyexpat.h
+3-1Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,8 +48,10 @@ struct PyExpat_CAPI
4848
enum XML_Status (*SetEncoding)(XML_Parser parser, const XML_Char *encoding);
4949
int (*DefaultUnknownEncodingHandler)(
5050
void *encodingHandlerData, const XML_Char *name, XML_Encoding *info);
51-
/* might be none for expat < 2.1.0 */
51+
/* might be NULL for expat < 2.1.0 */
5252
int (*SetHashSalt)(XML_Parser parser, unsigned long hash_salt);
53+
/* might be NULL for expat < 2.6.0 */
54+
XML_Bool (*SetReparseDeferralEnabled)(XML_Parser parser, XML_Bool enabled);
5355
/* always add new stuff to the end! */
5456
};
5557

‎Lib/test/test_pyexpat.py

Copy file name to clipboardExpand all lines: Lib/test/test_pyexpat.py
+54Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -730,5 +730,59 @@ def resolve_entity(context, base, system_id, public_id):
730730
self.assertEqual(handler_call_args, [("bar", "baz")])
731731

732732

733+
class ReparseDeferralTest(unittest.TestCase):
734+
def test_getter_setter_round_trip(self):
735+
parser = expat.ParserCreate()
736+
enabled = (expat.version_info >= (2, 6, 0))
737+
738+
self.assertIs(parser.GetReparseDeferralEnabled(), enabled)
739+
parser.SetReparseDeferralEnabled(False)
740+
self.assertIs(parser.GetReparseDeferralEnabled(), False)
741+
parser.SetReparseDeferralEnabled(True)
742+
self.assertIs(parser.GetReparseDeferralEnabled(), enabled)
743+
744+
def test_reparse_deferral_enabled(self):
745+
if expat.version_info < (2, 6, 0):
746+
self.skipTest(f'Expat {expat.version_info} does not '
747+
'support reparse deferral')
748+
749+
started = []
750+
751+
def start_element(name, _):
752+
started.append(name)
753+
754+
parser = expat.ParserCreate()
755+
parser.StartElementHandler = start_element
756+
self.assertTrue(parser.GetReparseDeferralEnabled())
757+
758+
for chunk in (b'<doc', b'/>'):
759+
parser.Parse(chunk, False)
760+
761+
# The key test: Have handlers already fired? Expecting: no.
762+
self.assertEqual(started, [])
763+
764+
parser.Parse(b'', True)
765+
766+
self.assertEqual(started, ['doc'])
767+
768+
def test_reparse_deferral_disabled(self):
769+
started = []
770+
771+
def start_element(name, _):
772+
started.append(name)
773+
774+
parser = expat.ParserCreate()
775+
parser.StartElementHandler = start_element
776+
if expat.version_info >= (2, 6, 0):
777+
parser.SetReparseDeferralEnabled(False)
778+
self.assertFalse(parser.GetReparseDeferralEnabled())
779+
780+
for chunk in (b'<doc', b'/>'):
781+
parser.Parse(chunk, False)
782+
783+
# The key test: Have handlers already fired? Expecting: yes.
784+
self.assertEqual(started, ['doc'])
785+
786+
733787
if __name__ == "__main__":
734788
unittest.main()

‎Lib/test/test_sax.py

Copy file name to clipboardExpand all lines: Lib/test/test_sax.py
+51Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919
from io import BytesIO, StringIO
2020
import codecs
2121
import os.path
22+
import pyexpat
2223
import shutil
2324
import sys
2425
from urllib.error import URLError
@@ -1214,6 +1215,56 @@ def test_expat_incremental_reset(self):
12141215

12151216
self.assertEqual(result.getvalue(), start + b"<doc>text</doc>")
12161217

1218+
def test_flush_reparse_deferral_enabled(self):
1219+
if pyexpat.version_info < (2, 6, 0):
1220+
self.skipTest(f'Expat {pyexpat.version_info} does not support reparse deferral')
1221+
1222+
result = BytesIO()
1223+
xmlgen = XMLGenerator(result)
1224+
parser = create_parser()
1225+
parser.setContentHandler(xmlgen)
1226+
1227+
for chunk in ("<doc", ">"):
1228+
parser.feed(chunk)
1229+
1230+
self.assertEqual(result.getvalue(), start) # i.e. no elements started
1231+
self.assertTrue(parser._parser.GetReparseDeferralEnabled())
1232+
1233+
parser.flush()
1234+
1235+
self.assertTrue(parser._parser.GetReparseDeferralEnabled())
1236+
self.assertEqual(result.getvalue(), start + b"<doc>")
1237+
1238+
parser.feed("</doc>")
1239+
parser.close()
1240+
1241+
self.assertEqual(result.getvalue(), start + b"<doc></doc>")
1242+
1243+
def test_flush_reparse_deferral_disabled(self):
1244+
result = BytesIO()
1245+
xmlgen = XMLGenerator(result)
1246+
parser = create_parser()
1247+
parser.setContentHandler(xmlgen)
1248+
1249+
for chunk in ("<doc", ">"):
1250+
parser.feed(chunk)
1251+
1252+
if pyexpat.version_info >= (2, 6, 0):
1253+
parser._parser.SetReparseDeferralEnabled(False)
1254+
1255+
self.assertEqual(result.getvalue(), start) # i.e. no elements started
1256+
self.assertFalse(parser._parser.GetReparseDeferralEnabled())
1257+
1258+
parser.flush()
1259+
1260+
self.assertFalse(parser._parser.GetReparseDeferralEnabled())
1261+
self.assertEqual(result.getvalue(), start + b"<doc>")
1262+
1263+
parser.feed("</doc>")
1264+
parser.close()
1265+
1266+
self.assertEqual(result.getvalue(), start + b"<doc></doc>")
1267+
12171268
# ===== Locator support
12181269

12191270
def test_expat_locator_noinfo(self):

‎Lib/test/test_xml_etree.py

Copy file name to clipboardExpand all lines: Lib/test/test_xml_etree.py
+63-16Lines changed: 63 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -121,10 +121,6 @@
121121
</foo>
122122
"""
123123

124-
fails_with_expat_2_6_0 = (unittest.expectedFailure
125-
if pyexpat.version_info >= (2, 6, 0) else
126-
lambda test: test)
127-
128124
def checkwarnings(*filters, quiet=False):
129125
def decorator(test):
130126
def newtest(*args, **kwargs):
@@ -1378,12 +1374,14 @@ def test_attlist_default(self):
13781374

13791375
class XMLPullParserTest(unittest.TestCase):
13801376

1381-
def _feed(self, parser, data, chunk_size=None):
1377+
def _feed(self, parser, data, chunk_size=None, flush=False):
13821378
if chunk_size is None:
13831379
parser.feed(data)
13841380
else:
13851381
for i in range(0, len(data), chunk_size):
13861382
parser.feed(data[i:i+chunk_size])
1383+
if flush:
1384+
parser.flush()
13871385

13881386
def assert_events(self, parser, expected, max_events=None):
13891387
self.assertEqual(
@@ -1401,34 +1399,32 @@ def assert_event_tags(self, parser, expected, max_events=None):
14011399
self.assertEqual([(action, elem.tag) for action, elem in events],
14021400
expected)
14031401

1404-
def test_simple_xml(self, chunk_size=None):
1402+
def test_simple_xml(self, chunk_size=None, flush=False):
14051403
parser = ET.XMLPullParser()
14061404
self.assert_event_tags(parser, [])
1407-
self._feed(parser, "<!-- comment -->\n", chunk_size)
1405+
self._feed(parser, "<!-- comment -->\n", chunk_size, flush)
14081406
self.assert_event_tags(parser, [])
14091407
self._feed(parser,
14101408
"<root>\n <element key='value'>text</element",
1411-
chunk_size)
1409+
chunk_size, flush)
14121410
self.assert_event_tags(parser, [])
1413-
self._feed(parser, ">\n", chunk_size)
1411+
self._feed(parser, ">\n", chunk_size, flush)
14141412
self.assert_event_tags(parser, [('end', 'element')])
1415-
self._feed(parser, "<element>text</element>tail\n", chunk_size)
1416-
self._feed(parser, "<empty-element/>\n", chunk_size)
1413+
self._feed(parser, "<element>text</element>tail\n", chunk_size, flush)
1414+
self._feed(parser, "<empty-element/>\n", chunk_size, flush)
14171415
self.assert_event_tags(parser, [
14181416
('end', 'element'),
14191417
('end', 'empty-element'),
14201418
])
1421-
self._feed(parser, "</root>\n", chunk_size)
1419+
self._feed(parser, "</root>\n", chunk_size, flush)
14221420
self.assert_event_tags(parser, [('end', 'root')])
14231421
self.assertIsNone(parser.close())
14241422

1425-
@fails_with_expat_2_6_0
14261423
def test_simple_xml_chunk_1(self):
1427-
self.test_simple_xml(chunk_size=1)
1424+
self.test_simple_xml(chunk_size=1, flush=True)
14281425

1429-
@fails_with_expat_2_6_0
14301426
def test_simple_xml_chunk_5(self):
1431-
self.test_simple_xml(chunk_size=5)
1427+
self.test_simple_xml(chunk_size=5, flush=True)
14321428

14331429
def test_simple_xml_chunk_22(self):
14341430
self.test_simple_xml(chunk_size=22)
@@ -1627,6 +1623,57 @@ def test_unknown_event(self):
16271623
with self.assertRaises(ValueError):
16281624
ET.XMLPullParser(events=('start', 'end', 'bogus'))
16291625

1626+
def test_flush_reparse_deferral_enabled(self):
1627+
if pyexpat.version_info < (2, 6, 0):
1628+
self.skipTest(f'Expat {pyexpat.version_info} does not '
1629+
'support reparse deferral')
1630+
1631+
parser = ET.XMLPullParser(events=('start', 'end'))
1632+
1633+
for chunk in ("<doc", ">"):
1634+
parser.feed(chunk)
1635+
1636+
self.assert_event_tags(parser, []) # i.e. no elements started
1637+
if ET is pyET:
1638+
self.assertTrue(parser._parser._parser.GetReparseDeferralEnabled())
1639+
1640+
parser.flush()
1641+
1642+
self.assert_event_tags(parser, [('start', 'doc')])
1643+
if ET is pyET:
1644+
self.assertTrue(parser._parser._parser.GetReparseDeferralEnabled())
1645+
1646+
parser.feed("</doc>")
1647+
parser.close()
1648+
1649+
self.assert_event_tags(parser, [('end', 'doc')])
1650+
1651+
def test_flush_reparse_deferral_disabled(self):
1652+
parser = ET.XMLPullParser(events=('start', 'end'))
1653+
1654+
for chunk in ("<doc", ">"):
1655+
parser.feed(chunk)
1656+
1657+
if pyexpat.version_info >= (2, 6, 0):
1658+
if not ET is pyET:
1659+
self.skipTest(f'XMLParser.(Get|Set)ReparseDeferralEnabled '
1660+
'methods not available in C')
1661+
parser._parser._parser.SetReparseDeferralEnabled(False)
1662+
1663+
self.assert_event_tags(parser, []) # i.e. no elements started
1664+
if ET is pyET:
1665+
self.assertFalse(parser._parser._parser.GetReparseDeferralEnabled())
1666+
1667+
parser.flush()
1668+
1669+
self.assert_event_tags(parser, [('start', 'doc')])
1670+
if ET is pyET:
1671+
self.assertFalse(parser._parser._parser.GetReparseDeferralEnabled())
1672+
1673+
parser.feed("</doc>")
1674+
parser.close()
1675+
1676+
self.assert_event_tags(parser, [('end', 'doc')])
16301677

16311678
#
16321679
# xinclude tests (samples from appendix C of the xinclude specification)

‎Lib/xml/etree/ElementTree.py

Copy file name to clipboardExpand all lines: Lib/xml/etree/ElementTree.py
+14Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1325,6 +1325,11 @@ def read_events(self):
13251325
else:
13261326
yield event
13271327

1328+
def flush(self):
1329+
if self._parser is None:
1330+
raise ValueError("flush() called after end of stream")
1331+
self._parser.flush()
1332+
13281333

13291334
def XML(text, parser=None):
13301335
"""Parse XML document from string constant.
@@ -1731,6 +1736,15 @@ def close(self):
17311736
del self.parser, self._parser
17321737
del self.target, self._target
17331738

1739+
def flush(self):
1740+
was_enabled = self.parser.GetReparseDeferralEnabled()
1741+
try:
1742+
self.parser.SetReparseDeferralEnabled(False)
1743+
self.parser.Parse(b"", False)
1744+
except self._error as v:
1745+
self._raiseerror(v)
1746+
finally:
1747+
self.parser.SetReparseDeferralEnabled(was_enabled)
17341748

17351749
# --------------------------------------------------------------------
17361750
# C14N 2.0

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.