Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit 854f645

Browse filesBrowse files
hartworkgpshead
andauthored
[3.8] gh-115398: Expose Expat >=2.6.0 reparse deferral API (CVE-2023-52425) (GH-115623) (GH-116275)
Allow controlling Expat >=2.6.0 reparse deferral (CVE-2023-52425) by adding five new methods: - `xml.etree.ElementTree.XMLParser.flush` - `xml.etree.ElementTree.XMLPullParser.flush` - `xml.parsers.expat.xmlparser.GetReparseDeferralEnabled` - `xml.parsers.expat.xmlparser.SetReparseDeferralEnabled` - `xml.sax.expatreader.ExpatParser.flush` Based on the "flush" idea from #115138 (comment) . Includes code suggested-by: Snild Dolkow <snild@sony.com> and by core dev Serhiy Storchaka. Co-authored-by: Gregory P. Smith <greg@krypto.org>
1 parent 4d58a1d commit 854f645
Copy full SHA for 854f645

14 files changed

+435
-20
lines changed

‎Doc/library/pyexpat.rst

Copy file name to clipboardExpand all lines: Doc/library/pyexpat.rst
+36Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -196,6 +196,42 @@ XMLParser Objects
196196
:exc:`ExpatError` to be raised with the :attr:`code` attribute set to
197197
``errors.codes[errors.XML_ERROR_CANT_CHANGE_FEATURE_ONCE_PARSING]``.
198198

199+
.. method:: xmlparser.SetReparseDeferralEnabled(enabled)
200+
201+
.. warning::
202+
203+
Calling ``SetReparseDeferralEnabled(False)`` has security implications,
204+
as detailed below; please make sure to understand these consequences
205+
prior to using the ``SetReparseDeferralEnabled`` method.
206+
207+
Expat 2.6.0 introduced a security mechanism called "reparse deferral"
208+
where instead of causing denial of service through quadratic runtime
209+
from reparsing large tokens, reparsing of unfinished tokens is now delayed
210+
by default until a sufficient amount of input is reached.
211+
Due to this delay, registered handlers may — depending of the sizing of
212+
input chunks pushed to Expat — no longer be called right after pushing new
213+
input to the parser. Where immediate feedback and taking over responsiblity
214+
of protecting against denial of service from large tokens are both wanted,
215+
calling ``SetReparseDeferralEnabled(False)`` disables reparse deferral
216+
for the current Expat parser instance, temporarily or altogether.
217+
Calling ``SetReparseDeferralEnabled(True)`` allows re-enabling reparse
218+
deferral.
219+
220+
Note that :meth:`SetReparseDeferralEnabled` has been backported to some
221+
prior releases of CPython as a security fix. Check for availability of
222+
:meth:`SetReparseDeferralEnabled` using :func:`hasattr` if used in code
223+
running across a variety of Python versions.
224+
225+
.. versionadded:: 3.8.19
226+
227+
.. method:: xmlparser.GetReparseDeferralEnabled()
228+
229+
Returns whether reparse deferral is currently enabled for the given
230+
Expat parser instance.
231+
232+
.. versionadded:: 3.8.19
233+
234+
199235
:class:`xmlparser` objects have the following attributes:
200236

201237

‎Doc/library/xml.etree.elementtree.rst

Copy file name to clipboardExpand all lines: Doc/library/xml.etree.elementtree.rst
+39Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -163,6 +163,11 @@ data but would still like to have incremental parsing capabilities, take a look
163163
at :func:`iterparse`. It can be useful when you're reading a large XML document
164164
and don't want to hold it wholly in memory.
165165

166+
Where *immediate* feedback through events is wanted, calling method
167+
:meth:`XMLPullParser.flush` can help reduce delay;
168+
please make sure to study the related security notes.
169+
170+
166171
Finding interesting elements
167172
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
168173

@@ -1352,6 +1357,24 @@ XMLParser Objects
13521357

13531358
Feeds data to the parser. *data* is encoded data.
13541359

1360+
1361+
.. method:: flush()
1362+
1363+
Triggers parsing of any previously fed unparsed data, which can be
1364+
used to ensure more immediate feedback, in particular with Expat >=2.6.0.
1365+
The implementation of :meth:`flush` temporarily disables reparse deferral
1366+
with Expat (if currently enabled) and triggers a reparse.
1367+
Disabling reparse deferral has security consequences; please see
1368+
:meth:`xml.parsers.expat.xmlparser.SetReparseDeferralEnabled` for details.
1369+
1370+
Note that :meth:`flush` has been backported to some prior releases of
1371+
CPython as a security fix. Check for availability of :meth:`flush`
1372+
using :func:`hasattr` if used in code running across a variety of Python
1373+
versions.
1374+
1375+
.. versionadded:: 3.8.19
1376+
1377+
13551378
:meth:`XMLParser.feed` calls *target*\'s ``start(tag, attrs_dict)`` method
13561379
for each opening tag, its ``end(tag)`` method for each closing tag, and data
13571380
is processed by method ``data(data)``. For further supported callback
@@ -1413,6 +1436,22 @@ XMLPullParser Objects
14131436

14141437
Feed the given bytes data to the parser.
14151438

1439+
.. method:: flush()
1440+
1441+
Triggers parsing of any previously fed unparsed data, which can be
1442+
used to ensure more immediate feedback, in particular with Expat >=2.6.0.
1443+
The implementation of :meth:`flush` temporarily disables reparse deferral
1444+
with Expat (if currently enabled) and triggers a reparse.
1445+
Disabling reparse deferral has security consequences; please see
1446+
:meth:`xml.parsers.expat.xmlparser.SetReparseDeferralEnabled` for details.
1447+
1448+
Note that :meth:`flush` has been backported to some prior releases of
1449+
CPython as a security fix. Check for availability of :meth:`flush`
1450+
using :func:`hasattr` if used in code running across a variety of Python
1451+
versions.
1452+
1453+
.. versionadded:: 3.8.19
1454+
14161455
.. method:: close()
14171456

14181457
Signal the parser that the data stream is terminated. Unlike

‎Include/pyexpat.h

Copy file name to clipboardExpand all lines: Include/pyexpat.h
+3-1Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,8 +48,10 @@ struct PyExpat_CAPI
4848
enum XML_Status (*SetEncoding)(XML_Parser parser, const XML_Char *encoding);
4949
int (*DefaultUnknownEncodingHandler)(
5050
void *encodingHandlerData, const XML_Char *name, XML_Encoding *info);
51-
/* might be none for expat < 2.1.0 */
51+
/* might be NULL for expat < 2.1.0 */
5252
int (*SetHashSalt)(XML_Parser parser, unsigned long hash_salt);
53+
/* might be NULL for expat < 2.6.0 */
54+
XML_Bool (*SetReparseDeferralEnabled)(XML_Parser parser, XML_Bool enabled);
5355
/* always add new stuff to the end! */
5456
};
5557

‎Lib/test/test_pyexpat.py

Copy file name to clipboardExpand all lines: Lib/test/test_pyexpat.py
+54Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -729,5 +729,59 @@ def resolve_entity(context, base, system_id, public_id):
729729
self.assertEqual(handler_call_args, [("bar", "baz")])
730730

731731

732+
class ReparseDeferralTest(unittest.TestCase):
733+
def test_getter_setter_round_trip(self):
734+
parser = expat.ParserCreate()
735+
enabled = (expat.version_info >= (2, 6, 0))
736+
737+
self.assertIs(parser.GetReparseDeferralEnabled(), enabled)
738+
parser.SetReparseDeferralEnabled(False)
739+
self.assertIs(parser.GetReparseDeferralEnabled(), False)
740+
parser.SetReparseDeferralEnabled(True)
741+
self.assertIs(parser.GetReparseDeferralEnabled(), enabled)
742+
743+
def test_reparse_deferral_enabled(self):
744+
if expat.version_info < (2, 6, 0):
745+
self.skipTest(f'Expat {expat.version_info} does not '
746+
'support reparse deferral')
747+
748+
started = []
749+
750+
def start_element(name, _):
751+
started.append(name)
752+
753+
parser = expat.ParserCreate()
754+
parser.StartElementHandler = start_element
755+
self.assertTrue(parser.GetReparseDeferralEnabled())
756+
757+
for chunk in (b'<doc', b'/>'):
758+
parser.Parse(chunk, False)
759+
760+
# The key test: Have handlers already fired? Expecting: no.
761+
self.assertEqual(started, [])
762+
763+
parser.Parse(b'', True)
764+
765+
self.assertEqual(started, ['doc'])
766+
767+
def test_reparse_deferral_disabled(self):
768+
started = []
769+
770+
def start_element(name, _):
771+
started.append(name)
772+
773+
parser = expat.ParserCreate()
774+
parser.StartElementHandler = start_element
775+
if expat.version_info >= (2, 6, 0):
776+
parser.SetReparseDeferralEnabled(False)
777+
self.assertFalse(parser.GetReparseDeferralEnabled())
778+
779+
for chunk in (b'<doc', b'/>'):
780+
parser.Parse(chunk, False)
781+
782+
# The key test: Have handlers already fired? Expecting: yes.
783+
self.assertEqual(started, ['doc'])
784+
785+
732786
if __name__ == "__main__":
733787
unittest.main()

‎Lib/test/test_sax.py

Copy file name to clipboardExpand all lines: Lib/test/test_sax.py
+51Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818
from io import BytesIO, StringIO
1919
import codecs
2020
import os.path
21+
import pyexpat
2122
import shutil
2223
from urllib.error import URLError
2324
from test import support
@@ -1206,6 +1207,56 @@ def test_expat_incremental_reset(self):
12061207

12071208
self.assertEqual(result.getvalue(), start + b"<doc>text</doc>")
12081209

1210+
def test_flush_reparse_deferral_enabled(self):
1211+
if pyexpat.version_info < (2, 6, 0):
1212+
self.skipTest(f'Expat {pyexpat.version_info} does not support reparse deferral')
1213+
1214+
result = BytesIO()
1215+
xmlgen = XMLGenerator(result)
1216+
parser = create_parser()
1217+
parser.setContentHandler(xmlgen)
1218+
1219+
for chunk in ("<doc", ">"):
1220+
parser.feed(chunk)
1221+
1222+
self.assertEqual(result.getvalue(), start) # i.e. no elements started
1223+
self.assertTrue(parser._parser.GetReparseDeferralEnabled())
1224+
1225+
parser.flush()
1226+
1227+
self.assertTrue(parser._parser.GetReparseDeferralEnabled())
1228+
self.assertEqual(result.getvalue(), start + b"<doc>")
1229+
1230+
parser.feed("</doc>")
1231+
parser.close()
1232+
1233+
self.assertEqual(result.getvalue(), start + b"<doc></doc>")
1234+
1235+
def test_flush_reparse_deferral_disabled(self):
1236+
result = BytesIO()
1237+
xmlgen = XMLGenerator(result)
1238+
parser = create_parser()
1239+
parser.setContentHandler(xmlgen)
1240+
1241+
for chunk in ("<doc", ">"):
1242+
parser.feed(chunk)
1243+
1244+
if pyexpat.version_info >= (2, 6, 0):
1245+
parser._parser.SetReparseDeferralEnabled(False)
1246+
1247+
self.assertEqual(result.getvalue(), start) # i.e. no elements started
1248+
self.assertFalse(parser._parser.GetReparseDeferralEnabled())
1249+
1250+
parser.flush()
1251+
1252+
self.assertFalse(parser._parser.GetReparseDeferralEnabled())
1253+
self.assertEqual(result.getvalue(), start + b"<doc>")
1254+
1255+
parser.feed("</doc>")
1256+
parser.close()
1257+
1258+
self.assertEqual(result.getvalue(), start + b"<doc></doc>")
1259+
12091260
# ===== Locator support
12101261

12111262
def test_expat_locator_noinfo(self):

‎Lib/test/test_xml_etree.py

Copy file name to clipboardExpand all lines: Lib/test/test_xml_etree.py
+63-17Lines changed: 63 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -105,11 +105,6 @@
105105
"""
106106

107107

108-
fails_with_expat_2_6_0 = (unittest.expectedFailure
109-
if pyexpat.version_info >= (2, 6, 0) else
110-
lambda test: test)
111-
112-
113108
def checkwarnings(*filters, quiet=False):
114109
def decorator(test):
115110
def newtest(*args, **kwargs):
@@ -1250,12 +1245,14 @@ def test_tree_write_attribute_order(self):
12501245

12511246
class XMLPullParserTest(unittest.TestCase):
12521247

1253-
def _feed(self, parser, data, chunk_size=None):
1248+
def _feed(self, parser, data, chunk_size=None, flush=False):
12541249
if chunk_size is None:
12551250
parser.feed(data)
12561251
else:
12571252
for i in range(0, len(data), chunk_size):
12581253
parser.feed(data[i:i+chunk_size])
1254+
if flush:
1255+
parser.flush()
12591256

12601257
def assert_events(self, parser, expected, max_events=None):
12611258
self.assertEqual(
@@ -1273,34 +1270,32 @@ def assert_event_tags(self, parser, expected, max_events=None):
12731270
self.assertEqual([(action, elem.tag) for action, elem in events],
12741271
expected)
12751272

1276-
def test_simple_xml(self, chunk_size=None):
1273+
def test_simple_xml(self, chunk_size=None, flush=False):
12771274
parser = ET.XMLPullParser()
12781275
self.assert_event_tags(parser, [])
1279-
self._feed(parser, "<!-- comment -->\n", chunk_size)
1276+
self._feed(parser, "<!-- comment -->\n", chunk_size, flush)
12801277
self.assert_event_tags(parser, [])
12811278
self._feed(parser,
12821279
"<root>\n <element key='value'>text</element",
1283-
chunk_size)
1280+
chunk_size, flush)
12841281
self.assert_event_tags(parser, [])
1285-
self._feed(parser, ">\n", chunk_size)
1282+
self._feed(parser, ">\n", chunk_size, flush)
12861283
self.assert_event_tags(parser, [('end', 'element')])
1287-
self._feed(parser, "<element>text</element>tail\n", chunk_size)
1288-
self._feed(parser, "<empty-element/>\n", chunk_size)
1284+
self._feed(parser, "<element>text</element>tail\n", chunk_size, flush)
1285+
self._feed(parser, "<empty-element/>\n", chunk_size, flush)
12891286
self.assert_event_tags(parser, [
12901287
('end', 'element'),
12911288
('end', 'empty-element'),
12921289
])
1293-
self._feed(parser, "</root>\n", chunk_size)
1290+
self._feed(parser, "</root>\n", chunk_size, flush)
12941291
self.assert_event_tags(parser, [('end', 'root')])
12951292
self.assertIsNone(parser.close())
12961293

1297-
@fails_with_expat_2_6_0
12981294
def test_simple_xml_chunk_1(self):
1299-
self.test_simple_xml(chunk_size=1)
1295+
self.test_simple_xml(chunk_size=1, flush=True)
13001296

1301-
@fails_with_expat_2_6_0
13021297
def test_simple_xml_chunk_5(self):
1303-
self.test_simple_xml(chunk_size=5)
1298+
self.test_simple_xml(chunk_size=5, flush=True)
13041299

13051300
def test_simple_xml_chunk_22(self):
13061301
self.test_simple_xml(chunk_size=22)
@@ -1499,6 +1494,57 @@ def test_unknown_event(self):
14991494
with self.assertRaises(ValueError):
15001495
ET.XMLPullParser(events=('start', 'end', 'bogus'))
15011496

1497+
def test_flush_reparse_deferral_enabled(self):
1498+
if pyexpat.version_info < (2, 6, 0):
1499+
self.skipTest(f'Expat {pyexpat.version_info} does not '
1500+
'support reparse deferral')
1501+
1502+
parser = ET.XMLPullParser(events=('start', 'end'))
1503+
1504+
for chunk in ("<doc", ">"):
1505+
parser.feed(chunk)
1506+
1507+
self.assert_event_tags(parser, []) # i.e. no elements started
1508+
if ET is pyET:
1509+
self.assertTrue(parser._parser._parser.GetReparseDeferralEnabled())
1510+
1511+
parser.flush()
1512+
1513+
self.assert_event_tags(parser, [('start', 'doc')])
1514+
if ET is pyET:
1515+
self.assertTrue(parser._parser._parser.GetReparseDeferralEnabled())
1516+
1517+
parser.feed("</doc>")
1518+
parser.close()
1519+
1520+
self.assert_event_tags(parser, [('end', 'doc')])
1521+
1522+
def test_flush_reparse_deferral_disabled(self):
1523+
parser = ET.XMLPullParser(events=('start', 'end'))
1524+
1525+
for chunk in ("<doc", ">"):
1526+
parser.feed(chunk)
1527+
1528+
if pyexpat.version_info >= (2, 6, 0):
1529+
if not ET is pyET:
1530+
self.skipTest(f'XMLParser.(Get|Set)ReparseDeferralEnabled '
1531+
'methods not available in C')
1532+
parser._parser._parser.SetReparseDeferralEnabled(False)
1533+
1534+
self.assert_event_tags(parser, []) # i.e. no elements started
1535+
if ET is pyET:
1536+
self.assertFalse(parser._parser._parser.GetReparseDeferralEnabled())
1537+
1538+
parser.flush()
1539+
1540+
self.assert_event_tags(parser, [('start', 'doc')])
1541+
if ET is pyET:
1542+
self.assertFalse(parser._parser._parser.GetReparseDeferralEnabled())
1543+
1544+
parser.feed("</doc>")
1545+
parser.close()
1546+
1547+
self.assert_event_tags(parser, [('end', 'doc')])
15021548

15031549
#
15041550
# xinclude tests (samples from appendix C of the xinclude specification)

‎Lib/xml/etree/ElementTree.py

Copy file name to clipboardExpand all lines: Lib/xml/etree/ElementTree.py
+14Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1303,6 +1303,11 @@ def read_events(self):
13031303
else:
13041304
yield event
13051305

1306+
def flush(self):
1307+
if self._parser is None:
1308+
raise ValueError("flush() called after end of stream")
1309+
self._parser.flush()
1310+
13061311

13071312
def XML(text, parser=None):
13081313
"""Parse XML document from string constant.
@@ -1711,6 +1716,15 @@ def close(self):
17111716
del self.parser, self._parser
17121717
del self.target, self._target
17131718

1719+
def flush(self):
1720+
was_enabled = self.parser.GetReparseDeferralEnabled()
1721+
try:
1722+
self.parser.SetReparseDeferralEnabled(False)
1723+
self.parser.Parse(b"", False)
1724+
except self._error as v:
1725+
self._raiseerror(v)
1726+
finally:
1727+
self.parser.SetReparseDeferralEnabled(was_enabled)
17141728

17151729
# --------------------------------------------------------------------
17161730
# C14N 2.0

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.