Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit 0a01ed6

Browse filesBrowse files
hartworkgpshead
andauthored
[3.12] gh-115398: Expose Expat >=2.6.0 reparse deferral API (CVE-2023-52425) (GH-115623) (GH-116248)
Allow controlling Expat >=2.6.0 reparse deferral (CVE-2023-52425) by adding five new methods: - `xml.etree.ElementTree.XMLParser.flush` - `xml.etree.ElementTree.XMLPullParser.flush` - `xml.parsers.expat.xmlparser.GetReparseDeferralEnabled` - `xml.parsers.expat.xmlparser.SetReparseDeferralEnabled` - `xml.sax.expatreader.ExpatParser.flush` Based on the "flush" idea from #115138 (comment) . - Please treat as a security fix related to CVE-2023-52425. (cherry picked from commit 6a95676) (cherry picked from commit 73807eb) (cherry picked from commit eda2963) --------- Includes code suggested-by: Snild Dolkow <snild@sony.com> and by core dev Serhiy Storchaka. Co-authored-by: Gregory P. Smith <greg@krypto.org>
1 parent 2528e46 commit 0a01ed6
Copy full SHA for 0a01ed6

15 files changed

+439
-21
lines changed

‎Doc/library/pyexpat.rst

Copy file name to clipboardExpand all lines: Doc/library/pyexpat.rst
+36Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -196,6 +196,42 @@ XMLParser Objects
196196
:exc:`ExpatError` to be raised with the :attr:`code` attribute set to
197197
``errors.codes[errors.XML_ERROR_CANT_CHANGE_FEATURE_ONCE_PARSING]``.
198198

199+
.. method:: xmlparser.SetReparseDeferralEnabled(enabled)
200+
201+
.. warning::
202+
203+
Calling ``SetReparseDeferralEnabled(False)`` has security implications,
204+
as detailed below; please make sure to understand these consequences
205+
prior to using the ``SetReparseDeferralEnabled`` method.
206+
207+
Expat 2.6.0 introduced a security mechanism called "reparse deferral"
208+
where instead of causing denial of service through quadratic runtime
209+
from reparsing large tokens, reparsing of unfinished tokens is now delayed
210+
by default until a sufficient amount of input is reached.
211+
Due to this delay, registered handlers may — depending of the sizing of
212+
input chunks pushed to Expat — no longer be called right after pushing new
213+
input to the parser. Where immediate feedback and taking over responsiblity
214+
of protecting against denial of service from large tokens are both wanted,
215+
calling ``SetReparseDeferralEnabled(False)`` disables reparse deferral
216+
for the current Expat parser instance, temporarily or altogether.
217+
Calling ``SetReparseDeferralEnabled(True)`` allows re-enabling reparse
218+
deferral.
219+
220+
Note that :meth:`SetReparseDeferralEnabled` has been backported to some
221+
prior releases of CPython as a security fix. Check for availability of
222+
:meth:`SetReparseDeferralEnabled` using :func:`hasattr` if used in code
223+
running across a variety of Python versions.
224+
225+
.. versionadded:: 3.12.3
226+
227+
.. method:: xmlparser.GetReparseDeferralEnabled()
228+
229+
Returns whether reparse deferral is currently enabled for the given
230+
Expat parser instance.
231+
232+
.. versionadded:: 3.12.3
233+
234+
199235
:class:`xmlparser` objects have the following attributes:
200236

201237

‎Doc/library/xml.etree.elementtree.rst

Copy file name to clipboardExpand all lines: Doc/library/xml.etree.elementtree.rst
+39Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -166,6 +166,11 @@ data but would still like to have incremental parsing capabilities, take a look
166166
at :func:`iterparse`. It can be useful when you're reading a large XML document
167167
and don't want to hold it wholly in memory.
168168

169+
Where *immediate* feedback through events is wanted, calling method
170+
:meth:`XMLPullParser.flush` can help reduce delay;
171+
please make sure to study the related security notes.
172+
173+
169174
Finding interesting elements
170175
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
171176

@@ -1382,6 +1387,24 @@ XMLParser Objects
13821387

13831388
Feeds data to the parser. *data* is encoded data.
13841389

1390+
1391+
.. method:: flush()
1392+
1393+
Triggers parsing of any previously fed unparsed data, which can be
1394+
used to ensure more immediate feedback, in particular with Expat >=2.6.0.
1395+
The implementation of :meth:`flush` temporarily disables reparse deferral
1396+
with Expat (if currently enabled) and triggers a reparse.
1397+
Disabling reparse deferral has security consequences; please see
1398+
:meth:`xml.parsers.expat.xmlparser.SetReparseDeferralEnabled` for details.
1399+
1400+
Note that :meth:`flush` has been backported to some prior releases of
1401+
CPython as a security fix. Check for availability of :meth:`flush`
1402+
using :func:`hasattr` if used in code running across a variety of Python
1403+
versions.
1404+
1405+
.. versionadded:: 3.12.3
1406+
1407+
13851408
:meth:`XMLParser.feed` calls *target*\'s ``start(tag, attrs_dict)`` method
13861409
for each opening tag, its ``end(tag)`` method for each closing tag, and data
13871410
is processed by method ``data(data)``. For further supported callback
@@ -1443,6 +1466,22 @@ XMLPullParser Objects
14431466

14441467
Feed the given bytes data to the parser.
14451468

1469+
.. method:: flush()
1470+
1471+
Triggers parsing of any previously fed unparsed data, which can be
1472+
used to ensure more immediate feedback, in particular with Expat >=2.6.0.
1473+
The implementation of :meth:`flush` temporarily disables reparse deferral
1474+
with Expat (if currently enabled) and triggers a reparse.
1475+
Disabling reparse deferral has security consequences; please see
1476+
:meth:`xml.parsers.expat.xmlparser.SetReparseDeferralEnabled` for details.
1477+
1478+
Note that :meth:`flush` has been backported to some prior releases of
1479+
CPython as a security fix. Check for availability of :meth:`flush`
1480+
using :func:`hasattr` if used in code running across a variety of Python
1481+
versions.
1482+
1483+
.. versionadded:: 3.12.3
1484+
14461485
.. method:: close()
14471486

14481487
Signal the parser that the data stream is terminated. Unlike

‎Include/pyexpat.h

Copy file name to clipboardExpand all lines: Include/pyexpat.h
+3-1Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,8 +48,10 @@ struct PyExpat_CAPI
4848
enum XML_Status (*SetEncoding)(XML_Parser parser, const XML_Char *encoding);
4949
int (*DefaultUnknownEncodingHandler)(
5050
void *encodingHandlerData, const XML_Char *name, XML_Encoding *info);
51-
/* might be none for expat < 2.1.0 */
51+
/* might be NULL for expat < 2.1.0 */
5252
int (*SetHashSalt)(XML_Parser parser, unsigned long hash_salt);
53+
/* might be NULL for expat < 2.6.0 */
54+
XML_Bool (*SetReparseDeferralEnabled)(XML_Parser parser, XML_Bool enabled);
5355
/* always add new stuff to the end! */
5456
};
5557

‎Lib/test/test_pyexpat.py

Copy file name to clipboardExpand all lines: Lib/test/test_pyexpat.py
+54Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -758,5 +758,59 @@ def resolve_entity(context, base, system_id, public_id):
758758
self.assertEqual(handler_call_args, [("bar", "baz")])
759759

760760

761+
class ReparseDeferralTest(unittest.TestCase):
762+
def test_getter_setter_round_trip(self):
763+
parser = expat.ParserCreate()
764+
enabled = (expat.version_info >= (2, 6, 0))
765+
766+
self.assertIs(parser.GetReparseDeferralEnabled(), enabled)
767+
parser.SetReparseDeferralEnabled(False)
768+
self.assertIs(parser.GetReparseDeferralEnabled(), False)
769+
parser.SetReparseDeferralEnabled(True)
770+
self.assertIs(parser.GetReparseDeferralEnabled(), enabled)
771+
772+
def test_reparse_deferral_enabled(self):
773+
if expat.version_info < (2, 6, 0):
774+
self.skipTest(f'Expat {expat.version_info} does not '
775+
'support reparse deferral')
776+
777+
started = []
778+
779+
def start_element(name, _):
780+
started.append(name)
781+
782+
parser = expat.ParserCreate()
783+
parser.StartElementHandler = start_element
784+
self.assertTrue(parser.GetReparseDeferralEnabled())
785+
786+
for chunk in (b'<doc', b'/>'):
787+
parser.Parse(chunk, False)
788+
789+
# The key test: Have handlers already fired? Expecting: no.
790+
self.assertEqual(started, [])
791+
792+
parser.Parse(b'', True)
793+
794+
self.assertEqual(started, ['doc'])
795+
796+
def test_reparse_deferral_disabled(self):
797+
started = []
798+
799+
def start_element(name, _):
800+
started.append(name)
801+
802+
parser = expat.ParserCreate()
803+
parser.StartElementHandler = start_element
804+
if expat.version_info >= (2, 6, 0):
805+
parser.SetReparseDeferralEnabled(False)
806+
self.assertFalse(parser.GetReparseDeferralEnabled())
807+
808+
for chunk in (b'<doc', b'/>'):
809+
parser.Parse(chunk, False)
810+
811+
# The key test: Have handlers already fired? Expecting: yes.
812+
self.assertEqual(started, ['doc'])
813+
814+
761815
if __name__ == "__main__":
762816
unittest.main()

‎Lib/test/test_sax.py

Copy file name to clipboardExpand all lines: Lib/test/test_sax.py
+51Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919
from io import BytesIO, StringIO
2020
import codecs
2121
import os.path
22+
import pyexpat
2223
import shutil
2324
import sys
2425
from urllib.error import URLError
@@ -1214,6 +1215,56 @@ def test_expat_incremental_reset(self):
12141215

12151216
self.assertEqual(result.getvalue(), start + b"<doc>text</doc>")
12161217

1218+
def test_flush_reparse_deferral_enabled(self):
1219+
if pyexpat.version_info < (2, 6, 0):
1220+
self.skipTest(f'Expat {pyexpat.version_info} does not support reparse deferral')
1221+
1222+
result = BytesIO()
1223+
xmlgen = XMLGenerator(result)
1224+
parser = create_parser()
1225+
parser.setContentHandler(xmlgen)
1226+
1227+
for chunk in ("<doc", ">"):
1228+
parser.feed(chunk)
1229+
1230+
self.assertEqual(result.getvalue(), start) # i.e. no elements started
1231+
self.assertTrue(parser._parser.GetReparseDeferralEnabled())
1232+
1233+
parser.flush()
1234+
1235+
self.assertTrue(parser._parser.GetReparseDeferralEnabled())
1236+
self.assertEqual(result.getvalue(), start + b"<doc>")
1237+
1238+
parser.feed("</doc>")
1239+
parser.close()
1240+
1241+
self.assertEqual(result.getvalue(), start + b"<doc></doc>")
1242+
1243+
def test_flush_reparse_deferral_disabled(self):
1244+
result = BytesIO()
1245+
xmlgen = XMLGenerator(result)
1246+
parser = create_parser()
1247+
parser.setContentHandler(xmlgen)
1248+
1249+
for chunk in ("<doc", ">"):
1250+
parser.feed(chunk)
1251+
1252+
if pyexpat.version_info >= (2, 6, 0):
1253+
parser._parser.SetReparseDeferralEnabled(False)
1254+
1255+
self.assertEqual(result.getvalue(), start) # i.e. no elements started
1256+
self.assertFalse(parser._parser.GetReparseDeferralEnabled())
1257+
1258+
parser.flush()
1259+
1260+
self.assertFalse(parser._parser.GetReparseDeferralEnabled())
1261+
self.assertEqual(result.getvalue(), start + b"<doc>")
1262+
1263+
parser.feed("</doc>")
1264+
parser.close()
1265+
1266+
self.assertEqual(result.getvalue(), start + b"<doc></doc>")
1267+
12171268
# ===== Locator support
12181269

12191270
def test_expat_locator_noinfo(self):

‎Lib/test/test_xml_etree.py

Copy file name to clipboardExpand all lines: Lib/test/test_xml_etree.py
+63-16Lines changed: 63 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -121,10 +121,6 @@
121121
</foo>
122122
"""
123123

124-
fails_with_expat_2_6_0 = (unittest.expectedFailure
125-
if pyexpat.version_info >= (2, 6, 0) else
126-
lambda test: test)
127-
128124
def checkwarnings(*filters, quiet=False):
129125
def decorator(test):
130126
def newtest(*args, **kwargs):
@@ -1382,12 +1378,14 @@ def test_attlist_default(self):
13821378

13831379
class XMLPullParserTest(unittest.TestCase):
13841380

1385-
def _feed(self, parser, data, chunk_size=None):
1381+
def _feed(self, parser, data, chunk_size=None, flush=False):
13861382
if chunk_size is None:
13871383
parser.feed(data)
13881384
else:
13891385
for i in range(0, len(data), chunk_size):
13901386
parser.feed(data[i:i+chunk_size])
1387+
if flush:
1388+
parser.flush()
13911389

13921390
def assert_events(self, parser, expected, max_events=None):
13931391
self.assertEqual(
@@ -1405,34 +1403,32 @@ def assert_event_tags(self, parser, expected, max_events=None):
14051403
self.assertEqual([(action, elem.tag) for action, elem in events],
14061404
expected)
14071405

1408-
def test_simple_xml(self, chunk_size=None):
1406+
def test_simple_xml(self, chunk_size=None, flush=False):
14091407
parser = ET.XMLPullParser()
14101408
self.assert_event_tags(parser, [])
1411-
self._feed(parser, "<!-- comment -->\n", chunk_size)
1409+
self._feed(parser, "<!-- comment -->\n", chunk_size, flush)
14121410
self.assert_event_tags(parser, [])
14131411
self._feed(parser,
14141412
"<root>\n <element key='value'>text</element",
1415-
chunk_size)
1413+
chunk_size, flush)
14161414
self.assert_event_tags(parser, [])
1417-
self._feed(parser, ">\n", chunk_size)
1415+
self._feed(parser, ">\n", chunk_size, flush)
14181416
self.assert_event_tags(parser, [('end', 'element')])
1419-
self._feed(parser, "<element>text</element>tail\n", chunk_size)
1420-
self._feed(parser, "<empty-element/>\n", chunk_size)
1417+
self._feed(parser, "<element>text</element>tail\n", chunk_size, flush)
1418+
self._feed(parser, "<empty-element/>\n", chunk_size, flush)
14211419
self.assert_event_tags(parser, [
14221420
('end', 'element'),
14231421
('end', 'empty-element'),
14241422
])
1425-
self._feed(parser, "</root>\n", chunk_size)
1423+
self._feed(parser, "</root>\n", chunk_size, flush)
14261424
self.assert_event_tags(parser, [('end', 'root')])
14271425
self.assertIsNone(parser.close())
14281426

1429-
@fails_with_expat_2_6_0
14301427
def test_simple_xml_chunk_1(self):
1431-
self.test_simple_xml(chunk_size=1)
1428+
self.test_simple_xml(chunk_size=1, flush=True)
14321429

1433-
@fails_with_expat_2_6_0
14341430
def test_simple_xml_chunk_5(self):
1435-
self.test_simple_xml(chunk_size=5)
1431+
self.test_simple_xml(chunk_size=5, flush=True)
14361432

14371433
def test_simple_xml_chunk_22(self):
14381434
self.test_simple_xml(chunk_size=22)
@@ -1631,6 +1627,57 @@ def test_unknown_event(self):
16311627
with self.assertRaises(ValueError):
16321628
ET.XMLPullParser(events=('start', 'end', 'bogus'))
16331629

1630+
def test_flush_reparse_deferral_enabled(self):
1631+
if pyexpat.version_info < (2, 6, 0):
1632+
self.skipTest(f'Expat {pyexpat.version_info} does not '
1633+
'support reparse deferral')
1634+
1635+
parser = ET.XMLPullParser(events=('start', 'end'))
1636+
1637+
for chunk in ("<doc", ">"):
1638+
parser.feed(chunk)
1639+
1640+
self.assert_event_tags(parser, []) # i.e. no elements started
1641+
if ET is pyET:
1642+
self.assertTrue(parser._parser._parser.GetReparseDeferralEnabled())
1643+
1644+
parser.flush()
1645+
1646+
self.assert_event_tags(parser, [('start', 'doc')])
1647+
if ET is pyET:
1648+
self.assertTrue(parser._parser._parser.GetReparseDeferralEnabled())
1649+
1650+
parser.feed("</doc>")
1651+
parser.close()
1652+
1653+
self.assert_event_tags(parser, [('end', 'doc')])
1654+
1655+
def test_flush_reparse_deferral_disabled(self):
1656+
parser = ET.XMLPullParser(events=('start', 'end'))
1657+
1658+
for chunk in ("<doc", ">"):
1659+
parser.feed(chunk)
1660+
1661+
if pyexpat.version_info >= (2, 6, 0):
1662+
if not ET is pyET:
1663+
self.skipTest(f'XMLParser.(Get|Set)ReparseDeferralEnabled '
1664+
'methods not available in C')
1665+
parser._parser._parser.SetReparseDeferralEnabled(False)
1666+
1667+
self.assert_event_tags(parser, []) # i.e. no elements started
1668+
if ET is pyET:
1669+
self.assertFalse(parser._parser._parser.GetReparseDeferralEnabled())
1670+
1671+
parser.flush()
1672+
1673+
self.assert_event_tags(parser, [('start', 'doc')])
1674+
if ET is pyET:
1675+
self.assertFalse(parser._parser._parser.GetReparseDeferralEnabled())
1676+
1677+
parser.feed("</doc>")
1678+
parser.close()
1679+
1680+
self.assert_event_tags(parser, [('end', 'doc')])
16341681

16351682
#
16361683
# xinclude tests (samples from appendix C of the xinclude specification)

‎Lib/xml/etree/ElementTree.py

Copy file name to clipboardExpand all lines: Lib/xml/etree/ElementTree.py
+14Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1313,6 +1313,11 @@ def read_events(self):
13131313
else:
13141314
yield event
13151315

1316+
def flush(self):
1317+
if self._parser is None:
1318+
raise ValueError("flush() called after end of stream")
1319+
self._parser.flush()
1320+
13161321

13171322
def XML(text, parser=None):
13181323
"""Parse XML document from string constant.
@@ -1719,6 +1724,15 @@ def close(self):
17191724
del self.parser, self._parser
17201725
del self.target, self._target
17211726

1727+
def flush(self):
1728+
was_enabled = self.parser.GetReparseDeferralEnabled()
1729+
try:
1730+
self.parser.SetReparseDeferralEnabled(False)
1731+
self.parser.Parse(b"", False)
1732+
except self._error as v:
1733+
self._raiseerror(v)
1734+
finally:
1735+
self.parser.SetReparseDeferralEnabled(was_enabled)
17221736

17231737
# --------------------------------------------------------------------
17241738
# C14N 2.0

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.