Closed
Description
Describe the issue:
ASAN reports a heap-buffer-overflow when calling numpy.strings.find
on specific strings.
Reproduce the code example:
import numpy.strings
numpy.strings.find("A" * (2 ** 17), r"[\w]+\Z",)
Error message:
=================================================================
==1211586==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60300000c610 at pc 0x7ffff76594b5 bp 0x7fffffffbec0 sp 0x7fffffffb668
READ of size 4 at 0x60300000c610 thread T0
#0 0x7ffff76594b4 in MemcmpInterceptorCommon(void*, int (*)(void const*, void const*, unsigned long), void const*, void const*, unsigned long) ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:861
#1 0x7ffff7659bc6 in __interceptor_memcmp ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:892
#2 0x7ffff7659bc6 in __interceptor_memcmp ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:887
#3 0x7fffb38ff321 in void preprocess<unsigned int>(CheckedIndexer<unsigned int>, long, prework<unsigned int>*) [clone .constprop.0] (/home/danzin/venvs/3.13_upstream_venv/lib/python3.13t/site-packages/numpy/_core/_multiarray_umath.cpython-313t-x86_64-linux-gnu.so+0x2af321)
#4 0x7fffb3909043 in long string_find<(ENCODING)1>(Buffer<(ENCODING)1>, Buffer<(ENCODING)1>, long, long) (/home/danzin/venvs/3.13_upstream_venv/lib/python3.13t/site-packages/numpy/_core/_multiarray_umath.cpython-313t-x86_64-linux-gnu.so+0x2b9043)
#5 0x7fffb38fd8c3 in int string_findlike_loop<(ENCODING)1>(PyArrayMethod_Context_tag*, char* const*, long const*, long const*, NpyAuxData_tag*) (/home/danzin/venvs/3.13_upstream_venv/lib/python3.13t/site-packages/numpy/_core/_multiarray_umath.cpython-313t-x86_64-linux-gnu.so+0x2ad8c3)
#6 0x7fffb38c4905 in ufunc_generic_fastcall (/home/danzin/venvs/3.13_upstream_venv/lib/python3.13t/site-packages/numpy/_core/_multiarray_umath.cpython-313t-x86_64-linux-gnu.so+0x274905)
#7 0x555555977b31 in _PyObject_VectorcallTstate Include/internal/pycore_call.h:168
#8 0x555555977c8c in PyObject_Vectorcall Objects/call.c:327
#9 0x555555d1e173 in _PyEval_EvalFrameDefault Python/generated_cases.c.h:813
#10 0x555555d514ff in _PyEval_EvalFrame Include/internal/pycore_ceval.h:119
#11 0x555555d514ff in _PyEval_Vector Python/ceval.c:1816
#12 0x555555d51726 in PyEval_EvalCode Python/ceval.c:604
#13 0x555555e908a8 in run_eval_code_obj Python/pythonrun.c:1381
#14 0x555555e93eda in run_mod Python/pythonrun.c:1466
#15 0x555555e9421c in pyrun_file Python/pythonrun.c:1295
#16 0x555555e99bb2 in _PyRun_SimpleFileObject Python/pythonrun.c:517
#17 0x555555e9a176 in _PyRun_AnyFileObject Python/pythonrun.c:77
#18 0x555555efde67 in pymain_run_file_obj Modules/main.c:410
#19 0x555555efe648 in pymain_run_file Modules/main.c:429
#20 0x555555f022f1 in pymain_run_python Modules/main.c:696
#21 0x555555f024c5 in Py_RunMain Modules/main.c:775
#22 0x555555f026ac in pymain_main Modules/main.c:805
#23 0x555555f02a24 in Py_BytesMain Modules/main.c:829
#24 0x5555557d6b05 in main Programs/python.c:15
#25 0x7ffff72ded8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
#26 0x7ffff72dee3f in __libc_start_main_impl ../csu/libc-start.c:392
#27 0x5555557d6a34 in _start (/home/danzin/projects/3.13_upstream_cpython/python+0x282a34)
0x60300000c610 is located 20 bytes to the right of 28-byte region [0x60300000c5e0,0x60300000c5fc)
allocated by thread T0 here:
#0 0x7ffff7679a57 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
#1 0x7fffb3778fad in PyDataMem_UserNEW_ZEROED (/home/danzin/venvs/3.13_upstream_venv/lib/python3.13t/site-packages/numpy/_core/_multiarray_umath.cpython-313t-x86_64-linux-gnu.so+0x128fad)
SUMMARY: AddressSanitizer: heap-buffer-overflow ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:861 in MemcmpInterceptorCommon(void*, int (*)(void const*, void const*, unsigned long), void const*, void const*, unsigned long)
Shadow bytes around the buggy address:
0x0c067fff9870: 00 00 fa fa 00 00 00 00 fa fa 00 00 00 00 fa fa
0x0c067fff9880: 00 00 00 04 fa fa 00 00 00 00 fa fa 00 00 00 00
0x0c067fff9890: fa fa 00 00 00 00 fa fa 00 00 00 00 fa fa 00 00
0x0c067fff98a0: 00 04 fa fa 00 00 00 00 fa fa 00 00 00 00 fa fa
0x0c067fff98b0: 00 00 00 00 fa fa 00 00 00 00 fa fa 00 00 00 04
=>0x0c067fff98c0: fa fa[fa]fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c067fff98d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c067fff98e0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c067fff98f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c067fff9900: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c067fff9910: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
==1211586==ABORTING
Python and NumPy Versions:
2.2.5
3.13.3+ experimental free-threading build (heads/3.13:d8b90117024, Apr 21 2025, 15:20:00) [GCC 11.4.0]
Runtime Environment:
[{'numpy_version': '2.2.5',
'python': '3.13.3+ experimental free-threading build '
'(heads/3.13:d8b90117024, Apr 21 2025, 15:20:00) [GCC 11.4.0]',
'uname': uname_result(system='Linux', node='LAPTOP-CS6PE5KB', release='5.15.167.4-microsoft-standard-WSL2', version='#1 SMP Tue Nov 5 00:21:55 UTC 2024', machine='x86_64')},
{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2'],
'not_found': ['AVX512F',
'AVX512CD',
'AVX512_KNL',
'AVX512_KNM',
'AVX512_SKX',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL']}},
{'architecture': 'Haswell',
'filepath': '/home/danzin/venvs/3.13_upstream_venv/lib/python3.13t/site-packages/numpy.libs/libscipy_openblas64_-6bb31eeb.so',
'internal_api': 'openblas',
'num_threads': 8,
'prefix': 'libscipy_openblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.28'}]
Context for the issue:
I have been fuzzing Numpy using fusil by @vstinner. I realize these issues are unlikely to be triggered in normal usage and therefore might be of low priority.