Description
Describe the issue:
I really would like to use the new Polynomial package, but I'm finding the evaluation performance is much worse than np.poly1d
for python versions 3.10 and 3.11. Running the example code below on different python versions gives the following results:
Method | 3.10 time (s) | 3.11 time (s) | 3.12 time (s) | 3.13 time (s) |
---|---|---|---|---|
Polynomial |
51.5 | 36.2 | 35.5 | 44.3 |
np.poly1d |
37.2 | 21.4 | 32.4 | 49.7 |
Notice that for python versions <= 3.11 np.poly1d
is significantly faster than using the Polynomial
package machinery. The performance of the two methods converges for python > 3.11, but mostly because np.poly1d
is slowing down.
Is this expected behavior? I didn't see any mention of performance in the docs for the new Polynomial package. I have seen some reports that the Polynomial
package is faster than np.poly1d
(which would be great), but that's not what I'm seeing with my tests.
Any advice/insight would be greatly appreciated. For now the clear solution is to just use np.poly1d
.
Reproduce the code example:
import numpy as np
from numpy.polynomial import Polynomial
import timeit
d = np.random.random((14, 2048, 2048))
P = Polynomial([1., 2., 3., 4.])
old_p = np.poly1d([4., 3., 2., 1.])
timeit.timeit("P(d)", globals={"P":P, "d": d}, number=10)
timeit.timeit("old_p(d)", globals={"old_p":old_p, "d": d}, number=10)
Error message:
Python and NumPy Versions:
For 3.10:
2.2.5
3.10.17 | packaged by conda-forge | (main, Apr 10 2025, 22:19:12) [GCC 13.3.0]
For 3.11:
2.2.5
3.11.12 | packaged by conda-forge | (main, Apr 10 2025, 22:23:25) [GCC 13.3.0]
For 3.12:
2.2.5
3.12.10 | packaged by conda-forge | (main, Apr 10 2025, 22:21:13) [GCC 13.3.0]
For 3.13:
2.2.5
3.13.3 | packaged by conda-forge | (main, Apr 14 2025, 20:44:03) [GCC 13.3.0]
Runtime Environment:
For 3.10:
[{'numpy_version': '2.2.5',
'python': '3.10.17 | packaged by conda-forge | (main, Apr 10 2025, 22:19:12) '
'[GCC 13.3.0]',
'uname': uname_result(system='Linux', node='XXX', release='6.14.4-arch1-2', version='#1 SMP PREEMPT_DYNAMIC Tue, 29 Apr 2025 09:23:13 +0000', machine='x86_64')},
{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2',
'AVX512F',
'AVX512CD',
'AVX512_SKX',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL'],
'not_found': ['AVX512_KNL', 'AVX512_KNM']}},
{'architecture': 'SkylakeX',
'filepath': '/home/XXX/micromamba/envs/tmp10/lib/python3.10/site-packages/numpy.libs/libscipy_openblas64_-6bb31eeb.so',
'internal_api': 'openblas',
'num_threads': 8,
'prefix': 'libscipy_openblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.28'}]
For 3.11:
[{'numpy_version': '2.2.5',
'python': '3.11.12 | packaged by conda-forge | (main, Apr 10 2025, 22:23:25) '
'[GCC 13.3.0]',
'uname': uname_result(system='Linux', node='XXX', release='6.14.4-arch1-2', version='#1 SMP PREEMPT_DYNAMIC Tue, 29 Apr 2025 09:23:13 +0000', machine='x86_64')},
{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2',
'AVX512F',
'AVX512CD',
'AVX512_SKX',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL'],
'not_found': ['AVX512_KNL', 'AVX512_KNM']}},
{'architecture': 'SkylakeX',
'filepath': '/home/XXX/micromamba/envs/tmp11/lib/python3.11/site-packages/numpy.libs/libscipy_openblas64_-6bb31eeb.so',
'internal_api': 'openblas',
'num_threads': 8,
'prefix': 'libscipy_openblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.28'}]
For 3.12:
[{'numpy_version': '2.2.5',
'python': '3.12.10 | packaged by conda-forge | (main, Apr 10 2025, 22:21:13) '
'[GCC 13.3.0]',
'uname': uname_result(system='Linux', node='XXX', release='6.14.4-arch1-2', version='#1 SMP PREEMPT_DYNAMIC Tue, 29 Apr 2025 09:23:13 +0000', machine='x86_64')},
{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2',
'AVX512F',
'AVX512CD',
'AVX512_SKX',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL'],
'not_found': ['AVX512_KNL', 'AVX512_KNM']}},
{'architecture': 'SkylakeX',
'filepath': '/home/XXX/micromamba/envs/tmp12/lib/python3.12/site-packages/numpy.libs/libscipy_openblas64_-6bb31eeb.so',
'internal_api': 'openblas',
'num_threads': 8,
'prefix': 'libscipy_openblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.28'}]
For 3.13:
[{'numpy_version': '2.2.5',
'python': '3.13.3 | packaged by conda-forge | (main, Apr 14 2025, 20:44:03) '
'[GCC 13.3.0]',
'uname': uname_result(system='Linux', node='XXX', release='6.14.4-arch1-2', version='#1 SMP PREEMPT_DYNAMIC Tue, 29 Apr 2025 09:23:13 +0000', machine='x86_64')},
{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2',
'AVX512F',
'AVX512CD',
'AVX512_SKX',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL'],
'not_found': ['AVX512_KNL', 'AVX512_KNM']}},
{'architecture': 'SkylakeX',
'filepath': '/home/XXX/micromamba/envs/tmp/lib/python3.13/site-packages/numpy.libs/libscipy_openblas64_-6bb31eeb.so',
'internal_api': 'openblas',
'num_threads': 8,
'prefix': 'libscipy_openblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.28'}]
Context for the issue:
I need to apply polynomials to many large arrays and do it quickly. I want to use the new-and-improved Polynomial
package, but its performance is forcing me to use the older np.poly1d
.