-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
SIMD: add lsx optimization for loongarch, and add Qemu tests #27856
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
All reviews in 27662PR have been modified. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a few more requests to optimize the implementation
fe0ba22
to
2e98bce
Compare
2e98bce
to
93062f4
Compare
Hi, @mattip @seiko2plus is there anything else need to do to move this forward ? |
@mattip @seiko2plus The code is ready to be merged. Please take a look. |
@jinboson, @CNClareChen, The current blocker is the CI performance for the loongarch job, which takes 44 minutes (far exceeding the expected maximum of 20 minutes) as I mentioned in #27856 (comment). This is due to the lack of cross-compilation support in the build process. |
Thank you for your reminder. I understand what you mean. I'll try to make some modifications. |
93062f4
to
2f6607f
Compare
2f6607f
to
e7d4081
Compare
369881e
to
afc1b67
Compare
@seiko2plus I add support for the cross-toolchain in the build process, and now the CI performance for the loongarch job takes 12 minutes. However, the Linux tests/lint (pull_request) failed, and I'm not sure what caused the issue. |
If you have FWIW, suspect numpy/_core/tests/test_cpu_features.py is just missing a blank line at the end of the file. (I.e. good if you fix it, but nothing to worry much about) |
afc1b67
to
5d1a16e
Compare
Thank you very much for your help. The Linux tests has passed now. |
@seiko2plus All requested changes have been made. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just two requests remain:
- Add the
LSX
target to the Meson array of dispatch-able sources (followSSE2
entry). This is necessary because the default value ofcpu-baseline
can be changed to none, andLSX
should be treated as a dispatch-able feature in that case.
Lines 874 to 1013 in c39b903
foreach gen_mtargets : [ | |
[ | |
'loops_arithm_fp.dispatch.h', | |
src_file.process('src/umath/loops_arithm_fp.dispatch.c.src'), | |
[ | |
[AVX2, FMA3], SSE2, | |
ASIMD, NEON, | |
VSX3, VSX2, | |
VXE, VX, | |
] | |
], | |
[ | |
'loops_arithmetic.dispatch.h', | |
src_file.process('src/umath/loops_arithmetic.dispatch.c.src'), | |
[ | |
AVX512_SKX, AVX512F, AVX2, SSE41, SSE2, | |
NEON, | |
VSX4, VSX2, | |
VX, | |
] | |
], | |
[ | |
'loops_comparison.dispatch.h', | |
src_file.process('src/umath/loops_comparison.dispatch.c.src'), | |
[ | |
AVX512_SKX, AVX512F, AVX2, SSE42, SSE2, | |
VSX3, VSX2, | |
NEON, | |
VXE, VX, | |
] | |
], | |
[ | |
'loops_exponent_log.dispatch.h', | |
src_file.process('src/umath/loops_exponent_log.dispatch.c.src'), | |
[ | |
AVX512_SKX, AVX512F, [AVX2, FMA3] | |
] | |
], | |
[ | |
'loops_hyperbolic.dispatch.h', | |
src_file.process('src/umath/loops_hyperbolic.dispatch.cpp.src'), | |
[ | |
AVX512_SKX, [AVX2, FMA3], | |
VSX4, VSX2, | |
NEON_VFPV4, | |
VXE | |
] | |
], | |
[ | |
'loops_logical.dispatch.h', | |
src_file.process('src/umath/loops_logical.dispatch.c.src'), | |
[ | |
ASIMD, NEON, | |
AVX512_SKX, AVX2, SSE2, | |
VSX2, | |
VX, | |
] | |
], | |
[ | |
'loops_minmax.dispatch.h', | |
src_file.process('src/umath/loops_minmax.dispatch.c.src'), | |
[ | |
ASIMD, NEON, | |
AVX512_SKX, AVX2, SSE2, | |
VSX2, | |
VXE, VX, | |
] | |
], | |
[ | |
'loops_modulo.dispatch.h', | |
src_file.process('src/umath/loops_modulo.dispatch.c.src'), | |
[ | |
VSX4 | |
] | |
], | |
[ | |
'loops_trigonometric.dispatch.h', | |
'src/umath/loops_trigonometric.dispatch.cpp', | |
[ | |
AVX512_SKX, [AVX2, FMA3], | |
VSX4, VSX3, VSX2, | |
NEON_VFPV4, | |
VXE2, VXE, | |
] | |
], | |
[ | |
'loops_umath_fp.dispatch.h', | |
src_file.process('src/umath/loops_umath_fp.dispatch.c.src'), | |
[AVX512_SKX] | |
], | |
[ | |
'loops_unary.dispatch.h', | |
src_file.process('src/umath/loops_unary.dispatch.c.src'), | |
[ | |
ASIMD, NEON, | |
AVX512_SKX, AVX2, SSE2, | |
VSX2, | |
VXE, VX | |
] | |
], | |
[ | |
'loops_unary_fp.dispatch.h', | |
src_file.process('src/umath/loops_unary_fp.dispatch.c.src'), | |
[ | |
SSE41, SSE2, | |
VSX2, | |
ASIMD, NEON, | |
VXE, VX | |
] | |
], | |
[ | |
'loops_unary_fp_le.dispatch.h', | |
src_file.process('src/umath/loops_unary_fp_le.dispatch.c.src'), | |
[ | |
SSE41, SSE2, | |
VSX2, | |
ASIMD, NEON, | |
] | |
], | |
[ | |
'loops_unary_complex.dispatch.h', | |
src_file.process('src/umath/loops_unary_complex.dispatch.c.src'), | |
[ | |
AVX512F, [AVX2, FMA3], SSE2, | |
ASIMD, NEON, | |
VSX3, VSX2, | |
VXE, VX, | |
] | |
], | |
[ | |
'loops_autovec.dispatch.h', | |
src_file.process('src/umath/loops_autovec.dispatch.c.src'), | |
[ | |
AVX2, SSE2, | |
NEON, | |
VSX2, | |
VX, | |
] | |
], | |
] |
a113ad3
to
8183efd
Compare
Hi, @seiko2plus Is there anything else that needs to be done to merge this PR? |
073ef26
to
d0d184c
Compare
d0d184c
to
20a3a91
Compare
@seiko2plus @mattip I have submitted a new modification that ensures the CI passes. Do you have any suggestions? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Thank you.
Thank you @CNClareChen, @ErnstPeng |
This PR's code is same with 27662PR, just switch from the my main branch to another branch. Please review. @seiko2plus @r-devulap