SIMD: add lsx optimization for loongarch, and add Qemu tests #27856

ErnstPeng · Nov 26, 2024

This PR's code is same with 27662PR, just switch from the my main branch to another branch. Please review. @seiko2plus @r-devulap

ErnstPeng · Nov 26, 2024

All reviews in 27662PR have been modified.

seiko2plus

LGTM, just a few more requests to optimize the implementation

numpy/_core/src/common/simd/lsx/arithmetic.h

numpy/_core/src/common/simd/lsx/operators.h

.github/workflows/linux_qemu.yml

jinboson · Dec 17, 2024

Hi， @mattip @seiko2plus is there anything else need to do to move this forward ?

CNClareChen · Dec 19, 2024

@mattip @seiko2plus The code is ready to be merged. Please take a look.

seiko2plus · Dec 19, 2024

@jinboson, @CNClareChen, The current blocker is the CI performance for the loongarch job, which takes 44 minutes (far exceeding the expected maximum of 20 minutes) as I mentioned in #27856 (comment). This is due to the lack of cross-compilation support in the build process.

CNClareChen · Dec 19, 2024

@jinboson, @CNClareChen, The current blocker is the CI performance for the loongarch job, which takes 44 minutes (far exceeding the expected maximum of 20 minutes) as I mentioned in #27856 (comment). This is due to the lack of cross-compilation support in the build process.

Thank you for your reminder. I understand what you mean. I'll try to make some modifications.

CNClareChen · Dec 19, 2024

@seiko2plus I add support for the cross-toolchain in the build process, and now the CI performance for the loongarch job takes 12 minutes. However, the Linux tests/lint (pull_request) failed, and I'm not sure what caused the issue.

seberg · Dec 19, 2024

If you have ruff, you can run the suggest ruff check --fix. The setup is new, I agree it would be nice if it gave more information about the actual failure.

FWIW, suspect numpy/_core/tests/test_cpu_features.py is just missing a blank line at the end of the file.

(I.e. good if you fix it, but nothing to worry much about)

CNClareChen · Dec 20, 2024

If you have ruff, you can run the suggest ruff check --fix. The setup is new, I agree it would be nice if it gave more information about the actual failure.

FWIW, suspect numpy/_core/tests/test_cpu_features.py is just missing a blank line at the end of the file.

(I.e. good if you fix it, but nothing to worry much about)

Thank you very much for your help. The Linux tests has passed now.

CNClareChen · Dec 20, 2024

@seiko2plus All requested changes have been made.

seiko2plus

LGTM, just two requests remain:

Add the LSX target to the Meson array of dispatch-able sources (follow SSE2 entry). This is necessary because the default value of cpu-baseline can be changed to none, and LSX should be treated as a dispatch-able feature in that case.

numpy/numpy/_core/meson.build

Lines 874 to 1013 in c39b903

    
           foreach gen_mtargets : [ 
        
             [ 
        
               'loops_arithm_fp.dispatch.h', 
        
               src_file.process('src/umath/loops_arithm_fp.dispatch.c.src'), 
        
               [ 
        
                 [AVX2, FMA3], SSE2, 
        
                 ASIMD, NEON, 
        
                 VSX3, VSX2, 
        
                 VXE, VX, 
        
               ] 
        
             ], 
        
             [ 
        
               'loops_arithmetic.dispatch.h', 
        
               src_file.process('src/umath/loops_arithmetic.dispatch.c.src'), 
        
               [ 
        
                 AVX512_SKX, AVX512F, AVX2, SSE41, SSE2, 
        
                 NEON, 
        
                 VSX4, VSX2, 
        
                 VX, 
        
               ] 
        
             ], 
        
             [ 
        
               'loops_comparison.dispatch.h', 
        
               src_file.process('src/umath/loops_comparison.dispatch.c.src'), 
        
               [ 
        
                 AVX512_SKX, AVX512F, AVX2, SSE42, SSE2, 
        
                 VSX3, VSX2, 
        
                 NEON, 
        
                 VXE, VX, 
        
               ] 
        
             ], 
        
             [ 
        
               'loops_exponent_log.dispatch.h', 
        
               src_file.process('src/umath/loops_exponent_log.dispatch.c.src'), 
        
               [ 
        
                 AVX512_SKX, AVX512F, [AVX2, FMA3] 
        
               ] 
        
             ], 
        
             [ 
        
               'loops_hyperbolic.dispatch.h', 
        
               src_file.process('src/umath/loops_hyperbolic.dispatch.cpp.src'), 
        
               [ 
        
                 AVX512_SKX, [AVX2, FMA3], 
        
                 VSX4, VSX2, 
        
                 NEON_VFPV4, 
        
                 VXE 
        
               ] 
        
             ], 
        
             [ 
        
               'loops_logical.dispatch.h', 
        
               src_file.process('src/umath/loops_logical.dispatch.c.src'), 
        
               [ 
        
                 ASIMD, NEON, 
        
                 AVX512_SKX, AVX2, SSE2, 
        
                 VSX2, 
        
                 VX, 
        
               ] 
        
             ], 
        
             [ 
        
               'loops_minmax.dispatch.h', 
        
               src_file.process('src/umath/loops_minmax.dispatch.c.src'), 
        
               [ 
        
                 ASIMD, NEON, 
        
                 AVX512_SKX, AVX2, SSE2, 
        
                 VSX2, 
        
                 VXE, VX, 
        
               ] 
        
             ], 
        
             [ 
        
               'loops_modulo.dispatch.h', 
        
               src_file.process('src/umath/loops_modulo.dispatch.c.src'), 
        
               [ 
        
                 VSX4 
        
               ] 
        
             ], 
        
             [ 
        
               'loops_trigonometric.dispatch.h', 
        
               'src/umath/loops_trigonometric.dispatch.cpp', 
        
               [ 
        
                 AVX512_SKX, [AVX2, FMA3], 
        
                 VSX4, VSX3, VSX2, 
        
                 NEON_VFPV4, 
        
                 VXE2, VXE, 
        
               ] 
        
             ], 
        
             [ 
        
               'loops_umath_fp.dispatch.h', 
        
               src_file.process('src/umath/loops_umath_fp.dispatch.c.src'), 
        
               [AVX512_SKX] 
        
             ], 
        
             [ 
        
               'loops_unary.dispatch.h', 
        
               src_file.process('src/umath/loops_unary.dispatch.c.src'), 
        
               [ 
        
                 ASIMD, NEON, 
        
                 AVX512_SKX, AVX2, SSE2, 
        
                 VSX2, 
        
                 VXE, VX 
        
               ] 
        
             ], 
        
             [ 
        
               'loops_unary_fp.dispatch.h', 
        
               src_file.process('src/umath/loops_unary_fp.dispatch.c.src'), 
        
               [ 
        
                 SSE41, SSE2, 
        
                 VSX2, 
        
                 ASIMD, NEON, 
        
                 VXE, VX 
        
               ] 
        
             ], 
        
             [ 
        
               'loops_unary_fp_le.dispatch.h', 
        
               src_file.process('src/umath/loops_unary_fp_le.dispatch.c.src'), 
        
               [ 
        
                 SSE41, SSE2, 
        
                 VSX2, 
        
                 ASIMD, NEON, 
        
               ] 
        
             ], 
        
             [ 
        
               'loops_unary_complex.dispatch.h', 
        
               src_file.process('src/umath/loops_unary_complex.dispatch.c.src'), 
        
               [ 
        
                 AVX512F, [AVX2, FMA3], SSE2, 
        
                 ASIMD, NEON, 
        
                 VSX3, VSX2, 
        
                 VXE, VX, 
        
               ] 
        
             ], 
        
             [ 
        
               'loops_autovec.dispatch.h', 
        
               src_file.process('src/umath/loops_autovec.dispatch.c.src'), 
        
               [ 
        
                 AVX2, SSE2, 
        
                 NEON, 
        
                 VSX2, 
        
                 VX, 
        
               ] 
        
             ], 
        
           ]

.github/workflows/linux_qemu.yml

CNClareChen · Dec 23, 2024

Hi， @seiko2plus Is there anything else that needs to be done to merge this PR?

CNClareChen · Dec 24, 2024

@seiko2plus @mattip I have submitted a new modification that ensures the CI passes. Do you have any suggestions?

seiko2plus

LGTM, Thank you.

seiko2plus · Dec 26, 2024

Thank you @CNClareChen, @ErnstPeng

seiko2plus requested changes Nov 27, 2024

View reviewed changes

mattip mentioned this pull request Nov 27, 2024

Loongarch: modify lsx optimization(25215PR) for newest branch, and add Qemu tests #27662

Closed

seiko2plus reviewed Nov 27, 2024

View reviewed changes

.github/workflows/linux_qemu.yml Show resolved Hide resolved

ErnstPeng force-pushed the loongarch-dev branch 15 times, most recently from fe0ba22 to 2e98bce Compare December 13, 2024 08:31

jinboson mentioned this pull request Dec 13, 2024

numpy CI failed on loongarch qemu test google/highway#2410

Closed

ErnstPeng force-pushed the loongarch-dev branch from 2e98bce to 93062f4 Compare December 16, 2024 08:10

mattip mentioned this pull request Dec 16, 2024

BUG: build failure for loongarch in Highway cpu-generic code #28011

Open

ErnstPeng force-pushed the loongarch-dev branch from 93062f4 to 2f6607f Compare December 19, 2024 09:23

ErnstPeng closed this Dec 19, 2024

ErnstPeng force-pushed the loongarch-dev branch from 2f6607f to e7d4081 Compare December 19, 2024 09:28

ErnstPeng reopened this Dec 19, 2024

ErnstPeng force-pushed the loongarch-dev branch from 369881e to afc1b67 Compare December 19, 2024 11:36

ErnstPeng force-pushed the loongarch-dev branch from afc1b67 to 5d1a16e Compare December 20, 2024 01:14

seiko2plus requested changes Dec 20, 2024

View reviewed changes

.github/workflows/linux_qemu.yml Outdated Show resolved Hide resolved

seiko2plus added 01 - Enhancement component: SIMD Issues in SIMD (fast instruction sets) code or machinery labels Dec 20, 2024

pengxu added 3 commits December 20, 2024 14:34

Loongarch: modify lsx optimization(25215PR) for newest branch

7c35c37

Loongarch: add lsx functions

7325cb9

CI: Add CI test for loongarch64

8183efd

ErnstPeng force-pushed the loongarch-dev branch 3 times, most recently from a113ad3 to 8183efd Compare December 20, 2024 08:17

ErnstPeng force-pushed the loongarch-dev branch 2 times, most recently from 073ef26 to d0d184c Compare December 24, 2024 08:42

Modify the npyv_shri macro is implemented using lsx.

20a3a91

ErnstPeng force-pushed the loongarch-dev branch from d0d184c to 20a3a91 Compare December 24, 2024 08:53

seiko2plus approved these changes Dec 24, 2024

View reviewed changes

seiko2plus merged commit a07c6c5 into numpy:main Dec 26, 2024
67 checks passed

ErnstPeng deleted the loongarch-dev branch December 30, 2024 01:01

xen0n mentioned this pull request Jan 3, 2025

MAINT: LoongArch: switch away from the __loongarch64 preprocessor macro #28092

Merged

rgommers added this to the 2.3.0 release milestone Jan 3, 2025

	foreach gen_mtargets : [
	[
	'loops_arithm_fp.dispatch.h',
	src_file.process('src/umath/loops_arithm_fp.dispatch.c.src'),
	[
	[AVX2, FMA3], SSE2,
	ASIMD, NEON,
	VSX3, VSX2,
	VXE, VX,
	]
	],
	[
	'loops_arithmetic.dispatch.h',
	src_file.process('src/umath/loops_arithmetic.dispatch.c.src'),
	[
	AVX512_SKX, AVX512F, AVX2, SSE41, SSE2,
	NEON,
	VSX4, VSX2,
	VX,
	]
	],
	[
	'loops_comparison.dispatch.h',
	src_file.process('src/umath/loops_comparison.dispatch.c.src'),
	[
	AVX512_SKX, AVX512F, AVX2, SSE42, SSE2,
	VSX3, VSX2,
	NEON,
	VXE, VX,
	]
	],
	[
	'loops_exponent_log.dispatch.h',
	src_file.process('src/umath/loops_exponent_log.dispatch.c.src'),
	[
	AVX512_SKX, AVX512F, [AVX2, FMA3]
	]
	],
	[
	'loops_hyperbolic.dispatch.h',
	src_file.process('src/umath/loops_hyperbolic.dispatch.cpp.src'),
	[
	AVX512_SKX, [AVX2, FMA3],
	VSX4, VSX2,
	NEON_VFPV4,
	VXE
	]
	],
	[
	'loops_logical.dispatch.h',
	src_file.process('src/umath/loops_logical.dispatch.c.src'),
	[
	ASIMD, NEON,
	AVX512_SKX, AVX2, SSE2,
	VSX2,
	VX,
	]
	],
	[
	'loops_minmax.dispatch.h',
	src_file.process('src/umath/loops_minmax.dispatch.c.src'),
	[
	ASIMD, NEON,
	AVX512_SKX, AVX2, SSE2,
	VSX2,
	VXE, VX,
	]
	],
	[
	'loops_modulo.dispatch.h',
	src_file.process('src/umath/loops_modulo.dispatch.c.src'),
	[
	VSX4
	]
	],
	[
	'loops_trigonometric.dispatch.h',
	'src/umath/loops_trigonometric.dispatch.cpp',
	[
	AVX512_SKX, [AVX2, FMA3],
	VSX4, VSX3, VSX2,
	NEON_VFPV4,
	VXE2, VXE,
	]
	],
	[
	'loops_umath_fp.dispatch.h',
	src_file.process('src/umath/loops_umath_fp.dispatch.c.src'),
	[AVX512_SKX]
	],
	[
	'loops_unary.dispatch.h',
	src_file.process('src/umath/loops_unary.dispatch.c.src'),
	[
	ASIMD, NEON,
	AVX512_SKX, AVX2, SSE2,
	VSX2,
	VXE, VX
	]
	],
	[
	'loops_unary_fp.dispatch.h',
	src_file.process('src/umath/loops_unary_fp.dispatch.c.src'),
	[
	SSE41, SSE2,
	VSX2,
	ASIMD, NEON,
	VXE, VX
	]
	],
	[
	'loops_unary_fp_le.dispatch.h',
	src_file.process('src/umath/loops_unary_fp_le.dispatch.c.src'),
	[
	SSE41, SSE2,
	VSX2,
	ASIMD, NEON,
	]
	],
	[
	'loops_unary_complex.dispatch.h',
	src_file.process('src/umath/loops_unary_complex.dispatch.c.src'),
	[
	AVX512F, [AVX2, FMA3], SSE2,
	ASIMD, NEON,
	VSX3, VSX2,
	VXE, VX,
	]
	],
	[
	'loops_autovec.dispatch.h',
	src_file.process('src/umath/loops_autovec.dispatch.c.src'),
	[
	AVX2, SSE2,
	NEON,
	VSX2,
	VX,
	]
	],
	]

Search code, repositories, users, issues, pull requests...

Uh oh!

SIMD: add lsx optimization for loongarch, and add Qemu tests #27856

SIMD: add lsx optimization for loongarch, and add Qemu tests #27856

Conversation

ErnstPeng commented Nov 26, 2024

Uh oh!

ErnstPeng commented Nov 26, 2024

Uh oh!

seiko2plus left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jinboson commented Dec 17, 2024

Uh oh!

CNClareChen commented Dec 19, 2024

Uh oh!

seiko2plus commented Dec 19, 2024

Uh oh!

CNClareChen commented Dec 19, 2024

Uh oh!

CNClareChen commented Dec 19, 2024

Uh oh!

seberg commented Dec 19, 2024

Uh oh!

CNClareChen commented Dec 20, 2024

Uh oh!

CNClareChen commented Dec 20, 2024

Uh oh!

seiko2plus left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

CNClareChen commented Dec 23, 2024

Uh oh!

CNClareChen commented Dec 24, 2024

Uh oh!

seiko2plus left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

seiko2plus commented Dec 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

seiko2plus commented Dec 26, 2024 •

edited

Loading