Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

ENH: Add LSX optimization for LoongArch #25215

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
Loading
from

Conversation

loongson-zn
Copy link
Contributor

@loongson-zn loongson-zn commented Nov 21, 2023

Add LSX optimization for LoongArch.
First, Maximum performance improvement of 42.3x (python3.9 runtest.py --bench);
Second, I have completed the functional testing on V1.25.2 \ V1.26.2 and main branch.
V1.25.2 : 'python3.9 runtest.py -v -m full', python3.9 and gcc 8.3 35023 passed, 443 skipped, 31 xfailed, 5 xpassed;
'spin test -v ' 33801 passed, 395 skipped, 1303 deselected, 30 xfailed, 4 xpassed

V1.26.2 : 'python3.9 runtest.py -v -m full', python3.9 and gcc 8.3 35142 passed, 357 skipped, 30 xfailed, 4 xpassed
'spin test -v ' 34020 passed, 176 skipped, 1303 deselected, 30 xfailed, 4 xpassed

main branch: 'spin test -v' , python3.9 and gcc 10.3 39359 passed, 177 skipped, 1303 deselected, 34 xfailed, 6 xpassed in 450.49s

@rgommers
Copy link
Member

@loongson-zn thank you for this PR. At first sight, the test/benchmark results look good and the code is clean. My first question is regarding testing, since we don't have access to this hardware. We test other architectures for which we don't have regular hosted CI available through QEMU (see https://github.com/numpy/numpy/blob/main/.github/workflows/linux_qemu.yml). Is there QEMU support for loongarch, and if so can you add a CI job in that linux_qemu.yml file?

@seiko2plus seiko2plus added the component: SIMD Issues in SIMD (fast instruction sets) code or machinery label Nov 22, 2023
@loongson-zn
Copy link
Contributor Author

@loongson-zn thank you for this PR. At first sight, the test/benchmark results look good and the code is clean. My first question is regarding testing, since we don't have access to this hardware. We test other architectures for which we don't have regular hosted CI available through QEMU (see https://github.com/numpy/numpy/blob/main/.github/workflows/linux_qemu.yml). Is there QEMU support for loongarch, and if so can you add a CI job in that linux_qemu.yml file?

LoongArch is promoting the Debian community, and ubuntu does not yet have a suitable toolchain, so ci is not applicable( ex: apt install gcc-xx). I am verifying with QEMU, but I have encountered some trouble. Please give me some time to debug.

@loongson-zn
Copy link
Contributor Author

@rgommers I have x86_64 crosscompile toolchain and enabling native execution via binfmt, but I encountered issues related to Python. Perhaps you can provide me with some help

@seiko2plus
Copy link
Member

but I encountered issues related to Python. Perhaps you can provide me with some help

I can give you a hand, what kind of issues are you facing? maybe it would be better to open a separate pull request that adds a CI test for LoongArch.

@loongson-zn
Copy link
Contributor Author

but I encountered issues related to Python. Perhaps you can provide me with some help

I can give you a hand, what kind of issues are you facing? maybe it would be better to open a separate pull request that adds a CI test for LoongArch.

Thanks @seiko2plus, I will try my best to provide a CI testing environment within a week and start a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
01 - Enhancement component: SIMD Issues in SIMD (fast instruction sets) code or machinery
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.