Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Loongarch: modify lsx optimization(25215PR) for newest branch, and add Qemu tests #27662

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

ErnstPeng
Copy link

No description provided.

@ErnstPeng
Copy link
Author

ErnstPeng commented Oct 29, 2024

Based on #25215, the latest branch code is adapted

@yinshiyou
Copy link

It's better to devide into multiple patches.

numpy/_core/src/common/npy_cpu_features.c Outdated Show resolved Hide resolved
numpy/_core/src/common/simd/lsx/arithmetic.h Outdated Show resolved Hide resolved
numpy/_core/src/common/simd/lsx/lsx.h Show resolved Hide resolved
numpy/_core/src/common/simd/lsx/math.h Outdated Show resolved Hide resolved
@melissawm melissawm added 01 - Enhancement component: SIMD Issues in SIMD (fast instruction sets) code or machinery labels Oct 31, 2024
@XiWeiGu
Copy link

XiWeiGu commented Nov 5, 2024

I think the usage of LoongArch64 and LoongArch is somewhat confusing. For now, we can standardize on LoongArch64, and later, if necessary, we can add support for LoongArch32.

@XiWeiGu
Copy link

XiWeiGu commented Nov 5, 2024

Add Linux Qemu tests for LoongArch64.

@ErnstPeng ErnstPeng force-pushed the main branch 4 times, most recently from 4254c5c to cb22dd3 Compare November 6, 2024 03:01
@ErnstPeng
Copy link
Author

I think the usage of LoongArch64 and LoongArch is somewhat confusing. For now, we can standardize on LoongArch64, and later, if necessary, we can add support for LoongArch32.

Has been uniformly modified to loongarch64

numpy/_core/src/common/simd/lsx/arithmetic.h Outdated Show resolved Hide resolved
numpy/_core/src/common/simd/lsx/misc.h Outdated Show resolved Hide resolved
numpy/_core/src/common/simd/lsx/operators.h Outdated Show resolved Hide resolved
numpy/_core/src/common/simd/lsx/reorder.h Outdated Show resolved Hide resolved
numpy/_core/tests/test_cpu_features.py Outdated Show resolved Hide resolved
@ErnstPeng ErnstPeng force-pushed the main branch 6 times, most recently from 4aab541 to 5ff4af2 Compare November 6, 2024 08:09
@ErnstPeng
Copy link
Author

It's better to devide into multiple patches.

Done

@ErnstPeng
Copy link
Author

Add Linux Qemu tests for LoongArch64.

Already added

@XiWeiGu
Copy link

XiWeiGu commented Nov 18, 2024

We discussed this in a recent traige meeting from the perspective of long-term support for loongarch. We at one point decided to move our kernels to Highway which does not currently support loongarch. The general opinion was that we would like to merge this as-is, with the caveat that we may disable it if the split between the two kernel systems (universal intrinsics and highway) becomes too much of a maintenance burden. The contributors are welcome to join our slack channel to discuss the longer term maintenance going forward, or to join a optimization working group meeting.

Sorry to bother you. How much longer will it take to merge this PR, or is there any additional work we need to do?

@jan-wassenberg
Copy link
Contributor

Hi @XiWeiGu , Highway TL here. We are willing to merge a LoongSIMD patch and maintain it if it can be tested using a commonly available QEMU. Happy to discuss via issue on the Highway github.

@XiWeiGu
Copy link

XiWeiGu commented Nov 18, 2024

Hi @XiWeiGu , Highway TL here. We are willing to merge a LoongSIMD patch and maintain it if it can be tested using a commonly available QEMU. Happy to discuss via issue on the Highway github.

Thank you for your response. Currently, the cross-toolchain and QEMU for LoongArch are fully available. Further discussions on LoongArch support will be carried out on the Highway GitHub issue.

@r-devulap r-devulap self-assigned this Nov 18, 2024
Copy link
Member

@r-devulap r-devulap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am not wrong, you are not building the source file with LSX optimizations. Could you also run benchmark numbers?

Also, there is an already an older PR #27402 which is converting this very source file to Highway. And since Highway doesn't support LoongArch just yet, we might want to prioritize getting Loongarch supported on highway.

numpy/distutils/command/build.py Outdated Show resolved Hide resolved
numpy/distutils/ccompiler_opt.py Outdated Show resolved Hide resolved
@ErnstPeng
Copy link
Author

If I am not wrong, you are not building the source file with LSX optimizations. Could you also run benchmark numbers?

Also, there is an already an older PR #27402 which is converting this very source file to Highway. And since Highway doesn't support LoongArch just yet, we might want to prioritize getting Loongarch supported on highway.

I can run benchmarks, it support LSX on my test environment.

@ErnstPeng ErnstPeng requested a review from r-devulap November 19, 2024 02:34
Copy link
Member

@seiko2plus seiko2plus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution! I have a few requests to enhance clarity and optimize the implementation. Please consider the following changes:

.github/workflows/linux_qemu.yml Outdated Show resolved Hide resolved
numpy/_core/src/common/simd/lsx/arithmetic.h Outdated Show resolved Hide resolved
numpy/_core/src/common/simd/lsx/arithmetic.h Show resolved Hide resolved
numpy/_core/src/common/simd/lsx/arithmetic.h Show resolved Hide resolved
numpy/_core/src/common/simd/lsx/arithmetic.h Outdated Show resolved Hide resolved
numpy/_core/src/common/simd/lsx/arithmetic.h Outdated Show resolved Hide resolved
numpy/distutils/ccompiler_opt.py Outdated Show resolved Hide resolved
numpy/_core/src/common/simd/lsx/math.h Show resolved Hide resolved
numpy/_core/src/common/simd/lsx/math.h Show resolved Hide resolved
numpy/_core/src/common/simd/lsx/arithmetic.h Show resolved Hide resolved
@XiWeiGu
Copy link

XiWeiGu commented Nov 19, 2024

If I am not wrong, you are not building the source file with LSX optimizations. Could you also run benchmark numbers?

Also, there is an already an older PR #27402 which is converting this very source file to Highway. And since Highway doesn't support LoongArch just yet, we might want to prioritize getting Loongarch supported on highway.

Considering that Highway supports general scalar instructions, I will submit a PR to Highway as soon as possible to add support for LoongArch, ensuring that PR (#27402) will not cause issues on the LoongArch platform once merged.

@jan-wassenberg
Copy link
Contributor

Considering that Highway supports general scalar instructions, I will submit a PR to Highway as soon as possible to add support for LoongArch, ensuring that PR (#27402) will not cause issues on the LoongArch platform once merged.

Sounds good. It is a little involved to add a target: about 10 places require updating. It is quite doable by searching for all occurrences of say AVX3_ZEN4 (except in x86_*) and adding something there. If you prefer, I can also help by making this change, without actually adding any intrinsics. If so, what should be the name of the targets(s)? HWY_LSX?

@XiWeiGu
Copy link

XiWeiGu commented Nov 19, 2024

Considering that Highway supports general scalar instructions, I will submit a PR to Highway as soon as possible to add support for LoongArch, ensuring that PR (#27402) will not cause issues on the LoongArch platform once merged.

Sounds good. It is a little involved to add a target: about 10 places require updating. It is quite doable by searching for all occurrences of say AVX3_ZEN4 (except in x86_*) and adding something there. If you prefer, I can also help by making this change, without actually adding any intrinsics. If so, what should be the name of the targets(s)? HWY_LSX?

Thank you very much for your kind help; that would be best if possible. Yes, the target name will be HWY_LSX. Do you need any other information about LoongArch?

@charris
Copy link
Member

charris commented Nov 19, 2024

You should not be working in your main branch.

@XiWeiGu
Copy link

XiWeiGu commented Nov 20, 2024

You should not be working in your main branch.

Thank you for your feedback. We will create a new branch to submit the code.

@jan-wassenberg
Copy link
Contributor

Thank you very much for your kind help; that would be best if possible. Yes, the target name will be HWY_LSX. Do you need any other information about LoongArch?

:) I intend to add the new HWY_LSX target on Friday.
There are some capability flags in set_macros-inl.h that I will guess, we can also change them later.
One thing that would be helpful to understand: what are the vector sizes? I saw mention of 128 and 256 bit. In what way does that depend on the CPU?

@XiWeiGu
Copy link

XiWeiGu commented Nov 21, 2024

Thank you very much for your kind help; that would be best if possible. Yes, the target name will be HWY_LSX. Do you need any other information about LoongArch?

:) I intend to add the new HWY_LSX target on Friday. There are some capability flags in set_macros-inl.h that I will guess, we can also change them later. One thing that would be helpful to understand: what are the vector sizes? I saw mention of 128 and 256 bit. In what way does that depend on the CPU?

The vector size of the LSX instruction set is 128 bits, while the vector size of the LASX instruction set is 256 bits.
LoongArch provides the cpucfg instruction to check whether the CPU supports the LSX or LASX instruction sets. However, this may not always be reliable, as some systems might disable LSX and LASX instruction sets for various reasons. Therefore, even if the hardware supports them, we still need to disable the respective instruction sets at runtime. The getauxval interface can be used to determine whether the system supports LSX or LASX.

@jan-wassenberg
Copy link
Contributor

Got it. Highway supports this use case by compiling both LSX and LASX targets, and then dispatching to whichever is present. google/highway#2388 has the basics of this ready.

@mattip
Copy link
Member

mattip commented Nov 27, 2024

Closing, replaced by #27856

@seiko2plus
Copy link
Member

replaced by #27856, Thank you everyone.

@seiko2plus seiko2plus closed this Dec 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
01 - Enhancement component: SIMD Issues in SIMD (fast instruction sets) code or machinery
Projects
Development

Successfully merging this pull request may close these issues.

10 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.