Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

[X86] Remove extra MOV after widening atomic load #138635

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: users/jofrn/spr/main/2894ccd1
Choose a base branch
Loading
from

Conversation

jofrn
Copy link
Contributor

@jofrn jofrn commented May 6, 2025

@llvmbot
Copy link
Member

llvmbot commented May 6, 2025

@llvm/pr-subscribers-backend-x86

Author: None (jofrn)

Changes

This change adds patterns to optimize out an extra MOV
present after widening the atomic load.


Stack:

  • #120716
  • #125432
  • #120640
  • #138635 ⬅
  • #120598
  • #120387
  • #120386
  • #120385
  • #120384

⚠️ Part of a stack created by spr. Do not merge manually using the UI - doing so may have unexpected results.


Full diff: https://github.com/llvm/llvm-project/pull/138635.diff

2 Files Affected:

  • (modified) llvm/lib/Target/X86/X86InstrCompiler.td (+7)
  • (modified) llvm/test/CodeGen/X86/atomic-load-store.ll (+23-20)
diff --git a/llvm/lib/Target/X86/X86InstrCompiler.td b/llvm/lib/Target/X86/X86InstrCompiler.td
index 167e27eddd71e..8ad8a0a6194d6 100644
--- a/llvm/lib/Target/X86/X86InstrCompiler.td
+++ b/llvm/lib/Target/X86/X86InstrCompiler.td
@@ -1200,6 +1200,13 @@ def : Pat<(i16 (atomic_load_nonext_16 addr:$src)), (MOV16rm addr:$src)>;
 def : Pat<(i32 (atomic_load_nonext_32 addr:$src)), (MOV32rm addr:$src)>;
 def : Pat<(i64 (atomic_load_nonext_64 addr:$src)), (MOV64rm addr:$src)>;
 
+def : Pat<(v4i32 (scalar_to_vector (i32 (anyext (i16 (atomic_load_16 addr:$src)))))),
+           (MOVDI2PDIrm addr:$src)>;   // load atomic <2 x i8>
+def : Pat<(v4i32 (scalar_to_vector (i32 (atomic_load_32 addr:$src)))),
+           (MOVDI2PDIrm addr:$src)>;   // load atomic <2 x i16>
+def : Pat<(v2i64 (scalar_to_vector (i64 (atomic_load_64 addr:$src)))),
+           (MOV64toPQIrm  addr:$src)>; // load atomic <2 x i32,float>
+
 // Floating point loads/stores.
 def : Pat<(atomic_store_32 (i32 (bitconvert (f32 FR32:$src))), addr:$dst),
           (MOVSSmr addr:$dst, FR32:$src)>, Requires<[UseSSE1]>;
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 9ee8b4fc5ac7f..935d058a52f8f 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -149,8 +149,7 @@ define <1 x i64> @atomic_vec1_i64_align(ptr %x) nounwind {
 define <2 x i8> @atomic_vec2_i8(ptr %x) {
 ; CHECK3-LABEL: atomic_vec2_i8:
 ; CHECK3:       ## %bb.0:
-; CHECK3-NEXT:    movzwl (%rdi), %eax
-; CHECK3-NEXT:    movd %eax, %xmm0
+; CHECK3-NEXT:    movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
 ; CHECK3-NEXT:    retq
 ;
 ; CHECK0-LABEL: atomic_vec2_i8:
@@ -165,11 +164,15 @@ define <2 x i8> @atomic_vec2_i8(ptr %x) {
 }
 
 define <2 x i16> @atomic_vec2_i16(ptr %x) {
-; CHECK-LABEL: atomic_vec2_i16:
-; CHECK:       ## %bb.0:
-; CHECK-NEXT:    movl (%rdi), %eax
-; CHECK-NEXT:    movd %eax, %xmm0
-; CHECK-NEXT:    retq
+; CHECK3-LABEL: atomic_vec2_i16:
+; CHECK3:       ## %bb.0:
+; CHECK3-NEXT:    movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; CHECK3-NEXT:    retq
+;
+; CHECK0-LABEL: atomic_vec2_i16:
+; CHECK0:       ## %bb.0:
+; CHECK0-NEXT:    movd {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; CHECK0-NEXT:    retq
   %ret = load atomic <2 x i16>, ptr %x acquire, align 4
   ret <2 x i16> %ret
 }
@@ -177,8 +180,7 @@ define <2 x i16> @atomic_vec2_i16(ptr %x) {
 define <2 x ptr addrspace(270)> @atomic_vec2_ptr270(ptr %x) {
 ; CHECK-LABEL: atomic_vec2_ptr270:
 ; CHECK:       ## %bb.0:
-; CHECK-NEXT:    movq (%rdi), %rax
-; CHECK-NEXT:    movq %rax, %xmm0
+; CHECK-NEXT:    movq (%rdi), %xmm0
 ; CHECK-NEXT:    retq
   %ret = load atomic <2 x ptr addrspace(270)>, ptr %x acquire, align 8
   ret <2 x ptr addrspace(270)> %ret
@@ -187,8 +189,7 @@ define <2 x ptr addrspace(270)> @atomic_vec2_ptr270(ptr %x) {
 define <2 x i32> @atomic_vec2_i32_align(ptr %x) {
 ; CHECK-LABEL: atomic_vec2_i32_align:
 ; CHECK:       ## %bb.0:
-; CHECK-NEXT:    movq (%rdi), %rax
-; CHECK-NEXT:    movq %rax, %xmm0
+; CHECK-NEXT:    movq (%rdi), %xmm0
 ; CHECK-NEXT:    retq
   %ret = load atomic <2 x i32>, ptr %x acquire, align 8
   ret <2 x i32> %ret
@@ -197,8 +198,7 @@ define <2 x i32> @atomic_vec2_i32_align(ptr %x) {
 define <2 x float> @atomic_vec2_float_align(ptr %x) {
 ; CHECK-LABEL: atomic_vec2_float_align:
 ; CHECK:       ## %bb.0:
-; CHECK-NEXT:    movq (%rdi), %rax
-; CHECK-NEXT:    movq %rax, %xmm0
+; CHECK-NEXT:    movq (%rdi), %xmm0
 ; CHECK-NEXT:    retq
   %ret = load atomic <2 x float>, ptr %x acquire, align 8
   ret <2 x float> %ret
@@ -354,11 +354,15 @@ define <2 x i32> @atomic_vec2_i32(ptr %x) nounwind {
 }
 
 define <4 x i8> @atomic_vec4_i8(ptr %x) nounwind {
-; CHECK-LABEL: atomic_vec4_i8:
-; CHECK:       ## %bb.0:
-; CHECK-NEXT:    movl (%rdi), %eax
-; CHECK-NEXT:    movd %eax, %xmm0
-; CHECK-NEXT:    retq
+; CHECK3-LABEL: atomic_vec4_i8:
+; CHECK3:       ## %bb.0:
+; CHECK3-NEXT:    movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; CHECK3-NEXT:    retq
+;
+; CHECK0-LABEL: atomic_vec4_i8:
+; CHECK0:       ## %bb.0:
+; CHECK0-NEXT:    movd {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; CHECK0-NEXT:    retq
   %ret = load atomic <4 x i8>, ptr %x acquire, align 4
   ret <4 x i8> %ret
 }
@@ -366,8 +370,7 @@ define <4 x i8> @atomic_vec4_i8(ptr %x) nounwind {
 define <4 x i16> @atomic_vec4_i16(ptr %x) nounwind {
 ; CHECK-LABEL: atomic_vec4_i16:
 ; CHECK:       ## %bb.0:
-; CHECK-NEXT:    movq (%rdi), %rax
-; CHECK-NEXT:    movq %rax, %xmm0
+; CHECK-NEXT:    movq (%rdi), %xmm0
 ; CHECK-NEXT:    retq
   %ret = load atomic <4 x i16>, ptr %x acquire, align 8
   ret <4 x i16> %ret

@jofrn jofrn force-pushed the users/jofrn/spr/main/45989503 branch from 4383732 to 0fcd430 Compare May 6, 2025 15:04
@jofrn jofrn force-pushed the users/jofrn/spr/main/2894ccd1 branch from 0b8a020 to 561e379 Compare May 6, 2025 15:04
@@ -1200,6 +1200,13 @@ def : Pat<(i16 (atomic_load_nonext_16 addr:$src)), (MOV16rm addr:$src)>;
def : Pat<(i32 (atomic_load_nonext_32 addr:$src)), (MOV32rm addr:$src)>;
def : Pat<(i64 (atomic_load_nonext_64 addr:$src)), (MOV64rm addr:$src)>;

def : Pat<(v4i32 (scalar_to_vector (i32 (anyext (i16 (atomic_load_16 addr:$src)))))),
(MOVDI2PDIrm addr:$src)>; // load atomic <2 x i8>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will dereference 32-bits

Copy link
Contributor Author

@jofrn jofrn May 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched it to a zext and now it dereferences 16 bits in the asm. Thanks.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next thing is to add SSE/AVX handling - I've added better test coverage at d27d0c7

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without loss of generality, do we not need the v2 in --check-prefixes=CHECK,CHECKv2-O0 due to divergence of asm?

; RUN: llc < %s -mtriple=x86_64-- -verify-machineinstrs -O0 -mcpu=x86-64-v2 | FileCheck %s --check-prefixes=CHECK,CHECK-O0

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you'll have to add extra check-prefixes - base + v2 can share CHECK-SSE-O* and v3/4 can share a CHECK-AVX-O* prefixes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Thanks!

Since you've made the commits in already, I'll interleave the SSE/AVX updates throughout the series rather than making a new PR.

@jofrn jofrn force-pushed the users/jofrn/spr/main/2894ccd1 branch from 561e379 to 99a560f Compare May 7, 2025 12:53
@jofrn jofrn force-pushed the users/jofrn/spr/main/45989503 branch 2 times, most recently from 5b5d948 to 939a68f Compare May 8, 2025 01:53
@jofrn jofrn force-pushed the users/jofrn/spr/main/2894ccd1 branch from 99a560f to 45d0296 Compare May 8, 2025 01:53
@jofrn jofrn force-pushed the users/jofrn/spr/main/45989503 branch from 939a68f to b6c4b48 Compare May 8, 2025 23:38
@jofrn jofrn force-pushed the users/jofrn/spr/main/2894ccd1 branch from e961155 to c46962c Compare May 9, 2025 12:53
@jofrn jofrn force-pushed the users/jofrn/spr/main/45989503 branch 3 times, most recently from 7e560d9 to e8dc4c2 Compare May 9, 2025 20:03
@jofrn jofrn force-pushed the users/jofrn/spr/main/2894ccd1 branch from 7fc4840 to 2ad7651 Compare May 9, 2025 20:03
@jofrn jofrn force-pushed the users/jofrn/spr/main/45989503 branch 3 times, most recently from e5413e4 to 6312f8c Compare May 12, 2025 05:34
RKSimon added a commit that referenced this pull request May 12, 2025
…ctor atomic memory operations

Help #138635 where we need to ensure correct SSE/AVX load instructions
@jofrn jofrn force-pushed the users/jofrn/spr/main/45989503 branch from 6312f8c to 109bc60 Compare May 27, 2025 17:34
@jofrn jofrn changed the base branch from users/jofrn/spr/main/2894ccd1 to main June 1, 2025 20:46
@jofrn jofrn force-pushed the users/jofrn/spr/main/45989503 branch from 109bc60 to 21475ae Compare June 1, 2025 20:46
@jofrn jofrn changed the base branch from main to users/jofrn/spr/main/2894ccd1 June 1, 2025 20:46
This change adds patterns to optimize out an extra MOV
present after widening the atomic load.

commit-id:45989503
@jofrn jofrn force-pushed the users/jofrn/spr/main/2894ccd1 branch from c315792 to 768b1a9 Compare June 2, 2025 04:15
@jofrn jofrn force-pushed the users/jofrn/spr/main/45989503 branch from 21475ae to b2b23bf Compare June 2, 2025 04:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.