[SelectionDAG][X86] Remove unused elements from atomic vector. #125432

jofrn · Feb 2, 2025

After splitting, all elements are created. The two components must
be found by looking at the upper and lower half of the value.
This change extends EltsFromConsecutiveLoads
to understand AtomicSDNode so that unused elements can be removed.

Stack:

⚠️ Part of a stack created by spr. Do not merge manually using the UI - doing so may have unexpected results.

llvmbot · Feb 2, 2025

@llvm/pr-subscribers-llvm-selectiondag

@llvm/pr-subscribers-backend-x86

Author: None (jofrn)

Changes

After splitting, all elements are created. The elements are
placed back into a concat_vectors. This change extends EltsFromConsecutiveLoads
to understand AtomicSDNode so that its concat_vectors can be
mapped to a BUILD_VECTOR and so unused elements are no longer
referenced.

Stack:

#120716
#125432 ⬅
#120640
#120598
#120387
#120386
#120385
#120384

⚠️ Part of a stack created by spr. Do not merge manually using the UI - doing so may have unexpected results.

Full diff: https://github.com/llvm/llvm-project/pull/125432.diff

6 Files Affected:

(modified) llvm/include/llvm/CodeGen/SelectionDAG.h (+2-2)
(modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp (+12-8)
(modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp (+17-13)
(modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (+5-1)
(modified) llvm/lib/Target/X86/X86ISelLowering.cpp (+17-11)
(modified) llvm/test/CodeGen/X86/atomic-load-store.ll (+16-151)

diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h b/llvm/include/llvm/CodeGen/SelectionDAG.h
index 461c0c1ead16d2..bea5958ec0bba6 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAG.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAG.h
@@ -1840,7 +1840,7 @@ class SelectionDAG {
   /// chain to the token factor. This ensures that the new memory node will have
   /// the same relative memory dependency position as the old load. Returns the
   /// new merged load chain.
-  SDValue makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, SDValue NewMemOp);
+  SDValue makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp);
 
   /// Topological-sort the AllNodes list and a
   /// assign a unique node id for each node in the DAG based on their
@@ -2264,7 +2264,7 @@ class SelectionDAG {
   /// merged. Check that both are nonvolatile and if LD is loading
   /// 'Bytes' bytes from a location that is 'Dist' units away from the
   /// location that the 'Base' load is loading from.
-  bool areNonVolatileConsecutiveLoads(LoadSDNode *LD, LoadSDNode *Base,
+  bool areNonVolatileConsecutiveLoads(MemSDNode *LD, MemSDNode *Base,
                                       unsigned Bytes, int Dist) const;
 
   /// Infer alignment of a load / store address. Return std::nullopt if it
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index b416c0efbbc4fc..5f274fabfe8d64 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -12167,7 +12167,7 @@ SDValue SelectionDAG::makeEquivalentMemoryOrdering(SDValue OldChain,
   return TokenFactor;
 }
 
-SDValue SelectionDAG::makeEquivalentMemoryOrdering(LoadSDNode *OldLoad,
+SDValue SelectionDAG::makeEquivalentMemoryOrdering(MemSDNode *OldLoad,
                                                    SDValue NewMemOp) {
   assert(isa<MemSDNode>(NewMemOp.getNode()) && "Expected a memop node");
   SDValue OldChain = SDValue(OldLoad, 1);
@@ -12879,17 +12879,21 @@ std::pair<SDValue, SDValue> SelectionDAG::UnrollVectorOverflowOp(
                         getBuildVector(NewOvVT, dl, OvScalars));
 }
 
-bool SelectionDAG::areNonVolatileConsecutiveLoads(LoadSDNode *LD,
-                                                  LoadSDNode *Base,
+bool SelectionDAG::areNonVolatileConsecutiveLoads(MemSDNode *LD,
+                                                  MemSDNode *Base,
                                                   unsigned Bytes,
                                                   int Dist) const {
   if (LD->isVolatile() || Base->isVolatile())
     return false;
-  // TODO: probably too restrictive for atomics, revisit
-  if (!LD->isSimple())
-    return false;
-  if (LD->isIndexed() || Base->isIndexed())
-    return false;
+  if (auto Ld = dyn_cast<LoadSDNode>(LD)) {
+    if (!Ld->isSimple())
+      return false;
+    if (Ld->isIndexed())
+      return false;
+  }
+  if (auto Ld = dyn_cast<LoadSDNode>(Base))
+    if (Ld->isIndexed())
+      return false;
   if (LD->getChain() != Base->getChain())
     return false;
   EVT VT = LD->getMemoryVT();
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp
index f2ab88851b780e..a19af64a796229 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp
@@ -194,8 +194,8 @@ bool BaseIndexOffset::contains(const SelectionDAG &DAG, int64_t BitSize,
   return false;
 }
 
-/// Parses tree in Ptr for base, index, offset addresses.
-static BaseIndexOffset matchLSNode(const LSBaseSDNode *N,
+template <typename T>
+static BaseIndexOffset matchSDNode(const T *N,
                                    const SelectionDAG &DAG) {
   SDValue Ptr = N->getBasePtr();
 
@@ -206,16 +206,18 @@ static BaseIndexOffset matchLSNode(const LSBaseSDNode *N,
   bool IsIndexSignExt = false;
 
   // pre-inc/pre-dec ops are components of EA.
-  if (N->getAddressingMode() == ISD::PRE_INC) {
-    if (auto *C = dyn_cast<ConstantSDNode>(N->getOffset()))
-      Offset += C->getSExtValue();
-    else // If unknown, give up now.
-      return BaseIndexOffset(SDValue(), SDValue(), 0, false);
-  } else if (N->getAddressingMode() == ISD::PRE_DEC) {
-    if (auto *C = dyn_cast<ConstantSDNode>(N->getOffset()))
-      Offset -= C->getSExtValue();
-    else // If unknown, give up now.
-      return BaseIndexOffset(SDValue(), SDValue(), 0, false);
+  if constexpr (std::is_same_v<T, LSBaseSDNode>) {
+    if (N->getAddressingMode() == ISD::PRE_INC) {
+      if (auto *C = dyn_cast<ConstantSDNode>(N->getOffset()))
+        Offset += C->getSExtValue();
+      else // If unknown, give up now.
+        return BaseIndexOffset(SDValue(), SDValue(), 0, false);
+    } else if (N->getAddressingMode() == ISD::PRE_DEC) {
+      if (auto *C = dyn_cast<ConstantSDNode>(N->getOffset()))
+        Offset -= C->getSExtValue();
+      else // If unknown, give up now.
+        return BaseIndexOffset(SDValue(), SDValue(), 0, false);
+    }
   }
 
   // Consume constant adds & ors with appropriate masking.
@@ -300,8 +302,10 @@ static BaseIndexOffset matchLSNode(const LSBaseSDNode *N,
 
 BaseIndexOffset BaseIndexOffset::match(const SDNode *N,
                                        const SelectionDAG &DAG) {
+  if (const auto *AN = dyn_cast<AtomicSDNode>(N))
+    return matchSDNode(AN, DAG);
   if (const auto *LS0 = dyn_cast<LSBaseSDNode>(N))
-    return matchLSNode(LS0, DAG);
+    return matchSDNode(LS0, DAG);
   if (const auto *LN = dyn_cast<LifetimeSDNode>(N)) {
     if (LN->hasOffset())
       return BaseIndexOffset(LN->getOperand(1), SDValue(), LN->getOffset(),
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 428e7a316d247b..7e784b2919c2a6 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -5218,7 +5218,11 @@ void SelectionDAGBuilder::visitAtomicLoad(const LoadInst &I) {
     L = DAG.getPtrExtOrTrunc(L, dl, VT);
 
   setValue(&I, L);
-  DAG.setRoot(OutChain);
+
+  if (VT.isVector())
+    DAG.setRoot(InChain);
+  else
+    DAG.setRoot(OutChain);
 }
 
 void SelectionDAGBuilder::visitAtomicStore(const StoreInst &I) {
diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index ba9ac2f21c7564..3b8f3dd1e9a5e9 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -7074,15 +7074,20 @@ static SDValue LowerAsSplatVectorLoad(SDValue SrcOp, MVT VT, const SDLoc &dl,
 }
 
 // Recurse to find a LoadSDNode source and the accumulated ByteOffest.
-static bool findEltLoadSrc(SDValue Elt, LoadSDNode *&Ld, int64_t &ByteOffset) {
-  if (ISD::isNON_EXTLoad(Elt.getNode())) {
-    auto *BaseLd = cast<LoadSDNode>(Elt);
-    if (!BaseLd->isSimple())
-      return false;
+static bool findEltLoadSrc(SDValue Elt, MemSDNode *&Ld, int64_t &ByteOffset) {
+  if (auto *BaseLd = dyn_cast<AtomicSDNode>(Elt)) {
     Ld = BaseLd;
     ByteOffset = 0;
     return true;
   }
+  else if (auto *BaseLd = dyn_cast<LoadSDNode>(Elt))
+    if (ISD::isNON_EXTLoad(Elt.getNode())) {
+      if (!BaseLd->isSimple())
+        return false;
+      Ld = BaseLd;
+      ByteOffset = 0;
+      return true;
+    }
 
   switch (Elt.getOpcode()) {
   case ISD::BITCAST:
@@ -7135,7 +7140,7 @@ static SDValue EltsFromConsecutiveLoads(EVT VT, ArrayRef<SDValue> Elts,
   APInt ZeroMask = APInt::getZero(NumElems);
   APInt UndefMask = APInt::getZero(NumElems);
 
-  SmallVector<LoadSDNode*, 8> Loads(NumElems, nullptr);
+  SmallVector<MemSDNode*, 8> Loads(NumElems, nullptr);
   SmallVector<int64_t, 8> ByteOffsets(NumElems, 0);
 
   // For each element in the initializer, see if we've found a load, zero or an
@@ -7185,7 +7190,7 @@ static SDValue EltsFromConsecutiveLoads(EVT VT, ArrayRef<SDValue> Elts,
   EVT EltBaseVT = EltBase.getValueType();
   assert(EltBaseVT.getSizeInBits() == EltBaseVT.getStoreSizeInBits() &&
          "Register/Memory size mismatch");
-  LoadSDNode *LDBase = Loads[FirstLoadedElt];
+  MemSDNode *LDBase = Loads[FirstLoadedElt];
   assert(LDBase && "Did not find base load for merging consecutive loads");
   unsigned BaseSizeInBits = EltBaseVT.getStoreSizeInBits();
   unsigned BaseSizeInBytes = BaseSizeInBits / 8;
@@ -7199,8 +7204,8 @@ static SDValue EltsFromConsecutiveLoads(EVT VT, ArrayRef<SDValue> Elts,
 
   // Check to see if the element's load is consecutive to the base load
   // or offset from a previous (already checked) load.
-  auto CheckConsecutiveLoad = [&](LoadSDNode *Base, int EltIdx) {
-    LoadSDNode *Ld = Loads[EltIdx];
+  auto CheckConsecutiveLoad = [&](MemSDNode *Base, int EltIdx) {
+    MemSDNode *Ld = Loads[EltIdx];
     int64_t ByteOffset = ByteOffsets[EltIdx];
     if (ByteOffset && (ByteOffset % BaseSizeInBytes) == 0) {
       int64_t BaseIdx = EltIdx - (ByteOffset / BaseSizeInBytes);
@@ -7228,7 +7233,7 @@ static SDValue EltsFromConsecutiveLoads(EVT VT, ArrayRef<SDValue> Elts,
     }
   }
 
-  auto CreateLoad = [&DAG, &DL, &Loads](EVT VT, LoadSDNode *LDBase) {
+  auto CreateLoad = [&DAG, &DL, &Loads](EVT VT, MemSDNode *LDBase) {
     auto MMOFlags = LDBase->getMemOperand()->getFlags();
     assert(LDBase->isSimple() &&
            "Cannot merge volatile or atomic loads.");
@@ -9271,8 +9276,9 @@ X86TargetLowering::LowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG) const {
   {
     SmallVector<SDValue, 64> Ops(Op->op_begin(), Op->op_begin() + NumElems);
     if (SDValue LD =
-            EltsFromConsecutiveLoads(VT, Ops, dl, DAG, Subtarget, false))
+            EltsFromConsecutiveLoads(VT, Ops, dl, DAG, Subtarget, false)) {
       return LD;
+    }
   }
 
   // If this is a splat of pairs of 32-bit elements, we can use a narrower
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 42b09558242934..08d0405345f573 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -205,63 +205,19 @@ define <2 x float> @atomic_vec2_float_align(ptr %x) {
 }
 
 define <2 x half> @atomic_vec2_half(ptr %x) {
-; CHECK3-LABEL: atomic_vec2_half:
-; CHECK3:       ## %bb.0:
-; CHECK3-NEXT:    movl (%rdi), %eax
-; CHECK3-NEXT:    pinsrw $0, %eax, %xmm0
-; CHECK3-NEXT:    shrl $16, %eax
-; CHECK3-NEXT:    pinsrw $0, %eax, %xmm1
-; CHECK3-NEXT:    punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
-; CHECK3-NEXT:    retq
-;
-; CHECK0-LABEL: atomic_vec2_half:
-; CHECK0:       ## %bb.0:
-; CHECK0-NEXT:    movl (%rdi), %eax
-; CHECK0-NEXT:    movl %eax, %ecx
-; CHECK0-NEXT:    shrl $16, %ecx
-; CHECK0-NEXT:    movw %cx, %dx
-; CHECK0-NEXT:    ## implicit-def: $ecx
-; CHECK0-NEXT:    movw %dx, %cx
-; CHECK0-NEXT:    ## implicit-def: $xmm1
-; CHECK0-NEXT:    pinsrw $0, %ecx, %xmm1
-; CHECK0-NEXT:    movw %ax, %cx
-; CHECK0-NEXT:    ## implicit-def: $eax
-; CHECK0-NEXT:    movw %cx, %ax
-; CHECK0-NEXT:    ## implicit-def: $xmm0
-; CHECK0-NEXT:    pinsrw $0, %eax, %xmm0
-; CHECK0-NEXT:    punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
-; CHECK0-NEXT:    retq
+; CHECK-LABEL: atomic_vec2_half:
+; CHECK:       ## %bb.0:
+; CHECK-NEXT:    movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; CHECK-NEXT:    retq
   %ret = load atomic <2 x half>, ptr %x acquire, align 4
   ret <2 x half> %ret
 }
 
 define <2 x bfloat> @atomic_vec2_bfloat(ptr %x) {
-; CHECK3-LABEL: atomic_vec2_bfloat:
-; CHECK3:       ## %bb.0:
-; CHECK3-NEXT:    movl (%rdi), %eax
-; CHECK3-NEXT:    pinsrw $0, %eax, %xmm0
-; CHECK3-NEXT:    shrl $16, %eax
-; CHECK3-NEXT:    pinsrw $0, %eax, %xmm1
-; CHECK3-NEXT:    punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
-; CHECK3-NEXT:    retq
-;
-; CHECK0-LABEL: atomic_vec2_bfloat:
-; CHECK0:       ## %bb.0:
-; CHECK0-NEXT:    movl (%rdi), %eax
-; CHECK0-NEXT:    movl %eax, %ecx
-; CHECK0-NEXT:    shrl $16, %ecx
-; CHECK0-NEXT:    ## kill: def $cx killed $cx killed $ecx
-; CHECK0-NEXT:    movw %ax, %dx
-; CHECK0-NEXT:    ## implicit-def: $eax
-; CHECK0-NEXT:    movw %dx, %ax
-; CHECK0-NEXT:    ## implicit-def: $xmm0
-; CHECK0-NEXT:    pinsrw $0, %eax, %xmm0
-; CHECK0-NEXT:    ## implicit-def: $eax
-; CHECK0-NEXT:    movw %cx, %ax
-; CHECK0-NEXT:    ## implicit-def: $xmm1
-; CHECK0-NEXT:    pinsrw $0, %eax, %xmm1
-; CHECK0-NEXT:    punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
-; CHECK0-NEXT:    retq
+; CHECK-LABEL: atomic_vec2_bfloat:
+; CHECK:       ## %bb.0:
+; CHECK-NEXT:    movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; CHECK-NEXT:    retq
   %ret = load atomic <2 x bfloat>, ptr %x acquire, align 4
   ret <2 x bfloat> %ret
 }
@@ -439,110 +395,19 @@ define <4 x i16> @atomic_vec4_i16(ptr %x) nounwind {
 }
 
 define <4 x half> @atomic_vec4_half(ptr %x) nounwind {
-; CHECK3-LABEL: atomic_vec4_half:
-; CHECK3:       ## %bb.0:
-; CHECK3-NEXT:    movq (%rdi), %rax
-; CHECK3-NEXT:    movl %eax, %ecx
-; CHECK3-NEXT:    shrl $16, %ecx
-; CHECK3-NEXT:    pinsrw $0, %ecx, %xmm1
-; CHECK3-NEXT:    pinsrw $0, %eax, %xmm0
-; CHECK3-NEXT:    movq %rax, %rcx
-; CHECK3-NEXT:    shrq $32, %rcx
-; CHECK3-NEXT:    pinsrw $0, %ecx, %xmm2
-; CHECK3-NEXT:    shrq $48, %rax
-; CHECK3-NEXT:    pinsrw $0, %eax, %xmm3
-; CHECK3-NEXT:    punpcklwd {{.*#+}} xmm2 = xmm2[0],xmm3[0],xmm2[1],xmm3[1],xmm2[2],xmm3[2],xmm2[3],xmm3[3]
-; CHECK3-NEXT:    punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
-; CHECK3-NEXT:    punpckldq {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
-; CHECK3-NEXT:    retq
-;
-; CHECK0-LABEL: atomic_vec4_half:
-; CHECK0:       ## %bb.0:
-; CHECK0-NEXT:    movq (%rdi), %rax
-; CHECK0-NEXT:    movl %eax, %ecx
-; CHECK0-NEXT:    shrl $16, %ecx
-; CHECK0-NEXT:    movw %cx, %dx
-; CHECK0-NEXT:    ## implicit-def: $ecx
-; CHECK0-NEXT:    movw %dx, %cx
-; CHECK0-NEXT:    ## implicit-def: $xmm2
-; CHECK0-NEXT:    pinsrw $0, %ecx, %xmm2
-; CHECK0-NEXT:    movw %ax, %dx
-; CHECK0-NEXT:    ## implicit-def: $ecx
-; CHECK0-NEXT:    movw %dx, %cx
-; CHECK0-NEXT:    ## implicit-def: $xmm0
-; CHECK0-NEXT:    pinsrw $0, %ecx, %xmm0
-; CHECK0-NEXT:    movq %rax, %rcx
-; CHECK0-NEXT:    shrq $32, %rcx
-; CHECK0-NEXT:    movw %cx, %dx
-; CHECK0-NEXT:    ## implicit-def: $ecx
-; CHECK0-NEXT:    movw %dx, %cx
-; CHECK0-NEXT:    ## implicit-def: $xmm1
-; CHECK0-NEXT:    pinsrw $0, %ecx, %xmm1
-; CHECK0-NEXT:    shrq $48, %rax
-; CHECK0-NEXT:    movw %ax, %cx
-; CHECK0-NEXT:    ## implicit-def: $eax
-; CHECK0-NEXT:    movw %cx, %ax
-; CHECK0-NEXT:    ## implicit-def: $xmm3
-; CHECK0-NEXT:    pinsrw $0, %eax, %xmm3
-; CHECK0-NEXT:    punpcklwd {{.*#+}} xmm1 = xmm1[0],xmm3[0],xmm1[1],xmm3[1],xmm1[2],xmm3[2],xmm1[3],xmm3[3]
-; CHECK0-NEXT:    punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3]
-; CHECK0-NEXT:    unpcklps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
-; CHECK0-NEXT:    retq
+; CHECK-LABEL: atomic_vec4_half:
+; CHECK:       ## %bb.0:
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK-NEXT:    retq
   %ret = load atomic <4 x half>, ptr %x acquire, align 8
   ret <4 x half> %ret
 }
 
 define <4 x bfloat> @atomic_vec4_bfloat(ptr %x) nounwind {
-; CHECK3-LABEL: atomic_vec4_bfloat:
-; CHECK3:       ## %bb.0:
-; CHECK3-NEXT:    movq (%rdi), %rax
-; CHECK3-NEXT:    movq %rax, %rcx
-; CHECK3-NEXT:    movq %rax, %rdx
-; CHECK3-NEXT:    pinsrw $0, %eax, %xmm0
-; CHECK3-NEXT:    ## kill: def $eax killed $eax killed $rax
-; CHECK3-NEXT:    shrl $16, %eax
-; CHECK3-NEXT:    shrq $32, %rcx
-; CHECK3-NEXT:    shrq $48, %rdx
-; CHECK3-NEXT:    pinsrw $0, %edx, %xmm1
-; CHECK3-NEXT:    pinsrw $0, %ecx, %xmm2
-; CHECK3-NEXT:    punpcklwd {{.*#+}} xmm2 = xmm2[0],xmm1[0],xmm2[1],xmm1[1],xmm2[2],xmm1[2],xmm2[3],xmm1[3]
-; CHECK3-NEXT:    pinsrw $0, %eax, %xmm1
-; CHECK3-NEXT:    punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
-; CHECK3-NEXT:    punpckldq {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
-; CHECK3-NEXT:    retq
-;
-; CHECK0-LABEL: atomic_vec4_bfloat:
-; CHECK0:       ## %bb.0:
-; CHECK0-NEXT:    movq (%rdi), %rax
-; CHECK0-NEXT:    movl %eax, %ecx
-; CHECK0-NEXT:    shrl $16, %ecx
-; CHECK0-NEXT:    ## kill: def $cx killed $cx killed $ecx
-; CHECK0-NEXT:    movw %ax, %dx
-; CHECK0-NEXT:    movq %rax, %rsi
-; CHECK0-NEXT:    shrq $32, %rsi
-; CHECK0-NEXT:    ## kill: def $si killed $si killed $rsi
-; CHECK0-NEXT:    shrq $48, %rax
-; CHECK0-NEXT:    movw %ax, %di
-; CHECK0-NEXT:    ## implicit-def: $eax
-; CHECK0-NEXT:    movw %di, %ax
-; CHECK0-NEXT:    ## implicit-def: $xmm0
-; CHECK0-NEXT:    pinsrw $0, %eax, %xmm0
-; CHECK0-NEXT:    ## implicit-def: $eax
-; CHECK0-NEXT:    movw %si, %ax
-; CHECK0-NEXT:    ## implicit-def: $xmm1
-; CHECK0-NEXT:    pinsrw $0, %eax, %xmm1
-; CHECK0-NEXT:    punpcklwd {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3]
-; CHECK0-NEXT:    ## implicit-def: $eax
-; CHECK0-NEXT:    movw %dx, %ax
-; CHECK0-NEXT:    ## implicit-def: $xmm0
-; CHECK0-NEXT:    pinsrw $0, %eax, %xmm0
-; CHECK0-NEXT:    ## implicit-def: $eax
-; CHECK0-NEXT:    movw %cx, %ax
-; CHECK0-NEXT:    ## implicit-def: $xmm2
-; CHECK0-NEXT:    pinsrw $0, %eax, %xmm2
-; CHECK0-NEXT:    punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3]
-; CHECK0-NEXT:    unpcklps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
-; CHECK0-NEXT:    retq
+; CHECK-LABEL: atomic_vec4_bfloat:
+; CHECK:       ## %bb.0:
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK-NEXT:    retq
   %ret = load atomic <4 x bfloat>, ptr %x acquire, align 8
   ret <4 x bfloat> %ret
 }

github-actions · Feb 2, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

RKSimon · Mar 4, 2025

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

-bool SelectionDAG::areNonVolatileConsecutiveLoads(LoadSDNode *LD,
-                                                  LoadSDNode *Base,
+bool SelectionDAG::areNonVolatileConsecutiveLoads(MemSDNode *LD,
+                                                  MemSDNode *Base,
                                                  unsigned Bytes,
                                                  int Dist) const {
  if (LD->isVolatile() || Base->isVolatile())


assert/earlyout if either of LD/BASE aren't read-only?

arsenm

I don't think you should need to do this. I suspect this is a consequence of your vector legalization patch bypassing the legalization maps

jofrn · May 8, 2025

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

@@ -5172,7 +5172,11 @@ void SelectionDAGBuilder::visitAtomicLoad(const LoadInst &I) {
    L = DAG.getPtrExtOrTrunc(L, dl, VT);

  setValue(&I, L);
-  DAG.setRoot(OutChain);
+
+  if (VT.isVector())


If we remove this, the tests associated will each get one extra MOV. combineVZEXT_LOAD method is made to remove this.

arsenm

Title seems to not match the implementation anymore

arsenm · May 8, 2025

llvm/lib/Target/X86/X86ISelLowering.cpp

@@ -7230,6 +7234,20 @@ static bool findEltLoadSrc(SDValue Elt, LoadSDNode *&Ld, int64_t &ByteOffset) {
      }
    }
    break;
+  case ISD::EXTRACT_ELEMENT:
+    if (auto *IdxC = dyn_cast<ConstantSDNode>(Elt.getOperand(1))) {


Pretty sure this must be a constant. But this could also be done separately, it's not related to the atomic

But also, can we avoid this by not using EXTRACT_ELEMENT in the first place? EXTRACT_ELEMENT has weird handling I've never understood where it's hardly used

Pretty sure this must be a constant. But this could also be done separately, it's not related to the atomic

If it isn't a constant, shall we assert false or abort the transform? Right now it aborts the transform. If we know it always is and will be constant, I guess asserting is better. Separately as in another PR? There would be no associated test change as this is required for the optimization.

But also, can we avoid this by not using EXTRACT_ELEMENT in the first place? EXTRACT_ELEMENT has weird handling I've never understood where it's hardly used

Yes, if we use the ones already implemented here, then we will be able to discover the ByteOffset and it'll work. Why do we want to avoid EXTRACT_ELEMENT? It seems to work here.

If it isn't a constant, shall we assert false or abort the transform?

Ignore it as a possibility and just let it crash. No efforts should be made to support invalid constructs

Why do we want to avoid EXTRACT_ELEMENT? It seems to work here.

Of course it works, but seems poorly supported by optimizations. In particular the comment on it says "This is only for use before legalization, for values that will be broken into multiple registers.". This is use during legalization (dont' really know why we have this, or this fake restriction documented). You could get the same with just shifts.

llvm/lib/Target/X86/X86ISelLowering.cpp

After splitting, all elements are created. The two components must be found by looking at the upper and lower half of the value. This change extends EltsFromConsecutiveLoads to understand AtomicSDNode so that unused elements can be removed. commit-id:b83937a8

jofrn · Jun 1, 2025

Closing pull request: commit has gone away

llvmbot added backend:X86 llvm:SelectionDAG SelectionDAGISel as well labels Feb 2, 2025

jofrn force-pushed the users/jofrn/spr/main/b83937a8 branch from fbe99a2 to 1c10a4c Compare February 2, 2025 20:43

jofrn force-pushed the users/jofrn/spr/main/b83937a8 branch from 1c10a4c to d7d7c3b Compare March 3, 2025 23:26

RKSimon requested a review from arsenm March 4, 2025 10:51

RKSimon reviewed Mar 4, 2025

View reviewed changes

arsenm reviewed Mar 4, 2025

View reviewed changes

jofrn force-pushed the users/jofrn/spr/main/b83937a8 branch from d7d7c3b to ed8d4f8 Compare April 25, 2025 20:51

jofrn force-pushed the users/jofrn/spr/main/3a045357 branch from 0d766dd to bf3f6b0 Compare April 25, 2025 20:51

jofrn force-pushed the users/jofrn/spr/main/b83937a8 branch 2 times, most recently from de6a810 to b0d976b Compare April 26, 2025 07:57

jofrn force-pushed the users/jofrn/spr/main/3a045357 branch from 7565845 to 1dd74be Compare April 26, 2025 07:57

jofrn force-pushed the users/jofrn/spr/main/b83937a8 branch 3 times, most recently from 33a5ca1 to abb646b Compare May 1, 2025 06:45

jofrn force-pushed the users/jofrn/spr/main/3a045357 branch from 2faa227 to 71d49aa Compare May 5, 2025 17:45

jofrn force-pushed the users/jofrn/spr/main/b83937a8 branch 2 times, most recently from d108ebd to fc2debe Compare May 6, 2025 03:50

jofrn force-pushed the users/jofrn/spr/main/3a045357 branch from 71d49aa to db5b862 Compare May 6, 2025 03:50

jofrn force-pushed the users/jofrn/spr/main/b83937a8 branch from fc2debe to 6897d31 Compare May 6, 2025 04:15

jofrn force-pushed the users/jofrn/spr/main/3a045357 branch from db5b862 to 28f6bf3 Compare May 6, 2025 04:15

jofrn mentioned this pull request May 6, 2025

[X86] Remove extra MOV after widening atomic load #138635

Open

jofrn changed the base branch from main to users/jofrn/spr/main/3a045357 May 6, 2025 06:04

jofrn force-pushed the users/jofrn/spr/main/3a045357 branch from 309d817 to cd4402a Compare May 6, 2025 15:04

jofrn force-pushed the users/jofrn/spr/main/b83937a8 branch 2 times, most recently from 7a786f2 to 0890009 Compare May 7, 2025 12:53

jofrn commented May 8, 2025

View reviewed changes

jofrn force-pushed the users/jofrn/spr/main/3a045357 branch from 0e4399d to 4783f04 Compare May 8, 2025 01:53

jofrn force-pushed the users/jofrn/spr/main/b83937a8 branch 2 times, most recently from c5cfc13 to b50f678 Compare May 8, 2025 09:13

arsenm reviewed May 8, 2025

View reviewed changes

jofrn force-pushed the users/jofrn/spr/main/3a045357 branch from 4783f04 to 13a5e87 Compare May 8, 2025 23:38

jofrn force-pushed the users/jofrn/spr/main/b83937a8 branch 2 times, most recently from 479a227 to 6f96cf8 Compare May 9, 2025 12:53

jofrn force-pushed the users/jofrn/spr/main/3a045357 branch 2 times, most recently from 3e8de67 to b51658d Compare May 9, 2025 19:43

jofrn force-pushed the users/jofrn/spr/main/b83937a8 branch from 6f96cf8 to 053d34b Compare May 9, 2025 19:43

jofrn force-pushed the users/jofrn/spr/main/3a045357 branch from b51658d to 507069a Compare May 9, 2025 20:03

jofrn force-pushed the users/jofrn/spr/main/b83937a8 branch 2 times, most recently from 3a1f677 to bf8fc80 Compare May 10, 2025 08:27

jofrn force-pushed the users/jofrn/spr/main/3a045357 branch from 507069a to 40b0a4e Compare May 10, 2025 08:27

jofrn force-pushed the users/jofrn/spr/main/b83937a8 branch from bf8fc80 to 9fe563b Compare May 11, 2025 00:05

jofrn force-pushed the users/jofrn/spr/main/3a045357 branch from 40b0a4e to eda6b72 Compare May 11, 2025 00:05

arsenm reviewed May 11, 2025

View reviewed changes

llvm/lib/Target/X86/X86ISelLowering.cpp Outdated Show resolved Hide resolved

jofrn force-pushed the users/jofrn/spr/main/3a045357 branch from eda6b72 to 4fccbd6 Compare May 12, 2025 05:34

jofrn force-pushed the users/jofrn/spr/main/b83937a8 branch from 9fe563b to 684a542 Compare May 12, 2025 05:34

jofrn force-pushed the users/jofrn/spr/main/3a045357 branch from 4fccbd6 to f916347 Compare May 27, 2025 17:34

jofrn force-pushed the users/jofrn/spr/main/b83937a8 branch from 684a542 to 7ba3e69 Compare May 27, 2025 17:34

jofrn closed this Jun 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SelectionDAG][X86] Remove unused elements from atomic vector. #125432

[SelectionDAG][X86] Remove unused elements from atomic vector. #125432

Uh oh!

jofrn commented Feb 2, 2025 •

edited

Loading

Uh oh!

llvmbot commented Feb 2, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Feb 2, 2025 •

edited

Loading

Uh oh!

RKSimon Mar 4, 2025

Uh oh!

arsenm left a comment

Uh oh!

jofrn May 8, 2025

Uh oh!

arsenm left a comment

Uh oh!

arsenm May 8, 2025

Uh oh!

jofrn May 8, 2025 •

edited

Loading

Uh oh!

arsenm May 8, 2025

Uh oh!

Uh oh!

jofrn commented Jun 1, 2025

Uh oh!

Uh oh!

Search code, repositories, users, issues, pull requests...

[SelectionDAG][X86] Remove unused elements from atomic vector. #125432

[SelectionDAG][X86] Remove unused elements from atomic vector. #125432

Uh oh!

Conversation

jofrn commented Feb 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Feb 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RKSimon Mar 4, 2025

Choose a reason for hiding this comment

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

jofrn May 8, 2025

Choose a reason for hiding this comment

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

arsenm May 8, 2025

Choose a reason for hiding this comment

Uh oh!

jofrn May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arsenm May 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jofrn commented Jun 1, 2025

Uh oh!

Uh oh!

jofrn commented Feb 2, 2025 •

edited

Loading

llvmbot commented Feb 2, 2025 •

edited

Loading

github-actions bot commented Feb 2, 2025 •

edited

Loading

jofrn May 8, 2025 •

edited

Loading