Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

[SelectionDAG][X86] Remove unused elements from atomic vector. #125432

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

jofrn
Copy link
Contributor

@jofrn jofrn commented Feb 2, 2025

@llvmbot
Copy link
Member

llvmbot commented Feb 2, 2025

@llvm/pr-subscribers-llvm-selectiondag

@llvm/pr-subscribers-backend-x86

Author: None (jofrn)

Changes

After splitting, all elements are created. The elements are
placed back into a concat_vectors. This change extends EltsFromConsecutiveLoads
to understand AtomicSDNode so that its concat_vectors can be
mapped to a BUILD_VECTOR and so unused elements are no longer
referenced.


Stack:

  • #120716
  • #125432 ⬅
  • #120640
  • #120598
  • #120387
  • #120386
  • #120385
  • #120384

⚠️ Part of a stack created by spr. Do not merge manually using the UI - doing so may have unexpected results.


Full diff: https://github.com/llvm/llvm-project/pull/125432.diff

6 Files Affected:

  • (modified) llvm/include/llvm/CodeGen/SelectionDAG.h (+2-2)
  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp (+12-8)
  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp (+17-13)
  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (+5-1)
  • (modified) llvm/lib/Target/X86/X86ISelLowering.cpp (+17-11)
  • (modified) llvm/test/CodeGen/X86/atomic-load-store.ll (+16-151)
diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h b/llvm/include/llvm/CodeGen/SelectionDAG.h
index 461c0c1ead16d2..bea5958ec0bba6 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAG.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAG.h
@@ -1840,7 +1840,7 @@ class SelectionDAG {
   /// chain to the token factor. This ensures that the new memory node will have
   /// the same relative memory dependency position as the old load. Returns the
   /// new merged load chain.
-  SDValue makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, SDValue NewMemOp);
+  SDValue makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp);
 
   /// Topological-sort the AllNodes list and a
   /// assign a unique node id for each node in the DAG based on their
@@ -2264,7 +2264,7 @@ class SelectionDAG {
   /// merged. Check that both are nonvolatile and if LD is loading
   /// 'Bytes' bytes from a location that is 'Dist' units away from the
   /// location that the 'Base' load is loading from.
-  bool areNonVolatileConsecutiveLoads(LoadSDNode *LD, LoadSDNode *Base,
+  bool areNonVolatileConsecutiveLoads(MemSDNode *LD, MemSDNode *Base,
                                       unsigned Bytes, int Dist) const;
 
   /// Infer alignment of a load / store address. Return std::nullopt if it
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index b416c0efbbc4fc..5f274fabfe8d64 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -12167,7 +12167,7 @@ SDValue SelectionDAG::makeEquivalentMemoryOrdering(SDValue OldChain,
   return TokenFactor;
 }
 
-SDValue SelectionDAG::makeEquivalentMemoryOrdering(LoadSDNode *OldLoad,
+SDValue SelectionDAG::makeEquivalentMemoryOrdering(MemSDNode *OldLoad,
                                                    SDValue NewMemOp) {
   assert(isa<MemSDNode>(NewMemOp.getNode()) && "Expected a memop node");
   SDValue OldChain = SDValue(OldLoad, 1);
@@ -12879,17 +12879,21 @@ std::pair<SDValue, SDValue> SelectionDAG::UnrollVectorOverflowOp(
                         getBuildVector(NewOvVT, dl, OvScalars));
 }
 
-bool SelectionDAG::areNonVolatileConsecutiveLoads(LoadSDNode *LD,
-                                                  LoadSDNode *Base,
+bool SelectionDAG::areNonVolatileConsecutiveLoads(MemSDNode *LD,
+                                                  MemSDNode *Base,
                                                   unsigned Bytes,
                                                   int Dist) const {
   if (LD->isVolatile() || Base->isVolatile())
     return false;
-  // TODO: probably too restrictive for atomics, revisit
-  if (!LD->isSimple())
-    return false;
-  if (LD->isIndexed() || Base->isIndexed())
-    return false;
+  if (auto Ld = dyn_cast<LoadSDNode>(LD)) {
+    if (!Ld->isSimple())
+      return false;
+    if (Ld->isIndexed())
+      return false;
+  }
+  if (auto Ld = dyn_cast<LoadSDNode>(Base))
+    if (Ld->isIndexed())
+      return false;
   if (LD->getChain() != Base->getChain())
     return false;
   EVT VT = LD->getMemoryVT();
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp
index f2ab88851b780e..a19af64a796229 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp
@@ -194,8 +194,8 @@ bool BaseIndexOffset::contains(const SelectionDAG &DAG, int64_t BitSize,
   return false;
 }
 
-/// Parses tree in Ptr for base, index, offset addresses.
-static BaseIndexOffset matchLSNode(const LSBaseSDNode *N,
+template <typename T>
+static BaseIndexOffset matchSDNode(const T *N,
                                    const SelectionDAG &DAG) {
   SDValue Ptr = N->getBasePtr();
 
@@ -206,16 +206,18 @@ static BaseIndexOffset matchLSNode(const LSBaseSDNode *N,
   bool IsIndexSignExt = false;
 
   // pre-inc/pre-dec ops are components of EA.
-  if (N->getAddressingMode() == ISD::PRE_INC) {
-    if (auto *C = dyn_cast<ConstantSDNode>(N->getOffset()))
-      Offset += C->getSExtValue();
-    else // If unknown, give up now.
-      return BaseIndexOffset(SDValue(), SDValue(), 0, false);
-  } else if (N->getAddressingMode() == ISD::PRE_DEC) {
-    if (auto *C = dyn_cast<ConstantSDNode>(N->getOffset()))
-      Offset -= C->getSExtValue();
-    else // If unknown, give up now.
-      return BaseIndexOffset(SDValue(), SDValue(), 0, false);
+  if constexpr (std::is_same_v<T, LSBaseSDNode>) {
+    if (N->getAddressingMode() == ISD::PRE_INC) {
+      if (auto *C = dyn_cast<ConstantSDNode>(N->getOffset()))
+        Offset += C->getSExtValue();
+      else // If unknown, give up now.
+        return BaseIndexOffset(SDValue(), SDValue(), 0, false);
+    } else if (N->getAddressingMode() == ISD::PRE_DEC) {
+      if (auto *C = dyn_cast<ConstantSDNode>(N->getOffset()))
+        Offset -= C->getSExtValue();
+      else // If unknown, give up now.
+        return BaseIndexOffset(SDValue(), SDValue(), 0, false);
+    }
   }
 
   // Consume constant adds & ors with appropriate masking.
@@ -300,8 +302,10 @@ static BaseIndexOffset matchLSNode(const LSBaseSDNode *N,
 
 BaseIndexOffset BaseIndexOffset::match(const SDNode *N,
                                        const SelectionDAG &DAG) {
+  if (const auto *AN = dyn_cast<AtomicSDNode>(N))
+    return matchSDNode(AN, DAG);
   if (const auto *LS0 = dyn_cast<LSBaseSDNode>(N))
-    return matchLSNode(LS0, DAG);
+    return matchSDNode(LS0, DAG);
   if (const auto *LN = dyn_cast<LifetimeSDNode>(N)) {
     if (LN->hasOffset())
       return BaseIndexOffset(LN->getOperand(1), SDValue(), LN->getOffset(),
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 428e7a316d247b..7e784b2919c2a6 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -5218,7 +5218,11 @@ void SelectionDAGBuilder::visitAtomicLoad(const LoadInst &I) {
     L = DAG.getPtrExtOrTrunc(L, dl, VT);
 
   setValue(&I, L);
-  DAG.setRoot(OutChain);
+
+  if (VT.isVector())
+    DAG.setRoot(InChain);
+  else
+    DAG.setRoot(OutChain);
 }
 
 void SelectionDAGBuilder::visitAtomicStore(const StoreInst &I) {
diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index ba9ac2f21c7564..3b8f3dd1e9a5e9 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -7074,15 +7074,20 @@ static SDValue LowerAsSplatVectorLoad(SDValue SrcOp, MVT VT, const SDLoc &dl,
 }
 
 // Recurse to find a LoadSDNode source and the accumulated ByteOffest.
-static bool findEltLoadSrc(SDValue Elt, LoadSDNode *&Ld, int64_t &ByteOffset) {
-  if (ISD::isNON_EXTLoad(Elt.getNode())) {
-    auto *BaseLd = cast<LoadSDNode>(Elt);
-    if (!BaseLd->isSimple())
-      return false;
+static bool findEltLoadSrc(SDValue Elt, MemSDNode *&Ld, int64_t &ByteOffset) {
+  if (auto *BaseLd = dyn_cast<AtomicSDNode>(Elt)) {
     Ld = BaseLd;
     ByteOffset = 0;
     return true;
   }
+  else if (auto *BaseLd = dyn_cast<LoadSDNode>(Elt))
+    if (ISD::isNON_EXTLoad(Elt.getNode())) {
+      if (!BaseLd->isSimple())
+        return false;
+      Ld = BaseLd;
+      ByteOffset = 0;
+      return true;
+    }
 
   switch (Elt.getOpcode()) {
   case ISD::BITCAST:
@@ -7135,7 +7140,7 @@ static SDValue EltsFromConsecutiveLoads(EVT VT, ArrayRef<SDValue> Elts,
   APInt ZeroMask = APInt::getZero(NumElems);
   APInt UndefMask = APInt::getZero(NumElems);
 
-  SmallVector<LoadSDNode*, 8> Loads(NumElems, nullptr);
+  SmallVector<MemSDNode*, 8> Loads(NumElems, nullptr);
   SmallVector<int64_t, 8> ByteOffsets(NumElems, 0);
 
   // For each element in the initializer, see if we've found a load, zero or an
@@ -7185,7 +7190,7 @@ static SDValue EltsFromConsecutiveLoads(EVT VT, ArrayRef<SDValue> Elts,
   EVT EltBaseVT = EltBase.getValueType();
   assert(EltBaseVT.getSizeInBits() == EltBaseVT.getStoreSizeInBits() &&
          "Register/Memory size mismatch");
-  LoadSDNode *LDBase = Loads[FirstLoadedElt];
+  MemSDNode *LDBase = Loads[FirstLoadedElt];
   assert(LDBase && "Did not find base load for merging consecutive loads");
   unsigned BaseSizeInBits = EltBaseVT.getStoreSizeInBits();
   unsigned BaseSizeInBytes = BaseSizeInBits / 8;
@@ -7199,8 +7204,8 @@ static SDValue EltsFromConsecutiveLoads(EVT VT, ArrayRef<SDValue> Elts,
 
   // Check to see if the element's load is consecutive to the base load
   // or offset from a previous (already checked) load.
-  auto CheckConsecutiveLoad = [&](LoadSDNode *Base, int EltIdx) {
-    LoadSDNode *Ld = Loads[EltIdx];
+  auto CheckConsecutiveLoad = [&](MemSDNode *Base, int EltIdx) {
+    MemSDNode *Ld = Loads[EltIdx];
     int64_t ByteOffset = ByteOffsets[EltIdx];
     if (ByteOffset && (ByteOffset % BaseSizeInBytes) == 0) {
       int64_t BaseIdx = EltIdx - (ByteOffset / BaseSizeInBytes);
@@ -7228,7 +7233,7 @@ static SDValue EltsFromConsecutiveLoads(EVT VT, ArrayRef<SDValue> Elts,
     }
   }
 
-  auto CreateLoad = [&DAG, &DL, &Loads](EVT VT, LoadSDNode *LDBase) {
+  auto CreateLoad = [&DAG, &DL, &Loads](EVT VT, MemSDNode *LDBase) {
     auto MMOFlags = LDBase->getMemOperand()->getFlags();
     assert(LDBase->isSimple() &&
            "Cannot merge volatile or atomic loads.");
@@ -9271,8 +9276,9 @@ X86TargetLowering::LowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG) const {
   {
     SmallVector<SDValue, 64> Ops(Op->op_begin(), Op->op_begin() + NumElems);
     if (SDValue LD =
-            EltsFromConsecutiveLoads(VT, Ops, dl, DAG, Subtarget, false))
+            EltsFromConsecutiveLoads(VT, Ops, dl, DAG, Subtarget, false)) {
       return LD;
+    }
   }
 
   // If this is a splat of pairs of 32-bit elements, we can use a narrower
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 42b09558242934..08d0405345f573 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -205,63 +205,19 @@ define <2 x float> @atomic_vec2_float_align(ptr %x) {
 }
 
 define <2 x half> @atomic_vec2_half(ptr %x) {
-; CHECK3-LABEL: atomic_vec2_half:
-; CHECK3:       ## %bb.0:
-; CHECK3-NEXT:    movl (%rdi), %eax
-; CHECK3-NEXT:    pinsrw $0, %eax, %xmm0
-; CHECK3-NEXT:    shrl $16, %eax
-; CHECK3-NEXT:    pinsrw $0, %eax, %xmm1
-; CHECK3-NEXT:    punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
-; CHECK3-NEXT:    retq
-;
-; CHECK0-LABEL: atomic_vec2_half:
-; CHECK0:       ## %bb.0:
-; CHECK0-NEXT:    movl (%rdi), %eax
-; CHECK0-NEXT:    movl %eax, %ecx
-; CHECK0-NEXT:    shrl $16, %ecx
-; CHECK0-NEXT:    movw %cx, %dx
-; CHECK0-NEXT:    ## implicit-def: $ecx
-; CHECK0-NEXT:    movw %dx, %cx
-; CHECK0-NEXT:    ## implicit-def: $xmm1
-; CHECK0-NEXT:    pinsrw $0, %ecx, %xmm1
-; CHECK0-NEXT:    movw %ax, %cx
-; CHECK0-NEXT:    ## implicit-def: $eax
-; CHECK0-NEXT:    movw %cx, %ax
-; CHECK0-NEXT:    ## implicit-def: $xmm0
-; CHECK0-NEXT:    pinsrw $0, %eax, %xmm0
-; CHECK0-NEXT:    punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
-; CHECK0-NEXT:    retq
+; CHECK-LABEL: atomic_vec2_half:
+; CHECK:       ## %bb.0:
+; CHECK-NEXT:    movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; CHECK-NEXT:    retq
   %ret = load atomic <2 x half>, ptr %x acquire, align 4
   ret <2 x half> %ret
 }
 
 define <2 x bfloat> @atomic_vec2_bfloat(ptr %x) {
-; CHECK3-LABEL: atomic_vec2_bfloat:
-; CHECK3:       ## %bb.0:
-; CHECK3-NEXT:    movl (%rdi), %eax
-; CHECK3-NEXT:    pinsrw $0, %eax, %xmm0
-; CHECK3-NEXT:    shrl $16, %eax
-; CHECK3-NEXT:    pinsrw $0, %eax, %xmm1
-; CHECK3-NEXT:    punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
-; CHECK3-NEXT:    retq
-;
-; CHECK0-LABEL: atomic_vec2_bfloat:
-; CHECK0:       ## %bb.0:
-; CHECK0-NEXT:    movl (%rdi), %eax
-; CHECK0-NEXT:    movl %eax, %ecx
-; CHECK0-NEXT:    shrl $16, %ecx
-; CHECK0-NEXT:    ## kill: def $cx killed $cx killed $ecx
-; CHECK0-NEXT:    movw %ax, %dx
-; CHECK0-NEXT:    ## implicit-def: $eax
-; CHECK0-NEXT:    movw %dx, %ax
-; CHECK0-NEXT:    ## implicit-def: $xmm0
-; CHECK0-NEXT:    pinsrw $0, %eax, %xmm0
-; CHECK0-NEXT:    ## implicit-def: $eax
-; CHECK0-NEXT:    movw %cx, %ax
-; CHECK0-NEXT:    ## implicit-def: $xmm1
-; CHECK0-NEXT:    pinsrw $0, %eax, %xmm1
-; CHECK0-NEXT:    punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
-; CHECK0-NEXT:    retq
+; CHECK-LABEL: atomic_vec2_bfloat:
+; CHECK:       ## %bb.0:
+; CHECK-NEXT:    movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; CHECK-NEXT:    retq
   %ret = load atomic <2 x bfloat>, ptr %x acquire, align 4
   ret <2 x bfloat> %ret
 }
@@ -439,110 +395,19 @@ define <4 x i16> @atomic_vec4_i16(ptr %x) nounwind {
 }
 
 define <4 x half> @atomic_vec4_half(ptr %x) nounwind {
-; CHECK3-LABEL: atomic_vec4_half:
-; CHECK3:       ## %bb.0:
-; CHECK3-NEXT:    movq (%rdi), %rax
-; CHECK3-NEXT:    movl %eax, %ecx
-; CHECK3-NEXT:    shrl $16, %ecx
-; CHECK3-NEXT:    pinsrw $0, %ecx, %xmm1
-; CHECK3-NEXT:    pinsrw $0, %eax, %xmm0
-; CHECK3-NEXT:    movq %rax, %rcx
-; CHECK3-NEXT:    shrq $32, %rcx
-; CHECK3-NEXT:    pinsrw $0, %ecx, %xmm2
-; CHECK3-NEXT:    shrq $48, %rax
-; CHECK3-NEXT:    pinsrw $0, %eax, %xmm3
-; CHECK3-NEXT:    punpcklwd {{.*#+}} xmm2 = xmm2[0],xmm3[0],xmm2[1],xmm3[1],xmm2[2],xmm3[2],xmm2[3],xmm3[3]
-; CHECK3-NEXT:    punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
-; CHECK3-NEXT:    punpckldq {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
-; CHECK3-NEXT:    retq
-;
-; CHECK0-LABEL: atomic_vec4_half:
-; CHECK0:       ## %bb.0:
-; CHECK0-NEXT:    movq (%rdi), %rax
-; CHECK0-NEXT:    movl %eax, %ecx
-; CHECK0-NEXT:    shrl $16, %ecx
-; CHECK0-NEXT:    movw %cx, %dx
-; CHECK0-NEXT:    ## implicit-def: $ecx
-; CHECK0-NEXT:    movw %dx, %cx
-; CHECK0-NEXT:    ## implicit-def: $xmm2
-; CHECK0-NEXT:    pinsrw $0, %ecx, %xmm2
-; CHECK0-NEXT:    movw %ax, %dx
-; CHECK0-NEXT:    ## implicit-def: $ecx
-; CHECK0-NEXT:    movw %dx, %cx
-; CHECK0-NEXT:    ## implicit-def: $xmm0
-; CHECK0-NEXT:    pinsrw $0, %ecx, %xmm0
-; CHECK0-NEXT:    movq %rax, %rcx
-; CHECK0-NEXT:    shrq $32, %rcx
-; CHECK0-NEXT:    movw %cx, %dx
-; CHECK0-NEXT:    ## implicit-def: $ecx
-; CHECK0-NEXT:    movw %dx, %cx
-; CHECK0-NEXT:    ## implicit-def: $xmm1
-; CHECK0-NEXT:    pinsrw $0, %ecx, %xmm1
-; CHECK0-NEXT:    shrq $48, %rax
-; CHECK0-NEXT:    movw %ax, %cx
-; CHECK0-NEXT:    ## implicit-def: $eax
-; CHECK0-NEXT:    movw %cx, %ax
-; CHECK0-NEXT:    ## implicit-def: $xmm3
-; CHECK0-NEXT:    pinsrw $0, %eax, %xmm3
-; CHECK0-NEXT:    punpcklwd {{.*#+}} xmm1 = xmm1[0],xmm3[0],xmm1[1],xmm3[1],xmm1[2],xmm3[2],xmm1[3],xmm3[3]
-; CHECK0-NEXT:    punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3]
-; CHECK0-NEXT:    unpcklps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
-; CHECK0-NEXT:    retq
+; CHECK-LABEL: atomic_vec4_half:
+; CHECK:       ## %bb.0:
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK-NEXT:    retq
   %ret = load atomic <4 x half>, ptr %x acquire, align 8
   ret <4 x half> %ret
 }
 
 define <4 x bfloat> @atomic_vec4_bfloat(ptr %x) nounwind {
-; CHECK3-LABEL: atomic_vec4_bfloat:
-; CHECK3:       ## %bb.0:
-; CHECK3-NEXT:    movq (%rdi), %rax
-; CHECK3-NEXT:    movq %rax, %rcx
-; CHECK3-NEXT:    movq %rax, %rdx
-; CHECK3-NEXT:    pinsrw $0, %eax, %xmm0
-; CHECK3-NEXT:    ## kill: def $eax killed $eax killed $rax
-; CHECK3-NEXT:    shrl $16, %eax
-; CHECK3-NEXT:    shrq $32, %rcx
-; CHECK3-NEXT:    shrq $48, %rdx
-; CHECK3-NEXT:    pinsrw $0, %edx, %xmm1
-; CHECK3-NEXT:    pinsrw $0, %ecx, %xmm2
-; CHECK3-NEXT:    punpcklwd {{.*#+}} xmm2 = xmm2[0],xmm1[0],xmm2[1],xmm1[1],xmm2[2],xmm1[2],xmm2[3],xmm1[3]
-; CHECK3-NEXT:    pinsrw $0, %eax, %xmm1
-; CHECK3-NEXT:    punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
-; CHECK3-NEXT:    punpckldq {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
-; CHECK3-NEXT:    retq
-;
-; CHECK0-LABEL: atomic_vec4_bfloat:
-; CHECK0:       ## %bb.0:
-; CHECK0-NEXT:    movq (%rdi), %rax
-; CHECK0-NEXT:    movl %eax, %ecx
-; CHECK0-NEXT:    shrl $16, %ecx
-; CHECK0-NEXT:    ## kill: def $cx killed $cx killed $ecx
-; CHECK0-NEXT:    movw %ax, %dx
-; CHECK0-NEXT:    movq %rax, %rsi
-; CHECK0-NEXT:    shrq $32, %rsi
-; CHECK0-NEXT:    ## kill: def $si killed $si killed $rsi
-; CHECK0-NEXT:    shrq $48, %rax
-; CHECK0-NEXT:    movw %ax, %di
-; CHECK0-NEXT:    ## implicit-def: $eax
-; CHECK0-NEXT:    movw %di, %ax
-; CHECK0-NEXT:    ## implicit-def: $xmm0
-; CHECK0-NEXT:    pinsrw $0, %eax, %xmm0
-; CHECK0-NEXT:    ## implicit-def: $eax
-; CHECK0-NEXT:    movw %si, %ax
-; CHECK0-NEXT:    ## implicit-def: $xmm1
-; CHECK0-NEXT:    pinsrw $0, %eax, %xmm1
-; CHECK0-NEXT:    punpcklwd {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3]
-; CHECK0-NEXT:    ## implicit-def: $eax
-; CHECK0-NEXT:    movw %dx, %ax
-; CHECK0-NEXT:    ## implicit-def: $xmm0
-; CHECK0-NEXT:    pinsrw $0, %eax, %xmm0
-; CHECK0-NEXT:    ## implicit-def: $eax
-; CHECK0-NEXT:    movw %cx, %ax
-; CHECK0-NEXT:    ## implicit-def: $xmm2
-; CHECK0-NEXT:    pinsrw $0, %eax, %xmm2
-; CHECK0-NEXT:    punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3]
-; CHECK0-NEXT:    unpcklps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
-; CHECK0-NEXT:    retq
+; CHECK-LABEL: atomic_vec4_bfloat:
+; CHECK:       ## %bb.0:
+; CHECK-NEXT:    movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK-NEXT:    retq
   %ret = load atomic <4 x bfloat>, ptr %x acquire, align 8
   ret <4 x bfloat> %ret
 }

Copy link

github-actions bot commented Feb 2, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@jofrn jofrn force-pushed the users/jofrn/spr/main/b83937a8 branch from fbe99a2 to 1c10a4c Compare February 2, 2025 20:43
@jofrn jofrn force-pushed the users/jofrn/spr/main/b83937a8 branch from 1c10a4c to d7d7c3b Compare March 3, 2025 23:26
@RKSimon RKSimon requested a review from arsenm March 4, 2025 10:51
bool SelectionDAG::areNonVolatileConsecutiveLoads(LoadSDNode *LD,
LoadSDNode *Base,
bool SelectionDAG::areNonVolatileConsecutiveLoads(MemSDNode *LD,
MemSDNode *Base,
unsigned Bytes,
int Dist) const {
if (LD->isVolatile() || Base->isVolatile())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert/earlyout if either of LD/BASE aren't read-only?

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you should need to do this. I suspect this is a consequence of your vector legalization patch bypassing the legalization maps

@jofrn jofrn force-pushed the users/jofrn/spr/main/b83937a8 branch from d7d7c3b to ed8d4f8 Compare April 25, 2025 20:51
@jofrn jofrn force-pushed the users/jofrn/spr/main/3a045357 branch from 0d766dd to bf3f6b0 Compare April 25, 2025 20:51
@jofrn jofrn force-pushed the users/jofrn/spr/main/b83937a8 branch 2 times, most recently from de6a810 to b0d976b Compare April 26, 2025 07:57
@jofrn jofrn force-pushed the users/jofrn/spr/main/3a045357 branch from 7565845 to 1dd74be Compare April 26, 2025 07:57
@jofrn jofrn force-pushed the users/jofrn/spr/main/b83937a8 branch 3 times, most recently from 33a5ca1 to abb646b Compare May 1, 2025 06:45
@jofrn jofrn force-pushed the users/jofrn/spr/main/3a045357 branch from 2faa227 to 71d49aa Compare May 5, 2025 17:45
@jofrn jofrn force-pushed the users/jofrn/spr/main/b83937a8 branch 2 times, most recently from d108ebd to fc2debe Compare May 6, 2025 03:50
@jofrn jofrn force-pushed the users/jofrn/spr/main/3a045357 branch from 71d49aa to db5b862 Compare May 6, 2025 03:50
@jofrn jofrn force-pushed the users/jofrn/spr/main/b83937a8 branch from fc2debe to 6897d31 Compare May 6, 2025 04:15
@jofrn jofrn force-pushed the users/jofrn/spr/main/3a045357 branch from db5b862 to 28f6bf3 Compare May 6, 2025 04:15
@jofrn jofrn changed the base branch from main to users/jofrn/spr/main/3a045357 May 6, 2025 06:04
@jofrn jofrn force-pushed the users/jofrn/spr/main/3a045357 branch from 309d817 to cd4402a Compare May 6, 2025 15:04
@jofrn jofrn force-pushed the users/jofrn/spr/main/b83937a8 branch 2 times, most recently from 7a786f2 to 0890009 Compare May 7, 2025 12:53
@@ -5172,7 +5172,11 @@ void SelectionDAGBuilder::visitAtomicLoad(const LoadInst &I) {
L = DAG.getPtrExtOrTrunc(L, dl, VT);

setValue(&I, L);
DAG.setRoot(OutChain);

if (VT.isVector())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove this, the tests associated will each get one extra MOV. combineVZEXT_LOAD method is made to remove this.

@jofrn jofrn force-pushed the users/jofrn/spr/main/3a045357 branch from 0e4399d to 4783f04 Compare May 8, 2025 01:53
@jofrn jofrn force-pushed the users/jofrn/spr/main/b83937a8 branch 2 times, most recently from c5cfc13 to b50f678 Compare May 8, 2025 09:13
Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Title seems to not match the implementation anymore

@@ -7230,6 +7234,20 @@ static bool findEltLoadSrc(SDValue Elt, LoadSDNode *&Ld, int64_t &ByteOffset) {
}
}
break;
case ISD::EXTRACT_ELEMENT:
if (auto *IdxC = dyn_cast<ConstantSDNode>(Elt.getOperand(1))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure this must be a constant. But this could also be done separately, it's not related to the atomic

But also, can we avoid this by not using EXTRACT_ELEMENT in the first place? EXTRACT_ELEMENT has weird handling I've never understood where it's hardly used

Copy link
Contributor Author

@jofrn jofrn May 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure this must be a constant. But this could also be done separately, it's not related to the atomic

If it isn't a constant, shall we assert false or abort the transform? Right now it aborts the transform. If we know it always is and will be constant, I guess asserting is better. Separately as in another PR? There would be no associated test change as this is required for the optimization.

But also, can we avoid this by not using EXTRACT_ELEMENT in the first place? EXTRACT_ELEMENT has weird handling I've never understood where it's hardly used

Yes, if we use the ones already implemented here, then we will be able to discover the ByteOffset and it'll work. Why do we want to avoid EXTRACT_ELEMENT? It seems to work here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it isn't a constant, shall we assert false or abort the transform?

Ignore it as a possibility and just let it crash. No efforts should be made to support invalid constructs

Why do we want to avoid EXTRACT_ELEMENT? It seems to work here.

Of course it works, but seems poorly supported by optimizations. In particular the comment on it says "This is only for use before legalization, for values that will be broken into multiple registers.". This is use during legalization (dont' really know why we have this, or this fake restriction documented). You could get the same with just shifts.

@jofrn jofrn force-pushed the users/jofrn/spr/main/3a045357 branch from 4783f04 to 13a5e87 Compare May 8, 2025 23:38
@jofrn jofrn force-pushed the users/jofrn/spr/main/b83937a8 branch 2 times, most recently from 479a227 to 6f96cf8 Compare May 9, 2025 12:53
@jofrn jofrn force-pushed the users/jofrn/spr/main/3a045357 branch 2 times, most recently from 3e8de67 to b51658d Compare May 9, 2025 19:43
@jofrn jofrn force-pushed the users/jofrn/spr/main/b83937a8 branch from 6f96cf8 to 053d34b Compare May 9, 2025 19:43
@jofrn jofrn force-pushed the users/jofrn/spr/main/3a045357 branch from b51658d to 507069a Compare May 9, 2025 20:03
@jofrn jofrn force-pushed the users/jofrn/spr/main/b83937a8 branch 2 times, most recently from 3a1f677 to bf8fc80 Compare May 10, 2025 08:27
@jofrn jofrn force-pushed the users/jofrn/spr/main/3a045357 branch from 507069a to 40b0a4e Compare May 10, 2025 08:27
@jofrn jofrn force-pushed the users/jofrn/spr/main/b83937a8 branch from bf8fc80 to 9fe563b Compare May 11, 2025 00:05
@jofrn jofrn force-pushed the users/jofrn/spr/main/3a045357 branch from 40b0a4e to eda6b72 Compare May 11, 2025 00:05
llvm/lib/Target/X86/X86ISelLowering.cpp Outdated Show resolved Hide resolved
@jofrn jofrn force-pushed the users/jofrn/spr/main/3a045357 branch from eda6b72 to 4fccbd6 Compare May 12, 2025 05:34
@jofrn jofrn force-pushed the users/jofrn/spr/main/b83937a8 branch from 9fe563b to 684a542 Compare May 12, 2025 05:34
After splitting, all elements are created. The two components must
be found by looking at the upper and lower half of the value.
This change extends EltsFromConsecutiveLoads
to understand AtomicSDNode so that unused elements can be removed.

commit-id:b83937a8
@jofrn jofrn force-pushed the users/jofrn/spr/main/3a045357 branch from 4fccbd6 to f916347 Compare May 27, 2025 17:34
@jofrn jofrn force-pushed the users/jofrn/spr/main/b83937a8 branch from 684a542 to 7ba3e69 Compare May 27, 2025 17:34
@jofrn
Copy link
Contributor Author

jofrn commented Jun 1, 2025

Closing pull request: commit has gone away

@jofrn jofrn closed this Jun 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 llvm:SelectionDAG SelectionDAGISel as well
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.