-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[SelectionDAG][X86] Remove unused elements from atomic vector. #125432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-selectiondag @llvm/pr-subscribers-backend-x86 Author: None (jofrn) ChangesAfter splitting, all elements are created. The elements are Stack:
Full diff: https://github.com/llvm/llvm-project/pull/125432.diff 6 Files Affected:
diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h b/llvm/include/llvm/CodeGen/SelectionDAG.h
index 461c0c1ead16d2..bea5958ec0bba6 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAG.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAG.h
@@ -1840,7 +1840,7 @@ class SelectionDAG {
/// chain to the token factor. This ensures that the new memory node will have
/// the same relative memory dependency position as the old load. Returns the
/// new merged load chain.
- SDValue makeEquivalentMemoryOrdering(LoadSDNode *OldLoad, SDValue NewMemOp);
+ SDValue makeEquivalentMemoryOrdering(MemSDNode *OldLoad, SDValue NewMemOp);
/// Topological-sort the AllNodes list and a
/// assign a unique node id for each node in the DAG based on their
@@ -2264,7 +2264,7 @@ class SelectionDAG {
/// merged. Check that both are nonvolatile and if LD is loading
/// 'Bytes' bytes from a location that is 'Dist' units away from the
/// location that the 'Base' load is loading from.
- bool areNonVolatileConsecutiveLoads(LoadSDNode *LD, LoadSDNode *Base,
+ bool areNonVolatileConsecutiveLoads(MemSDNode *LD, MemSDNode *Base,
unsigned Bytes, int Dist) const;
/// Infer alignment of a load / store address. Return std::nullopt if it
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index b416c0efbbc4fc..5f274fabfe8d64 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -12167,7 +12167,7 @@ SDValue SelectionDAG::makeEquivalentMemoryOrdering(SDValue OldChain,
return TokenFactor;
}
-SDValue SelectionDAG::makeEquivalentMemoryOrdering(LoadSDNode *OldLoad,
+SDValue SelectionDAG::makeEquivalentMemoryOrdering(MemSDNode *OldLoad,
SDValue NewMemOp) {
assert(isa<MemSDNode>(NewMemOp.getNode()) && "Expected a memop node");
SDValue OldChain = SDValue(OldLoad, 1);
@@ -12879,17 +12879,21 @@ std::pair<SDValue, SDValue> SelectionDAG::UnrollVectorOverflowOp(
getBuildVector(NewOvVT, dl, OvScalars));
}
-bool SelectionDAG::areNonVolatileConsecutiveLoads(LoadSDNode *LD,
- LoadSDNode *Base,
+bool SelectionDAG::areNonVolatileConsecutiveLoads(MemSDNode *LD,
+ MemSDNode *Base,
unsigned Bytes,
int Dist) const {
if (LD->isVolatile() || Base->isVolatile())
return false;
- // TODO: probably too restrictive for atomics, revisit
- if (!LD->isSimple())
- return false;
- if (LD->isIndexed() || Base->isIndexed())
- return false;
+ if (auto Ld = dyn_cast<LoadSDNode>(LD)) {
+ if (!Ld->isSimple())
+ return false;
+ if (Ld->isIndexed())
+ return false;
+ }
+ if (auto Ld = dyn_cast<LoadSDNode>(Base))
+ if (Ld->isIndexed())
+ return false;
if (LD->getChain() != Base->getChain())
return false;
EVT VT = LD->getMemoryVT();
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp
index f2ab88851b780e..a19af64a796229 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp
@@ -194,8 +194,8 @@ bool BaseIndexOffset::contains(const SelectionDAG &DAG, int64_t BitSize,
return false;
}
-/// Parses tree in Ptr for base, index, offset addresses.
-static BaseIndexOffset matchLSNode(const LSBaseSDNode *N,
+template <typename T>
+static BaseIndexOffset matchSDNode(const T *N,
const SelectionDAG &DAG) {
SDValue Ptr = N->getBasePtr();
@@ -206,16 +206,18 @@ static BaseIndexOffset matchLSNode(const LSBaseSDNode *N,
bool IsIndexSignExt = false;
// pre-inc/pre-dec ops are components of EA.
- if (N->getAddressingMode() == ISD::PRE_INC) {
- if (auto *C = dyn_cast<ConstantSDNode>(N->getOffset()))
- Offset += C->getSExtValue();
- else // If unknown, give up now.
- return BaseIndexOffset(SDValue(), SDValue(), 0, false);
- } else if (N->getAddressingMode() == ISD::PRE_DEC) {
- if (auto *C = dyn_cast<ConstantSDNode>(N->getOffset()))
- Offset -= C->getSExtValue();
- else // If unknown, give up now.
- return BaseIndexOffset(SDValue(), SDValue(), 0, false);
+ if constexpr (std::is_same_v<T, LSBaseSDNode>) {
+ if (N->getAddressingMode() == ISD::PRE_INC) {
+ if (auto *C = dyn_cast<ConstantSDNode>(N->getOffset()))
+ Offset += C->getSExtValue();
+ else // If unknown, give up now.
+ return BaseIndexOffset(SDValue(), SDValue(), 0, false);
+ } else if (N->getAddressingMode() == ISD::PRE_DEC) {
+ if (auto *C = dyn_cast<ConstantSDNode>(N->getOffset()))
+ Offset -= C->getSExtValue();
+ else // If unknown, give up now.
+ return BaseIndexOffset(SDValue(), SDValue(), 0, false);
+ }
}
// Consume constant adds & ors with appropriate masking.
@@ -300,8 +302,10 @@ static BaseIndexOffset matchLSNode(const LSBaseSDNode *N,
BaseIndexOffset BaseIndexOffset::match(const SDNode *N,
const SelectionDAG &DAG) {
+ if (const auto *AN = dyn_cast<AtomicSDNode>(N))
+ return matchSDNode(AN, DAG);
if (const auto *LS0 = dyn_cast<LSBaseSDNode>(N))
- return matchLSNode(LS0, DAG);
+ return matchSDNode(LS0, DAG);
if (const auto *LN = dyn_cast<LifetimeSDNode>(N)) {
if (LN->hasOffset())
return BaseIndexOffset(LN->getOperand(1), SDValue(), LN->getOffset(),
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 428e7a316d247b..7e784b2919c2a6 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -5218,7 +5218,11 @@ void SelectionDAGBuilder::visitAtomicLoad(const LoadInst &I) {
L = DAG.getPtrExtOrTrunc(L, dl, VT);
setValue(&I, L);
- DAG.setRoot(OutChain);
+
+ if (VT.isVector())
+ DAG.setRoot(InChain);
+ else
+ DAG.setRoot(OutChain);
}
void SelectionDAGBuilder::visitAtomicStore(const StoreInst &I) {
diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index ba9ac2f21c7564..3b8f3dd1e9a5e9 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -7074,15 +7074,20 @@ static SDValue LowerAsSplatVectorLoad(SDValue SrcOp, MVT VT, const SDLoc &dl,
}
// Recurse to find a LoadSDNode source and the accumulated ByteOffest.
-static bool findEltLoadSrc(SDValue Elt, LoadSDNode *&Ld, int64_t &ByteOffset) {
- if (ISD::isNON_EXTLoad(Elt.getNode())) {
- auto *BaseLd = cast<LoadSDNode>(Elt);
- if (!BaseLd->isSimple())
- return false;
+static bool findEltLoadSrc(SDValue Elt, MemSDNode *&Ld, int64_t &ByteOffset) {
+ if (auto *BaseLd = dyn_cast<AtomicSDNode>(Elt)) {
Ld = BaseLd;
ByteOffset = 0;
return true;
}
+ else if (auto *BaseLd = dyn_cast<LoadSDNode>(Elt))
+ if (ISD::isNON_EXTLoad(Elt.getNode())) {
+ if (!BaseLd->isSimple())
+ return false;
+ Ld = BaseLd;
+ ByteOffset = 0;
+ return true;
+ }
switch (Elt.getOpcode()) {
case ISD::BITCAST:
@@ -7135,7 +7140,7 @@ static SDValue EltsFromConsecutiveLoads(EVT VT, ArrayRef<SDValue> Elts,
APInt ZeroMask = APInt::getZero(NumElems);
APInt UndefMask = APInt::getZero(NumElems);
- SmallVector<LoadSDNode*, 8> Loads(NumElems, nullptr);
+ SmallVector<MemSDNode*, 8> Loads(NumElems, nullptr);
SmallVector<int64_t, 8> ByteOffsets(NumElems, 0);
// For each element in the initializer, see if we've found a load, zero or an
@@ -7185,7 +7190,7 @@ static SDValue EltsFromConsecutiveLoads(EVT VT, ArrayRef<SDValue> Elts,
EVT EltBaseVT = EltBase.getValueType();
assert(EltBaseVT.getSizeInBits() == EltBaseVT.getStoreSizeInBits() &&
"Register/Memory size mismatch");
- LoadSDNode *LDBase = Loads[FirstLoadedElt];
+ MemSDNode *LDBase = Loads[FirstLoadedElt];
assert(LDBase && "Did not find base load for merging consecutive loads");
unsigned BaseSizeInBits = EltBaseVT.getStoreSizeInBits();
unsigned BaseSizeInBytes = BaseSizeInBits / 8;
@@ -7199,8 +7204,8 @@ static SDValue EltsFromConsecutiveLoads(EVT VT, ArrayRef<SDValue> Elts,
// Check to see if the element's load is consecutive to the base load
// or offset from a previous (already checked) load.
- auto CheckConsecutiveLoad = [&](LoadSDNode *Base, int EltIdx) {
- LoadSDNode *Ld = Loads[EltIdx];
+ auto CheckConsecutiveLoad = [&](MemSDNode *Base, int EltIdx) {
+ MemSDNode *Ld = Loads[EltIdx];
int64_t ByteOffset = ByteOffsets[EltIdx];
if (ByteOffset && (ByteOffset % BaseSizeInBytes) == 0) {
int64_t BaseIdx = EltIdx - (ByteOffset / BaseSizeInBytes);
@@ -7228,7 +7233,7 @@ static SDValue EltsFromConsecutiveLoads(EVT VT, ArrayRef<SDValue> Elts,
}
}
- auto CreateLoad = [&DAG, &DL, &Loads](EVT VT, LoadSDNode *LDBase) {
+ auto CreateLoad = [&DAG, &DL, &Loads](EVT VT, MemSDNode *LDBase) {
auto MMOFlags = LDBase->getMemOperand()->getFlags();
assert(LDBase->isSimple() &&
"Cannot merge volatile or atomic loads.");
@@ -9271,8 +9276,9 @@ X86TargetLowering::LowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG) const {
{
SmallVector<SDValue, 64> Ops(Op->op_begin(), Op->op_begin() + NumElems);
if (SDValue LD =
- EltsFromConsecutiveLoads(VT, Ops, dl, DAG, Subtarget, false))
+ EltsFromConsecutiveLoads(VT, Ops, dl, DAG, Subtarget, false)) {
return LD;
+ }
}
// If this is a splat of pairs of 32-bit elements, we can use a narrower
diff --git a/llvm/test/CodeGen/X86/atomic-load-store.ll b/llvm/test/CodeGen/X86/atomic-load-store.ll
index 42b09558242934..08d0405345f573 100644
--- a/llvm/test/CodeGen/X86/atomic-load-store.ll
+++ b/llvm/test/CodeGen/X86/atomic-load-store.ll
@@ -205,63 +205,19 @@ define <2 x float> @atomic_vec2_float_align(ptr %x) {
}
define <2 x half> @atomic_vec2_half(ptr %x) {
-; CHECK3-LABEL: atomic_vec2_half:
-; CHECK3: ## %bb.0:
-; CHECK3-NEXT: movl (%rdi), %eax
-; CHECK3-NEXT: pinsrw $0, %eax, %xmm0
-; CHECK3-NEXT: shrl $16, %eax
-; CHECK3-NEXT: pinsrw $0, %eax, %xmm1
-; CHECK3-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
-; CHECK3-NEXT: retq
-;
-; CHECK0-LABEL: atomic_vec2_half:
-; CHECK0: ## %bb.0:
-; CHECK0-NEXT: movl (%rdi), %eax
-; CHECK0-NEXT: movl %eax, %ecx
-; CHECK0-NEXT: shrl $16, %ecx
-; CHECK0-NEXT: movw %cx, %dx
-; CHECK0-NEXT: ## implicit-def: $ecx
-; CHECK0-NEXT: movw %dx, %cx
-; CHECK0-NEXT: ## implicit-def: $xmm1
-; CHECK0-NEXT: pinsrw $0, %ecx, %xmm1
-; CHECK0-NEXT: movw %ax, %cx
-; CHECK0-NEXT: ## implicit-def: $eax
-; CHECK0-NEXT: movw %cx, %ax
-; CHECK0-NEXT: ## implicit-def: $xmm0
-; CHECK0-NEXT: pinsrw $0, %eax, %xmm0
-; CHECK0-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
-; CHECK0-NEXT: retq
+; CHECK-LABEL: atomic_vec2_half:
+; CHECK: ## %bb.0:
+; CHECK-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; CHECK-NEXT: retq
%ret = load atomic <2 x half>, ptr %x acquire, align 4
ret <2 x half> %ret
}
define <2 x bfloat> @atomic_vec2_bfloat(ptr %x) {
-; CHECK3-LABEL: atomic_vec2_bfloat:
-; CHECK3: ## %bb.0:
-; CHECK3-NEXT: movl (%rdi), %eax
-; CHECK3-NEXT: pinsrw $0, %eax, %xmm0
-; CHECK3-NEXT: shrl $16, %eax
-; CHECK3-NEXT: pinsrw $0, %eax, %xmm1
-; CHECK3-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
-; CHECK3-NEXT: retq
-;
-; CHECK0-LABEL: atomic_vec2_bfloat:
-; CHECK0: ## %bb.0:
-; CHECK0-NEXT: movl (%rdi), %eax
-; CHECK0-NEXT: movl %eax, %ecx
-; CHECK0-NEXT: shrl $16, %ecx
-; CHECK0-NEXT: ## kill: def $cx killed $cx killed $ecx
-; CHECK0-NEXT: movw %ax, %dx
-; CHECK0-NEXT: ## implicit-def: $eax
-; CHECK0-NEXT: movw %dx, %ax
-; CHECK0-NEXT: ## implicit-def: $xmm0
-; CHECK0-NEXT: pinsrw $0, %eax, %xmm0
-; CHECK0-NEXT: ## implicit-def: $eax
-; CHECK0-NEXT: movw %cx, %ax
-; CHECK0-NEXT: ## implicit-def: $xmm1
-; CHECK0-NEXT: pinsrw $0, %eax, %xmm1
-; CHECK0-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
-; CHECK0-NEXT: retq
+; CHECK-LABEL: atomic_vec2_bfloat:
+; CHECK: ## %bb.0:
+; CHECK-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; CHECK-NEXT: retq
%ret = load atomic <2 x bfloat>, ptr %x acquire, align 4
ret <2 x bfloat> %ret
}
@@ -439,110 +395,19 @@ define <4 x i16> @atomic_vec4_i16(ptr %x) nounwind {
}
define <4 x half> @atomic_vec4_half(ptr %x) nounwind {
-; CHECK3-LABEL: atomic_vec4_half:
-; CHECK3: ## %bb.0:
-; CHECK3-NEXT: movq (%rdi), %rax
-; CHECK3-NEXT: movl %eax, %ecx
-; CHECK3-NEXT: shrl $16, %ecx
-; CHECK3-NEXT: pinsrw $0, %ecx, %xmm1
-; CHECK3-NEXT: pinsrw $0, %eax, %xmm0
-; CHECK3-NEXT: movq %rax, %rcx
-; CHECK3-NEXT: shrq $32, %rcx
-; CHECK3-NEXT: pinsrw $0, %ecx, %xmm2
-; CHECK3-NEXT: shrq $48, %rax
-; CHECK3-NEXT: pinsrw $0, %eax, %xmm3
-; CHECK3-NEXT: punpcklwd {{.*#+}} xmm2 = xmm2[0],xmm3[0],xmm2[1],xmm3[1],xmm2[2],xmm3[2],xmm2[3],xmm3[3]
-; CHECK3-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
-; CHECK3-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
-; CHECK3-NEXT: retq
-;
-; CHECK0-LABEL: atomic_vec4_half:
-; CHECK0: ## %bb.0:
-; CHECK0-NEXT: movq (%rdi), %rax
-; CHECK0-NEXT: movl %eax, %ecx
-; CHECK0-NEXT: shrl $16, %ecx
-; CHECK0-NEXT: movw %cx, %dx
-; CHECK0-NEXT: ## implicit-def: $ecx
-; CHECK0-NEXT: movw %dx, %cx
-; CHECK0-NEXT: ## implicit-def: $xmm2
-; CHECK0-NEXT: pinsrw $0, %ecx, %xmm2
-; CHECK0-NEXT: movw %ax, %dx
-; CHECK0-NEXT: ## implicit-def: $ecx
-; CHECK0-NEXT: movw %dx, %cx
-; CHECK0-NEXT: ## implicit-def: $xmm0
-; CHECK0-NEXT: pinsrw $0, %ecx, %xmm0
-; CHECK0-NEXT: movq %rax, %rcx
-; CHECK0-NEXT: shrq $32, %rcx
-; CHECK0-NEXT: movw %cx, %dx
-; CHECK0-NEXT: ## implicit-def: $ecx
-; CHECK0-NEXT: movw %dx, %cx
-; CHECK0-NEXT: ## implicit-def: $xmm1
-; CHECK0-NEXT: pinsrw $0, %ecx, %xmm1
-; CHECK0-NEXT: shrq $48, %rax
-; CHECK0-NEXT: movw %ax, %cx
-; CHECK0-NEXT: ## implicit-def: $eax
-; CHECK0-NEXT: movw %cx, %ax
-; CHECK0-NEXT: ## implicit-def: $xmm3
-; CHECK0-NEXT: pinsrw $0, %eax, %xmm3
-; CHECK0-NEXT: punpcklwd {{.*#+}} xmm1 = xmm1[0],xmm3[0],xmm1[1],xmm3[1],xmm1[2],xmm3[2],xmm1[3],xmm3[3]
-; CHECK0-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3]
-; CHECK0-NEXT: unpcklps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
-; CHECK0-NEXT: retq
+; CHECK-LABEL: atomic_vec4_half:
+; CHECK: ## %bb.0:
+; CHECK-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK-NEXT: retq
%ret = load atomic <4 x half>, ptr %x acquire, align 8
ret <4 x half> %ret
}
define <4 x bfloat> @atomic_vec4_bfloat(ptr %x) nounwind {
-; CHECK3-LABEL: atomic_vec4_bfloat:
-; CHECK3: ## %bb.0:
-; CHECK3-NEXT: movq (%rdi), %rax
-; CHECK3-NEXT: movq %rax, %rcx
-; CHECK3-NEXT: movq %rax, %rdx
-; CHECK3-NEXT: pinsrw $0, %eax, %xmm0
-; CHECK3-NEXT: ## kill: def $eax killed $eax killed $rax
-; CHECK3-NEXT: shrl $16, %eax
-; CHECK3-NEXT: shrq $32, %rcx
-; CHECK3-NEXT: shrq $48, %rdx
-; CHECK3-NEXT: pinsrw $0, %edx, %xmm1
-; CHECK3-NEXT: pinsrw $0, %ecx, %xmm2
-; CHECK3-NEXT: punpcklwd {{.*#+}} xmm2 = xmm2[0],xmm1[0],xmm2[1],xmm1[1],xmm2[2],xmm1[2],xmm2[3],xmm1[3]
-; CHECK3-NEXT: pinsrw $0, %eax, %xmm1
-; CHECK3-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
-; CHECK3-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
-; CHECK3-NEXT: retq
-;
-; CHECK0-LABEL: atomic_vec4_bfloat:
-; CHECK0: ## %bb.0:
-; CHECK0-NEXT: movq (%rdi), %rax
-; CHECK0-NEXT: movl %eax, %ecx
-; CHECK0-NEXT: shrl $16, %ecx
-; CHECK0-NEXT: ## kill: def $cx killed $cx killed $ecx
-; CHECK0-NEXT: movw %ax, %dx
-; CHECK0-NEXT: movq %rax, %rsi
-; CHECK0-NEXT: shrq $32, %rsi
-; CHECK0-NEXT: ## kill: def $si killed $si killed $rsi
-; CHECK0-NEXT: shrq $48, %rax
-; CHECK0-NEXT: movw %ax, %di
-; CHECK0-NEXT: ## implicit-def: $eax
-; CHECK0-NEXT: movw %di, %ax
-; CHECK0-NEXT: ## implicit-def: $xmm0
-; CHECK0-NEXT: pinsrw $0, %eax, %xmm0
-; CHECK0-NEXT: ## implicit-def: $eax
-; CHECK0-NEXT: movw %si, %ax
-; CHECK0-NEXT: ## implicit-def: $xmm1
-; CHECK0-NEXT: pinsrw $0, %eax, %xmm1
-; CHECK0-NEXT: punpcklwd {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3]
-; CHECK0-NEXT: ## implicit-def: $eax
-; CHECK0-NEXT: movw %dx, %ax
-; CHECK0-NEXT: ## implicit-def: $xmm0
-; CHECK0-NEXT: pinsrw $0, %eax, %xmm0
-; CHECK0-NEXT: ## implicit-def: $eax
-; CHECK0-NEXT: movw %cx, %ax
-; CHECK0-NEXT: ## implicit-def: $xmm2
-; CHECK0-NEXT: pinsrw $0, %eax, %xmm2
-; CHECK0-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3]
-; CHECK0-NEXT: unpcklps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
-; CHECK0-NEXT: retq
+; CHECK-LABEL: atomic_vec4_bfloat:
+; CHECK: ## %bb.0:
+; CHECK-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
+; CHECK-NEXT: retq
%ret = load atomic <4 x bfloat>, ptr %x acquire, align 8
ret <4 x bfloat> %ret
}
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
fbe99a2
to
1c10a4c
Compare
1c10a4c
to
d7d7c3b
Compare
bool SelectionDAG::areNonVolatileConsecutiveLoads(LoadSDNode *LD, | ||
LoadSDNode *Base, | ||
bool SelectionDAG::areNonVolatileConsecutiveLoads(MemSDNode *LD, | ||
MemSDNode *Base, | ||
unsigned Bytes, | ||
int Dist) const { | ||
if (LD->isVolatile() || Base->isVolatile()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert/earlyout if either of LD/BASE aren't read-only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you should need to do this. I suspect this is a consequence of your vector legalization patch bypassing the legalization maps
d7d7c3b
to
ed8d4f8
Compare
0d766dd
to
bf3f6b0
Compare
de6a810
to
b0d976b
Compare
7565845
to
1dd74be
Compare
33a5ca1
to
abb646b
Compare
2faa227
to
71d49aa
Compare
d108ebd
to
fc2debe
Compare
71d49aa
to
db5b862
Compare
fc2debe
to
6897d31
Compare
db5b862
to
28f6bf3
Compare
309d817
to
cd4402a
Compare
7a786f2
to
0890009
Compare
@@ -5172,7 +5172,11 @@ void SelectionDAGBuilder::visitAtomicLoad(const LoadInst &I) { | ||
L = DAG.getPtrExtOrTrunc(L, dl, VT); | ||
|
||
setValue(&I, L); | ||
DAG.setRoot(OutChain); | ||
|
||
if (VT.isVector()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we remove this, the tests associated will each get one extra MOV. combineVZEXT_LOAD method is made to remove this.
0e4399d
to
4783f04
Compare
c5cfc13
to
b50f678
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Title seems to not match the implementation anymore
@@ -7230,6 +7234,20 @@ static bool findEltLoadSrc(SDValue Elt, LoadSDNode *&Ld, int64_t &ByteOffset) { | ||
} | ||
} | ||
break; | ||
case ISD::EXTRACT_ELEMENT: | ||
if (auto *IdxC = dyn_cast<ConstantSDNode>(Elt.getOperand(1))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty sure this must be a constant. But this could also be done separately, it's not related to the atomic
But also, can we avoid this by not using EXTRACT_ELEMENT in the first place? EXTRACT_ELEMENT has weird handling I've never understood where it's hardly used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty sure this must be a constant. But this could also be done separately, it's not related to the atomic
If it isn't a constant, shall we assert false or abort the transform? Right now it aborts the transform. If we know it always is and will be constant, I guess asserting is better. Separately as in another PR? There would be no associated test change as this is required for the optimization.
But also, can we avoid this by not using EXTRACT_ELEMENT in the first place? EXTRACT_ELEMENT has weird handling I've never understood where it's hardly used
Yes, if we use the ones already implemented here, then we will be able to discover the ByteOffset
and it'll work. Why do we want to avoid EXTRACT_ELEMENT? It seems to work here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it isn't a constant, shall we assert false or abort the transform?
Ignore it as a possibility and just let it crash. No efforts should be made to support invalid constructs
Why do we want to avoid EXTRACT_ELEMENT? It seems to work here.
Of course it works, but seems poorly supported by optimizations. In particular the comment on it says "This is only for use before legalization, for values that will be broken into multiple registers.". This is use during legalization (dont' really know why we have this, or this fake restriction documented). You could get the same with just shifts.
4783f04
to
13a5e87
Compare
479a227
to
6f96cf8
Compare
3e8de67
to
b51658d
Compare
6f96cf8
to
053d34b
Compare
b51658d
to
507069a
Compare
3a1f677
to
bf8fc80
Compare
507069a
to
40b0a4e
Compare
bf8fc80
to
9fe563b
Compare
40b0a4e
to
eda6b72
Compare
eda6b72
to
4fccbd6
Compare
9fe563b
to
684a542
Compare
After splitting, all elements are created. The two components must be found by looking at the upper and lower half of the value. This change extends EltsFromConsecutiveLoads to understand AtomicSDNode so that unused elements can be removed. commit-id:b83937a8
4fccbd6
to
f916347
Compare
684a542
to
7ba3e69
Compare
Closing pull request: commit has gone away |
After splitting, all elements are created. The two components must
be found by looking at the upper and lower half of the value.
This change extends EltsFromConsecutiveLoads
to understand AtomicSDNode so that unused elements can be removed.
Stack: