-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[LV] Initial support for stores in early exit loops #137774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
620f879
d1e6062
5ea9208
de34099
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -407,6 +407,13 @@ class LoopVectorizationLegality { | |
return hasUncountableEarlyExit() ? getUncountableEdge()->second : nullptr; | ||
} | ||
|
||
/// Returns true if this is an early exit loop containing a store. | ||
bool isConditionCopyRequired() const { return EarlyExitLoad.has_value(); } | ||
|
||
/// Returns the load instruction, if any, nearest to an uncountable early | ||
/// exit. | ||
std::optional<LoadInst *> getEarlyExitLoad() const { return EarlyExitLoad; } | ||
|
||
/// Return true if there is store-load forwarding dependencies. | ||
bool isSafeForAnyStoreLoadForwardDistances() const { | ||
return LAI->getDepChecker().isSafeForAnyStoreLoadForwardDistances(); | ||
|
@@ -536,6 +543,12 @@ class LoopVectorizationLegality { | |
/// additional cases safely. | ||
bool isVectorizableEarlyExitLoop(); | ||
|
||
/// Clears any current early exit data gathered if a check failed. | ||
void clearEarlyExitData() { | ||
UncountableEdge = std::nullopt; | ||
EarlyExitLoad = std::nullopt; | ||
} | ||
|
||
/// Return true if all of the instructions in the block can be speculatively | ||
/// executed, and record the loads/stores that require masking. | ||
/// \p SafePtrs is a list of addresses that are known to be legal and we know | ||
|
@@ -654,6 +667,15 @@ class LoopVectorizationLegality { | |
/// Keep track of the loop edge to an uncountable exit, comprising a pair | ||
/// of (Exiting, Exit) blocks, if there is exactly one early exit. | ||
std::optional<std::pair<BasicBlock *, BasicBlock *>> UncountableEdge; | ||
|
||
/// The load used to determine an uncountable early-exit condition. This is | ||
/// only used to allow further analysis in canVectorizeMemory if we found | ||
/// what looks like a valid early exit loop with store beforehand. | ||
/// | ||
/// Also indicates that we will need to copy the early exit condition into | ||
/// the vector preheader, as we will need to mask some operations in | ||
/// the loop (e.g. stores) or bail out to a scalar loop. | ||
std::optional<LoadInst *> EarlyExitLoad; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What happens in the case of two loads?, i.e. %ld1 = load i8, ... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This would fail the check that the second operand to the compare is loop invariant. There's a backup check in the vplan transform that would reject it at that point too. In future, it would be nice to increase the number of cases handled, but I've tried to keep it simple for now and just come up with a list of extra cases to handle later. Some of the possible future work:
|
||
}; | ||
|
||
} // namespace llvm | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,6 +17,7 @@ | |
#include "llvm/Transforms/Vectorize/LoopVectorizationLegality.h" | ||
#include "llvm/Analysis/Loads.h" | ||
#include "llvm/Analysis/LoopInfo.h" | ||
#include "llvm/Analysis/MustExecute.h" | ||
#include "llvm/Analysis/OptimizationRemarkEmitter.h" | ||
#include "llvm/Analysis/ScalarEvolutionExpressions.h" | ||
#include "llvm/Analysis/TargetLibraryInfo.h" | ||
|
@@ -1209,7 +1210,33 @@ bool LoopVectorizationLegality::canVectorizeMemory() { | |
}); | ||
} | ||
|
||
if (!LAI->canVectorizeMemory()) | ||
if (LAI->canVectorizeMemory()) { | ||
// FIXME: Remove or reduce this restriction. We're in a bit of an odd spot | ||
// since we're (potentially) doing the load out of its normal order | ||
// in the loop and that may throw off dependency checking. | ||
// A forward dependency should be fine, but a backwards dep may not | ||
// be even if LAA thinks it is due to performing the load for the | ||
// vector iteration i+1 in vector iteration i. | ||
if (isConditionCopyRequired()) { | ||
const MemoryDepChecker &DepChecker = LAI->getDepChecker(); | ||
const auto *Deps = DepChecker.getDependences(); | ||
|
||
for (const MemoryDepChecker::Dependence &Dep : *Deps) { | ||
if (Dep.getDestination(DepChecker) == EarlyExitLoad || | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we have to worry about other loads too? The header block could have several loads, with only one of them contributing to the branch condition. |
||
Dep.getSource(DepChecker) == EarlyExitLoad) { | ||
// Refine language a little? This currently only applies when a store | ||
// is present in the early exit loop. | ||
reportVectorizationFailure( | ||
"No dependencies allowed for early exit condition load", | ||
"Early exit condition loads may not have a dependence with " | ||
"another" | ||
" memory operation.", | ||
"CantVectorizeStoreToLoopInvariantAddress", ORE, TheLoop); | ||
return false; | ||
} | ||
} | ||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What happens in the else case? Looks like we fall through to:
and so we may still treat the loop as legal. Worth adding a test for this case I think. |
||
} else if (!isConditionCopyRequired()) | ||
return canVectorizeIndirectUnsafeDependences(); | ||
|
||
if (LAI->hasLoadStoreDependenceInvolvingLoopInvariantAddress()) { | ||
|
@@ -1708,16 +1735,31 @@ bool LoopVectorizationLegality::isVectorizableEarlyExitLoop() { | |
} | ||
}; | ||
|
||
bool HasStore = false; | ||
for (auto *BB : TheLoop->blocks()) | ||
for (auto &I : *BB) { | ||
if (StoreInst *SI = dyn_cast<StoreInst>(&I)) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does it matter if the store is prior to or after the early exit? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No. Since this prototype bails out to a scalar loop if the next vector iteration would exit partway through, it doesn't matter where the store is in the loop. If/when we do tail-folding for these loops, we will need to generate different masks for state-changing operations before or after an exit. |
||
HasStore = true; | ||
if (SI->isSimple()) | ||
continue; | ||
|
||
reportVectorizationFailure( | ||
"Complex writes to memory unsupported in early exit loops", | ||
"Cannot vectorize early exit loop with complex writes to memory", | ||
"WritesInEarlyExitLoop", ORE, TheLoop); | ||
return false; | ||
} | ||
|
||
if (I.mayWriteToMemory()) { | ||
// We don't support writes to memory. | ||
reportVectorizationFailure( | ||
"Writes to memory unsupported in early exit loops", | ||
"Cannot vectorize early exit loop with writes to memory", | ||
"Complex writes to memory unsupported in early exit loops", | ||
"Cannot vectorize early exit loop with complex writes to memory", | ||
"WritesInEarlyExitLoop", ORE, TheLoop); | ||
return false; | ||
} else if (!IsSafeOperation(&I)) { | ||
} | ||
|
||
if (!IsSafeOperation(&I)) { | ||
reportVectorizationFailure("Early exit loop contains operations that " | ||
"cannot be speculatively executed", | ||
"UnsafeOperationsEarlyExitLoop", ORE, | ||
|
@@ -1732,13 +1774,73 @@ bool LoopVectorizationLegality::isVectorizableEarlyExitLoop() { | |
|
||
// TODO: Handle loops that may fault. | ||
Predicates.clear(); | ||
if (!isDereferenceableReadOnlyLoop(TheLoop, PSE.getSE(), DT, AC, | ||
&Predicates)) { | ||
reportVectorizationFailure( | ||
"Loop may fault", | ||
"Cannot vectorize potentially faulting early exit loop", | ||
"PotentiallyFaultingEarlyExitLoop", ORE, TheLoop); | ||
return false; | ||
|
||
std::optional<LoadInst *> EELoad; | ||
if (HasStore) { | ||
// Record load for analysis by isDereferenceableAndAlignedInLoop | ||
// and later by dependence analysis. | ||
if (BranchInst *Br = dyn_cast<BranchInst>( | ||
SingleUncountableEdge->first->getTerminator())) { | ||
// FIXME: Handle exit conditions with multiple users, more complex exit | ||
// conditions than br(icmp(load, loop_inv)). | ||
ICmpInst *Cmp = dyn_cast<ICmpInst>(Br->getCondition()); | ||
if (Cmp && Cmp->hasOneUse() && | ||
TheLoop->isLoopInvariant(Cmp->getOperand(1))) { | ||
LoadInst *Load = dyn_cast<LoadInst>(Cmp->getOperand(0)); | ||
if (Load && Load->hasOneUse() && !TheLoop->isLoopInvariant(Load)) { | ||
if (isDereferenceableAndAlignedInLoop(Load, TheLoop, *PSE.getSE(), | ||
*DT, AC, &Predicates)) { | ||
ICFLoopSafetyInfo SafetyInfo; | ||
SafetyInfo.computeLoopSafetyInfo(TheLoop); | ||
// FIXME: We may have multiple levels of conditional loads, so will | ||
// need to improve on outright rejection at some point. | ||
if (SafetyInfo.isGuaranteedToExecute(*Load, DT, TheLoop)) { | ||
EELoad = Load; | ||
} else { | ||
LLVM_DEBUG( | ||
dbgs() | ||
<< "Early exit condition load not guaranteed to execute.\n"); | ||
reportVectorizationFailure( | ||
"Early exit condition load not guaranteed to execute", | ||
"Cannot vectorize early exit loop when condition load is not " | ||
"guaranteed to execute", | ||
"EarlyExitLoadNotGuaranteed", ORE, TheLoop); | ||
} | ||
} else { | ||
LLVM_DEBUG(dbgs() | ||
<< "Early exit condition load potentially unsafe.\n"); | ||
reportVectorizationFailure( | ||
"Uncounted loop condition not known safe", | ||
"Cannot vectorize early exit loop with " | ||
"possibly unsafe condition load", | ||
"PotentiallyFaultingEarlyExitLoop", ORE, TheLoop); | ||
return false; | ||
} | ||
} | ||
} | ||
} | ||
|
||
if (!EELoad.has_value()) { | ||
LLVM_DEBUG(dbgs() << "Found early exit store but no condition load.\n"); | ||
reportVectorizationFailure( | ||
"Early exit loop with store but no condition load", | ||
"Cannot vectorize early exit loop with store but no condition load", | ||
"NoConditionLoadForEarlyExitLoop", ORE, TheLoop); | ||
return false; | ||
} | ||
} else { | ||
// Read-only loop. | ||
// FIXME: as with the loops with stores, only the loads contributing to | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. At least prior to this patch, this statement is untrue because the way the vectorised loop is constructed in vplan the latch block is always executed regardless of whether we would exit early in the scalar loop, i.e.
will get vectorised into a single block:
Are you saying that with this patch we now create a masked load for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, that's why it's a FIXME. Future work :) |
||
// the loop condition need to be guaranteed dereferenceable and | ||
// aligned. | ||
if (!isDereferenceableReadOnlyLoop(TheLoop, PSE.getSE(), DT, AC, | ||
&Predicates)) { | ||
reportVectorizationFailure( | ||
"Loop may fault", | ||
"Cannot vectorize potentially faulting early exit loop", | ||
"PotentiallyFaultingEarlyExitLoop", ORE, TheLoop); | ||
return false; | ||
} | ||
} | ||
|
||
[[maybe_unused]] const SCEV *SymbolicMaxBTC = | ||
|
@@ -1751,6 +1853,9 @@ bool LoopVectorizationLegality::isVectorizableEarlyExitLoop() { | |
"backedge taken count: " | ||
<< *SymbolicMaxBTC << '\n'); | ||
UncountableEdge = SingleUncountableEdge; | ||
if (HasStore) | ||
EarlyExitLoad = EELoad; | ||
|
||
return true; | ||
} | ||
|
||
|
@@ -1822,7 +1927,7 @@ bool LoopVectorizationLegality::canVectorize(bool UseVPlanNativePath) { | |
return false; | ||
} else { | ||
if (!isVectorizableEarlyExitLoop()) { | ||
UncountableEdge = std::nullopt; | ||
clearEarlyExitData(); | ||
if (DoExtraAnalysis) | ||
Result = false; | ||
else | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does
nearest
refer to nearest load prior to the early exit, after, or either?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The load closest in the IR graph to the exit (ideally, the load whose result is used directly in a comparison). I didn't spend much time on the naming ;)