Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

[LV] Initial support for stores in early exit loops #137774

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: main
Choose a base branch
Loading
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions 22 llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
Original file line number Diff line number Diff line change
Expand Up @@ -407,6 +407,13 @@ class LoopVectorizationLegality {
return hasUncountableEarlyExit() ? getUncountableEdge()->second : nullptr;
}

/// Returns true if this is an early exit loop containing a store.
bool isConditionCopyRequired() const { return EarlyExitLoad.has_value(); }

/// Returns the load instruction, if any, nearest to an uncountable early
/// exit.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does nearest refer to nearest load prior to the early exit, after, or either?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The load closest in the IR graph to the exit (ideally, the load whose result is used directly in a comparison). I didn't spend much time on the naming ;)

std::optional<LoadInst *> getEarlyExitLoad() const { return EarlyExitLoad; }

/// Return true if there is store-load forwarding dependencies.
bool isSafeForAnyStoreLoadForwardDistances() const {
return LAI->getDepChecker().isSafeForAnyStoreLoadForwardDistances();
Expand Down Expand Up @@ -536,6 +543,12 @@ class LoopVectorizationLegality {
/// additional cases safely.
bool isVectorizableEarlyExitLoop();

/// Clears any current early exit data gathered if a check failed.
void clearEarlyExitData() {
UncountableEdge = std::nullopt;
EarlyExitLoad = std::nullopt;
}

/// Return true if all of the instructions in the block can be speculatively
/// executed, and record the loads/stores that require masking.
/// \p SafePtrs is a list of addresses that are known to be legal and we know
Expand Down Expand Up @@ -654,6 +667,15 @@ class LoopVectorizationLegality {
/// Keep track of the loop edge to an uncountable exit, comprising a pair
/// of (Exiting, Exit) blocks, if there is exactly one early exit.
std::optional<std::pair<BasicBlock *, BasicBlock *>> UncountableEdge;

/// The load used to determine an uncountable early-exit condition. This is
/// only used to allow further analysis in canVectorizeMemory if we found
/// what looks like a valid early exit loop with store beforehand.
///
/// Also indicates that we will need to copy the early exit condition into
/// the vector preheader, as we will need to mask some operations in
/// the loop (e.g. stores) or bail out to a scalar loop.
std::optional<LoadInst *> EarlyExitLoad;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens in the case of two loads?, i.e.

%ld1 = load i8, ...
%ld2 = load i8, ...
%cmp = icmp eq i8 %ld1, %ld2
br i1 %cmp, label %early.exit, %loop.inc

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would fail the check that the second operand to the compare is loop invariant. There's a backup check in the vplan transform that would reject it at that point too.

In future, it would be nice to increase the number of cases handled, but I've tried to keep it simple for now and just come up with a list of extra cases to handle later.

Some of the possible future work:

  1. Supporting a combined condition; e.g. no escaping value, but two exit conditions or'd together. Needs work in ScalarEvolution? (This actually applies to the sample loop I chose; for the IR test I manually edited the IR to create 2 exits again)

  2. Multiple uses of loads or comparisons in the exit IR chain; requires introducing a PHI node for current vector iteration value.

  3. Supporting a chain of conditional loads; e.g. early exit is itself in a conditional block so might not always execute.

  4. Supporting a non-const second term for comparison, or more varied comparisons (e.g. bit test instead of icmp).

  5. Tail folding of EE loops with a store.

  6. Dynamic bounds for known-dereferenceable load (via e.g. call void @llvm.assume(i1 true) [ "align"(ptr %pred, i64 2), "dereferenceable"(ptr %pred, i32 %n_bytes) ])

};

} // namespace llvm
Expand Down
129 changes: 117 additions & 12 deletions 129 llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
#include "llvm/Transforms/Vectorize/LoopVectorizationLegality.h"
#include "llvm/Analysis/Loads.h"
#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/MustExecute.h"
#include "llvm/Analysis/OptimizationRemarkEmitter.h"
#include "llvm/Analysis/ScalarEvolutionExpressions.h"
#include "llvm/Analysis/TargetLibraryInfo.h"
Expand Down Expand Up @@ -1209,7 +1210,33 @@ bool LoopVectorizationLegality::canVectorizeMemory() {
});
}

if (!LAI->canVectorizeMemory())
if (LAI->canVectorizeMemory()) {
// FIXME: Remove or reduce this restriction. We're in a bit of an odd spot
// since we're (potentially) doing the load out of its normal order
// in the loop and that may throw off dependency checking.
// A forward dependency should be fine, but a backwards dep may not
// be even if LAA thinks it is due to performing the load for the
// vector iteration i+1 in vector iteration i.
if (isConditionCopyRequired()) {
const MemoryDepChecker &DepChecker = LAI->getDepChecker();
const auto *Deps = DepChecker.getDependences();

for (const MemoryDepChecker::Dependence &Dep : *Deps) {
if (Dep.getDestination(DepChecker) == EarlyExitLoad ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to worry about other loads too? The header block could have several loads, with only one of them contributing to the branch condition.

Dep.getSource(DepChecker) == EarlyExitLoad) {
// Refine language a little? This currently only applies when a store
// is present in the early exit loop.
reportVectorizationFailure(
"No dependencies allowed for early exit condition load",
"Early exit condition loads may not have a dependence with "
"another"
" memory operation.",
"CantVectorizeStoreToLoopInvariantAddress", ORE, TheLoop);
return false;
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens in the else case? Looks like we fall through to:

  if (!LAI->canVectorizeMemory())
    return canVectorizeIndirectUnsafeDependences();

and so we may still treat the loop as legal. Worth adding a test for this case I think.

} else if (!isConditionCopyRequired())
return canVectorizeIndirectUnsafeDependences();

if (LAI->hasLoadStoreDependenceInvolvingLoopInvariantAddress()) {
Expand Down Expand Up @@ -1708,16 +1735,31 @@ bool LoopVectorizationLegality::isVectorizableEarlyExitLoop() {
}
};

bool HasStore = false;
for (auto *BB : TheLoop->blocks())
for (auto &I : *BB) {
if (StoreInst *SI = dyn_cast<StoreInst>(&I)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it matter if the store is prior to or after the early exit?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Since this prototype bails out to a scalar loop if the next vector iteration would exit partway through, it doesn't matter where the store is in the loop.

If/when we do tail-folding for these loops, we will need to generate different masks for state-changing operations before or after an exit.

HasStore = true;
if (SI->isSimple())
continue;

reportVectorizationFailure(
"Complex writes to memory unsupported in early exit loops",
"Cannot vectorize early exit loop with complex writes to memory",
"WritesInEarlyExitLoop", ORE, TheLoop);
return false;
}

if (I.mayWriteToMemory()) {
// We don't support writes to memory.
reportVectorizationFailure(
"Writes to memory unsupported in early exit loops",
"Cannot vectorize early exit loop with writes to memory",
"Complex writes to memory unsupported in early exit loops",
"Cannot vectorize early exit loop with complex writes to memory",
"WritesInEarlyExitLoop", ORE, TheLoop);
return false;
} else if (!IsSafeOperation(&I)) {
}

if (!IsSafeOperation(&I)) {
reportVectorizationFailure("Early exit loop contains operations that "
"cannot be speculatively executed",
"UnsafeOperationsEarlyExitLoop", ORE,
Expand All @@ -1732,13 +1774,73 @@ bool LoopVectorizationLegality::isVectorizableEarlyExitLoop() {

// TODO: Handle loops that may fault.
Predicates.clear();
if (!isDereferenceableReadOnlyLoop(TheLoop, PSE.getSE(), DT, AC,
&Predicates)) {
reportVectorizationFailure(
"Loop may fault",
"Cannot vectorize potentially faulting early exit loop",
"PotentiallyFaultingEarlyExitLoop", ORE, TheLoop);
return false;

std::optional<LoadInst *> EELoad;
if (HasStore) {
// Record load for analysis by isDereferenceableAndAlignedInLoop
// and later by dependence analysis.
if (BranchInst *Br = dyn_cast<BranchInst>(
SingleUncountableEdge->first->getTerminator())) {
// FIXME: Handle exit conditions with multiple users, more complex exit
// conditions than br(icmp(load, loop_inv)).
ICmpInst *Cmp = dyn_cast<ICmpInst>(Br->getCondition());
if (Cmp && Cmp->hasOneUse() &&
TheLoop->isLoopInvariant(Cmp->getOperand(1))) {
LoadInst *Load = dyn_cast<LoadInst>(Cmp->getOperand(0));
if (Load && Load->hasOneUse() && !TheLoop->isLoopInvariant(Load)) {
if (isDereferenceableAndAlignedInLoop(Load, TheLoop, *PSE.getSE(),
*DT, AC, &Predicates)) {
ICFLoopSafetyInfo SafetyInfo;
SafetyInfo.computeLoopSafetyInfo(TheLoop);
// FIXME: We may have multiple levels of conditional loads, so will
// need to improve on outright rejection at some point.
if (SafetyInfo.isGuaranteedToExecute(*Load, DT, TheLoop)) {
EELoad = Load;
} else {
LLVM_DEBUG(
dbgs()
<< "Early exit condition load not guaranteed to execute.\n");
reportVectorizationFailure(
"Early exit condition load not guaranteed to execute",
"Cannot vectorize early exit loop when condition load is not "
"guaranteed to execute",
"EarlyExitLoadNotGuaranteed", ORE, TheLoop);
}
} else {
LLVM_DEBUG(dbgs()
<< "Early exit condition load potentially unsafe.\n");
reportVectorizationFailure(
"Uncounted loop condition not known safe",
"Cannot vectorize early exit loop with "
"possibly unsafe condition load",
"PotentiallyFaultingEarlyExitLoop", ORE, TheLoop);
return false;
}
}
}
}

if (!EELoad.has_value()) {
LLVM_DEBUG(dbgs() << "Found early exit store but no condition load.\n");
reportVectorizationFailure(
"Early exit loop with store but no condition load",
"Cannot vectorize early exit loop with store but no condition load",
"NoConditionLoadForEarlyExitLoop", ORE, TheLoop);
return false;
}
} else {
// Read-only loop.
// FIXME: as with the loops with stores, only the loads contributing to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least prior to this patch, this statement is untrue because the way the vectorised loop is constructed in vplan the latch block is always executed regardless of whether we would exit early in the scalar loop, i.e.

for.body:
  %ld1 = load i8, ...
  ...

for.inc:
  %ld2 = load i8, ...
  ...

will get vectorised into a single block:

vector.body:
  ...
  %vld1 = load <16 x i8>, ...
  %vld2 = load <16 x i8>, ...
  ...
  %vcmp = icmp eq <16 x i8> %vld1, splat (i8 3)
  %early.exit.cmp = reduce or %vcmp
  %latch.exit.cmp = icmp ...
  %final.cmp = or i1 %early.exit.cmp, %latch.exit.cmp
  br i1 %final.cmp, label %middle.split, label %vector.body

Are you saying that with this patch we now create a masked load for %vld2 instead based on the early exit condition?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that's why it's a FIXME. Future work :)

// the loop condition need to be guaranteed dereferenceable and
// aligned.
if (!isDereferenceableReadOnlyLoop(TheLoop, PSE.getSE(), DT, AC,
&Predicates)) {
reportVectorizationFailure(
"Loop may fault",
"Cannot vectorize potentially faulting early exit loop",
"PotentiallyFaultingEarlyExitLoop", ORE, TheLoop);
return false;
}
}

[[maybe_unused]] const SCEV *SymbolicMaxBTC =
Expand All @@ -1751,6 +1853,9 @@ bool LoopVectorizationLegality::isVectorizableEarlyExitLoop() {
"backedge taken count: "
<< *SymbolicMaxBTC << '\n');
UncountableEdge = SingleUncountableEdge;
if (HasStore)
EarlyExitLoad = EELoad;

return true;
}

Expand Down Expand Up @@ -1822,7 +1927,7 @@ bool LoopVectorizationLegality::canVectorize(bool UseVPlanNativePath) {
return false;
} else {
if (!isVectorizableEarlyExitLoop()) {
UncountableEdge = std::nullopt;
clearEarlyExitData();
if (DoExtraAnalysis)
Result = false;
else
Expand Down
23 changes: 21 additions & 2 deletions 23 llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2530,8 +2530,10 @@ BasicBlock *InnerLoopVectorizer::emitMemRuntimeChecks(BasicBlock *Bypass) {
static void replaceVPBBWithIRVPBB(VPBasicBlock *VPBB, BasicBlock *IRBB) {
VPIRBasicBlock *IRVPBB = VPBB->getPlan()->createVPIRBasicBlock(IRBB);
for (auto &R : make_early_inc_range(*VPBB)) {
assert(!R.isPhi() && "Tried to move phi recipe to end of block");
R.moveBefore(*IRVPBB, IRVPBB->end());
if (R.isPhi())
R.moveBefore(*IRVPBB, IRVPBB->getFirstNonPhi());
else
R.moveBefore(*IRVPBB, IRVPBB->end());
}

VPBlockUtils::reassociateBlocks(VPBB, IRVPBB);
Expand Down Expand Up @@ -9100,6 +9102,15 @@ void LoopVectorizationPlanner::buildVPlansWithVPRecipes(ElementCount MinVF,
VPlanTransforms::runPass(VPlanTransforms::truncateToMinimalBitwidths,
*Plan, CM.getMinimalBitwidths());
VPlanTransforms::runPass(VPlanTransforms::optimize, *Plan);

// See if we can convert an early exit vplan to bail out to a scalar
// loop if state-changing operations (like stores) are present and
// an exit will be taken in the next vector iteration.
// If not, discard the plan.
if (Legal->isConditionCopyRequired() && !HasScalarVF &&
!VPlanTransforms::runPass(VPlanTransforms::tryEarlyExitConversion,
*Plan))
break;
// TODO: try to put it close to addActiveLaneMask().
// Discard the plan if it is not EVL-compatible
if (CM.foldTailWithEVL() && !HasScalarVF &&
Expand Down Expand Up @@ -9380,6 +9391,10 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range,
Range);
DenseMap<const VPBlockBase *, BasicBlock *> VPB2IRBB;
auto Plan = VPlanTransforms::buildPlainCFG(OrigLoop, *LI, VPB2IRBB);
// FIXME: Better place to put this? Or maybe an enum for how to handle
// early exits?
if (Legal->hasUncountableEarlyExit())
Plan->setEarlyExitContinuesInScalarLoop(Legal->isConditionCopyRequired());
VPlanTransforms::prepareForVectorization(
*Plan, Legal->getWidestInductionType(), PSE, RequiresScalarEpilogueCheck,
CM.foldTailByMasking(), OrigLoop,
Expand Down Expand Up @@ -9681,6 +9696,10 @@ VPlanPtr LoopVectorizationPlanner::tryToBuildVPlan(VFRange &Range) {

DenseMap<const VPBlockBase *, BasicBlock *> VPB2IRBB;
auto Plan = VPlanTransforms::buildPlainCFG(OrigLoop, *LI, VPB2IRBB);
// FIXME: Better place to put this? Or maybe an enum for how to handle
// early exits?
if (Legal->hasUncountableEarlyExit())
Plan->setEarlyExitContinuesInScalarLoop(Legal->isConditionCopyRequired());
VPlanTransforms::prepareForVectorization(
*Plan, Legal->getWidestInductionType(), PSE, true, false, OrigLoop,
getDebugLocFromInstOrOperands(Legal->getPrimaryInduction()), false,
Expand Down
17 changes: 17 additions & 0 deletions 17 llvm/lib/Transforms/Vectorize/VPlan.h
Original file line number Diff line number Diff line change
Expand Up @@ -3636,6 +3636,13 @@ class VPlan {
/// VPlan is destroyed.
SmallVector<VPBlockBase *> CreatedBlocks;

/// Indicates that an early exit loop will exit before the condition is
/// reached, and that the scalar loop must perform the last few iterations.
/// FIXME: Is this the right place? We mainly want to make sure that we
/// know about this for transforming the plan to copy&move the exit
/// condition, but maybe it doesn't need to be in the plan itself.
bool EarlyExitContinuesInScalarLoop = false;

/// Construct a VPlan with \p Entry to the plan and with \p ScalarHeader
/// wrapping the original header of the scalar loop.
VPlan(VPBasicBlock *Entry, VPIRBasicBlock *ScalarHeader)
Expand Down Expand Up @@ -3939,6 +3946,16 @@ class VPlan {
return ExitBlocks.size() > 1 || ExitBlocks[0]->getNumPredecessors() > 1;
}

/// Returns true if all exit paths should reach the scalar loop.
bool shouldEarlyExitContinueInScalarLoop() const {
return EarlyExitContinuesInScalarLoop;
}

/// Set early exit vectorization to always reach the scalar loop.
void setEarlyExitContinuesInScalarLoop(bool Continues) {
EarlyExitContinuesInScalarLoop = Continues;
}

/// Returns true if the scalar tail may execute after the vector loop. Note
/// that this relies on unneeded branches to the scalar tail loop being
/// removed.
Expand Down
44 changes: 44 additions & 0 deletions 44 llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,17 @@ template <typename Class> struct class_match {
/// Match an arbitrary VPValue and ignore it.
inline class_match<VPValue> m_VPValue() { return class_match<VPValue>(); }

struct loop_invariant_vpvalue {
template <typename ITy> bool match(ITy *V) const {
VPValue *Val = dyn_cast<VPValue>(V);
return Val && Val->isDefinedOutsideLoopRegions();
}
};

inline loop_invariant_vpvalue m_LoopInvVPValue() {
return loop_invariant_vpvalue();
}

template <typename Class> struct bind_ty {
Class *&VR;

Expand Down Expand Up @@ -324,6 +335,12 @@ m_Not(const Op0_t &Op0) {
return m_VPInstruction<VPInstruction::Not>(Op0);
}

template <typename Op0_t>
inline UnaryVPInstruction_match<Op0_t, VPInstruction::AnyOf>
m_AnyOf(const Op0_t &Op0) {
return m_VPInstruction<VPInstruction::AnyOf>(Op0);
}

template <typename Op0_t>
inline UnaryVPInstruction_match<Op0_t, VPInstruction::BranchOnCond>
m_BranchOnCond(const Op0_t &Op0) {
Expand Down Expand Up @@ -431,6 +448,19 @@ inline GEPLikeRecipe_match<Op0_t, Op1_t> m_GetElementPtr(const Op0_t &Op0,
return GEPLikeRecipe_match<Op0_t, Op1_t>(Op0, Op1);
}

// FIXME: Separate Commutative matcher? Share result type?
// FIXME: Are there other recipe types for ICmp?
template <typename Op0_t, typename Op1_t>
using ICmpRecipe_match =
BinaryRecipe_match<Op0_t, Op1_t, Instruction::ICmp, false, VPWidenRecipe,
VPReplicateRecipe>;

template <typename Op0_t, typename Op1_t>
inline ICmpRecipe_match<Op0_t, Op1_t> m_ICmp(const Op0_t &Op0,
const Op1_t &Op1) {
return ICmpRecipe_match<Op0_t, Op1_t>(Op0, Op1);
}

template <typename Op0_t, typename Op1_t, typename Op2_t, unsigned Opcode>
using AllTernaryRecipe_match =
Recipe_match<std::tuple<Op0_t, Op1_t, Op2_t>, Opcode, false,
Expand Down Expand Up @@ -581,6 +611,20 @@ m_Intrinsic(const T0 &Op0, const T1 &Op1, const T2 &Op2, const T3 &Op3) {
return m_CombineAnd(m_Intrinsic<IntrID>(Op0, Op1, Op2), m_Argument<3>(Op3));
}

template <typename SubPattern_t> struct OneUse_match {
SubPattern_t SubPattern;

OneUse_match(const SubPattern_t &SP) : SubPattern(SP) {}

template <typename OpTy> bool match(OpTy *V) {
return V->hasOneUse() && SubPattern.match(V);
}
};

template <typename T> inline OneUse_match<T> m_OneUse(const T &SubPattern) {
return SubPattern;
}

} // namespace VPlanPatternMatch
} // namespace llvm

Expand Down
4 changes: 3 additions & 1 deletion 4 llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -276,7 +276,9 @@ InstructionCost VPRecipeBase::computeCost(ElementCount VF,
bool VPRecipeBase::isPhi() const {
return (getVPDefID() >= VPFirstPHISC && getVPDefID() <= VPLastPHISC) ||
(isa<VPInstruction>(this) &&
cast<VPInstruction>(this)->getOpcode() == Instruction::PHI) ||
(cast<VPInstruction>(this)->getOpcode() == Instruction::PHI ||
cast<VPInstruction>(this)->getOpcode() ==
VPInstruction::ResumePhi)) ||
isa<VPIRPhi>(this);
}

Expand Down
Loading
Morty Proxy This is a proxified and sanitized view of the page, visit original site.