[VPlan] Move predication to VPlanTransform (NFC). #128420

fhahn · Feb 23, 2025

This patch moves the logic to predicate and linearize a VPlan to a dedicated VPlan transform.

The main logic to perform predication is ready to review, although there are few things to note that should be improved, either directly in the PR or in the future:

Edge and block masks are cached in VPPredicator, but the block masks are still made available via VPRecipeBuilder, so they can be accessed during recipe construction. As a follow-up, this should be replaced by adding mask operands to all VPInstructions that need them and use that during recipe construction.
The mask caching in a map also means that this map needs updating each time a new recipe replaces a VPInstruction; this would also be handled by adding mask operands.

llvmbot · Feb 23, 2025

@llvm/pr-subscribers-vectorizers

Author: Florian Hahn (fhahn)

Changes

This patch moves the logic to predicate and linearize a VPlan to a dedicated VPlan transform.

The main logic to perform predication is ready to review, although there are few things to note that should be improved, either directly in the PR or in the future:

Edge and block masks are cached in VPRecipeBuilder, so they can be accessed during recipe construction. A better alternative may be to add mask operands to all VPInstructions that need them and use that during recipe construction
The mask caching in a map also means that this map needs updating each time a new recipe replaces a VPInstruction; this would also be handled by adding mask operands.

Currently this is still WIP due to early-exit loop handling not working due to the exit conditions not being available in the initial VPlans. This will be fixed with #128419 and follow-ups

All tests except early-exit loops are passing

Patch is 38.23 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/128420.diff

8 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/CMakeLists.txt (+1)
(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+27-259)
(modified) llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h (+18-27)
(modified) llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp (+13-11)
(modified) llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.h (-12)
(added) llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp (+274)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+3-2)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.h (+3)

diff --git a/llvm/lib/Transforms/Vectorize/CMakeLists.txt b/llvm/lib/Transforms/Vectorize/CMakeLists.txt
index 38670ba304e53..74ae61440327c 100644
--- a/llvm/lib/Transforms/Vectorize/CMakeLists.txt
+++ b/llvm/lib/Transforms/Vectorize/CMakeLists.txt
@@ -23,6 +23,7 @@ add_llvm_component_library(LLVMVectorize
   VPlan.cpp
   VPlanAnalysis.cpp
   VPlanHCFGBuilder.cpp
+  VPlanPredicator.cpp
   VPlanRecipes.cpp
   VPlanSLP.cpp
   VPlanTransforms.cpp
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index ced01df7b0d44..a2e20a701d612 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -8115,185 +8115,6 @@ void EpilogueVectorizerEpilogueLoop::printDebugTracesAtEnd() {
   });
 }
 
-void VPRecipeBuilder::createSwitchEdgeMasks(SwitchInst *SI) {
-  BasicBlock *Src = SI->getParent();
-  assert(!OrigLoop->isLoopExiting(Src) &&
-         all_of(successors(Src),
-                [this](BasicBlock *Succ) {
-                  return OrigLoop->getHeader() != Succ;
-                }) &&
-         "unsupported switch either exiting loop or continuing to header");
-  // Create masks where the terminator in Src is a switch. We create mask for
-  // all edges at the same time. This is more efficient, as we can create and
-  // collect compares for all cases once.
-  VPValue *Cond = getVPValueOrAddLiveIn(SI->getCondition());
-  BasicBlock *DefaultDst = SI->getDefaultDest();
-  MapVector<BasicBlock *, SmallVector<VPValue *>> Dst2Compares;
-  for (auto &C : SI->cases()) {
-    BasicBlock *Dst = C.getCaseSuccessor();
-    assert(!EdgeMaskCache.contains({Src, Dst}) && "Edge masks already created");
-    // Cases whose destination is the same as default are redundant and can be
-    // ignored - they will get there anyhow.
-    if (Dst == DefaultDst)
-      continue;
-    auto &Compares = Dst2Compares[Dst];
-    VPValue *V = getVPValueOrAddLiveIn(C.getCaseValue());
-    Compares.push_back(Builder.createICmp(CmpInst::ICMP_EQ, Cond, V));
-  }
-
-  // We need to handle 2 separate cases below for all entries in Dst2Compares,
-  // which excludes destinations matching the default destination.
-  VPValue *SrcMask = getBlockInMask(Src);
-  VPValue *DefaultMask = nullptr;
-  for (const auto &[Dst, Conds] : Dst2Compares) {
-    // 1. Dst is not the default destination. Dst is reached if any of the cases
-    // with destination == Dst are taken. Join the conditions for each case
-    // whose destination == Dst using an OR.
-    VPValue *Mask = Conds[0];
-    for (VPValue *V : ArrayRef<VPValue *>(Conds).drop_front())
-      Mask = Builder.createOr(Mask, V);
-    if (SrcMask)
-      Mask = Builder.createLogicalAnd(SrcMask, Mask);
-    EdgeMaskCache[{Src, Dst}] = Mask;
-
-    // 2. Create the mask for the default destination, which is reached if none
-    // of the cases with destination != default destination are taken. Join the
-    // conditions for each case where the destination is != Dst using an OR and
-    // negate it.
-    DefaultMask = DefaultMask ? Builder.createOr(DefaultMask, Mask) : Mask;
-  }
-
-  if (DefaultMask) {
-    DefaultMask = Builder.createNot(DefaultMask);
-    if (SrcMask)
-      DefaultMask = Builder.createLogicalAnd(SrcMask, DefaultMask);
-  }
-  EdgeMaskCache[{Src, DefaultDst}] = DefaultMask;
-}
-
-VPValue *VPRecipeBuilder::createEdgeMask(BasicBlock *Src, BasicBlock *Dst) {
-  assert(is_contained(predecessors(Dst), Src) && "Invalid edge");
-
-  // Look for cached value.
-  std::pair<BasicBlock *, BasicBlock *> Edge(Src, Dst);
-  EdgeMaskCacheTy::iterator ECEntryIt = EdgeMaskCache.find(Edge);
-  if (ECEntryIt != EdgeMaskCache.end())
-    return ECEntryIt->second;
-
-  if (auto *SI = dyn_cast<SwitchInst>(Src->getTerminator())) {
-    createSwitchEdgeMasks(SI);
-    assert(EdgeMaskCache.contains(Edge) && "Mask for Edge not created?");
-    return EdgeMaskCache[Edge];
-  }
-
-  VPValue *SrcMask = getBlockInMask(Src);
-
-  // The terminator has to be a branch inst!
-  BranchInst *BI = dyn_cast<BranchInst>(Src->getTerminator());
-  assert(BI && "Unexpected terminator found");
-  if (!BI->isConditional() || BI->getSuccessor(0) == BI->getSuccessor(1))
-    return EdgeMaskCache[Edge] = SrcMask;
-
-  // If source is an exiting block, we know the exit edge is dynamically dead
-  // in the vector loop, and thus we don't need to restrict the mask.  Avoid
-  // adding uses of an otherwise potentially dead instruction unless we are
-  // vectorizing a loop with uncountable exits. In that case, we always
-  // materialize the mask.
-  if (OrigLoop->isLoopExiting(Src) &&
-      Src != Legal->getUncountableEarlyExitingBlock())
-    return EdgeMaskCache[Edge] = SrcMask;
-
-  VPValue *EdgeMask = getVPValueOrAddLiveIn(BI->getCondition());
-  assert(EdgeMask && "No Edge Mask found for condition");
-
-  if (BI->getSuccessor(0) != Dst)
-    EdgeMask = Builder.createNot(EdgeMask, BI->getDebugLoc());
-
-  if (SrcMask) { // Otherwise block in-mask is all-one, no need to AND.
-    // The bitwise 'And' of SrcMask and EdgeMask introduces new UB if SrcMask
-    // is false and EdgeMask is poison. Avoid that by using 'LogicalAnd'
-    // instead which generates 'select i1 SrcMask, i1 EdgeMask, i1 false'.
-    EdgeMask = Builder.createLogicalAnd(SrcMask, EdgeMask, BI->getDebugLoc());
-  }
-
-  return EdgeMaskCache[Edge] = EdgeMask;
-}
-
-VPValue *VPRecipeBuilder::getEdgeMask(BasicBlock *Src, BasicBlock *Dst) const {
-  assert(is_contained(predecessors(Dst), Src) && "Invalid edge");
-
-  // Look for cached value.
-  std::pair<BasicBlock *, BasicBlock *> Edge(Src, Dst);
-  EdgeMaskCacheTy::const_iterator ECEntryIt = EdgeMaskCache.find(Edge);
-  assert(ECEntryIt != EdgeMaskCache.end() &&
-         "looking up mask for edge which has not been created");
-  return ECEntryIt->second;
-}
-
-void VPRecipeBuilder::createHeaderMask() {
-  BasicBlock *Header = OrigLoop->getHeader();
-
-  // When not folding the tail, use nullptr to model all-true mask.
-  if (!CM.foldTailByMasking()) {
-    BlockMaskCache[Header] = nullptr;
-    return;
-  }
-
-  // Introduce the early-exit compare IV <= BTC to form header block mask.
-  // This is used instead of IV < TC because TC may wrap, unlike BTC. Start by
-  // constructing the desired canonical IV in the header block as its first
-  // non-phi instructions.
-
-  VPBasicBlock *HeaderVPBB = Plan.getVectorLoopRegion()->getEntryBasicBlock();
-  auto NewInsertionPoint = HeaderVPBB->getFirstNonPhi();
-  auto *IV = new VPWidenCanonicalIVRecipe(Plan.getCanonicalIV());
-  HeaderVPBB->insert(IV, NewInsertionPoint);
-
-  VPBuilder::InsertPointGuard Guard(Builder);
-  Builder.setInsertPoint(HeaderVPBB, NewInsertionPoint);
-  VPValue *BlockMask = nullptr;
-  VPValue *BTC = Plan.getOrCreateBackedgeTakenCount();
-  BlockMask = Builder.createICmp(CmpInst::ICMP_ULE, IV, BTC);
-  BlockMaskCache[Header] = BlockMask;
-}
-
-VPValue *VPRecipeBuilder::getBlockInMask(BasicBlock *BB) const {
-  // Return the cached value.
-  BlockMaskCacheTy::const_iterator BCEntryIt = BlockMaskCache.find(BB);
-  assert(BCEntryIt != BlockMaskCache.end() &&
-         "Trying to access mask for block without one.");
-  return BCEntryIt->second;
-}
-
-void VPRecipeBuilder::createBlockInMask(BasicBlock *BB) {
-  assert(OrigLoop->contains(BB) && "Block is not a part of a loop");
-  assert(BlockMaskCache.count(BB) == 0 && "Mask for block already computed");
-  assert(OrigLoop->getHeader() != BB &&
-         "Loop header must have cached block mask");
-
-  // All-one mask is modelled as no-mask following the convention for masked
-  // load/store/gather/scatter. Initialize BlockMask to no-mask.
-  VPValue *BlockMask = nullptr;
-  // This is the block mask. We OR all unique incoming edges.
-  for (auto *Predecessor :
-       SetVector<BasicBlock *>(pred_begin(BB), pred_end(BB))) {
-    VPValue *EdgeMask = createEdgeMask(Predecessor, BB);
-    if (!EdgeMask) { // Mask of predecessor is all-one so mask of block is too.
-      BlockMaskCache[BB] = EdgeMask;
-      return;
-    }
-
-    if (!BlockMask) { // BlockMask has its initialized nullptr value.
-      BlockMask = EdgeMask;
-      continue;
-    }
-
-    BlockMask = Builder.createOr(BlockMask, EdgeMask, {});
-  }
-
-  BlockMaskCache[BB] = BlockMask;
-}
-
 VPWidenMemoryRecipe *
 VPRecipeBuilder::tryToWidenMemory(Instruction *I, ArrayRef<VPValue *> Operands,
                                   VFRange &Range) {
@@ -8318,7 +8139,7 @@ VPRecipeBuilder::tryToWidenMemory(Instruction *I, ArrayRef<VPValue *> Operands,
 
   VPValue *Mask = nullptr;
   if (Legal->isMaskRequired(I))
-    Mask = getBlockInMask(I->getParent());
+    Mask = getBlockInMask(Builder.getInsertBlock());
 
   // Determine if the pointer operand of the access is either consecutive or
   // reverse consecutive.
@@ -8437,38 +8258,6 @@ VPWidenIntOrFpInductionRecipe *VPRecipeBuilder::tryToOptimizeInductionTruncate(
   return nullptr;
 }
 
-VPBlendRecipe *VPRecipeBuilder::tryToBlend(PHINode *Phi,
-                                           ArrayRef<VPValue *> Operands) {
-  unsigned NumIncoming = Phi->getNumIncomingValues();
-
-  // We know that all PHIs in non-header blocks are converted into selects, so
-  // we don't have to worry about the insertion order and we can just use the
-  // builder. At this point we generate the predication tree. There may be
-  // duplications since this is a simple recursive scan, but future
-  // optimizations will clean it up.
-
-  // Map incoming IR BasicBlocks to incoming VPValues, for lookup below.
-  // TODO: Add operands and masks in order from the VPlan predecessors.
-  DenseMap<BasicBlock *, VPValue *> VPIncomingValues;
-  for (const auto &[Idx, Pred] : enumerate(predecessors(Phi->getParent())))
-    VPIncomingValues[Pred] = Operands[Idx];
-
-  SmallVector<VPValue *, 2> OperandsWithMask;
-  for (unsigned In = 0; In < NumIncoming; In++) {
-    BasicBlock *Pred = Phi->getIncomingBlock(In);
-    OperandsWithMask.push_back(VPIncomingValues.lookup(Pred));
-    VPValue *EdgeMask = getEdgeMask(Pred, Phi->getParent());
-    if (!EdgeMask) {
-      assert(In == 0 && "Both null and non-null edge masks found");
-      assert(all_equal(Operands) &&
-             "Distinct incoming values with one having a full mask");
-      break;
-    }
-    OperandsWithMask.push_back(EdgeMask);
-  }
-  return new VPBlendRecipe(Phi, OperandsWithMask);
-}
-
 VPSingleDefRecipe *VPRecipeBuilder::tryToWidenCall(CallInst *CI,
                                                    ArrayRef<VPValue *> Operands,
                                                    VFRange &Range) {
@@ -8544,7 +8333,7 @@ VPSingleDefRecipe *VPRecipeBuilder::tryToWidenCall(CallInst *CI,
       //      all-true mask.
       VPValue *Mask = nullptr;
       if (Legal->isMaskRequired(CI))
-        Mask = getBlockInMask(CI->getParent());
+        Mask = getBlockInMask(Builder.getInsertBlock());
       else
         Mask = Plan.getOrAddLiveIn(
             ConstantInt::getTrue(IntegerType::getInt1Ty(CI->getContext())));
@@ -8586,7 +8375,7 @@ VPWidenRecipe *VPRecipeBuilder::tryToWiden(Instruction *I,
     // div/rem operation itself.  Otherwise fall through to general handling below.
     if (CM.isPredicatedInst(I)) {
       SmallVector<VPValue *> Ops(Operands);
-      VPValue *Mask = getBlockInMask(I->getParent());
+      VPValue *Mask = getBlockInMask(Builder.getInsertBlock());
       VPValue *One =
           Plan.getOrAddLiveIn(ConstantInt::get(I->getType(), 1u, false));
       auto *SafeRHS = Builder.createSelect(Mask, Ops[1], One, I->getDebugLoc());
@@ -8668,7 +8457,7 @@ VPRecipeBuilder::tryToWidenHistogram(const HistogramInfo *HI,
   // In case of predicated execution (due to tail-folding, or conditional
   // execution, or both), pass the relevant mask.
   if (Legal->isMaskRequired(HI->Store))
-    HGramOps.push_back(getBlockInMask(HI->Store->getParent()));
+    HGramOps.push_back(getBlockInMask(Builder.getInsertBlock()));
 
   return new VPHistogramRecipe(Opcode,
                                make_range(HGramOps.begin(), HGramOps.end()),
@@ -8724,7 +8513,7 @@ VPRecipeBuilder::handleReplication(Instruction *I, ArrayRef<VPValue *> Operands,
     // added initially. Masked replicate recipes will later be placed under an
     // if-then construct to prevent side-effects. Generate recipes to compute
     // the block mask for this region.
-    BlockInMask = getBlockInMask(I->getParent());
+    BlockInMask = getBlockInMask(Builder.getInsertBlock());
   }
 
   // Note that there is some custom logic to mark some intrinsics as uniform
@@ -8857,9 +8646,8 @@ VPRecipeBase *VPRecipeBuilder::tryToCreateWidenRecipe(
   // nodes, calls and memory operations.
   VPRecipeBase *Recipe;
   if (auto *Phi = dyn_cast<PHINode>(Instr)) {
-    if (Phi->getParent() != OrigLoop->getHeader())
-      return tryToBlend(Phi, Operands);
-
+    assert(Phi->getParent() == OrigLoop->getHeader() &&
+           "Non-header phis should have been handled during predication");
     assert(Operands.size() == 2 && "Must have 2 operands for header phis");
     if ((Recipe = tryToOptimizeInductionPHI(Phi, Operands, Range)))
       return Recipe;
@@ -8964,7 +8752,7 @@ VPRecipeBuilder::tryToCreatePartialReduction(Instruction *Reduction,
             ReductionOpcode == Instruction::Sub) &&
            "Expected an ADD or SUB operation for predicated partial "
            "reductions (because the neutral element in the mask is zero)!");
-    VPValue *Mask = getBlockInMask(Reduction->getParent());
+    VPValue *Mask = getBlockInMask(Builder.getInsertBlock());
     VPValue *Zero =
         Plan.getOrAddLiveIn(ConstantInt::get(Reduction->getType(), 0));
     BinOp = Builder.createSelect(Mask, BinOp, Zero, Reduction->getDebugLoc());
@@ -9332,9 +9120,6 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
   bool HasNUW = !IVUpdateMayOverflow || Style == TailFoldingStyle::None;
   addCanonicalIVRecipes(*Plan, Legal->getWidestInductionType(), HasNUW, DL);
 
-  VPRecipeBuilder RecipeBuilder(*Plan, OrigLoop, TLI, &TTI, Legal, CM, PSE,
-                                Builder);
-
   // ---------------------------------------------------------------------------
   // Pre-construction: record ingredients whose recipes we'll need to further
   // process after constructing the initial VPlan.
@@ -9375,39 +9160,24 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
         return Legal->blockNeedsPredication(BB) || NeedsBlends;
       });
 
-  RecipeBuilder.collectScaledReductions(Range);
 
   auto *MiddleVPBB = Plan->getMiddleBlock();
 
+  VPRecipeBuilder RecipeBuilder(*Plan, OrigLoop, TLI, &TTI, Legal, CM, PSE,
+                                Builder);
+  if (NeedsMasks) {
+    VPlanTransforms::predicateAndLinearize(*Plan, CM.foldTailByMasking(),
+                                           RecipeBuilder);
+  }
+  RecipeBuilder.collectScaledReductions(Range);
+
   // Scan the body of the loop in a topological order to visit each basic block
   // after having visited its predecessor basic blocks.
   ReversePostOrderTraversal<VPBlockShallowTraversalWrapper<VPBlockBase *>> RPOT(
       HeaderVPBB);
 
   VPBasicBlock::iterator MBIP = MiddleVPBB->getFirstNonPhi();
-  VPBlockBase *PrevVPBB = nullptr;
   for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(RPOT)) {
-    // Handle VPBBs down to the latch.
-    if (VPBB == LoopRegion->getExiting()) {
-      assert(!HCFGBuilder.getIRBBForVPB(VPBB) &&
-             "the latch block shouldn't have a corresponding IRBB");
-      VPBlockUtils::connectBlocks(PrevVPBB, VPBB);
-      break;
-    }
-
-    // Create mask based on the IR BB corresponding to VPBB.
-    // TODO: Predicate directly based on VPlan.
-    Builder.setInsertPoint(VPBB, VPBB->begin());
-    if (VPBB == HeaderVPBB) {
-      Builder.setInsertPoint(VPBB, VPBB->getFirstNonPhi());
-      RecipeBuilder.createHeaderMask();
-    } else if (NeedsMasks) {
-      // FIXME: At the moment, masks need to be placed at the beginning of the
-      // block, as blends introduced for phi nodes need to use it. The created
-      // blends should be sunk after the mask recipes.
-      RecipeBuilder.createBlockInMask(HCFGBuilder.getIRBBForVPB(VPBB));
-    }
-
     // Convert input VPInstructions to widened recipes.
     for (VPRecipeBase &R : make_early_inc_range(*VPBB)) {
       auto *SingleDef = cast<VPSingleDefRecipe>(&R);
@@ -9417,7 +9187,8 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
       // latter are added above for masking.
       // FIXME: Migrate code relying on the underlying instruction from VPlan0
       // to construct recipes below to not use the underlying instruction.
-      if (isa<VPCanonicalIVPHIRecipe, VPWidenCanonicalIVRecipe>(&R) ||
+      if (isa<VPCanonicalIVPHIRecipe, VPWidenCanonicalIVRecipe, VPBlendRecipe>(
+              &R) ||
           (isa<VPInstruction>(&R) && !UnderlyingValue))
         continue;
 
@@ -9469,22 +9240,18 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
       } else {
         Builder.insert(Recipe);
       }
-      if (Recipe->getNumDefinedValues() == 1)
+      if (Recipe->getNumDefinedValues() == 1) {
         SingleDef->replaceAllUsesWith(Recipe->getVPSingleValue());
-      else
+        for (auto &[_, V] : RecipeBuilder.BlockMaskCache) {
+          if (V == SingleDef)
+            V = Recipe->getVPSingleValue();
+        }
+      } else
         assert(Recipe->getNumDefinedValues() == 0 &&
                "Unexpected multidef recipe");
       R.eraseFromParent();
     }
 
-    // Flatten the CFG in the loop. Masks for blocks have already been generated
-    // and added to recipes as needed. To do so, first disconnect VPBB from its
-    // successors. Then connect VPBB to the previously visited VPBB.
-    for (auto *Succ : to_vector(VPBB->getSuccessors()))
-      VPBlockUtils::disconnectBlocks(VPBB, Succ);
-    if (PrevVPBB)
-      VPBlockUtils::connectBlocks(PrevVPBB, VPBB);
-    PrevVPBB = VPBB;
   }
 
   assert(isa<VPRegionBlock>(Plan->getVectorLoopRegion()) &&
@@ -9783,7 +9550,7 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
       BasicBlock *BB = CurrentLinkI->getParent();
       VPValue *CondOp = nullptr;
       if (CM.blockNeedsPredicationForAnyReason(BB))
-        CondOp = RecipeBuilder.getBlockInMask(BB);
+        CondOp = RecipeBuilder.getBlockInMask(CurrentLink->getParent());
 
       auto *RedRecipe = new VPReductionRecipe(
           RdxDesc, CurrentLinkI, PreviousLink, VecOp, CondOp,
@@ -9818,7 +9585,8 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
     // different numbers of lanes. Partial reductions mask the input instead.
     if (!PhiR->isInLoop() && CM.foldTailByMasking() &&
         !isa<VPPartialReductionRecipe>(OrigExitingVPV->getDefiningRecipe())) {
-      VPValue *Cond = RecipeBuilder.getBlockInMask(OrigLoop->getHeader());
+      VPValue *Cond =
+          RecipeBuilder.getBlockInMask(VectorLoopRegion->getEntryBasicBlock());
       assert(OrigExitingVPV->getDefiningRecipe()->getParent() != LatchVPBB &&
              "reduction recipe must be defined before latch");
       Type *PhiTy = PhiR->getOperand(0)->getLiveInIRValue()->getType();
diff --git a/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h b/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
index 334cfbad8bd7c..9900c4117c5f6 100644
--- a/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
+++ b/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
@@ -73,11 +73,14 @@ class VPRecipeBuilder {
   /// if-conversion currently takes place during VPlan-construction, so these
   /// caches are only used at that stage.
   using EdgeMaskCacheTy =
-      DenseMap<std::pair<BasicBlock *, BasicBlock *>, VPValue *>;
-  using BlockMaskCacheTy = DenseMap<BasicBlock *, VPValue *>;
+      DenseMap<std::pair<VPBasicBlock *, VPBasicBlock *>, VPValue *>;
+  using BlockMaskCacheTy = DenseMap<VPBasicBlock *, VPValue *>;
   EdgeMaskCacheTy EdgeMaskCache;
+
+public:
   BlockMaskCacheTy BlockMaskCache;
 
+private:
   // VPlan construction support: Hold a mapping from ingredients to
   // their recipe.
   DenseMap<Instruction *, VPRecipeBase *> Ingredient2Recipe;
@@ -114,11 +117,6 @@ class VPRecipeBuilder {
   tryToOptimizeInductionTruncate(TruncInst *I, ArrayRef<VPValue *> Operands,
                                  VFRange &Range);
 
-  /// Handle non-...
[truncated]

llvmbot · Feb 23, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

This patch moves the logic to predicate and linearize a VPlan to a dedicated VPlan transform.

The main logic to perform predication is ready to review, although there are few things to note that should be improved, either directly in the PR or in the future:

Edge and block masks are cached in VPRecipeBuilder, so they can be accessed during recipe construction. A better alternative may be to add mask operands to all VPInstructions that need them and use that during recipe construction
The mask caching in a map also means that this map needs updating each time a new recipe replaces a VPInstruction; this would also be handled by adding mask operands.

Currently this is still WIP due to early-exit loop handling not working due to the exit conditions not being available in the initial VPlans. This will be fixed with #128419 and follow-ups

All tests except early-exit loops are passing

Patch is 38.23 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/128420.diff

8 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/CMakeLists.txt (+1)
(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+27-259)
(modified) llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h (+18-27)
(modified) llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp (+13-11)
(modified) llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.h (-12)
(added) llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp (+274)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+3-2)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.h (+3)

diff --git a/llvm/lib/Transforms/Vectorize/CMakeLists.txt b/llvm/lib/Transforms/Vectorize/CMakeLists.txt
index 38670ba304e53..74ae61440327c 100644
--- a/llvm/lib/Transforms/Vectorize/CMakeLists.txt
+++ b/llvm/lib/Transforms/Vectorize/CMakeLists.txt
@@ -23,6 +23,7 @@ add_llvm_component_library(LLVMVectorize
   VPlan.cpp
   VPlanAnalysis.cpp
   VPlanHCFGBuilder.cpp
+  VPlanPredicator.cpp
   VPlanRecipes.cpp
   VPlanSLP.cpp
   VPlanTransforms.cpp
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index ced01df7b0d44..a2e20a701d612 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -8115,185 +8115,6 @@ void EpilogueVectorizerEpilogueLoop::printDebugTracesAtEnd() {
   });
 }
 
-void VPRecipeBuilder::createSwitchEdgeMasks(SwitchInst *SI) {
-  BasicBlock *Src = SI->getParent();
-  assert(!OrigLoop->isLoopExiting(Src) &&
-         all_of(successors(Src),
-                [this](BasicBlock *Succ) {
-                  return OrigLoop->getHeader() != Succ;
-                }) &&
-         "unsupported switch either exiting loop or continuing to header");
-  // Create masks where the terminator in Src is a switch. We create mask for
-  // all edges at the same time. This is more efficient, as we can create and
-  // collect compares for all cases once.
-  VPValue *Cond = getVPValueOrAddLiveIn(SI->getCondition());
-  BasicBlock *DefaultDst = SI->getDefaultDest();
-  MapVector<BasicBlock *, SmallVector<VPValue *>> Dst2Compares;
-  for (auto &C : SI->cases()) {
-    BasicBlock *Dst = C.getCaseSuccessor();
-    assert(!EdgeMaskCache.contains({Src, Dst}) && "Edge masks already created");
-    // Cases whose destination is the same as default are redundant and can be
-    // ignored - they will get there anyhow.
-    if (Dst == DefaultDst)
-      continue;
-    auto &Compares = Dst2Compares[Dst];
-    VPValue *V = getVPValueOrAddLiveIn(C.getCaseValue());
-    Compares.push_back(Builder.createICmp(CmpInst::ICMP_EQ, Cond, V));
-  }
-
-  // We need to handle 2 separate cases below for all entries in Dst2Compares,
-  // which excludes destinations matching the default destination.
-  VPValue *SrcMask = getBlockInMask(Src);
-  VPValue *DefaultMask = nullptr;
-  for (const auto &[Dst, Conds] : Dst2Compares) {
-    // 1. Dst is not the default destination. Dst is reached if any of the cases
-    // with destination == Dst are taken. Join the conditions for each case
-    // whose destination == Dst using an OR.
-    VPValue *Mask = Conds[0];
-    for (VPValue *V : ArrayRef<VPValue *>(Conds).drop_front())
-      Mask = Builder.createOr(Mask, V);
-    if (SrcMask)
-      Mask = Builder.createLogicalAnd(SrcMask, Mask);
-    EdgeMaskCache[{Src, Dst}] = Mask;
-
-    // 2. Create the mask for the default destination, which is reached if none
-    // of the cases with destination != default destination are taken. Join the
-    // conditions for each case where the destination is != Dst using an OR and
-    // negate it.
-    DefaultMask = DefaultMask ? Builder.createOr(DefaultMask, Mask) : Mask;
-  }
-
-  if (DefaultMask) {
-    DefaultMask = Builder.createNot(DefaultMask);
-    if (SrcMask)
-      DefaultMask = Builder.createLogicalAnd(SrcMask, DefaultMask);
-  }
-  EdgeMaskCache[{Src, DefaultDst}] = DefaultMask;
-}
-
-VPValue *VPRecipeBuilder::createEdgeMask(BasicBlock *Src, BasicBlock *Dst) {
-  assert(is_contained(predecessors(Dst), Src) && "Invalid edge");
-
-  // Look for cached value.
-  std::pair<BasicBlock *, BasicBlock *> Edge(Src, Dst);
-  EdgeMaskCacheTy::iterator ECEntryIt = EdgeMaskCache.find(Edge);
-  if (ECEntryIt != EdgeMaskCache.end())
-    return ECEntryIt->second;
-
-  if (auto *SI = dyn_cast<SwitchInst>(Src->getTerminator())) {
-    createSwitchEdgeMasks(SI);
-    assert(EdgeMaskCache.contains(Edge) && "Mask for Edge not created?");
-    return EdgeMaskCache[Edge];
-  }
-
-  VPValue *SrcMask = getBlockInMask(Src);
-
-  // The terminator has to be a branch inst!
-  BranchInst *BI = dyn_cast<BranchInst>(Src->getTerminator());
-  assert(BI && "Unexpected terminator found");
-  if (!BI->isConditional() || BI->getSuccessor(0) == BI->getSuccessor(1))
-    return EdgeMaskCache[Edge] = SrcMask;
-
-  // If source is an exiting block, we know the exit edge is dynamically dead
-  // in the vector loop, and thus we don't need to restrict the mask.  Avoid
-  // adding uses of an otherwise potentially dead instruction unless we are
-  // vectorizing a loop with uncountable exits. In that case, we always
-  // materialize the mask.
-  if (OrigLoop->isLoopExiting(Src) &&
-      Src != Legal->getUncountableEarlyExitingBlock())
-    return EdgeMaskCache[Edge] = SrcMask;
-
-  VPValue *EdgeMask = getVPValueOrAddLiveIn(BI->getCondition());
-  assert(EdgeMask && "No Edge Mask found for condition");
-
-  if (BI->getSuccessor(0) != Dst)
-    EdgeMask = Builder.createNot(EdgeMask, BI->getDebugLoc());
-
-  if (SrcMask) { // Otherwise block in-mask is all-one, no need to AND.
-    // The bitwise 'And' of SrcMask and EdgeMask introduces new UB if SrcMask
-    // is false and EdgeMask is poison. Avoid that by using 'LogicalAnd'
-    // instead which generates 'select i1 SrcMask, i1 EdgeMask, i1 false'.
-    EdgeMask = Builder.createLogicalAnd(SrcMask, EdgeMask, BI->getDebugLoc());
-  }
-
-  return EdgeMaskCache[Edge] = EdgeMask;
-}
-
-VPValue *VPRecipeBuilder::getEdgeMask(BasicBlock *Src, BasicBlock *Dst) const {
-  assert(is_contained(predecessors(Dst), Src) && "Invalid edge");
-
-  // Look for cached value.
-  std::pair<BasicBlock *, BasicBlock *> Edge(Src, Dst);
-  EdgeMaskCacheTy::const_iterator ECEntryIt = EdgeMaskCache.find(Edge);
-  assert(ECEntryIt != EdgeMaskCache.end() &&
-         "looking up mask for edge which has not been created");
-  return ECEntryIt->second;
-}
-
-void VPRecipeBuilder::createHeaderMask() {
-  BasicBlock *Header = OrigLoop->getHeader();
-
-  // When not folding the tail, use nullptr to model all-true mask.
-  if (!CM.foldTailByMasking()) {
-    BlockMaskCache[Header] = nullptr;
-    return;
-  }
-
-  // Introduce the early-exit compare IV <= BTC to form header block mask.
-  // This is used instead of IV < TC because TC may wrap, unlike BTC. Start by
-  // constructing the desired canonical IV in the header block as its first
-  // non-phi instructions.
-
-  VPBasicBlock *HeaderVPBB = Plan.getVectorLoopRegion()->getEntryBasicBlock();
-  auto NewInsertionPoint = HeaderVPBB->getFirstNonPhi();
-  auto *IV = new VPWidenCanonicalIVRecipe(Plan.getCanonicalIV());
-  HeaderVPBB->insert(IV, NewInsertionPoint);
-
-  VPBuilder::InsertPointGuard Guard(Builder);
-  Builder.setInsertPoint(HeaderVPBB, NewInsertionPoint);
-  VPValue *BlockMask = nullptr;
-  VPValue *BTC = Plan.getOrCreateBackedgeTakenCount();
-  BlockMask = Builder.createICmp(CmpInst::ICMP_ULE, IV, BTC);
-  BlockMaskCache[Header] = BlockMask;
-}
-
-VPValue *VPRecipeBuilder::getBlockInMask(BasicBlock *BB) const {
-  // Return the cached value.
-  BlockMaskCacheTy::const_iterator BCEntryIt = BlockMaskCache.find(BB);
-  assert(BCEntryIt != BlockMaskCache.end() &&
-         "Trying to access mask for block without one.");
-  return BCEntryIt->second;
-}
-
-void VPRecipeBuilder::createBlockInMask(BasicBlock *BB) {
-  assert(OrigLoop->contains(BB) && "Block is not a part of a loop");
-  assert(BlockMaskCache.count(BB) == 0 && "Mask for block already computed");
-  assert(OrigLoop->getHeader() != BB &&
-         "Loop header must have cached block mask");
-
-  // All-one mask is modelled as no-mask following the convention for masked
-  // load/store/gather/scatter. Initialize BlockMask to no-mask.
-  VPValue *BlockMask = nullptr;
-  // This is the block mask. We OR all unique incoming edges.
-  for (auto *Predecessor :
-       SetVector<BasicBlock *>(pred_begin(BB), pred_end(BB))) {
-    VPValue *EdgeMask = createEdgeMask(Predecessor, BB);
-    if (!EdgeMask) { // Mask of predecessor is all-one so mask of block is too.
-      BlockMaskCache[BB] = EdgeMask;
-      return;
-    }
-
-    if (!BlockMask) { // BlockMask has its initialized nullptr value.
-      BlockMask = EdgeMask;
-      continue;
-    }
-
-    BlockMask = Builder.createOr(BlockMask, EdgeMask, {});
-  }
-
-  BlockMaskCache[BB] = BlockMask;
-}
-
 VPWidenMemoryRecipe *
 VPRecipeBuilder::tryToWidenMemory(Instruction *I, ArrayRef<VPValue *> Operands,
                                   VFRange &Range) {
@@ -8318,7 +8139,7 @@ VPRecipeBuilder::tryToWidenMemory(Instruction *I, ArrayRef<VPValue *> Operands,
 
   VPValue *Mask = nullptr;
   if (Legal->isMaskRequired(I))
-    Mask = getBlockInMask(I->getParent());
+    Mask = getBlockInMask(Builder.getInsertBlock());
 
   // Determine if the pointer operand of the access is either consecutive or
   // reverse consecutive.
@@ -8437,38 +8258,6 @@ VPWidenIntOrFpInductionRecipe *VPRecipeBuilder::tryToOptimizeInductionTruncate(
   return nullptr;
 }
 
-VPBlendRecipe *VPRecipeBuilder::tryToBlend(PHINode *Phi,
-                                           ArrayRef<VPValue *> Operands) {
-  unsigned NumIncoming = Phi->getNumIncomingValues();
-
-  // We know that all PHIs in non-header blocks are converted into selects, so
-  // we don't have to worry about the insertion order and we can just use the
-  // builder. At this point we generate the predication tree. There may be
-  // duplications since this is a simple recursive scan, but future
-  // optimizations will clean it up.
-
-  // Map incoming IR BasicBlocks to incoming VPValues, for lookup below.
-  // TODO: Add operands and masks in order from the VPlan predecessors.
-  DenseMap<BasicBlock *, VPValue *> VPIncomingValues;
-  for (const auto &[Idx, Pred] : enumerate(predecessors(Phi->getParent())))
-    VPIncomingValues[Pred] = Operands[Idx];
-
-  SmallVector<VPValue *, 2> OperandsWithMask;
-  for (unsigned In = 0; In < NumIncoming; In++) {
-    BasicBlock *Pred = Phi->getIncomingBlock(In);
-    OperandsWithMask.push_back(VPIncomingValues.lookup(Pred));
-    VPValue *EdgeMask = getEdgeMask(Pred, Phi->getParent());
-    if (!EdgeMask) {
-      assert(In == 0 && "Both null and non-null edge masks found");
-      assert(all_equal(Operands) &&
-             "Distinct incoming values with one having a full mask");
-      break;
-    }
-    OperandsWithMask.push_back(EdgeMask);
-  }
-  return new VPBlendRecipe(Phi, OperandsWithMask);
-}
-
 VPSingleDefRecipe *VPRecipeBuilder::tryToWidenCall(CallInst *CI,
                                                    ArrayRef<VPValue *> Operands,
                                                    VFRange &Range) {
@@ -8544,7 +8333,7 @@ VPSingleDefRecipe *VPRecipeBuilder::tryToWidenCall(CallInst *CI,
       //      all-true mask.
       VPValue *Mask = nullptr;
       if (Legal->isMaskRequired(CI))
-        Mask = getBlockInMask(CI->getParent());
+        Mask = getBlockInMask(Builder.getInsertBlock());
       else
         Mask = Plan.getOrAddLiveIn(
             ConstantInt::getTrue(IntegerType::getInt1Ty(CI->getContext())));
@@ -8586,7 +8375,7 @@ VPWidenRecipe *VPRecipeBuilder::tryToWiden(Instruction *I,
     // div/rem operation itself.  Otherwise fall through to general handling below.
     if (CM.isPredicatedInst(I)) {
       SmallVector<VPValue *> Ops(Operands);
-      VPValue *Mask = getBlockInMask(I->getParent());
+      VPValue *Mask = getBlockInMask(Builder.getInsertBlock());
       VPValue *One =
           Plan.getOrAddLiveIn(ConstantInt::get(I->getType(), 1u, false));
       auto *SafeRHS = Builder.createSelect(Mask, Ops[1], One, I->getDebugLoc());
@@ -8668,7 +8457,7 @@ VPRecipeBuilder::tryToWidenHistogram(const HistogramInfo *HI,
   // In case of predicated execution (due to tail-folding, or conditional
   // execution, or both), pass the relevant mask.
   if (Legal->isMaskRequired(HI->Store))
-    HGramOps.push_back(getBlockInMask(HI->Store->getParent()));
+    HGramOps.push_back(getBlockInMask(Builder.getInsertBlock()));
 
   return new VPHistogramRecipe(Opcode,
                                make_range(HGramOps.begin(), HGramOps.end()),
@@ -8724,7 +8513,7 @@ VPRecipeBuilder::handleReplication(Instruction *I, ArrayRef<VPValue *> Operands,
     // added initially. Masked replicate recipes will later be placed under an
     // if-then construct to prevent side-effects. Generate recipes to compute
     // the block mask for this region.
-    BlockInMask = getBlockInMask(I->getParent());
+    BlockInMask = getBlockInMask(Builder.getInsertBlock());
   }
 
   // Note that there is some custom logic to mark some intrinsics as uniform
@@ -8857,9 +8646,8 @@ VPRecipeBase *VPRecipeBuilder::tryToCreateWidenRecipe(
   // nodes, calls and memory operations.
   VPRecipeBase *Recipe;
   if (auto *Phi = dyn_cast<PHINode>(Instr)) {
-    if (Phi->getParent() != OrigLoop->getHeader())
-      return tryToBlend(Phi, Operands);
-
+    assert(Phi->getParent() == OrigLoop->getHeader() &&
+           "Non-header phis should have been handled during predication");
     assert(Operands.size() == 2 && "Must have 2 operands for header phis");
     if ((Recipe = tryToOptimizeInductionPHI(Phi, Operands, Range)))
       return Recipe;
@@ -8964,7 +8752,7 @@ VPRecipeBuilder::tryToCreatePartialReduction(Instruction *Reduction,
             ReductionOpcode == Instruction::Sub) &&
            "Expected an ADD or SUB operation for predicated partial "
            "reductions (because the neutral element in the mask is zero)!");
-    VPValue *Mask = getBlockInMask(Reduction->getParent());
+    VPValue *Mask = getBlockInMask(Builder.getInsertBlock());
     VPValue *Zero =
         Plan.getOrAddLiveIn(ConstantInt::get(Reduction->getType(), 0));
     BinOp = Builder.createSelect(Mask, BinOp, Zero, Reduction->getDebugLoc());
@@ -9332,9 +9120,6 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
   bool HasNUW = !IVUpdateMayOverflow || Style == TailFoldingStyle::None;
   addCanonicalIVRecipes(*Plan, Legal->getWidestInductionType(), HasNUW, DL);
 
-  VPRecipeBuilder RecipeBuilder(*Plan, OrigLoop, TLI, &TTI, Legal, CM, PSE,
-                                Builder);
-
   // ---------------------------------------------------------------------------
   // Pre-construction: record ingredients whose recipes we'll need to further
   // process after constructing the initial VPlan.
@@ -9375,39 +9160,24 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
         return Legal->blockNeedsPredication(BB) || NeedsBlends;
       });
 
-  RecipeBuilder.collectScaledReductions(Range);
 
   auto *MiddleVPBB = Plan->getMiddleBlock();
 
+  VPRecipeBuilder RecipeBuilder(*Plan, OrigLoop, TLI, &TTI, Legal, CM, PSE,
+                                Builder);
+  if (NeedsMasks) {
+    VPlanTransforms::predicateAndLinearize(*Plan, CM.foldTailByMasking(),
+                                           RecipeBuilder);
+  }
+  RecipeBuilder.collectScaledReductions(Range);
+
   // Scan the body of the loop in a topological order to visit each basic block
   // after having visited its predecessor basic blocks.
   ReversePostOrderTraversal<VPBlockShallowTraversalWrapper<VPBlockBase *>> RPOT(
       HeaderVPBB);
 
   VPBasicBlock::iterator MBIP = MiddleVPBB->getFirstNonPhi();
-  VPBlockBase *PrevVPBB = nullptr;
   for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(RPOT)) {
-    // Handle VPBBs down to the latch.
-    if (VPBB == LoopRegion->getExiting()) {
-      assert(!HCFGBuilder.getIRBBForVPB(VPBB) &&
-             "the latch block shouldn't have a corresponding IRBB");
-      VPBlockUtils::connectBlocks(PrevVPBB, VPBB);
-      break;
-    }
-
-    // Create mask based on the IR BB corresponding to VPBB.
-    // TODO: Predicate directly based on VPlan.
-    Builder.setInsertPoint(VPBB, VPBB->begin());
-    if (VPBB == HeaderVPBB) {
-      Builder.setInsertPoint(VPBB, VPBB->getFirstNonPhi());
-      RecipeBuilder.createHeaderMask();
-    } else if (NeedsMasks) {
-      // FIXME: At the moment, masks need to be placed at the beginning of the
-      // block, as blends introduced for phi nodes need to use it. The created
-      // blends should be sunk after the mask recipes.
-      RecipeBuilder.createBlockInMask(HCFGBuilder.getIRBBForVPB(VPBB));
-    }
-
     // Convert input VPInstructions to widened recipes.
     for (VPRecipeBase &R : make_early_inc_range(*VPBB)) {
       auto *SingleDef = cast<VPSingleDefRecipe>(&R);
@@ -9417,7 +9187,8 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
       // latter are added above for masking.
       // FIXME: Migrate code relying on the underlying instruction from VPlan0
       // to construct recipes below to not use the underlying instruction.
-      if (isa<VPCanonicalIVPHIRecipe, VPWidenCanonicalIVRecipe>(&R) ||
+      if (isa<VPCanonicalIVPHIRecipe, VPWidenCanonicalIVRecipe, VPBlendRecipe>(
+              &R) ||
           (isa<VPInstruction>(&R) && !UnderlyingValue))
         continue;
 
@@ -9469,22 +9240,18 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
       } else {
         Builder.insert(Recipe);
       }
-      if (Recipe->getNumDefinedValues() == 1)
+      if (Recipe->getNumDefinedValues() == 1) {
         SingleDef->replaceAllUsesWith(Recipe->getVPSingleValue());
-      else
+        for (auto &[_, V] : RecipeBuilder.BlockMaskCache) {
+          if (V == SingleDef)
+            V = Recipe->getVPSingleValue();
+        }
+      } else
         assert(Recipe->getNumDefinedValues() == 0 &&
                "Unexpected multidef recipe");
       R.eraseFromParent();
     }
 
-    // Flatten the CFG in the loop. Masks for blocks have already been generated
-    // and added to recipes as needed. To do so, first disconnect VPBB from its
-    // successors. Then connect VPBB to the previously visited VPBB.
-    for (auto *Succ : to_vector(VPBB->getSuccessors()))
-      VPBlockUtils::disconnectBlocks(VPBB, Succ);
-    if (PrevVPBB)
-      VPBlockUtils::connectBlocks(PrevVPBB, VPBB);
-    PrevVPBB = VPBB;
   }
 
   assert(isa<VPRegionBlock>(Plan->getVectorLoopRegion()) &&
@@ -9783,7 +9550,7 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
       BasicBlock *BB = CurrentLinkI->getParent();
       VPValue *CondOp = nullptr;
       if (CM.blockNeedsPredicationForAnyReason(BB))
-        CondOp = RecipeBuilder.getBlockInMask(BB);
+        CondOp = RecipeBuilder.getBlockInMask(CurrentLink->getParent());
 
       auto *RedRecipe = new VPReductionRecipe(
           RdxDesc, CurrentLinkI, PreviousLink, VecOp, CondOp,
@@ -9818,7 +9585,8 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
     // different numbers of lanes. Partial reductions mask the input instead.
     if (!PhiR->isInLoop() && CM.foldTailByMasking() &&
         !isa<VPPartialReductionRecipe>(OrigExitingVPV->getDefiningRecipe())) {
-      VPValue *Cond = RecipeBuilder.getBlockInMask(OrigLoop->getHeader());
+      VPValue *Cond =
+          RecipeBuilder.getBlockInMask(VectorLoopRegion->getEntryBasicBlock());
       assert(OrigExitingVPV->getDefiningRecipe()->getParent() != LatchVPBB &&
              "reduction recipe must be defined before latch");
       Type *PhiTy = PhiR->getOperand(0)->getLiveInIRValue()->getType();
diff --git a/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h b/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
index 334cfbad8bd7c..9900c4117c5f6 100644
--- a/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
+++ b/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
@@ -73,11 +73,14 @@ class VPRecipeBuilder {
   /// if-conversion currently takes place during VPlan-construction, so these
   /// caches are only used at that stage.
   using EdgeMaskCacheTy =
-      DenseMap<std::pair<BasicBlock *, BasicBlock *>, VPValue *>;
-  using BlockMaskCacheTy = DenseMap<BasicBlock *, VPValue *>;
+      DenseMap<std::pair<VPBasicBlock *, VPBasicBlock *>, VPValue *>;
+  using BlockMaskCacheTy = DenseMap<VPBasicBlock *, VPValue *>;
   EdgeMaskCacheTy EdgeMaskCache;
+
+public:
   BlockMaskCacheTy BlockMaskCache;
 
+private:
   // VPlan construction support: Hold a mapping from ingredients to
   // their recipe.
   DenseMap<Instruction *, VPRecipeBase *> Ingredient2Recipe;
@@ -114,11 +117,6 @@ class VPRecipeBuilder {
   tryToOptimizeInductionTruncate(TruncInst *I, ArrayRef<VPValue *> Operands,
                                  VFRange &Range);
 
-  /// Handle non-...
[truncated]

github-actions · Feb 23, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

fhahn · Mar 30, 2025

Still WIP, but early-exits are now handled properly as well, by retaining exit branches during initial construction.

This needs to be split up, which I'll start once #129402 lands

Update initial VPlan construction to include exit conditions and edges. For now, all early exits are disconnected before forming the regions, but a follow-up will update uncountable exit handling to also happen here. This is required to enable VPlan predication and remove the dependence any IR BBs (llvm#128420). This includes updates in a few places to use replaceSuccessor/replacePredecessor to preserve the order of predecessors and successors, to reduce the need of fixing up phi operand orderings. This unfortunately required making them public, not sure if there's a

Move early-exit handling up front to original VPlan construction, before introducing early exits. This builds on llvm#137709, which adds exiting edges to the original VPlan, instead of adding exit blocks later. This retains the exit conditions early, and means we can handle early exits before forming regions, without the reliance on VPRecipeBuilder. Once we retain all exits initially, handling early exits before region construction ensures the regions are valid; otherwise we would leave edges exiting the region from elsewhere than the latch. Removing the reliance on VPRecipeBuilder removes the dependence on mapping IR BBs to VPBBs and unblocks predication as VPlan transform: llvm#128420. Depends on llvm#137709.

Update initial VPlan construction to include exit conditions and edges. For now, all early exits are disconnected before forming the regions, but a follow-up will update uncountable exit handling to also happen here. This is required to enable VPlan predication and remove the dependence any IR BBs (llvm#128420). This includes updates in a few places to use replaceSuccessor/replacePredecessor to preserve the order of predecessors and successors, to reduce the need of fixing up phi operand orderings. This unfortunately required making them public, not sure if there's a

Move early-exit handling up front to original VPlan construction, before introducing early exits. This builds on llvm#137709, which adds exiting edges to the original VPlan, instead of adding exit blocks later. This retains the exit conditions early, and means we can handle early exits before forming regions, without the reliance on VPRecipeBuilder. Once we retain all exits initially, handling early exits before region construction ensures the regions are valid; otherwise we would leave edges exiting the region from elsewhere than the latch. Removing the reliance on VPRecipeBuilder removes the dependence on mapping IR BBs to VPBBs and unblocks predication as VPlan transform: llvm#128420. Depends on llvm#137709.

…7709) Update initial VPlan construction to include exit conditions and edges. The loop region is now first constructed without entry/exiting. Those are set after inserting the region in the CFG, to preserve the original predecessor/successor order of blocks. For now, all early exits are disconnected before forming the regions, but a follow-up will update uncountable exit handling to also happen here. This is required to enable VPlan predication and remove the dependence any IR BBs (#128420). PR: #137709

…(NFC). (#137709) Update initial VPlan construction to include exit conditions and edges. The loop region is now first constructed without entry/exiting. Those are set after inserting the region in the CFG, to preserve the original predecessor/successor order of blocks. For now, all early exits are disconnected before forming the regions, but a follow-up will update uncountable exit handling to also happen here. This is required to enable VPlan predication and remove the dependence any IR BBs (llvm/llvm-project#128420). PR: llvm/llvm-project#137709

Move early-exit handling up front to original VPlan construction, before introducing early exits. This builds on llvm#137709, which adds exiting edges to the original VPlan, instead of adding exit blocks later. This retains the exit conditions early, and means we can handle early exits before forming regions, without the reliance on VPRecipeBuilder. Once we retain all exits initially, handling early exits before region construction ensures the regions are valid; otherwise we would leave edges exiting the region from elsewhere than the latch. Removing the reliance on VPRecipeBuilder removes the dependence on mapping IR BBs to VPBBs and unblocks predication as VPlan transform: llvm#128420. Depends on llvm#137709.

This allows migrating some more code to be based on VPBBs in VPRecipeBuilder, in preparation for #128420.

This allows migrating some more code to be based on VPBBs in VPRecipeBuilder, in preparation for llvm/llvm-project#128420.

Update recipe construction to use VPBBs to look up masks, in preparation for #128420.

fhahn · May 18, 2025

May be worth reviewing the "native" VPlanPredicator logic introduced in https://reviews.llvm.org/D53349 and removed in https://reviews.llvm.org/D123017.

Might be good to as follow-up to potentially improve the predication implementation, once we completed the NFC move and completed the transition? Although the original l VPlanPredicator may need more work, as it was not enabled by default even in the native path and only tested via C++ unit tests.

ayalz

This LGTM, thanks!
Raised several suggestions, can also be addressed as follow-up.

ayalz · May 18, 2025

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+  // to remove the need to keep a map of masks beyond the predication
+  // transform.
+  RecipeBuilder.updateBlockMaskCache(Old2New);
+  for (const auto &[Old, New] : Old2New)


Suggested change

for (const auto &[Old, New] : Old2New)

for (const auto &[Old, _] : Old2New)

?

Done thanks

ayalz · May 18, 2025

llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h

+  void updateBlockMaskCache(const DenseMap<VPValue *, VPValue *> &Old2New) {
+    for (auto &[_, V] : BlockMaskCache) {
+      if (auto *New = Old2New.lookup(V)) {
+        V->replaceAllUsesWith(New);


nit: worth removing V from Old2New now?

Cannot be done for now, as Old2New is used to erase old recipes after updateBlockMaskCache

ayalz · May 18, 2025

llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp

@@ -66,8 +66,7 @@ class PlainCFGBuilder {
      : TheLoop(Lp), LI(LI), Plan(std::make_unique<VPlan>(Lp)) {}

  /// Build plain CFG for TheLoop  and connects it to Plan's entry.


Suggested change

/// Build plain CFG for TheLoop and connects it to Plan's entry.

/// Build plain CFG for TheLoop and connect it to Plan's entry.

Updated thanks.

ayalz · May 19, 2025

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

@@ -9488,7 +9267,8 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range,
      // latter are added above for masking.


Follow-up: have this stage take care of widening original scalar recipes, including canonical IV, blend, and masking recipes (underlying-less VPInstructions)?

ayalz · May 19, 2025

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

-      });
-
+  // ---------------------------------------------------------------------------
+  // Construct recipes for the instructions in the loop


Suggested change

// Construct recipes for the instructions in the loop

// Construct wide recipes and apply predication for original scalar VPInstructions in the loop.

?

Follow-up: outline this into a VPlanTransform?

Yep plan to do so, thanks

ayalz · May 20, 2025

llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp

+}
+
+VPValue *VPPredicator::createBlockInMask(VPBasicBlock *VPBB) {
+  Builder.setInsertPoint(VPBB, VPBB->begin());


Perhaps better to

Suggested change

Builder.setInsertPoint(VPBB, VPBB->begin());

Builder.setInsertPoint(VPBB, VPBB->getFirstNonPhi());

as this keeps phi's in order and allows subsequent traversals of phis() to convert them into blends?

it needs to stay as-is for now, as blends need masks that have been created earlier. Will check and adjust separately.

ayalz · May 20, 2025

llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp

+    SmallVector<VPWidenPHIRecipe *> Phis;
+    for (VPRecipeBase &R : VPBB->phis())
+      Phis.push_back(cast<VPWidenPHIRecipe>(&R));
+
+    Predicator.createBlockInMask(VPBB);


Suggested change

SmallVector<VPWidenPHIRecipe *> Phis;

for (VPRecipeBase &R : VPBB->phis())

Phis.push_back(cast<VPWidenPHIRecipe>(&R));

Predicator.createBlockInMask(VPBB);

Predicator.createBlockInMask(VPBB);

SmallVector<VPWidenPHIRecipe *> Phis;

for (VPRecipeBase &R : VPBB->phis())

Phis.push_back(cast<VPWidenPHIRecipe>(&R));

seems a bit more consistent as Phis are part of the "PhiToBlends" below while createBlockInMask() are part of "introducingBlockMasks" started above with header mask; provided createBlockInMask sets its insert point after all phi's.
Would make_early_inc_range suffice instead of copying into a SmallVector?

Yep, adjusted to insert blends using Builder.insert, removing the need for a vector.

ayalz · May 20, 2025

llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp

+  // Linearize the blocks of the loop into one serial chain.
+  VPBlockBase *PrevVPBB = nullptr;
+  for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(RPOT)) {
+    // Handle VPBBs down to the latch.


This "Handle VPBBs down to latch" early-break is needed when traversing CFG to stop RPOT from going out of the loop. Is it still needed here where RPOT traverses the region, shallowly? If so, is it needed in the createBlockMasks/convertPhisToBlends loop above too?

Removed, thannks

ayalz · May 20, 2025

llvm/lib/Transforms/Vectorize/VPlanTransforms.h

@@ -224,6 +222,16 @@ struct VPlanTransforms {
  /// candidates.
  static void narrowInterleaveGroups(VPlan &Plan, ElementCount VF,
                                     unsigned VectorRegWidth);
+
+  /// Predicate and linearize the control-flow in the only loop region of
+  /// \p Plan. If \p FoldTail is true, also create a mask guarding the loop


Suggested change

/// \p Plan. If \p FoldTail is true, also create a mask guarding the loop

/// \p Plan. If \p FoldTail is true, create a mask guarding the loop

done thanks

ayalz · May 20, 2025

llvm/lib/Transforms/Vectorize/VPlanTransforms.h

+  /// Predicate and linearize the control-flow in the only loop region of
+  /// \p Plan. If \p FoldTail is true, also create a mask guarding the loop
+  /// header, otherwise use all-true for the header mask. Masks for blocks are
+  /// added to \p BlockMaskCache, which in turn will temporarily be used later


Suggested change

/// added to \p BlockMaskCache, which in turn will temporarily be used later

/// added to \p BlockMaskCache in order to be used later

ayalz · May 20, 2025

May be worth reviewing the "native" VPlanPredicator logic introduced in https://reviews.llvm.org/D53349 and removed in https://reviews.llvm.org/D123017.

Might be good to as follow-up to potentially improve the predication implementation, once we completed the NFC move and completed the transition? Although the original l VPlanPredicator may need more work, as it was not enabled by default even in the native path and only tested via C++ unit tests.

Sure, just noting that this revives an older VPlanPredicator.cpp (along with its log?), which could offer some directions for improvements and/or extensions.

llvm-ci · May 21, 2025

LLVM Buildbot has detected a new failure on builder openmp-offload-amdgpu-runtime-2 running on rocm-worker-hw-02 while building llvm at step 6 "test-openmp".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/10/builds/5775

Here is the relevant piece of the build log for the reference

Step 6 (test-openmp) failure: test (failure)
******************** TEST 'libarcher :: races/lock-unrelated.c' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 13
/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/clang -fopenmp  -gdwarf-4 -O1 -fsanitize=thread  -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src   /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/lock-unrelated.c -o /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/lock-unrelated.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/deflake.bash /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/lock-unrelated.c.tmp 2>&1 | tee /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/lock-unrelated.c.tmp.log | /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/FileCheck /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/lock-unrelated.c
# executed command: /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/clang -fopenmp -gdwarf-4 -O1 -fsanitize=thread -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/lock-unrelated.c -o /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/lock-unrelated.c.tmp -latomic
# note: command had no output on stdout or stderr
# executed command: env TSAN_OPTIONS=ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1 /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/deflake.bash /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/lock-unrelated.c.tmp
# note: command had no output on stdout or stderr
# executed command: tee /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/lock-unrelated.c.tmp.log
# note: command had no output on stdout or stderr
# executed command: /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/FileCheck /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/lock-unrelated.c
# note: command had no output on stdout or stderr
# RUN: at line 14
/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/clang -fopenmp  -gdwarf-4 -O1 -fsanitize=thread  -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src   /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/lock-unrelated.c -o /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/lock-unrelated.c.tmp -latomic && env ARCHER_OPTIONS="ignore_serial=1 report_data_leak=1" env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/deflake.bash /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/lock-unrelated.c.tmp 2>&1 | tee /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/lock-unrelated.c.tmp.log | /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/FileCheck /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/lock-unrelated.c
# executed command: /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/clang -fopenmp -gdwarf-4 -O1 -fsanitize=thread -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/lock-unrelated.c -o /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/lock-unrelated.c.tmp -latomic
# note: command had no output on stdout or stderr
# executed command: env 'ARCHER_OPTIONS=ignore_serial=1 report_data_leak=1' env TSAN_OPTIONS=ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1 /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/deflake.bash /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/lock-unrelated.c.tmp
# note: command had no output on stdout or stderr
# executed command: tee /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/lock-unrelated.c.tmp.log
# note: command had no output on stdout or stderr
# executed command: /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/FileCheck /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/lock-unrelated.c
# .---command stderr------------
# | /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/lock-unrelated.c:47:11: error: CHECK: expected string not found in input
# | // CHECK: ThreadSanitizer: reported {{[1-7]}} warnings
# |           ^
# | <stdin>:26:5: note: scanning from here
# | DONE
# |     ^
# | <stdin>:27:1: note: possible intended match here
# | ThreadSanitizer: thread T4 finished with ignores enabled, created at:
# | ^
# | 
# | Input file: <stdin>
# | Check file: /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/lock-unrelated.c
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |             .
# |             .
# |             .
# |            21:  #0 pthread_create /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:1045:3 (lock-unrelated.c.tmp+0xa2c2a) 
# |            22:  #1 __kmp_create_worker z_Linux_util.cpp (libomp.so+0xcac82) 
# |            23:  
# |            24: SUMMARY: ThreadSanitizer: data race /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/lock-unrelated.c:31:8 in main.omp_outlined_debug__ 
# |            25: ================== 
...

This patch moves the logic to predicate and linearize a VPlan to a dedicated VPlan transform. It mostly ports the existing logic directly. There are a number of follow-ups planned in the near future to further improve on the implementation: * Edge and block masks are cached in VPPredicator, but the block masks are still made available to VPRecipeBuilder, so they can be accessed during recipe construction. As a follow-up, this should be replaced by adding mask operands to all VPInstructions that need them and use that during recipe construction. * The mask caching in a map also means that this map needs updating each time a new recipe replaces a VPInstruction; this would also be handled by adding mask operands. PR: llvm/llvm-project#128420

llvm-ci · May 21, 2025

LLVM Buildbot has detected a new failure on builder llvm-nvptx64-nvidia-ubuntu running on as-builder-7 while building llvm at step 2 "checkout".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/160/builds/17815

Here is the relevant piece of the build log for the reference

Step 2 (checkout) failure: update (failure)
...
Resolving deltas:  58% (92/157)
Resolving deltas:  59% (93/157)
Resolving deltas:  60% (95/157)
Resolving deltas:  61% (96/157)
Resolving deltas:  62% (98/157)
Resolving deltas:  63% (99/157)
Resolving deltas:  64% (101/157)
Resolving deltas:  65% (103/157)
Resolving deltas:  66% (104/157)
Resolving deltas:  67% (106/157)
Resolving deltas:  68% (107/157)
Resolving deltas:  69% (109/157)
Resolving deltas:  70% (110/157)
Resolving deltas:  71% (112/157)
Resolving deltas:  72% (114/157)
Resolving deltas:  73% (115/157)
Resolving deltas:  74% (117/157)
Resolving deltas:  75% (118/157)
Resolving deltas:  76% (120/157)
Resolving deltas:  77% (121/157)
Resolving deltas:  78% (123/157)
Resolving deltas:  79% (125/157)
Resolving deltas:  80% (126/157)
Resolving deltas:  81% (128/157)
Resolving deltas:  82% (129/157)
Resolving deltas:  83% (131/157)
Resolving deltas:  84% (132/157)
Resolving deltas:  85% (134/157)
Resolving deltas:  86% (136/157)
Resolving deltas:  87% (137/157)
Resolving deltas:  88% (139/157)
Resolving deltas:  89% (140/157)
Resolving deltas:  90% (142/157)
Resolving deltas:  91% (143/157)
Resolving deltas:  92% (145/157)
Resolving deltas:  93% (147/157)
Resolving deltas:  94% (148/157)
Resolving deltas:  95% (150/157)
Resolving deltas:  96% (151/157)
Resolving deltas:  97% (153/157)
Resolving deltas:  98% (154/157)
Resolving deltas:  99% (156/157)
Resolving deltas: 100% (157/157)
Resolving deltas: 100% (157/157), completed with 123 local objects.
From https://github.com/llvm/llvm-project
 * branch                      main       -> FETCH_HEAD
Auto packing the repository in background for optimum performance.
See "git help gc" for manual housekeeping.
fatal: sha1 file '/home/buildbot/worker/as-builder-7/ramdisk/llvm-nvptx64-nvidia-ubuntu/llvm-project/.git/index.lock' write error. Out of diskspace
fatal: sha1 file '/home/buildbot/worker/as-builder-7/ramdisk/llvm-nvptx64-nvidia-ubuntu/llvm-project/.git/index.lock' write error. Out of diskspace

llvm-ci · May 21, 2025

LLVM Buildbot has detected a new failure on builder llvm-nvptx-nvidia-ubuntu running on as-builder-7 while building llvm at step 2 "checkout".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/180/builds/17958

Here is the relevant piece of the build log for the reference

Step 2 (checkout) failure: update (failure)
...
Resolving deltas:  58% (92/158)
Resolving deltas:  59% (94/158)
Resolving deltas:  60% (95/158)
Resolving deltas:  61% (97/158)
Resolving deltas:  62% (98/158)
Resolving deltas:  63% (100/158)
Resolving deltas:  64% (102/158)
Resolving deltas:  65% (103/158)
Resolving deltas:  66% (105/158)
Resolving deltas:  67% (106/158)
Resolving deltas:  68% (108/158)
Resolving deltas:  69% (110/158)
Resolving deltas:  70% (111/158)
Resolving deltas:  71% (113/158)
Resolving deltas:  72% (114/158)
Resolving deltas:  73% (116/158)
Resolving deltas:  74% (117/158)
Resolving deltas:  75% (119/158)
Resolving deltas:  76% (121/158)
Resolving deltas:  77% (122/158)
Resolving deltas:  78% (124/158)
Resolving deltas:  79% (125/158)
Resolving deltas:  80% (127/158)
Resolving deltas:  81% (128/158)
Resolving deltas:  82% (130/158)
Resolving deltas:  83% (132/158)
Resolving deltas:  84% (133/158)
Resolving deltas:  85% (135/158)
Resolving deltas:  86% (136/158)
Resolving deltas:  87% (138/158)
Resolving deltas:  88% (140/158)
Resolving deltas:  89% (141/158)
Resolving deltas:  90% (143/158)
Resolving deltas:  91% (144/158)
Resolving deltas:  92% (146/158)
Resolving deltas:  93% (147/158)
Resolving deltas:  94% (149/158)
Resolving deltas:  95% (151/158)
Resolving deltas:  96% (152/158)
Resolving deltas:  97% (154/158)
Resolving deltas:  98% (155/158)
Resolving deltas:  99% (157/158)
Resolving deltas: 100% (158/158)
Resolving deltas: 100% (158/158), completed with 124 local objects.
From https://github.com/llvm/llvm-project
 * branch                      main       -> FETCH_HEAD
Auto packing the repository in background for optimum performance.
See "git help gc" for manual housekeeping.
fatal: sha1 file '/home/buildbot/worker/as-builder-7/ramdisk/llvm-nvptx-nvidia-ubuntu/llvm-project/.git/index.lock' write error. Out of diskspace
fatal: sha1 file '/home/buildbot/worker/as-builder-7/ramdisk/llvm-nvptx-nvidia-ubuntu/llvm-project/.git/index.lock' write error. Out of diskspace

llvm-ci · May 21, 2025

LLVM Buildbot has detected a new failure on builder flang-runtime-cuda-gcc running on as-builder-7 while building llvm at step 6 "build-flang-rt".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/152/builds/2711

Here is the relevant piece of the build log for the reference

Step 6 (build-flang-rt) failure: cmake (failure)
...
          detected during instantiation of "__nv_bool Fortran::runtime::io::ChildUnformattedIoStatementState<DIR>::Receive(char *, std::size_t, std::size_t) [with DIR=Fortran::runtime::io::Direction::Input]" at line 1101

8.957 [2/6/117] Building CUDA object flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/extrema.ptx
10.596 [2/5/118] Building CXX object flang-rt/lib/runtime/CMakeFiles/flang_rt.runtime.static.dir/reduce.cpp.o
11.028 [1/5/119] Linking CXX static library /home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/build/lib/clang/21/lib/x86_64-unknown-linux-gnu/libflang_rt.runtime.a
11.033 [1/4/120] Building CUDA object flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/matmul.ptx
11.635 [1/3/121] Building CUDA object flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/findloc.ptx
12.041 [1/2/122] Building CUDA object flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/matmul-transpose.ptx
19.054 [1/1/123] Building CUDA object flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/dot-product.ptx
19.728 [0/1/124] Linking CUDA static library flang-rt/lib/runtime/libflang_rt.runtimePTX.a
FAILED: flang-rt/lib/runtime/libflang_rt.runtimePTX.a 
: && /usr/bin/cmake -E rm -f flang-rt/lib/runtime/libflang_rt.runtimePTX.a && /home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/build/bin/llvm-ar qc flang-rt/lib/runtime/libflang_rt.runtimePTX.a  flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/llvm-project/flang/lib/Decimal/binary-to-decimal.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/llvm-project/flang/lib/Decimal/decimal-to-binary.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/ISO_Fortran_binding.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/allocator-registry.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/allocatable.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/array-constructor.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/assign.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/buffer.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/character.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/connection.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/copy.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/derived-api.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/derived.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/descriptor-io.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/descriptor.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/dot-product.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/edit-input.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/edit-output.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/environment.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/external-unit.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/extrema.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/file.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/findloc.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/format.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/inquiry.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/internal-unit.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/io-api.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/io-api-minimal.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/io-error.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/io-stmt.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/iostat.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/matmul-transpose.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/matmul.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/memory.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/misc-intrinsic.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/namelist.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/non-tbp-dio.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/numeric.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/pointer.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/product.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/pseudo-unit.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/ragged.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/stat.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/stop.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/sum.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/support.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/terminator.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.ru
t.runtimePTX.dir/type-code.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/type-info.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/unit.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/utf.ptx && /home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/build/bin/llvm-ranlib flang-rt/lib/runtime/libflang_rt.runtimePTX.a && :
LLVM ERROR: IO failure on output stream: No space left on device
ninja: build stopped: subcommand failed.
FAILED: runtimes/CMakeFiles/flang-rt /home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/build/runtimes/CMakeFiles/flang-rt 
cd /home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/build/runtimes/runtimes-bins && /usr/bin/cmake --build /home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/build/runtimes/runtimes-bins/ --target flang-rt --config Release
ninja: build stopped: subcommand failed.

llvm-ci · May 21, 2025

LLVM Buildbot has detected a new failure on builder clang-aarch64-sve2-vla running on linaro-g4-02 while building llvm at step 14 "test-suite".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/198/builds/4652

Here is the relevant piece of the build log for the reference

Step 14 (test-suite) failure: test (failure)
...
size..init_array: 16 
size..interp: 27 
size..note.ABI-tag: 32 
size..plt: 928 
size..rela.dyn: 1176 
size..rela.plt: 1344 
size..rodata: 14491 
size..text: 198032 
**********
NOEXE: test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__inline_sum_3_f90.test (6089 of 10355)
******************** TEST 'test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__inline_sum_3_f90.test' FAILED ********************
Executable '/home/tcwg-buildbot/worker/clang-aarch64-sve2-vla/test/sandbox/build/Fortran/gfortran/regression/gfortran-regression-execute-regression__inline_sum_3_f90' is missing
********************
PASS: test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__implicit_12_f90.test (6090 of 10355)
********** TEST 'test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__implicit_12_f90.test' RESULTS **********
compile_time: 0.8763 
exec_time: 0.0005 
hash: "ac6b5721de1683371acd2a9be1d52f6f" 
link_time: 0.0000 
size: 565496 
size..bss: 176 
size..comment: 256 
size..data: 232 
size..data.rel.ro: 176 
size..dynamic: 496 
size..dynstr: 630 
size..dynsym: 1608 
size..eh_frame: 20608 
size..eh_frame_hdr: 5740 
size..fini: 20 
size..fini_array: 8 
size..gnu.hash: 28 
size..gnu.version: 134 
size..gnu.version_r: 96 
size..got: 160 
size..got.plt: 480 
size..init: 24 
size..init_array: 16 
size..interp: 27 
size..note.ABI-tag: 32 
size..plt: 944 
size..rela.dyn: 1248 
size..rela.plt: 1368 
size..rodata: 16035 
size..text: 353904 
**********
PASS: test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__implicit_class_1_f90.test (6091 of 10355)
********** TEST 'test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__implicit_class_1_f90.test' RESULTS **********
compile_time: 0.8763

llvm-ci · May 21, 2025

LLVM Buildbot has detected a new failure on builder clang-aarch64-sve-vla running on linaro-g3-02 while building llvm at step 14 "test-suite".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/17/builds/8225

Here is the relevant piece of the build log for the reference

Step 14 (test-suite) failure: test (failure)
...
size..init_array: 16 
size..interp: 27 
size..note.ABI-tag: 32 
size..plt: 928 
size..rela.dyn: 1176 
size..rela.plt: 1344 
size..rodata: 14480 
size..text: 198032 
**********
NOEXE: test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__inline_sum_3_f90.test (6111 of 10355)
******************** TEST 'test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__inline_sum_3_f90.test' FAILED ********************
Executable '/home/tcwg-buildbot/worker/clang-aarch64-sve-vla/test/sandbox/build/Fortran/gfortran/regression/gfortran-regression-execute-regression__inline_sum_3_f90' is missing
********************
PASS: test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__inline_transpose_1_f90.test (6112 of 10355)
********** TEST 'test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__inline_transpose_1_f90.test' RESULTS **********
compile_time: 1.1315 
exec_time: 0.0000 
hash: "3413d9388541ee8829fbee93834edce8" 
link_time: 0.0000 
size: 972144 
size..bss: 824 
size..comment: 256 
size..data: 688 
size..data.rel.ro: 176 
size..dynamic: 496 
size..dynstr: 666 
size..dynsym: 1704 
size..eh_frame: 34784 
size..eh_frame_hdr: 8908 
size..fini: 20 
size..fini_array: 8 
size..gnu.hash: 28 
size..gnu.version: 142 
size..gnu.version_r: 96 
size..got: 192 
size..got.plt: 512 
size..init: 24 
size..init_array: 16 
size..interp: 27 
size..note.ABI-tag: 32 
size..plt: 1008 
size..rela.dyn: 2376 
size..rela.plt: 1464 
size..rodata: 26854 
size..text: 636816 
**********
PASS: test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__inline_matmul_13_f90.test (6113 of 10355)
********** TEST 'test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__inline_matmul_13_f90.test' RESULTS **********
compile_time: 1.1315

This reverts commit b263c08. Looks like this triggers a crash in one of the Fortran tests. Reverting while I investigate https://lab.llvm.org/buildbot/#/builders/41/builds/6825

fhahn · May 21, 2025

Reverted for now in 793bb6b as it looks like this triggers a crash in one of the Fortran tests. Reverting while I investigate
https://lab.llvm.org/buildbot/\#/builders/41/builds/6825

llvm-ci · May 21, 2025

LLVM Buildbot has detected a new failure on builder clang-ppc64-aix running on aix-ppc64 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/64/builds/3723

Here is the relevant piece of the build log for the reference

Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'lit :: timeout-hang.py' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 13
not env -u FILECHECK_OPTS "/home/llvm/llvm-external-buildbots/workers/env/bin/python3.11" /home/llvm/llvm-external-buildbots/workers/aix-ppc64/clang-ppc64-aix/llvm-project/llvm/utils/lit/lit.py -j1 --order=lexical Inputs/timeout-hang/run-nonexistent.txt  --timeout=1 --param external=0 | "/home/llvm/llvm-external-buildbots/workers/env/bin/python3.11" /home/llvm/llvm-external-buildbots/workers/aix-ppc64/clang-ppc64-aix/build/utils/lit/tests/timeout-hang.py 1
# executed command: not env -u FILECHECK_OPTS /home/llvm/llvm-external-buildbots/workers/env/bin/python3.11 /home/llvm/llvm-external-buildbots/workers/aix-ppc64/clang-ppc64-aix/llvm-project/llvm/utils/lit/lit.py -j1 --order=lexical Inputs/timeout-hang/run-nonexistent.txt --timeout=1 --param external=0
# .---command stderr------------
# | lit.py: /home/llvm/llvm-external-buildbots/workers/aix-ppc64/clang-ppc64-aix/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 1 seconds was requested on the command line. Forcing timeout to be 1 seconds.
# `-----------------------------
# executed command: /home/llvm/llvm-external-buildbots/workers/env/bin/python3.11 /home/llvm/llvm-external-buildbots/workers/aix-ppc64/clang-ppc64-aix/build/utils/lit/tests/timeout-hang.py 1
# .---command stdout------------
# | Testing took as long or longer than timeout
# `-----------------------------
# error: command failed with exit status: 1

--

********************

llvm-ci · May 21, 2025

LLVM Buildbot has detected a new failure on builder clang-aarch64-sve-vls running on linaro-g3-01 while building llvm at step 14 "test-suite".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/143/builds/7863

Here is the relevant piece of the build log for the reference

Step 14 (test-suite) failure: test (failure)
...
size..init_array: 16 
size..interp: 27 
size..note.ABI-tag: 32 
size..plt: 928 
size..rela.dyn: 1176 
size..rela.plt: 1344 
size..rodata: 14480 
size..text: 198032 
**********
NOEXE: test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__inline_sum_3_f90.test (6111 of 10355)
******************** TEST 'test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__inline_sum_3_f90.test' FAILED ********************
Executable '/home/tcwg-buildbot/worker/clang-aarch64-sve-vls/test/sandbox/build/Fortran/gfortran/regression/gfortran-regression-execute-regression__inline_sum_3_f90' is missing
********************
PASS: test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__inline_matmul_19_f90.test (6112 of 10355)
********** TEST 'test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__inline_matmul_19_f90.test' RESULTS **********
compile_time: 1.1250 
exec_time: 0.0000 
hash: "d0a43790b67f6ed5f40c19e41e794469" 
link_time: 0.0000 
size: 420824 
size..bss: 960 
size..comment: 256 
size..data: 232 
size..data.rel.ro: 176 
size..dynamic: 496 
size..dynstr: 655 
size..dynsym: 1680 
size..eh_frame: 19960 
size..eh_frame_hdr: 5532 
size..fini: 20 
size..fini_array: 8 
size..gnu.hash: 28 
size..gnu.version: 140 
size..gnu.version_r: 96 
size..got: 168 
size..got.plt: 504 
size..init: 24 
size..init_array: 24 
size..interp: 27 
size..note.ABI-tag: 32 
size..plt: 992 
size..rela.dyn: 1296 
size..rela.plt: 1440 
size..rodata: 15482 
size..text: 211600 
**********
PASS: test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__initialization_11_f90.test (6113 of 10355)
********** TEST 'test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__initialization_11_f90.test' RESULTS **********
compile_time: 1.1250

This reverts commit 793bb6b. The recommitted version contains a fix to make sure only the original phis are processed in convertPhisToBlends nu collecting them in a vector first. This fixes a crash when no mask is needed, because there is only a single incoming value. Original message: This patch moves the logic to predicate and linearize a VPlan to a dedicated VPlan transform. It mostly ports the existing logic directly. There are a number of follow-ups planned in the near future to further improve on the implementation: * Edge and block masks are cached in VPPredicator, but the block masks are still made available to VPRecipeBuilder, so they can be accessed during recipe construction. As a follow-up, this should be replaced by adding mask operands to all VPInstructions that need them and use that during recipe construction. * The mask caching in a map also means that this map needs updating each time a new recipe replaces a VPInstruction; this would also be handled by adding mask operands. PR: #128420

… (#128420)" This reverts commit 793bb6b. The recommitted version contains a fix to make sure only the original phis are processed in convertPhisToBlends nu collecting them in a vector first. This fixes a crash when no mask is needed, because there is only a single incoming value. Original message: This patch moves the logic to predicate and linearize a VPlan to a dedicated VPlan transform. It mostly ports the existing logic directly. There are a number of follow-ups planned in the near future to further improve on the implementation: * Edge and block masks are cached in VPPredicator, but the block masks are still made available to VPRecipeBuilder, so they can be accessed during recipe construction. As a follow-up, this should be replaced by adding mask operands to all VPInstructions that need them and use that during recipe construction. * The mask caching in a map also means that this map needs updating each time a new recipe replaces a VPInstruction; this would also be handled by adding mask operands. PR: llvm/llvm-project#128420

fhahn · May 22, 2025

Re-landed in 95ba550 this morning, looks like the flang bots are happy.

Add queue_test [MC][DebugInfo] Emit linetable entries with known offsets immediately (#134677) DWARF linetable entries are usually emitted as a sequence of MCDwarfLineAddrFragment fragments containing the line-number difference and an MCExpr describing the instruction-range the linetable entry covers. These then get relaxed during assembly emission. However, a large number of these instruction-range expressions are ranges within a fixed MCDataFragment, i.e. a range over fixed-size instructions that are not subject to relaxation at a later stage. Thus, we can compute the address-delta immediately, and not spend time and memory describing that computation so it can be deferred. AMDGPU: Add regression test for multiple frame index lowering (#140784) Failures appeared after https://github.com/llvm/llvm-project/pull/140587 but this case wasn't covered [lldb][core] Fix getting summary of a variable pointing to r/o memory (#139196) Motivation example: ``` > lldb -c altmain2.core ... (lldb) var F (const char *) F = 0x0804a000 "" ``` The variable `F` points to a read-only memory page not dumped to the core file, so `Process::ReadMemory()` cannot read the data. The patch switches to `Target::ReadMemory()`, which can read data both from the process memory and the application binary. Suppress errors from well-formed-testing type traits in SFINAE contexts (#135390) There are several type traits that produce a boolean value or type based on the well-formedness of some expression (more precisely, the immediate context, i.e. for example excluding nested template instantiation): * `__is_constructible` and variants, * `__is_convertible` and variants, * `__is_assignable` and variants, * `__reference_{binds_to,{constructs,converts}_from}_temporary`, * `__is_trivially_equality_comparable`, * `__builtin_common_type`. (It should be noted that the standard doesn't always base this on the immediate context being well-formed: for `std::common_type` it's based on whether some expression "denotes a valid type." But I assume that's an editorial issue and means the same thing.) Errors in the immediate context are suppressed, instead the type traits return another value or produce a different type if the expression is not well-formed. This is achieved using an `SFINAETrap` with `AccessCheckingSFINAE` set to true. If the type trait is used outside of an SFINAE context, errors are discarded because in that case the `SFINAETrap` sets `InNonInstantiationSFINAEContext`, which makes `isSFINAEContext` return an `optional(nullptr)`, which causes the errors to be discarded in `EmitDiagnostic`. However, in an SFINAE context this doesn't happen, and errors are added to `SuppressedDiagnostics` in the `TemplateDeductionInfo` returned by `isSFINAEContext`. Once we're done with deducing template arguments and have decided which template is going to be instantiated, the errors corresponding to the chosen template are then emitted. At this point we get errors from those type traits that we wouldn't have seen if used with the same arguments outside of an SFINAE context. That doesn't seem right. So what we want to do is always set `InNonInstantiationSFINAEContext` when evaluating these well-formed-testing type traits, regardless of whether we're in an SFINAE context or not. This should only affect the immediate context, as nested contexts add a new `CodeSynthesisContext` that resets `InNonInstantiationSFINAEContext` for the time it's active. Going through uses of `SFINAETrap` with `AccessCheckingSFINAE` = `true`, it occurred to me that all of them want this behavior and we can just use this parameter to decide whether to use a non-instantiation context. The uses are precisely the type traits mentioned above plus the `TentativeAnalysisScope`, where I think it is also fine. (Though I think we don't do tentative analysis in SFINAE contexts anyway.) Because the parameter no longer just sets `AccessCheckingSFINAE` in Sema but also `InNonInstantiationSFINAEContext`, I think it should be renamed (along with uses, which also point the reviewer to the affected places). Since we're testing for validity of some expression, `ForValidityCheck` seems to be a good name. The added tests should more or less correspond to the users of `SFINAETrap` with `AccessCheckingSFINAE` = `true`. I added a test for errors outside of the immediate context for only one type trait, because it requires some setup and is relatively noisy. We put the `ForValidityCheck` condition first because it's constant in all uses and this would then allow the compiler to prune the call to `isSFINAEContext` when true. Fixes #132044. [gn build] Manually port 8f03e1a Emit inbounds and nuw attributes in memref. (#138984) Now that MLIR accepts nuw and nusw in getelementptr, this patch emits the inbounds and nuw attributes when lower memref to LLVM in load and store operators. This patch also strengthens the memref.load and memref.store spec about undefined behaviour during lowering. This patch also lifts the |rewriter| parameter in getStridedElementPtr ahead so that LLVM::GEPNoWrapFlags can be added at the end with a default value and grouped together with other operators' parameters. Signed-off-by: Lin, Peiyong <linpyong@gmail.com> [llvm] Use llvm::is_contained (NFC) (#140742) [bugpoint] Use a range-based for loop (NFC) (#140743) [llvm] prepare explicit template instantiations in llvm/CodeGen for DLL export annotations (#140653) This patch prepares the llvm/CodeGen library for public interface annotations in support of an LLVM Windows DLL (shared library) build, tracked in #109483. The purpose of this patch is to make the upcoming codemod of this library more straight-forward. It is not expected to impact any functionality. The `LLVM_ABI` annotations will be added in a subsequent patch. These changes are required to build with visibility annotations using Clang and gcc on Linux/Darwin/etc; Windows DLL can build fine without them. This PR does four things in preparation for adding `LLVM_ABI` annotations to llvm/CodeGen: 1. Explicitly include `Machine.h` and `Function.h` headers from `MachinePassManager.cpp` so that `Function` and `Machine` types are available for the instantiations of `InnerAnalysisManagerProxy`. Without this change, Clang only will only export one of the templates after visibility annotations are added to them. Unclear if this is a Clang bug or expected behavior, but this change avoids the issue and should be harmless. 2. Refactor the definition of `MachineFunctionAnalysisManager` to its own header file. Without this change, it is not possible to add visibility annotations to the declaration with causing gcc to produce `-Wattribute` warnings. 3. Remove the redundant specialization of the `DominatorTreeBase<MachineBasicBlock, false>::addRoot` method. The specialization is the same as implemented in `DominatorTreeBase` so should be unnecessary. Without this change, it is not possible to annotate the subsequent instantiations of `DominatorTreeBase` in the header file without gcc producing `-Wattribute` warnings. Mark unspecialized `addRoot` as `inline` to match the removed specialized version. 4. Move the explicit instantiations of the `GenericDomTreeUpdater` template earlier in the header file. These need to appear before being used in the `MachineDomTreeUpdater` class definition or gcc will produce warnings once visibility annotations are added. The LLVM Windows DLL effort is tracked in #109483. Additional context is provided in [this discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307). Clang and gcc handle visibility attributes on explicit template instantiations a bit differently; gcc is pickier and generates `-Wattribute` warnings when an explicit instantiation with a visibility annotation appears after the type has already appeared in the translation unit. These warnings can be avoided by moving explicit template instantiations so they always appear first. Local builds and tests to validate cross-platform compatibility. This included llvm, clang, and lldb on the following configurations: - Windows with MSVC - Windows with Clang - Linux with GCC - Linux with Clang - Darwin with Clang [llvm-exegesis] Error instead of aborting on verification failure (#137581) This patch makes llvm-exegesis emit an error when the machine function fails in MachineVerification rather than aborting. This allows downstream users (particularly https://github.com/google/gematria) to handle these errors rather than having the entire process crash. This essentially be NFC from the user perspective minus the addition of the new error message. [x64][win] Add compiler support for x64 import call optimization (equivalent to MSVC /d2guardretpoline) (#126631) This is the x64 equivalent of #121516 Since import call optimization was originally [added to x64 Windows to implement a more efficient retpoline mitigation](https://techcommunity.microsoft.com/blog/windowsosplatform/mitigating-spectre-variant-2-with-retpoline-on-windows/295618) the section and constant names relating to this all mention "retpoline" and we need to mark indirect calls, control-flow guard calls and jumps for jump tables in the section alongside calls to imported functions. As with the AArch64 feature, this emits a new section into the obj which is used by the MSVC linker to generate the Dynamic Value Relocation Table and the section itself does not appear in the final binary. The Windows Loader requires a specific sequence of instructions be emitted when this feature is enabled: * Indirect calls/jumps must have the function pointer to jump to in `rax`. * Calls to imported functions must use the `rex` prefix and be followed by a 5-byte nop. * Indirect calls must be followed by a 3-byte nop. [NFC][CI] Reformat python files Looks like some of these were not properly formatted at some point. This patch reformats these files so that future diffs are cleaner when running the formatter over the whole file. [mlir][NFC] Simplify constant checks with isOneInteger and renamed isZeroInteger. (#139340) The revision adds isOneInteger helper, and simplifies the existing code with the two methods. It removes some lambda, which makes code cleaner. For downstream users, you can update the code with the below script. ```bash sed -i "s/isZeroIndex/isZeroInteger/g" **/*.h sed -i "s/isZeroIndex/isZeroInteger/g" **/*.cpp ``` --------- Signed-off-by: hanhanW <hanhan0912@gmail.com> [Attributor] Don't replace `addrspacecast (ptr null to ptr addrspace(x))` with `ptr addrspace(x) null` (#126779) `ConstantPointerNull` represents a pointer with value 0, but it doesn’t necessarily mean a `nullptr`. `ptr addrspace(x) null` is not the same as `addrspacecast (ptr null to ptr addrspace(x))` if the `nullptr` in AS X is not zero. Therefore, we can't simply replace it. Fixes #115083. [CIR][NFC] Eliminate ArgInfo structure (#140612) A previous refactoring had reduced the ArgInfo structure to contain a single member, the argument type. This change eliminates the ArgInfo structure entirely, instead just storing the argument type directly in places where ArgInfo had previously been used. This also updates the place where the arg types were previously being copied for a call to CIRGenFunctionInfo::Profile to instead use the stored argument types buffer directly and adds assertions where the calculated folding set ID is used to verify that any match was correct. [lldb][lldb-dap] show modules pane if supported by the adapter (#140603) Fixes #140589 Added logic to dynamically set the `lldb-dap.showModules` context based on the presence of modules in the debug session. [mlir][Vector] Improve `vector.mask` verifier (#139823) This PR improves the `vector.mask` verifier to make sure it's not applying masking semantics to operations defined outside of the `vector.mask` region. Documentation is updated to emphasize that and make it clearer, even though it already stated that. As part of this change, the logic that ensures that a terminator is present in the region mask has been simplified to make it less surprising to the user when a `vector.yield` is explicitly provided in the IR. [mlir] Check for int limits when converting gpu dims (#140747) When the upper_bound of a gpu dim op (like `gpu.block_dim`) is the maximum i32 integer value, the op conversion for it causes overflow by adding 1 to convert the bound from closed to open. This fixes the bug by clamping the open bound to the maximum i32 value. --------- Signed-off-by: Max Dawkins <max.dawkins@gmail.com> [AMDGPU][LowerBufferFatPointers] Handle addrspacecast null to p7 (#140775) Some application code operating on generic pointers (that then gete initialized to buffer fat pointers) may perform tests against nullptr. After address space inference, this results in comparisons against `addrspacecast (ptr null to ptr addrspace(7))`, which were crashing. However, while general casts to ptr addrspace(7) from generic pointers aren't supposted, it is possible to cast null pointers to the all-zerose bufer resource and 0 offset, which this patch adds. It also adds a TODO for casting _out_ of buffer resources, which isn't implemented here but could be. [AMDGPU] Add make.buffer.rsrc to InferAddressSpaces (#140770) make.buffer.rsrc can be subjected to address space inference. There's not _currently_ a reason to have this, but we might as well handle this in case it comes up. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com> [gn] port d561d595c4ee (clang riscv_andes_vector.td) [gn] fix mistake in f78a081cdb3 [gn build] Port 9260d310f1cb [gn build] Port a9ee8e4a454e [gn build] Port d561d595c4ee [lld][WebAssembly] Set the target-cpu in LTO config (#140010) I couldn't find an existing way to pass -mcpu=lime1 equivalent to LTO codegen. This commit would privide one. With this commit, you can do so by passing `-mllvm -mcpu=lime1` to wasm-ld. [BOLT,test] Add --image-base to tests that use --section-start When using -no-pie without a SECTIONS command, the linker uses the target's default image base. If -Ttext= or --section-start specifies an output section address below this base, the result is likely unintended. LLD will give a diagnostic (#140187) and may change the behavior in the future. It's good to set an explicit image base to avoid relying on its current behavior. BOLT doesn't seem to care whether a PT_PHDR segment is present. Pull Request: https://github.com/llvm/llvm-project/pull/140570 [GISel] Fix ShuffleVector assert (#139769) Fixes issue: https://github.com/llvm/llvm-project/issues/139752 When G_SHUFFLE_VECTOR has only 1 element then it is possible the vector is decayed into a scalar. [mlir] [liveness] Conservatively mark operands of return-like op inside non-callable and non-regionbranch op as live (#140793) Currently the liveness analysis always marks operands yielded in regions that aren't classified as `RegionBranchOpInterface` or `CallableOpInterface` as non-live. Examples for these ops include linalg.generic (with `linalg.yield` as terminator) or gpu ops (with `gpu.yield` as terminator). This in turn makes the `remove-dead-values` pass always incorrectly remove the bodies of these ops, leading to invalid IR. Because these ops define their own semantics, I have conservatively marked all operands of these yield ops to be live. [LoongArch] Remove wrong vector shuffle lowering for lasx. (#140688) PR https://github.com/llvm/llvm-project/pull/137918 introduces a wrong lowering for v4f64/v4i64 to generate xvshuf4i.d instruction. This PR reverts the wrong part of lasx. [lldb-dap] Avoid double 'new' events for dyld on Darwin (#140810) I got a bug report where a pedantic DAP client complains about getting two "new" module events for the same UUID. This is caused by the dyld transition from the on-disk dyld to the shared cache dyld, which share the same UUID. The transition is not generating an unloaded event (because we're not really unloading dyld) but we do get a loaded event (because the load address changed). This PR fixes the issue by relying on the modules set as the source of truth instead of relying on the event type. [flang][cuda] Allocate extra descriptor in managed memory when it is coming from device (#140818) [bazel][mlir] Add missing dep for 747620d (#140830) fixes the following errors: ERROR: /var/lib/buildkite-agent/.cache/bazel/_bazel_buildkite-agent/6a1efeb401da192d3572f00e2f11245b/external/llvm-project/mlir/BUILD.bazel:3410:11: Compiling mlir/lib/Dialect/XeGPU/Transforms/XeGPUWgToSgDistribute.cpp failed: (Exit 1): clang failed: error executing CppCompile command (from target @@llvm-project//mlir:XeGPUTransforms) /usr/lib/llvm-18/bin/clang -U_FORTIFY_SOURCE -fstack-protector -Wall -Wthread-safety -Wself-assign -Wunused-but-set-parameter -Wno-free-nonheap-object -fcolor-diagnostics -fno-omit-frame-pointer ... (remaining 130 arguments skipped) Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging external/llvm-project/mlir/lib/Dialect/XeGPU/Transforms/XeGPUWgToSgDistribute.cpp:11:10: error: module llvm-project//mlir:XeGPUTransforms does not depend on a module exporting 'mlir/Dialect/Arith/Utils/Utils.h' 11 | #include "mlir/Dialect/Arith/Utils/Utils.h" | ^ external/llvm-project/mlir/lib/Dialect/XeGPU/Transforms/XeGPUWgToSgDistribute.cpp:13:10: fatal error: 'mlir/Dialect/Index/IR/IndexDialect.h' file not found 13 | #include "mlir/Dialect/Index/IR/IndexDialect.h" | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2 errors generated. [Clang] Fix an inadvertent overwrite of sub-initializers (#140714) When using InitChecker with VerifyOnly, we create a new designated initializer to handle anonymous fields. However in the last call to CheckDesignatedInitializer, the subinitializer isn't properly preserved but it gets overwritten by the cloned one. Which causes the initializer to reference the dependent field, breaking assumptions when we initialize the instantiated specialization. Fixes https://github.com/llvm/llvm-project/issues/67173 [clang-format] Handle raw string literals containing JSON code (#140666) Fix #65400 [OpenMP][GPU][FIX] Enable generic barriers in single threaded contexts (#140786) The generic GPU barrier implementation checked if it was the main thread in generic mode to identify single threaded regions. This doesn't work since inside of a non-active (=sequential) parallel, that thread becomes the main thread of a team, and is not the main thread in generic mode. At least that is the implementation of the APIs today. To identify single threaded regions we now check the team size explicitly. This exposed three other issues; one is, for now, expected and not a bug, the second one is a bug and has a FIXME in the single_threaded_for_barrier_hang_1.c file, and the final one is also benign as described in the end. The non-bug issue comes up if we ever initialize a thread state. Afterwards we will never run any region in parallel. This is a little conservative, but I guess thread states are really bad for performance anyway. The bug comes up if we optimize single_threaded_for_barrier_hang_1 and execute it in Generic-SPMD mode. For some reason we loose all the updates to b. This looks very much like a compiler bug, but could also be another logic issue in the runtime. Needs to be investigated. Issue number 3 comes up if we have nested parallels inside of a target region. The clang SPMD-check logic gets confused, determines SPMD (which is fine) but picks an unreasonable thread count. This is all benign, I think, just weird: ``` #pragma omp target teams #pragma omp parallel num_threads(64) #pragma omp parallel num_threads(10) {} ``` Was launched with 10 threads, not 64. Revert "[AMDGPU] remove move instruction if there is no user of it (#136735)" This reverts commit 883afa4ef93d824ec11981ccad04af1cd1e4ce29 since it is not technically sound. [MLIR][NVVM] Add NVVMRequiresSM op traits (#126886) Motivation: Currently, the NVVMOps are not verified against the supported SM architectures. This can manifest as an ISel failure in the NVPTX LLVM backend during CodeGen to PTX ISA. This PR addresses this issue by adding verifier checks for Target-SM architectures in the NVVM Dialect itself, thereby catching the errors early on. Summary: * Parametric traits named `NVVMRequiresSM` and `NVVMRequiresSMa` are added to facilitate the version checks for typical and arch-accelerated versions respectively. * These traits can be attached to any NVVM Op to enable the checks for the particular Op. (example shown below) * An attribute interface called named `TargetAttrVerifyInterface` is added to the GPU dialect which any target attribute seeking to perform target-verification on the module can implement. * The checks are performed by the `NVVMTargetAttr` (implementing the `TargetAttrVerifyInterface` interface) when called from the GPU module verifier where it walks through the module and performs the checks for Ops with the `NVVMRequiresSM` traits. * A few Ops in `NVVMOps.td` have been updated to serve as examples. Example Usage: ``` def NVVM_ReduxOp : NVVM_Op<"redux.sync"> {...} ----> def NVVM_ReduxOp : NVVM_Op<"redux.sync", [NVVMRequiresSM<80>]> {...} def NVVM_WgmmaFenceAlignedOp : NVVM_Op<"wgmma.fence.aligned"> {...} ----> def NVVM_WgmmaFenceAlignedOp : NVVM_Op<"wgmma.fence.aligned", [NVVMRequiresSMa<[90]>]> {...} ``` --------- Co-authored-by: Guray Ozen <guray.ozen@gmail.com> [llvm-debuginfo-analyzer] Fix a couple of unhandled DWARF situations leading to a crash (#137221) This pull request fixes a couple of unhandled situations in DWARF input leading to a crash. Specifically, - If the DWARF input contains a declaration of a C variadic function (where `...` translates to `DW_TAG_unspecified_parameters`), which is then followed by a definition, `llvm_unreachable()` is hit in `LVScope::addMissingElements()`. This is only visible in Debug builds. - Parsing of instructions in `LVBinaryReader::createInstructions()` does not check whether `Offset` lies within the `Bytes` ArrayRef. A specially crafted DWARF input can lead to this condition. [llvm-mca] Drop const from a return type (NFC) (#140836) [polly] Drop const from return types (NFC) (#140837) [CodeGen] Avoid repeated hash lookups (NFC) (#140838) [DebugInfo] Use std::map::try_emplace (NFC) (#140839) This patch provides default member initialization for SymInfo, which in turns allows us to call std::map::try_emplace without the value. [CodeGen] Use range-based for loops (NFC) (#140840) [lldb-dap] fix disassembly request instruction offset handling (#140486) Fix the handling of the `instructionOffset` parameter, which resulted in always returning the wrong disassembly because VSCode always uses `instructionOffset = -50` and expects 50 instructions before the given address, instead of 50 bytes before [clang][bytecode] Optimize classify() further (#140735) Try to do as few checks as possible. Check for builtin types only once, then look at the BuiltinType Kind. For integers, we cache the int and long size, since those are used a lot and the ASTContext::getIntWidth() call is costly. [clang][bytecode] Initialize global strings via memcpy (#140789) If we know the char width is 1, we can just copy the data over instead of going through the Pointer API. add @skipIfWindows to unresolved disassemble test on windows (#140852) Fix https://lab.llvm.org/buildbot/#/builders/141/builds/8867 [analyzer][NFC] Move PrettyStackTraceLocationContext into dispatchWorkItem (#140035) [analyzer][NFC] Move PrettyStackTraceLocationContext into dispatchWorkItem This change helps with ensuring that the abstract machine call stack is only dumped exactly once no matter what checker callback we have the crash in. Note that `check::EndAnalysis` callbacks are resolved outside of `dispatchWorkItem`, but that's the only checker callback that is outside of `dispatchWorkItem`. CPP-6476 [LoongArch] Add patterns for vstelm instructions (#139201) [MLIR][PDL] Skip over all results in the PDL Bytecode if a Constraint/Rewrite failed (#139255) Skipping only over the first results leads to the curCodeIt pointing to the wrong location in the bytecode, causing the execution to continue with a wrong instruction after the Constraint/Rewrite. Signed-off-by: Rickert, Jonas <Jonas.Rickert@amd.com> [Bazel] Port a9ee8e4a454ec01fefba8829d2847527aa80623f [clang][NFC] Clean up ASTContext.cpp (#140847) Use BuiltinType::{isInteger,isSignedInteger,isUnsignedInteger} instead of doing the comparisons here. [mlir][SPIRV] Do not rewrite CompositeInsert for coopmatrix (#137837) When rewriting multiple CompositeInserts to CompositeConstruct, we need to know the number of elements of the result type. However, we cannot query the number of elements for cooperative matrix types. [clang-tools-extra] Remove redundant control flow statements (NFC) (#140846) [Bazel] Follow fixes for 9a553d3766aacb69e884823da92dedff264e3f0f [Bazel] Also adapt test/BUILD for 9a553d3766aacb69e884823da92dedff264e3f0f [llvm] Use *Map::try_emplace (NFC) (#140843) try_emplace can default-construct values, so we do not need to do so on our own. Plus, try_emplace(Key) is much shorter than insert(std::make_pair(Key, Value()). [llvm] Fix typos in documentation (#140844) [Clang] Fix a regression introduced by #140576 (#140859) Lambda bodies should not be treated as subexpressions of the enclosing scope. [VectorCombine] Scalarize binop-like intrinsics (#138095) Currently VectorCombine can scalarize vector compares and binary ops. This extends it to also scalarize binary-op like intrinsics like umax, minnum etc. The motivation behind this is to scalarize more intrinsics in VectorCombine rather than in DAGCombine, so we can sink splats across basic blocks: see #137786 This currently has very little effect on generated code because InstCombine doesn't yet canonicalize binary intrinsics where one operand is a constant into the form that VectorCombine expects, i.e. `binop (shuffle insert) const --> shuffle (binop insert const)`. The plan is to land this first and then in a subsequent patch teach InstCombine to do the canonicalization to avoid regressions in the meantime. This uses `isTriviallyVectorizable` to determine whether or not an intrinsic is safe to scalarize. There's also `isTriviallyScalarizable`, but this seems more geared towards the Scalarizer pass and includes intrinsics with multiple return values. It also only handles intrinsics with two operands with the same type as the return type. In the future we would generalize this to handle arbitrary numbers of operands, including unary operators too, e.g. fneg or fma, as well as different operand types, e.g. powi or scmp [X86] combineINSERT_SUBVECTOR - generalise insert_subvector(x,extract(broadcast)) -> blend (#140516) Don't match against specific broadcast nodes and let isShuffleEquivalent handle it [clang-tidy][NFC] Refactor `modernize-pass-by-value` check code and tests (#140753) - Deleted unused includes - Deleted useless braces - Modernized tests to use `CHECK-MESSAGES-NOT` and `CHECK-FIXES-NOT` for better readability and maintainability Add llvm-project archive issues for Chromium bug tracker (#132030) The Chromium bug tracker is in an archived state. The Security Response Group has preemptively created llvm-project GitHub issues with PDF copies of the Chromium issues should the repository become inaccessible. * Add URLs for redirects from https://bugs.chromium.org/p/llvm/issues/detail?id=X to https://issuetracker.google.com/issues/y * Add URLs to llvm-project archive issues. * Add an explanation of archive use. [mlir] Silence an unused variable warnings in builds without asserts. [libclc] Re-use shuffle_decl.inc in OpenCL shuffle2 declaration (#140679) Also internalize __clc_get_el_* symbols in clc_shuffle2. llvm-diff shows no change to amdgcn--amdhsa.bc. [NVPTX] Support the OpenCL generic addrspace feature by default (#137940) As best as I can see, all NVPTX architectures support the generic address space. I note there's a FIXME in the target's address space map about 'generic' still having to be added to the target but we haven't observed any issues with it downstream. The generic address space is mapped to the same target address space as default/private (0), but this isn't necessarily a problem for users. [MLIR][Doc] Add documentation for OpAsmAttr/TypeInterface (#140244) After the introduction of OpAsmAttr/TypeInterface in #121187 #124721, the documentation for them could be updated along side the doc for OpAsmDialectInterface. [mlir][tosa] Allow creation of reshape with unranked output (#140617) This commit allows reshape to be created with an unranked output, allowing it to be inferred by the shape inference pass. [AArch64] Split AArch64ISD::COND_SMSTART/STOP off AArch64::SMSTART/STOP (NFC) (#140711) The conditional variants of SMSTART/STOP currently take the current PStateSM as a variadic value. This is not supported by the verification added in #140472 (which requires variadic values to be of type Register or RegisterMask), so this patch splits the the conditional variants into new `COND_` nodes, where these extra parameters are fixed arguments. Suggested in https://github.com/llvm/llvm-project/pull/140472#discussion_r2094635066 Part of #140472. [libclc][NFC] Reuse inc file for OpenCL frexp decl [flang][OpenMP] fix diagnostic for bad cancel type (#140798) Fixes #133685 [AArch64] Remove unused ISD nodes (NFC) (#140706) Part of #140472. [libclc] Move all remquo address spaces to CLC library (#140871) Previously the OpenCL address space overloads of remquo would call into the one and only 'private' CLC remquo. This was an outlier compared with the other pointer-argumented maths builtins. This commit moves the definitions of all address space overloads to the CLC library to give more control over each address space to CLC implementers. There are some minor changes to the generated bytecode but it's simply moving IR instructions around. [C] Don't diagnose null pointer macros in -Wimplicit-void-ptr-cast (#140724) This silences the diagnostic when the right-hand side is a null pointer constant that comes from a macro expansion, such as NULL. However, we do not limit to just NULL because other custom macros may expand to an implicit void * cast in C while expanding to something else in C++. [mlir][memref][nfc] push early-exit to earlier (#140730) Move early exit check to as early as possible, quic_mabsar@quicinc.com [NFC] Ubsan a few corner cases for `=sanitize` (#140855) [LAA] Tweak debug output for UTC stability (#140764) UpdateTestChecks has a make_analyzer_generalizer to replace pointer addressess from the debug output of LAA with a pattern, which is an acceptable solution when there is one RUN line. However, when there are multiple RUN lines with a common pattern, UTC fails to recognize common output due to mismatched pointer addresses. Instead of hacking UTC scrub the output before comparing the outputs from the different RUN lines, fix the issue once and for all by making LAA not output unstable pointer addresses in the first place. The removal of the now-dead make_analyzer_generalizer is left as a non-trivial exercise for a follow-up. [analyzer] Add previous CFG block to BlockEntrance ProgramPoints (#140861) This helps to gain contextual information about how we entered a CFG block. The `noexprcrash.c` test probably changed due to the fact that now BlockEntrance ProgramPoint Profile also hashes the pointer of the previous CFG block. I didn't investigate. CPP-6483 [X86] lowerV8F32Shuffle - use lowerShufflePairAsUNPCKAndPermute on AVX1 targets (#140881) If we're not going to split the v8f32 shuffle anyway, attempt to match with lowerShufflePairAsUNPCKAndPermute [SPIRV] Addition of matrix multiply accumulate operands (#138665) --Added Matrix multiply accumulate operands for the extension SPV_INTEL_subgroup_matrix_multiply_accumulate InferAddressSpaces: Stop trying to insert pointer bitcasts (#140873) [X86] combineINSERT_SUBVECTOR - simplify aligned index assertion to avoid signed/unsigned warning. NFC. [utils][TableGen] Clean up code in DirectiveEmitter (#140772) Remove most redundant function calls. Unify enum identifier name generation (via getIdentifierName), and namespace qualification (via getQualifier). [OpenACC] rename private/firstprivate recipe attributes (#140719) Make private and firstprivate recipe attribute names consistent with reductionRecipes attribute [mlir][XeGPU] Add XeGPU Workgroup to Subgroup Distribution Pass (#140805) This PR adds the XeGPU workgroup (wg) to subgroup (sg) pass. The wg to sg pass transforms the xegpu wg level operations to subgroup operations based on the sg_layout and sg_data attribute. The PR adds transformation patterns for following Ops 1. CreateNdDesc 2. LoadNd 3. StoreNd 4. PrefetchNd 5. UpdateNdOffset 6. Dpas [LLVM][TableGen] Use StringRef for various members `CGIOperandList::OperandInfo` (#140625) - Change `Name`, `SubopNames`, `PrinterMethodName`, and `EncoderMethodNames` to be stored as StringRef. - Also changed `CheckComplexPatMatcher::Name` to StringRef as a fallout from the above. Verified that all the tablegen generated files within LLVM are unchanged. [LLVM][IR] Replace `unsigned >= ConstantDataFirstVal` with static_assert (#140827) `ConstantDataFirstVal` is 0, so `getValueID() >= ConstantDataFirstVal` leads to a compiler warning that the expression is always true. Replace such comparisons with a static_assert() to verify that `ConstantDataFirstVal` is 0, similar to the existing code in Value.h [NFC][Support] Apply clang-format to regcomp.c (#140769) Apply clang-format to regcomp.c since it's not conformant and leads to clang-format failures when doing individual changes to this file (for example in https://github.com/llvm/llvm-project/pull/140758). File generated by running `clang-format -i regcomp.c` [flang] add -floop-interchange and enable it with opt levels (#140182) Enable the use of -floop-interchange from the flang driver. Enable in flang LLVM's loop interchange at levels -O2, -O3, -Ofast, and -Os. [AMDGPU] PromoteAlloca: handle out-of-bounds GEP for shufflevector (#139700) This LLVM defect was identified via the AMD Fuzzing project. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com> [flang] fix ICE with ignore_tkr(tk) character in explicit interface (#140885) Some MPI libraries use character dummies + ignore(TKR) to allow passing any kind of buffer. This was meant to already be handled by #108168 However, when the library interface also had an argument requiring an explicit interface, `builder.convertWithSemantics` was not allowed to properly deal with the actual/dummy type mismatch and generated bad IR causing errors like: `'fir.convert' op invalid type conversion'!fir.ref' / '!fir.boxchar\<1\>'`. This restriction was artificial, lowering should just handle any cases allowed by semantics. Just remove it. [Clang] Set the final date for workaround for libstdc++'s `format_kind` (#140831) We can use 20250520 as the final date, see the following commits. - GCC releases/gcc-15 branch: - https://gcc.gnu.org/g:fedf81ef7b98e5c9ac899b8641bb670746c51205 - https://gcc.gnu.org/g:53680c1aa92d9f78e8255fbf696c0ed36f160650 - GCC master branch: - https://gcc.gnu.org/g:9361966d80f625c5accc25cbb439f0278dd8b278 - https://gcc.gnu.org/g:c65725eccbabf3b9b5965f27fff2d3b9f6c75930 Follows-up #139560. [llvm-debuginfo-analyzer] Support DW_TAG_module (#137228) - Adds support for `DW_TAG_module` DIEs and recurse over their children. Prior to this patch, entities hanging below `DW_TAG_module` were just not visible. This DIE kind is commonly generated by Objective-C modules. This patch will represent such entities, which will print as ``` [001] {CompileUnit} '/llvm/tools/clang/test/modules/<stdin>' [002] {Producer} 'LLVM version 3.7.0' {Directory} '/llvm/tools/clang/test/modules' {File} '<stdin>' [002] {Module} 'DebugModule' ``` The minimal test case included is just the result of ``` $ llc llvm/test/DebugInfo/X86/DIModule.ll -accel-tables=Dwarf -o llvm/unittests/DebugInfo/LogicalView/Inputs/test-dwarf-clang-module.o -filetype=obj ``` [clang][Sema] Declare builtins used in #pragma intrinsic (#138205) When trying to remove the usage of `__has_builtin` on MSVC CUDA ARM for some builtins, the recommended direction was to universally declare the MSVC builtins on all platforms and require the header providing declarations to be included. This was done [here](https://github.com/llvm/llvm-project/pull/128222). However, some MSVC headers already use the MSVC builtins without including the header, so we introduce a warning for anyone compiling with MSVC for this target, so the above change had to be reverted. The MSVC headers use `#pragma intrinsic` before the intrinsic uses and that seems to be enough for MSVC, so declare builtins when used in `#pragma intrinsic` in Clang to prevent the warning. --------- Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com> [clang-include-cleaner] Make cleanup attr report expr location (#140233) Instead of reporting the location of the attribute, let's report the location of the function reference that's passed to the cleanup attribute as the first argument. This is required as the attribute might be coming from a macro which means clang-include-cleaner skips the use as it gets attributed to the header file declaringt the macro and not to the main file. To make this work, we have to add a fake argument to the CleanupAttr constructor so we can pass in the original Expr alongside the function declaration. Fixes #140212 [clang-tidy] Add UnusedIncludes/MissingIncludes options to misc-include-cleaner (#140600) These mimick the same options from clangd and allow using the check to only check for unused includes or missing includes. [clang-tools-extra] Add include mappings for getopt.h (#140726) [VPlan] Move predication to VPlanTransform (NFC). (#128420) This patch moves the logic to predicate and linearize a VPlan to a dedicated VPlan transform. It mostly ports the existing logic directly. There are a number of follow-ups planned in the near future to further improve on the implementation: * Edge and block masks are cached in VPPredicator, but the block masks are still made available to VPRecipeBuilder, so they can be accessed during recipe construction. As a follow-up, this should be replaced by adding mask operands to all VPInstructions that need them and use that during recipe construction. * The mask caching in a map also means that this map needs updating each time a new recipe replaces a VPInstruction; this would also be handled by adding mask operands. PR: https://github.com/llvm/llvm-project/pull/128420 AMDGPU/GlobalISel: Start legalizing minimumnum and maximumnum (#140900) This is the bare minimum to get the intrinsic to compile for AMDGPU, and it's not optimal. We need to follow along closer with the existing G_FMINNUM/G_FMAXNUM with custom lowering to handle the IEEE=0 case better. Just re-use the existing lowering for the old semantics for G_FMINNUM/G_FMAXNUM. This does not change G_FMINNUM/G_FMAXNUM's treatment, nor try to handle the general expansion without an underlying min/max variant (or with G_FMINIMUM/G_FMAXIMUM). [Vectorize] Fix a warning This patch fixes: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8564:20: error: unused variable 'LoopRegionOf' [-Werror,-Wunused-variable] [NVPTX] Unify and extend barrier{.cta} intrinsic support (#140615) Our current intrinsic support for barrier intrinsics is confusing and incomplete, with multiple intrinsics mapping to the same instruction and intrinsic names not clearly conveying intrinsic semantics. Further, we lack support for some variants. This change unifies the IR representation to a single consistently named set of intrinsics. - llvm.nvvm.barrier.cta.sync.aligned.all(i32) - llvm.nvvm.barrier.cta.sync.aligned(i32, i32) - llvm.nvvm.barrier.cta.arrive.aligned(i32, i32) - llvm.nvvm.barrier.cta.sync.all(i32) - llvm.nvvm.barrier.cta.sync(i32, i32) - llvm.nvvm.barrier.cta.arrive(i32, i32) The following Auto-Upgrade rules are used to maintain compatibility with IR using the legacy intrinsics: * llvm.nvvm.barrier0 --> llvm.nvvm.barrier.cta.sync.aligned.all(0) * llvm.nvvm.barrier.n --> llvm.nvvm.barrier.cta.sync.aligned.all(x) * llvm.nvvm.bar.sync --> llvm.nvvm.barrier.cta.sync.aligned.all(x) * llvm.nvvm.barrier --> llvm.nvvm.barrier.cta.sync.aligned(x, y) * llvm.nvvm.barrier.sync --> llvm.nvvm.barrier.cta.sync.all(x) * llvm.nvvm.barrier.sync.cnt --> llvm.nvvm.barrier.cta.sync(x, y) [gn build] Port b263c08e1a0b [RISCV] Add MC layer support for XSfmm*. (#133031) This adds assembler/disassembler support for XSfmmbase 0.6 and related SiFive matrix multiplication extensions based on the spec here https://www.sifive.com/document-file/xsfmm-matrix-extensions-specification Functionality-wise, this is the same as the Zvma extension proposal that SiFive shared with the Attached Matrix Extension Task Group. The extension names and instruction mnemonics have been changed to use vendor prefixes. Note this is a non-conforming extension as the opcodes used here are in the standard opcode space in OP-V or OP-VE. --------- Co-authored-by: Brandon Wu <brandon.wu@sifive.com> [InstCombine] Enable more fabs fold when the user ignores sign bit of zero/NaN (#139861) When the only user of select is a fcmp or a fp operation with nnan/nsz, the sign bit of zero/NaN can be ignored. Alive2: https://alive2.llvm.org/ce/z/ZcxeIv Compile-time impact: https://llvm-compile-time-tracker.com/compare.php?from=7add1bcd02b1f72d580bb2e64a1fe4a8bdc085d9&to=cb419c7cbddce778673f3d4b414ed9b8064b8d6e&stat=instructions:u Closes https://github.com/llvm/llvm-project/issues/133367. [SCCPSolver] Make getMRVFunctionsTracked return a reference (NFC) (#140851) This patch makes getMRVFunctionsTracked return a reference. runIPSCCP, the sole user of getMRVFunctionsTracked, just needs a read-access to the map. The missing "&" is most likely an oversight as two "sibling" functions getTrackedRetVals and getTrackedGlobals return maps by const reference. [libc++] Optimize std::for_each_n for segmented iterators (#135468) This patch enhances the performance of `std::for_each_n` when used with segmented iterators, leading to significant performance improvements, summarized in the tables below. This addresses a subtask of https://github.com/llvm/llvm-project/issues/102817. [CIR] Add support for recursive record layouts (#140811) While processing members of a record, we try to create new record types as we encounter them, but if this would result in recursion (either because the type points to itself or because it points to a type that points back to the original type) we need to add it to a list for deferred processing. Previously, we issued an error saying this wasn't handled. This change adds the necessary handling. [libc++] Optimize bitset::to_string (#128832) This patch optimizes `bitset::to_string` by replacing the existing bit-by-bit processing with a more efficient bit traversal strategy. Instead of checking each bit sequentially, we leverage `std::__countr_zero` to efficiently locate the next set bit, skipping over consecutive zero bits. This greatly accelerates the conversion process, especially for sparse `bitset`s where zero bits dominate. To ensure similar improvements for dense `bitset`s, we exploit symmetry by inverting the bit pattern, allowing us to apply the same optimized traversal technique. Even for uniformly distributed `bitset`s, the proposed approach offers measurable performance gains over the existing implementation. Benchmarks demonstrate substantial improvements, achieving up to 13.5x speedup for sparse `bitset`s with `Pr(true bit) = 0.1`, 16.1x for dense `bitset`s with `Pr(true bit) = 0.9`, and 8.3x for uniformly distributed `bitset`s with `Pr(true bit) = 0.5)`. [ELF] Error if a section address is smaller than image base When using `-no-pie` without a `SECTIONS` command, the linker uses the target's default image base. If `-Ttext=` or `--section-start` specifies an output section address below this base, the result is likely unintended. - With `--no-rosegment`, the PT_LOAD segment covering the ELF header cannot include `.text` if `.text`'s address is too low, causing an `error: output file too large`. - With default `--rosegment`: - If a read-only section (e.g., `.rodata`) exists, a similar `error: output file too large` occurs. - Without read-only sections, the PT_LOAD segment covering the ELF header and program headers includes no sections, which is unusual and likely undesired. This also causes non-ascending PT_LOAD `p_vaddr` values related to the PT_LOAD that overlaps with PT_PHDR (#138584). To prevent these issues, report an error if a section address is below the image base and suggest `--image-base`. This check also applies when `--image-base` is explicitly set but is skipped when a `SECTIONS` command is used. Pull Request: https://github.com/llvm/llvm-project/pull/140187 Add live in for PrivateSegmentSize in GISel path (#139968) [clang][TableGen] Fix Duplicate Entries in TableGen (#140828) Fixed TableGen duplicate issues that causes the wrong interrupt attribute from being selected. resolves #140701 [gn build] Port 09c266b75db4 [KeyInstr][Clang] Add ApplyAtomGroup (#134632) This is a scoped helper similar to ApplyDebugLocation that creates a new source location atom group which instructions can be added to. A source atom is a source construct that is "interesting" for debug stepping purposes. We use an atom group number to track the instruction(s) that implement the functionality for the atom, plus backup instructions/source locations. This patch is part of a stack that teaches Clang to generate Key Instructions metadata for C and C++. RFC: https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668 The feature is only functional in LLVM if LLVM is built with CMake flag LLVM_EXPERIMENTAL_KEY_INSTRUCTIONs. Eventually that flag will be removed. [CIR][NFC] Fix an unused variable warning (#140783) This fixes a warning where a variable assigned in 'if' statement wasn't referenced again, and where else is used when 'if' has returns statement in the if-else statement [CIR][LLVMLowering] Upstream Bitcast lowering (#140774) This change adds support for lowering BitCastOp Reduce llvm-gsymutil memory usage (#140740) Same as https://github.com/llvm/llvm-project/pull/139907/ except there is now a special dovoidwork helper function. Previous approach with assert(f();return success;) failed tests for release builds, so I created a separate helper. Open to suggestions how to solve this more elegantly. Co-authored-by: Arslan Khabutdinov <akhabutdinov@fb.com> [libclc] Support the generic address space (#137183) This commit provides definitions of builtins with the generic address space. One concept to consider is the difference between supporting the generic address space from the user's perspective and the requirement for libclc as a compiler implementation detail to define separate generic address space builtins. In practice a target (like NVPTX) might notionally support the generic address space, but it's mapped to the same LLVM target address space as another address space (often the private one). In such cases libclc must be careful not to define both private and generic overloads of the same builtin. We track these two concepts separately, and make the assumption that if the generic address space does clash with another, it's with the private one. We track the concepts separately because there are some builtins such as atomics that are defined for the generic address space but not the private address space. Fix-forward excess ';' from 9459c8309c6768cf6aa7956885b2540e16582a93 (#134632) clang/lib/CodeGen/CGDebugInfo.cpp:153:2: error: extra ';' outside of a function is incompatible with C++98 [-Werror,-Wc++98-compat-extra-semi] 153 | }; | ^ 1 error generated. [lldb][lldb-dap][tests] Make sure evaluate test exists with no errors. (#140788) [AMDGPU] Fix scale opsel flags for scaled MFMA operations (#140183) Fix for src scale opsel flags encoding and ASM parsing for gfx950 scaled MFMA. [OpenACC] Stop trying to analyze invalid Var-Decls. The code to analyze VarDecls for the purpose of ensuring a magic-static isn't present in a 'routine' was getting confused/crashed because we create something that looks like a magic-static during error-recovery, but it is still an invalid decl. This patch causes us to just 'give up' in the case where the vardecl is already invalid. Fixes: #140920 [RISCV] Support scalable vectors for the zvqdotq lowering paths (#140922) This was an oversight in the original patch series. Without this change, the newly added tests fail assertions. Add macro to suppress -Wunnecessary-virtual-specifier (#139614) Followup to #138741. This adds the requested macro to silence `-Wunnecessary-virtual-specifier` when declaring virtual anchor functions in `final` classes, per [LLVM policy](https://llvm.org/docs/CodingStandards.html#provide-a-virtual-method-anchor-for-classes-in-headers). It also cleans up any remaining instances of the warning, allowing us to stop disabling it when we build LLVM. [flang] [cuda] implicitly set DEVICE attribute to scalars in device routines (#140834) Scalars inside device routines also need to implicitly set the DEVICE attribute, except for function results. [RISCV] Expand zvqdotq partial.reduce test variants Make sure to cover all the scalable types which are legal, plus splitting. Make sure to cover all instructions. Not duplicating vx testing at this time. Revert "[VPlan] Move predication to VPlanTransform (NFC). (#128420)" This reverts commit b263c08e1a0b54a871915930aa9a1a6ba205b099. Looks like this triggers a crash in one of the Fortran tests. Reverting while I investigate https://lab.llvm.org/buildbot/#/builders/41/builds/6825 [RISCV] Remove nsw/nuw from zvqdotq tests [nfc] As noted in review comment https://github.com/llvm/llvm-project/pull/140922#discussion_r2100838209, this aren't required Revert "Add macro to suppress -Wunnecessary-virtual-specifier (#139614)" This reverts commit 0954c9d487e7cb30673df9f0ac125f71320d2936. It breaks the build when built with gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04). [CIR] Upstream support for string literals (#140796) This adds the minimal support needed to handle string literals. [NVPTX] Remove Float register classes (#140487) These classes are redundant, as the untyped "Int" classes can be used for all float operations. This change is intended to be as minimal as possible and leaves the many potential simplifications and refactors this exposes as future work. [GlobalISel] Fix ZExt known bits for scalable vectors. (#140213) It was using the full size of the vector as the SrcBitWidth. This patch changes the code to split G_ASSERT_ZEXT away from the others (G_INTTOPTR / G_PTRTOINT / G_ZEXT / G_TRUNC) which are simpler, and make the code match the SDAG equivalent. [lldb] Add templated CompilerType::GetTypeSystem (NFC) (#140424) Add an overloaded `GetTypeSystem` to specify the expected type system subclass. Changes code from `GetTypeSystem().dyn_cast_or_null<TypeSystemClang>()` to `GetTypeSystem<TypeSystemClang>()`. [X86] combineINSERT_SUBVECTOR - use concatSubVectors instead of direct fold to X86ISD::SUBV_BROADCAST_LOAD (#140919) Use common helper and try to reduce the number of places we're generating load node directly. [TargetLowering] Use getExtractSubvector/getExtractVectorElt. NFC [lldb-dap] assembly breakpoints (#139969) * Support assembly source breakpoints * Change `sourceReference` to be the symbol load address for simplicity and consistency across threads/frames [Screencast From 2025-05-17 23-57-30.webm](https://github.com/user-attachments/assets/2e7c181d-42c1-4121-8f13-b180c19d0e33) [gn build] Port 793bb6b257fa [mlir] Translate nested debug information (#140915) This backports changes from Triton with the exception that for fused locations, use the first one with file info rather than just first. --------- Co-authored-by: Sergei Lebedev <slebedev@google.com> Co-authored-by: Keren Zhou <kerenzhou@openai.com> [HLSL] Update Sema Checking Diagnostics for builtins (#138429) Update how Sema Checking is done for HLSL builtins to allow for better error messages, mainly using 'err_builtin_invalid_arg_type'. Try to follow the formula outlined in issue #134721 Closes #134721 [flang][cuda] Use NVVM op for barrier0 intrinsic (#140947) The simple form of `Barrier0Op` is available in the NVVM dialect. It is needed to use it instead of the string version since https://github.com/llvm/llvm-project/pull/140615 [NFC][ADT/Support] Add {} for else when if body has {} (#140758) [CIR] Improve NYI message for emitCompoundStmtWithoutScope (#140945) This improves the error emitting for unhandled compound statements without scope by reporting the statement class that wasn't handled. [RISCV] Add tests for widening fixed vector masked loads/stores. NFC (#140949) [mlir][ROCDL] Add fp4 and fp6 conversion intrinsics, fix fp8 immargs (#140801) This PR adds support for the scaled conversion intrinsics for fp4 and fp6 types so that they can be targetted by a future amdgpu dialect op or used directly. Additionally, this patch refactors the copy-paste-heavy fp8 versions of these scaled conversion intrinsics with tablegen `foreach` loops, and fixes the fact that certain immargs weren't being stored as attributes. Note that some of the MLIR-level tests for those scaled fp8 intrinsics had incorrect return types, which have been fixed. (Note that while the operations have a known return type, the IR format still prints that type for clarity). [mlir][Vector][NFC] Run `extractInsertFoldConstantOp` earlier in the folder (#140814) This PR moves `extractInsertFoldConstantOp` earlier in the folder lists of `vector.extract` and `vector.insert`. Many folders require having non-dynamic indices so `extractInsertFoldConstantOp` is a requirement for them to trigger. [SCCPSolver] Mark several functions const (NFC) (#140926) [VPlan] Don't try to narrow predicated VPReplicateRecipe. We cannot convert predicated recipes to uniform ones at the moment. This fixes a crash reported for https://github.com/llvm/llvm-project/pull/139150. [LoopPeel] Add test for peeling last iteration with non-trivial BTC. Additional test to https://github.com/llvm/llvm-project/pull/140792 with different SCEV expansion costs. [HLSL][RootSignature] Add parsing for empty RootDescriptors (#140147) - define the RootDescriptor in-memory struct containing its type - add test harness for testing First part of https://github.com/llvm/llvm-project/issues/126577 [llvm] add GenericFloatingPointPredicateUtils (#140254) add `GenericFloatingPointPredicateUtils` in order to generalize effects of floating point comparisons on `KnownFPClass` for both IR and MIR. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com> [AMDGPU][True16][CodeGen] select vgpr16 for asm inline 16bit vreg (#140946) select vgpr16 for asm inline 16bit vreg in true16 mode [gn build] Port d00d74bb2564 [RISCV][TTI] Add test coverage for getPartialReductionCost [nfc] Adding testing in advance of a change to cost the zvqdotq instructions such that we emit them from LV. [LLVM] Use `reportFatalUsageError` for LTO usage errors (#140955) Usage errors in `LTOBackend.cpp` were previously, misleadingly, reported as internal crashes. This PR updates `LTOBackend.cpp` to use `reportFatalUsageError` for reporting usage-related issues. LLVM Issue: https://github.com/llvm/llvm-project/issues/140953 Internal Tracker: TOOLCHAIN-17744 [SelectionDAG][RISCV] Use VP_LOAD to widen MLOAD in type legalization when possible. (#140595) Padding the mask using 0 elements doesn't work for scalable vectors. Use VP_LOAD and change the VL instead. This fixes crash for Zve32x. Test file was split since i64 isn't a valid element type for Zve32x. Fixes #140198. Revert "[llvm] add GenericFloatingPointPredicateUtils (#140254)" (#140968) This reverts commit d00d74bb2564103ae3cb5ac6b6ffecf7e1cc2238. The PR breaks our buildbots and blocks downstream merge. [gn build] Port c47a5fbb229b [mlir][Vector] Move `vector.mask` canonicalization to folder (#140324) This MR moves the canonicalization that elides empty `vector.mask` ops to folders. [OpenMP][Flang] Fix OOB access for derived type mapping (#140948) [lldb] Skip TestConsecutiveWatchpoints.py if out of tree debugserver The GreenDragon CI bots are currently passing because the installed Xcode is a bit old, and doesn't have the watchpoint handling bug that was fixed April with this test being added. But on other CI running newer Xcode debugservers, this test will fail. Skip this test if we're using an out of tree debugserver. Revert #140650 and #140505 (#140973) This reverts commit 90daed32a82ad2695d27db285ac36f579f2b270e and 4cfbe55781cb8fb95568c9a8538912f68d2ff681. These changes exposed cyclic dependencies when LLVM is configured with modules `-DLLVM_ENABLE_MODULES=ON`. [RISCV] Correct operand names for vmv.s.x and vfmv.s.f pseudos. NFC (#140970) [AMDGPU] Fix computation of waves/EU maximum (#140921) This fixes an issue in the waves/EU range calculation wherein, if the `amdgpu-waves-per-eu` attribute exists and is valid, the entire attribute may be spuriously and completely ignored if workgroup sizes and LDS usage restrict the maximum achievable occupancy below the subtarget maximum. In such cases, we should still honor the requested minimum number of waves/EU, even if the requested maximum is higher than the actually achievable maximum (but still within subtarget specification). As such, the added unit test `empty_at_least_2_lds_limited`'s waves/EU range should be [2,4] after this patch, when it is currently [1,4] (i.e, as if `amdgpu-waves-per-eu` was not specified at all). Before e377dc4 the default maximum waves/EU was always set to the subtarget maximum, trivially avoiding the issue. [SelectionDAG] Simplify creation of getStoreVP in WidenVecOp_STORE. NFC We can use the offset from the original store instead of creating a new undef offset. We didn't check if the offset was undef already so we really shouldn't drop it if it isn't. [RISCV] Add Andes A25/AX25 processor definition (#140681) Andes A25/AX25 are 32/64bit, 5-stage pipeline, linux-capable CPUs that implement the RV[32|64]IMAFDC_Zba_Zbb_Zbc_Zbs ISA extensions. They are developed by Andes Technology https://www.andestech.com, a RISC-V IP provider. The overviews for A25/AX25: https://www.andestech.com/en/products-solutions/andescore-processors/riscv-a25/ https://www.andestech.com/en/products-solutions/andescore-processors/riscv-ax25/ Scheduling model will be implemented in a later PR. Revert "[Clang] Fix missed initializer instantiation bug for variable templates" (#140930) Reverts llvm/llvm-project#138122 The patch causes a regression and prevents compiling valid C++ code. The code was accepted by earlier versions of clang and GCC. See https://github.com/llvm/llvm-project/issues/140773 for details. [test] Fix dissassemble-entry-point.s for #140187 (#140978) similar to #140570 getting this error: exit status 1 ld.lld: error: section '.text' address (0x8074) is smaller than image base (0x10000); specify --image-base [clang] Mark some language options as benign. (#131569) I'm fairly certain that the options in this CL are benign, as I don't believe they affect the AST. * RTTI - shouldn't affect the AST, should only affect codegen * Trivial var init - also should only affect codegen * Stack protector - also codegen * Exceptions - Since exceptions do allow new things in the AST, but I'm pretty sure that they can differ in parent and child safely, I marked it as compatible instead. I welcome any input from someone more familiar with this than me, as I might be wrong. [clang-format][NFC] Minor efficiency cleanup (#140835) [RISCV] Add Xqcibi Select_GPR_Using_CC_<Imm> Pseudos to isSelectPseudo (#140698) Not adding them was leading to a crash when trying to expand these pseudo instructions. I've also fixed the register class types for the Xqcibi instructions in these pseudo instructions which was incorrect and was exposed by the machine verifier while running the test case added in this patch. Fixes #140697 [ConstraintElim] Do not allow overflows in `Decomposition` (#140541) Consider the following case: ``` define i1 @pr140481(i32 %x) { %cond = icmp slt i32 %x, 0 call void @llvm.assume(i1 %cond) %add = add nsw i32 %x, 5001000 %mul1 = mul nsw i32 %add, -5001000 %mul2 = mul nsw i32 %mul1, 5001000 %cmp2 = icmp sgt i32 %mul2, 0 ret i1 %cmp2 } ``` Before this patch, `decompose(%mul2)` returns `-25010001000000 * %x + 4052193514966861312`. Therefore, `%cmp2` will be simplified into true because `%x s< 0 && -25010001000000 * %x + 4052193514966861312 s<= 0` is unsat. It is incorrect since the offset `-25010001000000 * 5001000 -> 4052193514966861312` signed wraps. This patch treats a decomposition as invalid if overflows occur when computing coefficients. Closes https://github.com/llvm/llvm-project/issues/140481. [clang] Use llvm::find_if (NFC) (#140983) [BOLT] Use llvm::is_contained (NFC) (#140984) [mlir] Use llvm::is_contained (NFC) (#140986) [BOLT] Avoid creating a temporary instance of std::string (NFC) (#140987) lookupTarget takes StringRef and internally creates an instance of std::string with the StringRef as part of constructing Triple, so we don't need to create a temporary instance of std::string on our own. [IA] Add support for [de]interleave{3,5,7} (#139373) This adds support for lowering deinterleave and interleave intrinsics for factors 3 5 and 7 into target specific memory intrinsics. Notably this doesn't add support for handling higher factors constructed from interleaving interleave intrinsics, e.g. factor 6 from interleave3 + interleave2. I initially tried this but it became very complex very quickly. For example, because there's now multiple factors involved interleaveLeafValues is no longer symmetric between interleaving and deinterleaving. There's then also two ways of representing a factor 6 deinterleave: It can both be done as either 1 deinterleave3 and 3 deinterleave2s OR 1 deinterleave2 and 3 deinterleave3s. I'm not sure the complexity of supporting arbitrary factors is warranted given how we only need to support a small number of factors currently: SVE only needs factors 2,3,4 whilst RVV only needs 2,3,4,5,6,7,8. My preference would be to just add a interleave6 and deinterleave6 intrinsic to avoid all this ambiguity, but I'll defer this discussion to a later patch. [clang] Avoid creating temporary instances of std::string (NFC) (#140988) lookupTarget takes StringRef and internally creates an instance of std::string with the StringRef as part of constructing Triple, so we don't need to create temporary instances of std::string on our own. [lldb] Remove unused local variables (NFC) (#140989) [mlir] Remove unused local variables (NFC) (#140990) Revert "[LLVM] Use `reportFatalUsageError` for LTO usage errors" (#141000) The PR causes check-lld fail: >TEST 'lld :: COFF/lto-cache-errors.ll' Tested on local revert and pass the check. Reverts llvm/llvm-project#140955 Fix regression tests with bad FileCheck checks (#140373) Fixes https://github.com/llvm/llvm-project/issues/140149 [RISCV] Use print-enabled-extensions to check the extensions of Andes n45/nx45/a45/ax45 cpus. NFC. (#140979) Similarly to what #137725 did for the SiFive P870. [test] Improve linker-relaxable fixups tests The behavior will change once the assembler improves (#140692) [CMake] respect LLVMConfig.cmake's LLVM_DEFINITIONS in standalone builds (#138587) In #138329, _GNU_SOURCE was added for Cygwin, but when building Clang standalone against an installed LLVM this definition was not picked up, resulting in undefined strnlen. Follow the documentation in https://llvm.org/docs/CMake.html#embedding-llvm-in-your-project and add the LLVM_DEFINITIONS in standalone projects' cmakes. [LLVM][Cygwin] add workaround for blocking connect/accept in AF_UNIX sockets (#140353) On Cygwin, UNIX sockets involve a handshake between connect and accept to enable SO_PEERCRED/getpeereid handling. This necessitates accept being called before connect can return, but at least the tests in llvm/unittests/Support/raw_socket_stream_test do both on the same thread (first connect and then accept), resulting in a deadlock. Add a call to both places sockets are created that turns off the handshake (and SO_PEERCRED/getpeereid support). References: * https://github.com/cygwin/cygwin/blob/cec8a6680ea1fe38f38001b06c34ae355a785209/winsup/cygwin/fhandler/socket_local.cc#L1462-L1471 * https://inbox.sourceware.org/cygwin/Z_UERXFI1g-1v3p2@calimero.vinschen.de/T/#u [MC] Restore MCAsmBackend::shouldForceRelocation to false Revert the Target.getSpecifier implementation (38c3ad36be…

llvm-ci · May 25, 2025

LLVM Buildbot has detected a new failure on builder bolt-x86_64-ubuntu-clang running on bolt-worker while building llvm at step 6 "test-build-clang-bolt-stage2-clang-bolt".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/113/builds/7336

Here is the relevant piece of the build log for the reference

Step 6 (test-build-clang-bolt-stage2-clang-bolt) failure: test (failure)
...
924.532 [12/6/3198] Linking CXX static library lib/libclangExtractAPI.a
924.932 [12/5/3199] Linking CXX static library lib/libclangStaticAnalyzerCore.a
925.001 [12/4/3200] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/WebKit/ForwardDeclChecker.cpp.o
925.159 [12/3/3201] Building CXX object tools/clang/tools/driver/CMakeFiles/clang.dir/driver.cpp.o
926.135 [12/2/3202] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/WebKit/RetainPtrCtorAdoptChecker.cpp.o
927.647 [11/2/3203] Building CXX object tools/clang/tools/driver/CMakeFiles/clang.dir/cc1_main.cpp.o
927.744 [11/1/3204] Linking CXX static library lib/libclangStaticAnalyzerCheckers.a
927.792 [10/1/3205] Linking CXX static library lib/libclangStaticAnalyzerFrontend.a
927.805 [9/1/3206] Linking CXX static library lib/libclangFrontendTool.a
1009.453 [8/1/3207] Linking CXX executable bin/clang-21
FAILED: bin/clang-21 
: && /home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/./bin/clang++ -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wno-unnecessary-virtual-specifier -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -fprofile-instr-generate="/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/tools/clang/stage2-instrumented-bins/profiles/%4m.profraw" -flto=thin -fno-common -Woverloaded-virtual -Wno-nested-anon-types -O3 -DNDEBUG -Wl,--emit-relocs,-znow -fuse-ld=lld -Wl,--color-diagnostics -fprofile-instr-generate="/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/tools/clang/stage2-instrumented-bins/profiles/%4m.profraw" -flto=thin -Wl,--thinlto-cache-dir=/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/tools/clang/stage2-instrumented-bins/lto.cache   -Wl,--export-dynamic tools/clang/tools/driver/CMakeFiles/clang.dir/driver.cpp.o tools/clang/tools/driver/CMakeFiles/clang.dir/cc1_main.cpp.o tools/clang/tools/driver/CMakeFiles/clang.dir/cc1as_main.cpp.o tools/clang/tools/driver/CMakeFiles/clang.dir/cc1gen_reproducer_main.cpp.o tools/clang/tools/driver/CMakeFiles/clang.dir/clang-driver.cpp.o -o bin/clang-21  -Wl,-rpath,"\$ORIGIN/../lib:"  lib/libLLVMX86CodeGen.a  lib/libLLVMX86AsmParser.a  lib/libLLVMX86Desc.a  lib/libLLVMX86Disassembler.a  lib/libLLVMX86Info.a  lib/libLLVMAnalysis.a  lib/libLLVMCodeGen.a  lib/libLLVMCore.a  lib/libLLVMipo.a  lib/libLLVMAggressiveInstCombine.a  lib/libLLVMInstCombine.a  lib/libLLVMInstrumentation.a  lib/libLLVMMC.a  lib/libLLVMMCParser.a  lib/libLLVMObjCARCOpts.a  lib/libLLVMOption.a  lib/libLLVMScalarOpts.a  lib/libLLVMSupport.a  lib/libLLVMTargetParser.a  lib/libLLVMTransformUtils.a  lib/libLLVMVectorize.a  lib/libclangBasic.a  lib/libclangCodeGen.a  lib/libclangDriver.a  lib/libclangFrontend.a  lib/libclangFrontendTool.a  lib/libclangSerialization.a  lib/libLLVMAsmPrinter.a  lib/libLLVMMCDisassembler.a  lib/libclangCodeGen.a  lib/libLLVMCoverage.a  lib/libLLVMFrontendDriver.a  lib/libLLVMLTO.a  lib/libLLVMExtensions.a  lib/libLLVMPasses.a  lib/libLLVMCFGuard.a  lib/libLLVMGlobalISel.a  lib/libLLVMSelectionDAG.a  lib/libLLVMCodeGen.a  lib/libLLVMObjCARCOpts.a  lib/libLLVMCGData.a  lib/libLLVMCodeGenTypes.a  lib/libLLVMIRPrinter.a  lib/libLLVMTarget.a  lib/libLLVMCoroutines.a  lib/libLLVMipo.a  lib/libLLVMInstrumentation.a  lib/libLLVMVectorize.a  lib/libLLVMSandboxIR.a  lib/libLLVMBitWriter.a  lib/libLLVMLinker.a  lib/libLLVMHipStdPar.a  lib/libclangExtractAPI.a  lib/libclangInstallAPI.a  lib/libLLVMTextAPIBinaryReader.a  lib/libclangRewriteFrontend.a  lib/libclangStaticAnalyzerFrontend.a  lib/libclangStaticAnalyzerCheckers.a  lib/libclangStaticAnalyzerCore.a  lib/libclangCrossTU.a  lib/libclangIndex.a  lib/libclangFrontend.a  lib/libclangDriver.a  lib/libLLVMWindowsDriver.a  lib/libLLVMOption.a  lib/libclangParse.a  lib/libclangSerialization.a  lib/libclangSema.a  lib/libclangAnalysis.a  lib/libclangASTMatchers.a  lib/libclangAPINotes.a  lib/libclangEdit.a  lib/libclangAST.a  lib/libLLVMFrontendHLSL.a  lib/libclangSupport.a  lib/libclangFormat.a  lib/libclangToolingInclusions.a  lib/libclangToolingCore.a  lib/libclangRewrite.a  lib/libclangLex.a  lib/libclangBasic.a  lib/libLLVMFrontendOpenMP.a  lib/libLLVMScalarOpts.a  lib/libLLVMAggressiveInstCombine.a  lib/libLLVMInstCombine.a  lib/libLLVMFrontendOffloading.a  lib/libLLVMTransformUtils.a  lib/libLLVMObjectYAML.a  lib/libLLVMFrontendAtomic.a  lib/libLLVMAnalysis.a  lib/libLLVMProfileData.a  lib/libLLVMSymbolize.a  lib/libLLVMDebugInfoGSYM.a  lib/libLLVMDebugInfoDWARF.a  lib/libLLVMDebugInfoPDB.a  lib/libLLVMDebugInfoCodeView.a  lib/libLLVMDebugInfoMSF.a  lib/libLLVMDebugInfoBTF.a  lib/libLLVMObject.a  lib/libLLVMMCParser.a  lib/libLLVMMC.a  lib/libLLVMIRReader.a  lib/libLLVMBitReader.a  lib/libLLVMAsmParser.a  lib/libLLVMTextAPI.a  lib/libLLVMCore.a  lib/libLLVMBinaryFormat.a  lib/libLLVMTargetParser.a  lib/libLLVMRemarks.a  lib/libLLVMBitstreamReader.a  lib/libLLVMSupport.a  lib/libLLVMDemangle.a  -lrt  -ldl  -lm  /usr/lib/x86_64-linux-gnu/libz.so && :
ld.lld: /home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/llvm-project/llvm/include/llvm/Support/Casting.h:578: decltype(auto) llvm::cast(From*) [with To = llvm::VPWidenPHIRecipe; From = llvm::VPRecipeBase]: Assertion `isa<To>(Val) && "cast<Ty>() argument of incompatible type!"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Running pass "function<eager-inv>(float2int,lower-constant-intrinsics,chr,loop(loop-rotate<header-duplication;no-prepare-for-lto>,loop-deletion),loop-distribute,inject-tli-mappings,loop-vectorize<no-interleave-forced-only;no-vectorize-forced-only;>,infer-alignment,loop-load-elim,instcombine<max-iterations=1;no-verify-fixpoint>,simplifycfg<bonus-inst-threshold=1;forward-switch-cond;switch-range-to-icmp;switch-to-lookup;no-keep-loops;hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>,slp-vectorizer,vector-combine,instcombine<max-iterations=1;no-verify-fixpoint>,loop-unroll<O3>,transform-warning,sroa<preserve-cfg>,infer-alignment,instcombine<max-iterations=1;no-verify-fixpoint>,loop-mssa(licm<allowspeculation>),alignment-from-assumptions,loop-sink,instsimplify,div-rem-pairs,tailcallelim,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;hoist-loads-stores-with-cond-faulting;no-sink-common-insts;speculate-blocks;simplify-cond-branch;speculate-unpredictables>)" on module "lib/libclangSema.a(SemaConcept.cpp.o at 73478474)"
1.	Running pass "loop-vectorize<no-interleave-forced-only;no-vectorize-forced-only;>" on function "_ZN5clang4Sema22IsAtLeastAsConstrainedEPKNS_9NamedDeclEN4llvm15MutableArrayRefINS_20AssociatedConstraintEEES3_S7_Rb"
 #0 0x000056202d775240 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/bin/ld.lld+0x1b7c240)
 #1 0x000056202d77264f llvm::sys::RunSignalHandlers() (/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/bin/ld.lld+0x1b7964f)
 #2 0x000056202d77279a SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0
 #3 0x00007fdab0442520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #4 0x00007fdab04969fc __pthread_kill_implementation ./nptl/pthread_kill.c:44:76
 #5 0x00007fdab04969fc __pthread_kill_internal ./nptl/pthread_kill.c:78:10
 #6 0x00007fdab04969fc pthread_kill ./nptl/pthread_kill.c:89:10
 #7 0x00007fdab0442476 gsignal ./signal/../sysdeps/posix/raise.c:27:6
 #8 0x00007fdab04287f3 abort ./stdlib/abort.c:81:7
 #9 0x00007fdab042871b _nl_load_domain ./intl/loadmsgcat.c:1177:9
#10 0x00007fdab0439e96 (/lib/x86_64-linux-gnu/libc.so.6+0x39e96)
#11 0x000056202fd0f2af llvm::VPlanTransforms::introduceMasksAndLinearize(llvm::VPlan&, bool) (/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/bin/ld.lld+0x41162af)
#12 0x000056202fb57266 llvm::LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(llvm::VFRange&, llvm::LoopVersioning*) (/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/bin/ld.lld+0x3f5e266)
#13 0x000056202fb5989c llvm::LoopVectorizationPlanner::buildVPlansWithVPRecipes(llvm::ElementCount, llvm::ElementCount) (/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/bin/ld.lld+0x3f6089c)
#14 0x000056202fb5a323 llvm::LoopVectorizationPlanner::plan(llvm::ElementCount, unsigned int) (/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/bin/ld.lld+0x3f61323)
#15 0x000056202fb5c33a llvm::LoopVectorizePass::processLoop(llvm::Loop*) (/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/bin/ld.lld+0x3f6333a)
#16 0x000056202fb5efb1 llvm::LoopVectorizePass::runImpl(llvm::Function&) (/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/bin/ld.lld+0x3f65fb1)
#17 0x000056202fb5f606 llvm::LoopVectorizePass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/bin/ld.lld+0x3f66606)
#18 0x000056202e2d7286 llvm::detail::PassModel<llvm::Function, llvm::LoopVectorizePass, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) PassBuilder.cpp:0:0
#19 0x0000562030bac60f llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/bin/ld.lld+0x4fb360f)
#20 0x000056202e0aced6 llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) X86CodeGenPassBuilder.cpp:0:0
#21 0x0000562030bacb33 llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/bin/ld.lld+0x4fb3b33)
#22 0x000056202e0ad896 llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) X86CodeGenPassBuilder.cpp:0:0
#23 0x0000562030bae2ed llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/bin/ld.lld+0x4fb52ed)
#24 0x000056202e2c55dc runNewPMPasses(llvm::lto::Config const&, llvm::Module&, llvm::TargetMachine*, unsigned int, bool, llvm::ModuleSummaryIndex*, llvm::ModuleSummaryIndex const*) LTOBackend.cpp:0:0
#25 0x000056202e2c7032 llvm::lto::opt(llvm::lto::Config const&, llvm::TargetMachine*, unsigned int, llvm::Module&, bool, llvm::ModuleSummaryIndex*, llvm::ModuleSummaryIndex const*, std::vector<unsigned char, std::allocator<unsigned char>> const&) (/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/bin/ld.lld+0x26ce032)
#26 0x000056202e2c873e llvm::lto::thinBackend(llvm::lto::Config const&, unsigned int, std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, llvm::Module&, llvm::ModuleSummaryIndex const&, llvm::FunctionImporter::ImportMapTy const&, llvm::DenseMap<unsigned long, llvm::GlobalValueSummary*, llvm::DenseMapInfo<unsigned long, void>, llvm::detail::DenseMapPair<unsigned long, llvm::GlobalValueSummary*>> const&, llvm::MapVector<llvm::StringRef, llvm::BitcodeModule, llvm::DenseMap<llvm::StringRef, unsigned int, llvm::DenseMapInfo<llvm::StringRef, void>, llvm::detail::DenseMapPair<llvm::StringRef, unsigned int>>, llvm::SmallVector<std::pair<llvm::StringRef, llvm::BitcodeModule>, 0u>>*, bool, std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, std::vector<unsigned char, std::allocator<unsigned char>> const&)::'lambda'(llvm::Module&, llvm::TargetMachine*, std::unique_ptr<llvm::ToolOutputFile, std::default_delete<llvm::ToolOutputFile>>)::operator()(llvm::Module&, llvm::TargetMachine*, std::unique_ptr<llvm::ToolOutputFile, std::default_delete<llvm::ToolOutputFile>>) const LTOBackend.cpp:0:0
#27 0x000056202e2c95de llvm::lto::thinBackend(llvm::lto::Config const&, unsigned int, std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, llvm::Module&, llvm::ModuleSummaryIndex const&, llvm::FunctionImporter::ImportMapTy const&, llvm::DenseMap<unsigned long, llvm::GlobalValueSummary*, llvm::DenseMapInfo<unsigned long, void>, llvm::detail::DenseMapPair<unsigned long, llvm::GlobalValueSummary*>> const&, llvm::MapVector<llvm::StringRef, llvm::BitcodeModule, llvm::DenseMap<llvm::StringRef, unsigned int, llvm::DenseMapInfo<llvm::StringRef, void>, llvm::detail::DenseMapPair<llvm::StringRef, unsigned int>>, llvm::SmallVector<std::pair<llvm::StringRef, llvm::BitcodeModule>, 0u>>*, bool, std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, std::vector<unsigned char, std::allocator<unsigned char>> const&) (/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/bin/ld.lld+0x26d05de)
#28 0x000056202e2a7c65 (anonymous namespace)::InProcessThinBackend::runThinLTOBackendThread(std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, llvm::FileCache, unsigned int, llvm::BitcodeModule, llvm::ModuleSummaryIndex&, llvm::FunctionImporter::ImportMapTy const&, llvm::DenseSet<llvm::ValueInfo, llvm::DenseMapInfo<llvm::ValueInfo, void>> const&, std::map<unsigned long, llvm::GlobalValue::LinkageTypes, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, llvm::GlobalValue::LinkageTypes>>> const&, llvm::DenseMap<unsigned long, llvm::GlobalValueSummary*, llvm::DenseMapInfo<unsigned long, void>, llvm::detail::DenseMapPair<unsigned long, llvm::GlobalValueSummary*>> const&, llvm::MapVector<llvm::StringRef, llvm::BitcodeModule, llvm::DenseMap<llvm::StringRef, unsigned int, llvm::DenseMapInfo<llvm::StringRef, void>, llvm::detail::DenseMapPair<llvm::StringRef, unsigned int>>, llvm::SmallVector<std::pair<llvm::StringRef, llvm::BitcodeModule>, 0u>>&)::'lambda'(std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>)::operator()(std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>) const LTO.cpp:0:0
#29 0x000056202e2b6833 (anonymous namespace)::InProcessThinBackend::runThinLTOBackendThread(std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, llvm::FileCache, unsigned int, llvm::BitcodeModule, llvm::ModuleSummaryIndex&, llvm::FunctionImporter::ImportMapTy const&, llvm::DenseSet<llvm::ValueInfo, llvm::DenseMapInfo<llvm::ValueInfo, void>> const&, std::map<unsigned long, llvm::GlobalValue::LinkageTypes, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, llvm::GlobalValue::LinkageTypes>>> const&, llvm::DenseMap<unsigned long, llvm::GlobalValueSummary*, llvm::DenseMapInfo<unsigned long, void>, llvm::detail::DenseMapPair<unsigned long, llvm::GlobalValueSummary*>> const&, llvm::MapVector<llvm::StringRef, llvm::BitcodeModule, llvm::DenseMap<llvm::StringRef, unsigned int, llvm::DenseMapInfo<llvm::StringRef, void>, llvm::detail::DenseMapPair<llvm::StringRef, unsigned int>>, llvm::SmallVector<std::pair<llvm::StringRef, llvm::BitcodeModule>, 0u>>&) LTO.cpp:0:0
#30 0x000056202e2a5e58 std::_Function_handler<void (), std::_Bind<(anonymous namespace)::InProcessThinBackend::start(unsigned int, llvm::BitcodeModule, llvm::FunctionImporter::ImportMapTy const&, llvm::DenseSet<llvm::ValueInfo, llvm::DenseMapInfo<llvm::ValueInfo, void>> const&, std::map<unsigned long, llvm::GlobalValue::LinkageTypes, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, llvm::GlobalValue::LinkageTypes>>> const&, llvm::MapVector<llvm::StringRef, llvm::BitcodeModule, llvm::DenseMap<llvm::StringRef, unsigned int, llvm::DenseMapInfo<llvm::StringRef, void>, llvm::detail::DenseMapPair<llvm::StringRef, unsigned int>>, llvm::SmallVector<std::pair<llvm::StringRef, llvm::BitcodeModule>, 0u>>&)::'lambda'(llvm::BitcodeModule, llvm::ModuleSummaryIndex&, llvm::FunctionImporter::ImportMapTy const&, llvm::DenseSet<llvm::ValueInfo, llvm::DenseMapInfo<llvm::ValueInfo, void>> const&, std::map<unsigned long, llvm::GlobalValue::LinkageTypes, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, llvm::GlobalValue::LinkageTypes>>> const&, llvm::DenseMap<unsigned long, llvm::GlobalValueSummary*, llvm::DenseMapInfo<unsigned long, void>, llvm::detail::DenseMapPair<unsigned long, llvm::GlobalValueSummary*>> const&, llvm::MapVector<llvm::StringRef, llvm::BitcodeModule, llvm::DenseMap<llvm::StringRef, unsigned int, llvm::DenseMapInfo<llvm::StringRef, void>, llvm::detail::DenseMapPair<llvm::StringRef, unsigned int>>, llvm::SmallVector<std::pair<llvm::StringRef, llvm::BitcodeModule>, 0u>>&) (llvm::BitcodeModule, std::reference_wrapper<llvm::ModuleSummaryIndex>, std::reference_wrapper<llvm::FunctionImporter::ImportMapTy const>, std::reference_wrapper<llvm::DenseSet<llvm::ValueInfo, llvm::DenseMapInfo<llvm::ValueInfo, void>> const>, std::reference_wrapper<std::map<unsigned long, llvm::GlobalValue::LinkageTypes, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, llvm::GlobalValue::LinkageTypes>>> const>, std::reference_wrapper<llvm::DenseMap<unsigned long, llvm::GlobalValueSummary*, llvm::DenseMapInfo<unsigned long, void>, llvm::detail::DenseMapPair<unsigned long, llvm::GlobalValueSummary*>> const>, std::reference_wrapper<llvm::MapVector<llvm::StringRef, llvm::BitcodeModule, llvm::DenseMap<llvm::StringRef, unsigned int, llvm::DenseMapInfo<llvm::StringRef, void>, llvm::detail::DenseMapPair<llvm::StringRef, unsigned int>>, llvm::SmallVector<std::pair<llvm::StringRef, llvm::BitcodeModule>, 0u>>>)>>::_M_invoke(std::_Any_data const&) LTO.cpp:0:0
#31 0x000056202db691d2 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<std::function<void ()>>>, void>>::_M_invoke(std::_Any_data const&) BalancedPartitioning.cpp:0:0

This patch moves the logic to predicate and linearize a VPlan to a dedicated VPlan transform. It mostly ports the existing logic directly. There are a number of follow-ups planned in the near future to further improve on the implementation: * Edge and block masks are cached in VPPredicator, but the block masks are still made available to VPRecipeBuilder, so they can be accessed during recipe construction. As a follow-up, this should be replaced by adding mask operands to all VPInstructions that need them and use that during recipe construction. * The mask caching in a map also means that this map needs updating each time a new recipe replaces a VPInstruction; this would also be handled by adding mask operands. PR: llvm#128420

This reverts commit b263c08. Looks like this triggers a crash in one of the Fortran tests. Reverting while I investigate https://lab.llvm.org/buildbot/#/builders/41/builds/6825

…)" This reverts commit 793bb6b. The recommitted version contains a fix to make sure only the original phis are processed in convertPhisToBlends nu collecting them in a vector first. This fixes a crash when no mask is needed, because there is only a single incoming value. Original message: This patch moves the logic to predicate and linearize a VPlan to a dedicated VPlan transform. It mostly ports the existing logic directly. There are a number of follow-ups planned in the near future to further improve on the implementation: * Edge and block masks are cached in VPPredicator, but the block masks are still made available to VPRecipeBuilder, so they can be accessed during recipe construction. As a follow-up, this should be replaced by adding mask operands to all VPInstructions that need them and use that during recipe construction. * The mask caching in a map also means that this map needs updating each time a new recipe replaces a VPInstruction; this would also be handled by adding mask operands. PR: llvm#128420

This patch moves the logic to predicate and linearize a VPlan to a dedicated VPlan transform. It mostly ports the existing logic directly. There are a number of follow-ups planned in the near future to further improve on the implementation: * Edge and block masks are cached in VPPredicator, but the block masks are still made available to VPRecipeBuilder, so they can be accessed during recipe construction. As a follow-up, this should be replaced by adding mask operands to all VPInstructions that need them and use that during recipe construction. * The mask caching in a map also means that this map needs updating each time a new recipe replaces a VPInstruction; this would also be handled by adding mask operands. PR: llvm#128420

This reverts commit b263c08. Looks like this triggers a crash in one of the Fortran tests. Reverting while I investigate https://lab.llvm.org/buildbot/#/builders/41/builds/6825

…)" This reverts commit 793bb6b. The recommitted version contains a fix to make sure only the original phis are processed in convertPhisToBlends nu collecting them in a vector first. This fixes a crash when no mask is needed, because there is only a single incoming value. Original message: This patch moves the logic to predicate and linearize a VPlan to a dedicated VPlan transform. It mostly ports the existing logic directly. There are a number of follow-ups planned in the near future to further improve on the implementation: * Edge and block masks are cached in VPPredicator, but the block masks are still made available to VPRecipeBuilder, so they can be accessed during recipe construction. As a follow-up, this should be replaced by adding mask operands to all VPInstructions that need them and use that during recipe construction. * The mask caching in a map also means that this map needs updating each time a new recipe replaces a VPInstruction; this would also be handled by adding mask operands. PR: llvm#128420

fhahn requested review from rengolin, ayalz and aniragil February 23, 2025 14:04

llvmbot added vectorizers llvm:transforms labels Feb 23, 2025

fhahn mentioned this pull request Feb 28, 2025

[LV] Optionally preserve uniform branches when vectorizing #128187

Open

fhahn force-pushed the vplan-predication branch from 92e45cd to 915b55b Compare March 30, 2025 16:22

fhahn force-pushed the vplan-predication branch from 915b55b to a06af46 Compare April 5, 2025 13:19

fhahn force-pushed the vplan-predication branch from a06af46 to 7f61860 Compare April 28, 2025 12:35

fhahn mentioned this pull request Apr 28, 2025

[VPlan] Retain exit conditions and edges in initial VPlan (NFC). #137709

Merged

fhahn mentioned this pull request May 3, 2025

[VPlan] Handle early exit before forming regions. (NFC) #138393

Merged

fhahn force-pushed the vplan-predication branch 2 times, most recently from fcfde33 to 4129042 Compare May 10, 2025 11:47

fhahn added a commit that referenced this pull request May 10, 2025

[VPlan] Sink VPB2IRBB lookups to VPRecipeBuilder (NFC).

cfde685

This allows migrating some more code to be based on VPBBs in VPRecipeBuilder, in preparation for #128420.

fhahn added a commit that referenced this pull request May 11, 2025

[VPlan] Use VPBBs to look up masks for newly created recipes (NFC).

2acecfe

Update recipe construction to use VPBBs to look up masks, in preparation for #128420.

ayalz approved these changes May 20, 2025

View reviewed changes

fhahn added 2 commits May 21, 2025 12:30

Merge remote-tracking branch 'origin/main' into vplan-predication

91423f6

!fixup address latest comments, thanks

763d667

fhahn merged commit b263c08 into llvm:main May 21, 2025
11 checks passed

fhahn deleted the vplan-predication branch May 21, 2025 14:47

fhahn mentioned this pull request May 21, 2025

VP Recipe cast assertion error in loop vectorize #140931

Closed

	for (const auto &[Old, New] : Old2New)
	for (const auto &[Old, _] : Old2New)

	/// Build plain CFG for TheLoop and connects it to Plan's entry.
	/// Build plain CFG for TheLoop and connect it to Plan's entry.

		@@ -9488,7 +9267,8 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range,
		// latter are added above for masking.

	// Construct recipes for the instructions in the loop
	// Construct wide recipes and apply predication for original scalar VPInstructions in the loop.

	Builder.setInsertPoint(VPBB, VPBB->begin());
	Builder.setInsertPoint(VPBB, VPBB->getFirstNonPhi());

	/// \p Plan. If \p FoldTail is true, also create a mask guarding the loop
	/// \p Plan. If \p FoldTail is true, create a mask guarding the loop

	/// added to \p BlockMaskCache, which in turn will temporarily be used later
	/// added to \p BlockMaskCache in order to be used later

Search code, repositories, users, issues, pull requests...

[VPlan] Move predication to VPlanTransform (NFC). #128420

[VPlan] Move predication to VPlanTransform (NFC). #128420

Uh oh!

Conversation

fhahn commented Feb 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Feb 23, 2025

Uh oh!

llvmbot commented Feb 23, 2025

Uh oh!

github-actions bot commented Feb 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fhahn commented Mar 30, 2025

Uh oh!

fhahn commented May 18, 2025

Uh oh!

ayalz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ayalz commented May 20, 2025

Uh oh!

Uh oh!

llvm-ci commented May 21, 2025

Uh oh!

llvm-ci commented May 21, 2025

Uh oh!

llvm-ci commented May 21, 2025

Uh oh!

llvm-ci commented May 21, 2025

Uh oh!

llvm-ci commented May 21, 2025

Uh oh!

llvm-ci commented May 21, 2025

Uh oh!

fhahn commented May 21, 2025

Uh oh!

llvm-ci commented May 21, 2025

Uh oh!

llvm-ci commented May 21, 2025

fhahn commented Feb 23, 2025 •

edited

Loading

github-actions bot commented Feb 23, 2025 •

edited

Loading