Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

[IR] Add CallBr intrinsics support #133907

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: main
Choose a base branch
Loading
from
Open

Conversation

ro-i
Copy link
Contributor

@ro-i ro-i commented Apr 1, 2025

This commit adds support for using intrinsics with callbr. The uses of this will most of the time look like this example:

  callbr void @llvm.amdgcn.kill(i1 %c) to label %cont [label %kill]
kill:
  unreachable
cont:
  ...

@arsenm

@ro-i ro-i requested a review from arsenm April 1, 2025 13:07
@arsenm arsenm added the llvm:ir label Apr 2, 2025
@llvmbot
Copy link
Member

llvmbot commented Apr 2, 2025

@llvm/pr-subscribers-llvm-ir

Author: Robert Imschweiler (ro-i)

Changes

This commit adds support for using intrinsics with callbr. The uses of this will most of the time look like this example:

  callbr void @<!-- -->llvm.amdgcn.kill(i1 %c) to label %cont [label %kill]
kill:
  unreachable
cont:
  ...

@arsenm


Full diff: https://github.com/llvm/llvm-project/pull/133907.diff

8 Files Affected:

  • (modified) llvm/include/llvm/Analysis/RegionInfoImpl.h (+12)
  • (modified) llvm/include/llvm/Transforms/Utils/BasicBlockUtils.h (+3-3)
  • (modified) llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp (+58-2)
  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (+35-11)
  • (modified) llvm/lib/IR/Verifier.cpp (+24-5)
  • (modified) llvm/lib/Transforms/Scalar/StructurizeCFG.cpp (+5-6)
  • (modified) llvm/lib/Transforms/Utils/BasicBlockUtils.cpp (+2-2)
  • (added) llvm/test/CodeGen/AMDGPU/callbr.ll (+70)
diff --git a/llvm/include/llvm/Analysis/RegionInfoImpl.h b/llvm/include/llvm/Analysis/RegionInfoImpl.h
index eb99d8bc6fb23..759e9c47bebb8 100644
--- a/llvm/include/llvm/Analysis/RegionInfoImpl.h
+++ b/llvm/include/llvm/Analysis/RegionInfoImpl.h
@@ -553,6 +553,18 @@ bool RegionInfoBase<Tr>::isRegion(BlockT *entry, BlockT *exit) const {
 
   using DST = typename DomFrontierT::DomSetType;
 
+  // TODO? post domination frontier?
+  if constexpr (std::is_same_v<BlockT, BasicBlock>) {
+    if (DomTreeNodeT *PDTNode = PDT->getNode(exit); PDTNode) {
+      for (DomTreeNodeT *PredNode : *PDTNode) {
+        for (BasicBlock *Pred : predecessors(PredNode->getBlock())) {
+          if (isa<CallBrInst>(Pred->getTerminator()))
+            return false;
+        }
+      }
+    }
+  }
+
   DST *entrySuccs = &DF->find(entry)->second;
 
   // Exit is the header of a loop that contains the entry. In this case,
diff --git a/llvm/include/llvm/Transforms/Utils/BasicBlockUtils.h b/llvm/include/llvm/Transforms/Utils/BasicBlockUtils.h
index 6faff3d1fd8e3..59143d235eb93 100644
--- a/llvm/include/llvm/Transforms/Utils/BasicBlockUtils.h
+++ b/llvm/include/llvm/Transforms/Utils/BasicBlockUtils.h
@@ -606,9 +606,9 @@ bool SplitIndirectBrCriticalEdges(Function &F, bool IgnoreBlocksWithoutPHI,
 // successors
 void InvertBranch(BranchInst *PBI, IRBuilderBase &Builder);
 
-// Check whether the function only has simple terminator:
-// br/brcond/unreachable/ret
-bool hasOnlySimpleTerminator(const Function &F);
+// Check whether the function only has blocks with simple terminators:
+// br/brcond/unreachable/ret (or callbr if AllowCallBr)
+bool hasOnlySimpleTerminator(const Function &F, bool AllowCallBr = true);
 
 // Returns true if these basic blocks belong to a presplit coroutine and the
 // edge corresponds to the 'default' case in the switch statement in the
diff --git a/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp b/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
index f8afb42bf5535..0f698375ad6cf 100644
--- a/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
@@ -3009,8 +3009,64 @@ bool IRTranslator::translateInvoke(const User &U,
 
 bool IRTranslator::translateCallBr(const User &U,
                                    MachineIRBuilder &MIRBuilder) {
-  // FIXME: Implement this.
-  return false;
+  const CallBrInst &I = cast<CallBrInst>(U);
+  MachineBasicBlock *CallBrMBB = &MIRBuilder.getMBB();
+
+  // TODO: operand bundles (see SelDAG implementation of callbr)?
+  assert(!I.hasOperandBundles() &&
+         "Cannot lower callbrs with operand bundles yet");
+
+  if (I.isInlineAsm()) {
+    // FIXME: inline asm not yet supported
+    if (!translateInlineAsm(I, MIRBuilder))
+      return false;
+  } else if (I.getIntrinsicID() != Intrinsic::not_intrinsic) {
+    switch (I.getIntrinsicID()) {
+    default:
+      report_fatal_error("Unsupported intrinsic for callbr");
+    case Intrinsic::amdgcn_kill:
+      if (I.getNumIndirectDests() != 1)
+        report_fatal_error(
+            "amdgcn.kill supportes exactly one indirect destination");
+      CallInst *CI =
+          CallInst::Create(I.getFunctionType(), I.getCalledFunction(),
+                           SmallVector<Value *, 1>(I.args()));
+      bool Success = translateCall(*CI, MIRBuilder);
+      CI->deleteValue();
+      if (!Success)
+        return false;
+      break;
+    }
+  } else {
+    report_fatal_error("Only know how to handle inlineasm/intrinsic callbr");
+  }
+
+  // Retrieve successors.
+  SmallPtrSet<BasicBlock *, 8> Dests;
+  Dests.insert(I.getDefaultDest());
+  MachineBasicBlock *Return = &getMBB(*I.getDefaultDest());
+
+  // Update successor info.
+  addSuccessorWithProb(CallBrMBB, Return, BranchProbability::getOne());
+  // TODO: For most of the cases where there is an intrinsic callbr, we're
+  // having exactly one indirect target, which will be unreachable. As soon as
+  // this changes, we might need to enhance
+  // Target->setIsInlineAsmBrIndirectTarget or add something similar for
+  // intrinsic indirect branches.
+  if (I.isInlineAsm()) {
+    for (BasicBlock *Dest : I.getIndirectDests()) {
+      MachineBasicBlock *Target = &getMBB(*Dest);
+      Target->setIsInlineAsmBrIndirectTarget();
+      Target->setMachineBlockAddressTaken();
+      Target->setLabelMustBeEmitted();
+      // Don't add duplicate machine successors.
+      if (Dests.insert(Dest).second)
+        addSuccessorWithProb(CallBrMBB, Target, BranchProbability::getZero());
+    }
+  }
+  CallBrMBB->normalizeSuccProbs();
+
+  return true;
 }
 
 bool IRTranslator::translateLandingPad(const User &U,
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 6db2a5ffbfb84..c9501128cd593 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -3385,8 +3385,26 @@ void SelectionDAGBuilder::visitCallBr(const CallBrInst &I) {
              {LLVMContext::OB_deopt, LLVMContext::OB_funclet}) &&
          "Cannot lower callbrs with arbitrary operand bundles yet!");
 
-  assert(I.isInlineAsm() && "Only know how to handle inlineasm callbr");
-  visitInlineAsm(I);
+  if (I.isInlineAsm()) {
+    visitInlineAsm(I);
+  } else if (I.getIntrinsicID() != Intrinsic::not_intrinsic) {
+    switch (I.getIntrinsicID()) {
+    default:
+      report_fatal_error("Unsupported intrinsic for callbr");
+    case Intrinsic::amdgcn_kill:
+      if (I.getNumIndirectDests() != 1)
+        report_fatal_error(
+            "amdgcn.kill supportes exactly one indirect destination");
+      CallInst *CI =
+          CallInst::Create(I.getFunctionType(), I.getCalledFunction(),
+                           SmallVector<Value *, 1>(I.args()));
+      visitCall(*CI);
+      CI->deleteValue();
+      break;
+    }
+  } else {
+    report_fatal_error("Only know how to handle inlineasm/intrinsic callbr");
+  }
   CopyToExportRegsIfNeeded(&I);
 
   // Retrieve successors.
@@ -3396,15 +3414,21 @@ void SelectionDAGBuilder::visitCallBr(const CallBrInst &I) {
 
   // Update successor info.
   addSuccessorWithProb(CallBrMBB, Return, BranchProbability::getOne());
-  for (unsigned i = 0, e = I.getNumIndirectDests(); i < e; ++i) {
-    BasicBlock *Dest = I.getIndirectDest(i);
-    MachineBasicBlock *Target = FuncInfo.getMBB(Dest);
-    Target->setIsInlineAsmBrIndirectTarget();
-    Target->setMachineBlockAddressTaken();
-    Target->setLabelMustBeEmitted();
-    // Don't add duplicate machine successors.
-    if (Dests.insert(Dest).second)
-      addSuccessorWithProb(CallBrMBB, Target, BranchProbability::getZero());
+  // TODO: For most of the cases where there is an intrinsic callbr, we're
+  // having exactly one indirect target, which will be unreachable. As soon as
+  // this changes, we might need to enhance
+  // Target->setIsInlineAsmBrIndirectTarget or add something similar for
+  // intrinsic indirect branches.
+  if (I.isInlineAsm()) {
+    for (BasicBlock *Dest : I.getIndirectDests()) {
+      MachineBasicBlock *Target = FuncInfo.getMBB(Dest);
+      Target->setIsInlineAsmBrIndirectTarget();
+      Target->setMachineBlockAddressTaken();
+      Target->setLabelMustBeEmitted();
+      // Don't add duplicate machine successors.
+      if (Dests.insert(Dest).second)
+        addSuccessorWithProb(CallBrMBB, Target, BranchProbability::getZero());
+    }
   }
   CallBrMBB->normalizeSuccProbs();
 
diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp
index ed86a10c3a25f..fbf6e087177c6 100644
--- a/llvm/lib/IR/Verifier.cpp
+++ b/llvm/lib/IR/Verifier.cpp
@@ -3249,11 +3249,30 @@ void Verifier::visitIndirectBrInst(IndirectBrInst &BI) {
 }
 
 void Verifier::visitCallBrInst(CallBrInst &CBI) {
-  Check(CBI.isInlineAsm(), "Callbr is currently only used for asm-goto!", &CBI);
-  const InlineAsm *IA = cast<InlineAsm>(CBI.getCalledOperand());
-  Check(!IA->canThrow(), "Unwinding from Callbr is not allowed");
+  if (!CBI.isInlineAsm()) {
+    switch (CBI.getIntrinsicID()) {
+    case Intrinsic::amdgcn_kill: {
+      Check(CBI.getNumIndirectDests() == 1,
+            "Callbr amdgcn_kill only supports one indirect dest");
+      bool Unreachable = isa<UnreachableInst>(CBI.getIndirectDest(0)->begin());
+      CallInst *Call = dyn_cast<CallInst>(CBI.getIndirectDest(0)->begin());
+      Check(Unreachable || (Call && Call->getIntrinsicID() ==
+                                        Intrinsic::amdgcn_unreachable),
+            "Callbr amdgcn_kill indirect dest needs to be unreachable");
+      visitIntrinsicCall(Intrinsic::amdgcn_kill, CBI);
+      break;
+    }
+    default:
+      CheckFailed(
+          "Callbr currently only supports asm-goto and selected intrinsics");
+    }
+    visitIntrinsicCall(CBI.getIntrinsicID(), CBI);
+  } else {
+    const InlineAsm *IA = cast<InlineAsm>(CBI.getCalledOperand());
+    Check(!IA->canThrow(), "Unwinding from Callbr is not allowed");
 
-  verifyInlineAsmCall(CBI);
+    verifyInlineAsmCall(CBI);
+  }
   visitTerminator(CBI);
 }
 
@@ -5211,7 +5230,7 @@ void Verifier::visitInstruction(Instruction &I) {
              (CBI && &CBI->getCalledOperandUse() == &I.getOperandUse(i)) ||
              IsAttachedCallOperand(F, CBI, i)),
             "Cannot take the address of an intrinsic!", &I);
-      Check(!F->isIntrinsic() || isa<CallInst>(I) ||
+      Check(!F->isIntrinsic() || isa<CallInst>(I) || isa<CallBrInst>(I) ||
                 F->getIntrinsicID() == Intrinsic::donothing ||
                 F->getIntrinsicID() == Intrinsic::seh_try_begin ||
                 F->getIntrinsicID() == Intrinsic::seh_try_end ||
diff --git a/llvm/lib/Transforms/Scalar/StructurizeCFG.cpp b/llvm/lib/Transforms/Scalar/StructurizeCFG.cpp
index d1054b9b045ca..bdd8b5fbb3212 100644
--- a/llvm/lib/Transforms/Scalar/StructurizeCFG.cpp
+++ b/llvm/lib/Transforms/Scalar/StructurizeCFG.cpp
@@ -486,11 +486,10 @@ void StructurizeCFG::analyzeLoops(RegionNode *N) {
   } else {
     // Test for successors as back edge
     BasicBlock *BB = N->getNodeAs<BasicBlock>();
-    BranchInst *Term = cast<BranchInst>(BB->getTerminator());
-
-    for (BasicBlock *Succ : Term->successors())
-      if (Visited.count(Succ))
-        Loops[Succ] = BB;
+    if (BranchInst *Term = dyn_cast<BranchInst>(BB->getTerminator()); Term)
+      for (BasicBlock *Succ : Term->successors())
+        if (Visited.count(Succ))
+          Loops[Succ] = BB;
   }
 }
 
@@ -522,7 +521,7 @@ void StructurizeCFG::gatherPredicates(RegionNode *N) {
 
   for (BasicBlock *P : predecessors(BB)) {
     // Ignore it if it's a branch from outside into our region entry
-    if (!ParentRegion->contains(P))
+    if (!ParentRegion->contains(P) || !dyn_cast<BranchInst>(P->getTerminator()))
       continue;
 
     Region *R = RI->getRegionFor(P);
diff --git a/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp b/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
index ce5bf0c7207c7..3090f65fac627 100644
--- a/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
+++ b/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
@@ -1907,11 +1907,11 @@ void llvm::InvertBranch(BranchInst *PBI, IRBuilderBase &Builder) {
   PBI->swapSuccessors();
 }
 
-bool llvm::hasOnlySimpleTerminator(const Function &F) {
+bool llvm::hasOnlySimpleTerminator(const Function &F, bool AllowCallBr) {
   for (auto &BB : F) {
     auto *Term = BB.getTerminator();
     if (!(isa<ReturnInst>(Term) || isa<UnreachableInst>(Term) ||
-          isa<BranchInst>(Term)))
+          isa<BranchInst>(Term) || (AllowCallBr && isa<CallBrInst>(Term))))
       return false;
   }
   return true;
diff --git a/llvm/test/CodeGen/AMDGPU/callbr.ll b/llvm/test/CodeGen/AMDGPU/callbr.ll
new file mode 100644
index 0000000000000..e2e84dca96cbf
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/callbr.ll
@@ -0,0 +1,70 @@
+; RUN: rm -rf %t && split-file %s %t
+; RUN: llc -mtriple=amdgcn -mcpu=gfx90a -o %t/with-callbr-seldag.s < %t/with-callbr.ll
+; RUN: FileCheck --check-prefix=SELDAG %s < %t/with-callbr-seldag.s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx90a -o %t/with-callbr-gisel.s -global-isel < %t/with-callbr.ll
+; RUN: FileCheck --check-prefix=GISEL %s < %t/with-callbr-gisel.s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx90a -o %t/without-callbr-seldag.s < %t/without-callbr.ll
+; RUN: llc -mtriple=amdgcn -mcpu=gfx90a -o %t/without-callbr-gisel.s -global-isel < %t/without-callbr.ll
+; RUN: diff %t/with-callbr-seldag.s %t/without-callbr-seldag.s
+; RUN: diff %t/with-callbr-gisel.s %t/without-callbr-gisel.s
+
+;--- with-callbr.ll
+
+; SELDAG-LABEL: test_kill:
+; SELDAG-NEXT:  ; %bb.0:
+; SELDAG-NEXT:      s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; SELDAG-NEXT:      flat_load_dword v0, v[0:1]
+; SELDAG-NEXT:      v_and_b32_e32 v1, 1, v4
+; SELDAG-NEXT:      v_cmp_eq_u32_e32 vcc, 1, v1
+; SELDAG-NEXT:      s_mov_b64 s[4:5], exec
+; SELDAG-NEXT:      s_andn2_b64 s[6:7], exec, vcc
+; SELDAG-NEXT:      s_andn2_b64 s[4:5], s[4:5], s[6:7]
+; SELDAG-NEXT:      s_cbranch_scc0 .LBB0_2
+; SELDAG-NEXT:  ; %bb.1:
+; SELDAG-NEXT:      s_and_b64 exec, exec, s[4:5]
+; SELDAG-NEXT:      s_waitcnt vmcnt(0) lgkmcnt(0)
+; SELDAG-NEXT:      flat_store_dword v[2:3], v0
+; SELDAG-NEXT:      s_waitcnt vmcnt(0) lgkmcnt(0)
+; SELDAG-NEXT:      s_setpc_b64 s[30:31]
+; SELDAG-NEXT:  .LBB0_2:
+; SELDAG-NEXT:      s_mov_b64 exec, 0
+; SELDAG-NEXT:      s_endpgm
+
+; GISEL-LABEL: test_kill:
+; GISEL-NEXT:  ; %bb.0:
+; GISEL-NEXT:      s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GISEL-NEXT:      flat_load_dword v0, v[0:1]
+; GISEL-NEXT:      v_and_b32_e32 v1, 1, v4
+; GISEL-NEXT:      v_cmp_ne_u32_e32 vcc, 0, v1
+; GISEL-NEXT:      s_mov_b64 s[4:5], exec
+; GISEL-NEXT:      s_andn2_b64 s[6:7], exec, vcc
+; GISEL-NEXT:      s_andn2_b64 s[4:5], s[4:5], s[6:7]
+; GISEL-NEXT:      s_cbranch_scc0 .LBB0_2
+; GISEL-NEXT:  ; %bb.1:
+; GISEL-NEXT:      s_and_b64 exec, exec, s[4:5]
+; GISEL-NEXT:      s_waitcnt vmcnt(0) lgkmcnt(0)
+; GISEL-NEXT:      flat_store_dword v[2:3], v0
+; GISEL-NEXT:      s_waitcnt vmcnt(0) lgkmcnt(0)
+; GISEL-NEXT:      s_setpc_b64 s[30:31]
+; GISEL-NEXT:  .LBB0_2:
+; GISEL-NEXT:      s_mov_b64 exec, 0
+; GISEL-NEXT:      s_endpgm
+
+define void @test_kill(ptr %src, ptr %dst, i1 %c) {
+  %a = load i32, ptr %src, align 4
+  callbr void @llvm.amdgcn.kill(i1 %c) to label %cont [label %kill]
+kill:
+  unreachable
+cont:
+  store i32 %a, ptr %dst, align 4
+  ret void
+}
+
+;--- without-callbr.ll
+
+define void @test_kill(ptr %src, ptr %dst, i1 %c) {
+  %a = load i32, ptr %src, align 4
+  call void @llvm.amdgcn.kill(i1 %c)
+  store i32 %a, ptr %dst, align 4
+  ret void
+}

llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp Outdated Show resolved Hide resolved
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp Outdated Show resolved Hide resolved
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp Outdated Show resolved Hide resolved
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp Show resolved Hide resolved
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp Outdated Show resolved Hide resolved
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp Outdated Show resolved Hide resolved
llvm/test/CodeGen/AMDGPU/callbr.ll Outdated Show resolved Hide resolved
llvm/lib/IR/Verifier.cpp Show resolved Hide resolved
llvm/include/llvm/CodeGen/GlobalISel/IRTranslator.h Outdated Show resolved Hide resolved
llvm/include/llvm/IR/IntrinsicInst.h Outdated Show resolved Hide resolved
Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'm assuming you'll still get to that as this is a draft, but this needs LangRef changes and a small RFC on discourse.)

@ro-i
Copy link
Contributor Author

ro-i commented Apr 2, 2025

as background info: the idea for this PR originated in #128162 (see #128162 (comment))

@nickdesaulniers
Copy link
Member

Re the example:

  callbr void @llvm.amdgcn.kill(i1 %c) to label %cont [label %kill]
kill:
  unreachable
cont:
  ...

the to label is supposed to be the "fallthrough case" where the instruction doesn't take a branch to anything in the indirect label list [label <foo>]. Should the example instead be:

  callbr void @llvm.amdgcn.kill(i1 %c) to label %cont [label %kill]
cont:
  ...
kill:
  unreachable

I guess are those the same? idk, example just looked funny.

@ro-i
Copy link
Contributor Author

ro-i commented Apr 9, 2025

@nickdesaulniers yes, it's the same. (I added a test to verify it.)

llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp Outdated Show resolved Hide resolved
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp Outdated Show resolved Hide resolved
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp Outdated Show resolved Hide resolved
Comment on lines 3045 to 3048
case Intrinsic::amdgcn_kill:
if (!translateTargetIntrinsic(I, Intrinsic::amdgcn_kill, MIRBuilder))
return false;
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should push the target intrinsic handling into TargetLowering, shouldn't need to refer to specific intrinsics here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, removed the switch over the intrinsics

@ro-i
Copy link
Contributor Author

ro-i commented Apr 29, 2025

@ro-i ro-i force-pushed the callbr-intrinsics branch from d58d925 to d16bbbd Compare May 12, 2025 08:58
@ro-i ro-i marked this pull request as ready for review May 12, 2025 08:58
@ro-i
Copy link
Contributor Author

ro-i commented May 12, 2025

Since the RFC seems to be fine, I rebased, resolved conflicts, and added the LangRef update.
This PR thus is converted from a draft to a regular PR

@ro-i
Copy link
Contributor Author

ro-i commented May 12, 2025

Ah, I think I shouldn't start to (partly) support inline-asm for callbr in GlobalISel with this patch. This would need to be a separate PR

@ro-i
Copy link
Contributor Author

ro-i commented May 12, 2025

@Meinersbur you added polly/test/ScopDetect/callbr.ll a few years ago. As far as I understand, this currently conflicts with my interpretation of how callbr should be handled during region detection (see failing test). I think that the region of the callbr instruction should include every following block that does not postdominate the block that has callbr as its terminator.
Or, simpler said, a region is only a region if the exit block postdominates every preceding callbr - that's what I implemented in llvm/include/llvm/Analysis/RegionInfoImpl.h.
How do you think about that?

@Meinersbur
Copy link
Member

Meinersbur commented May 12, 2025

That test case was added only to avoid Polly from crashing for some input. 91f46bb reads:

SplitBlockPredecessors is unable to insert an additional BasicBlock between an indirectbr/callbr terminator and the successor blocks. This is needed by Polly to normalize the control flow before emitting its optimzed code.
This patches rejects regions entered by an indirectbr/callbr to not fail later at code generation.

I don't understand why you change the region detection code, but I think it just it has the same goal: Don't use regions involving CallBr. With this PR RegionInfo doesn't even return a region, so the check for CallBr becomes unused. You can just change the test to accept the new output. Since it seems there is no output at all (no region that would trigger output), use something like:

; RUN: opt %loadNPMPolly '-passes=print<polly-detect>' -disable-output < %s 2>&1 | FileCheck %s
; CHECK-LABEL: func
; CHECK-NOT: Valid

@ro-i
Copy link
Contributor Author

ro-i commented May 12, 2025

Alright, thanks! The region detection change was due to StructurizeCFG ripping callbr apart, e.g.

*** IR Dump After Fixup each natural loop to have a single exit block (unify-loop-exits) ***
define void @foo(i32 %x) {
  callbr void asm "", "r,!i"(i32 %x)
          to label %UnifiedReturnBlock [label %indirect]

indirect:                                         ; preds = %0
  call void @llvm.amdgcn.unreachable()
  br label %UnifiedReturnBlock

UnifiedReturnBlock:                               ; preds = %0, %indirect
  ret void
}
*** IR Dump After Structurize control flow (structurizecfg) ***
  br i1 poison, label %indirect, label %UnifiedReturnBlock

indirect:                                         ; preds = %0
  call void @llvm.amdgcn.unreachable()
  br label %UnifiedReturnBlock
*** IR Dump After Structurize control flow (structurizecfg) ***
  br i1 poison, label %indirect, label %UnifiedReturnBlock

indirect:                                         ; preds = %0
  call void @llvm.amdgcn.unreachable()
  br label %UnifiedReturnBlock

UnifiedReturnBlock:                               ; preds = %indirect, %0
  ret void

@Meinersbur
Copy link
Member

Meinersbur commented May 12, 2025

Alright, thanks! The region detection change was due to StructurizeCFG ripping callbr apart, e.g.

As did Polly. Same applies to indirectbr btw.

In my undestanding, RegionInfo is an analysis and whether a Region is Single-Entry-Single-Exit does not depend on which branch instructions jump into the region's entry block, but it would be the responsibility of the transformation pass to not split the callbr/indirectbr from its basic blocks (as Polly did) or make the callbr/indirectbr reference the new BB instead. I guess this is to be discussed in this PR.

@ro-i ro-i requested a review from arsenm May 12, 2025 22:47
ro-i added 7 commits May 26, 2025 04:07
This commit adds support for using intrinsics with callbr.
The uses of this will most of the time look like this example:
```llvm
  callbr void @llvm.amdgcn.kill(i1 %c) to label %cont [label %kill]
kill:
  unreachable
cont:
  ...
```
This reverts commit db57ce2.
@ro-i ro-i force-pushed the callbr-intrinsics branch from e37cf05 to 4670b7a Compare May 26, 2025 12:53
@ro-i
Copy link
Contributor Author

ro-i commented May 26, 2025

Rebased and resolved merge conflicts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.