Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

AMDGPU/GlobalISel: add RegBankLegalize rules for bitfield extract #132381

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 26, 2025

Conversation

petar-avramovic
Copy link
Collaborator

Divergent S32 instruction is available, for S64 need to lower to S32.
Uniform instructions available for both S32 and S64 but need to pack
bitfield offset and size of bitfield into S32. Uniform instruction is
straight up selected since there is no available isel pattern.

@llvmbot
Copy link
Member

llvmbot commented Mar 21, 2025

@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-amdgpu

Author: Petar Avramovic (petar-avramovic)

Changes

Divergent S32 instruction is available, for S64 need to lower to S32.
Uniform instructions available for both S32 and S64 but need to pack
bitfield offset and size of bitfield into S32. Uniform instruction is
straight up selected since there is no available isel pattern.


Patch is 28.10 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/132381.diff

7 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp (+100)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp (+12-1)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.h (+2)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-sbfx.mir (+36-30)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-ubfx.mir (+37-31)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/sbfx.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/ubfx.ll (+3-3)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
index 3383175fc1bdb..e4eaa01951a7f 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
@@ -15,9 +15,12 @@
 #include "AMDGPUGlobalISelUtils.h"
 #include "AMDGPUInstrInfo.h"
 #include "AMDGPURegisterBankInfo.h"
+#include "GCNSubtarget.h"
 #include "MCTargetDesc/AMDGPUMCTargetDesc.h"
+#include "llvm/CodeGen/GlobalISel/GenericMachineInstrs.h"
 #include "llvm/CodeGen/GlobalISel/MachineIRBuilder.h"
 #include "llvm/CodeGen/MachineUniformityAnalysis.h"
+#include "llvm/IR/IntrinsicsAMDGPU.h"
 
 #define DEBUG_TYPE "amdgpu-regbanklegalize"
 
@@ -225,6 +228,103 @@ void RegBankLegalizeHelper::lower(MachineInstr &MI,
     MI.eraseFromParent();
     break;
   }
+  case Div_BFE: {
+    Register Dst = MI.getOperand(0).getReg();
+    assert(MRI.getType(Dst) == LLT::scalar(64));
+    bool Signed = isa<GIntrinsic>(MI) ? MI.getOperand(1).getIntrinsicID() ==
+                                            Intrinsic::amdgcn_sbfe
+                                      : MI.getOpcode() == AMDGPU::G_SBFX;
+    unsigned FirstOpnd = isa<GIntrinsic>(MI) ? 2 : 1;
+    // Extract bitfield from Src, LSBit is the least-significant bit for the
+    // extraction (field offset) and Width is size of bitfield.
+    Register Src = MI.getOperand(FirstOpnd).getReg();
+    Register LSBit = MI.getOperand(FirstOpnd + 1).getReg();
+    Register Width = MI.getOperand(FirstOpnd + 2).getReg();
+    // Comments are for signed bitfield extract, similar for unsigned. x is sign
+    // bit. s is sign, l is LSB and y are remaining bits of bitfield to extract.
+
+    // Src >> LSBit Hi|Lo: x?????syyyyyyl??? -> xxxx?????syyyyyyl
+    unsigned SHROpc = Signed ? AMDGPU::G_ASHR : AMDGPU::G_LSHR;
+    auto SHRSrc = B.buildInstr(SHROpc, {{VgprRB, S64}}, {Src, LSBit});
+
+    auto ConstWidth = getIConstantVRegValWithLookThrough(Width, MRI);
+
+    // Expand to Src >> LSBit << (64 - Width) >> (64 - Width)
+    // << (64 - Width): Hi|Lo: xxxx?????syyyyyyl -> syyyyyyl000000000
+    // >> (64 - Width): Hi|Lo: syyyyyyl000000000 -> ssssssssssyyyyyyl
+    if (!ConstWidth) {
+      auto Amt = B.buildSub(VgprRB_S32, B.buildConstant(SgprRB_S32, 64), Width);
+      auto SignBit = B.buildShl({VgprRB, S64}, SHRSrc, Amt);
+      B.buildInstr(SHROpc, {Dst}, {SignBit, Amt});
+      MI.eraseFromParent();
+      return;
+    }
+
+    auto WidthImm = ConstWidth->Value.getZExtValue();
+    auto UnmergeSHRSrc = B.buildUnmerge(VgprRB_S32, SHRSrc);
+    Register SHRSrcLo = UnmergeSHRSrc.getReg(0);
+    Register SHRSrcHi = UnmergeSHRSrc.getReg(1);
+    auto Zero = B.buildConstant({VgprRB, S32}, 0);
+    unsigned BFXOpc = Signed ? AMDGPU::G_SBFX : AMDGPU::G_UBFX;
+
+    if (WidthImm <= 32) {
+      // SHRSrc Hi|Lo: ????????|???syyyl -> ????????|ssssyyyl
+      auto Lo = B.buildInstr(BFXOpc, {VgprRB_S32}, {SHRSrcLo, Zero, Width});
+      MachineInstrBuilder Hi;
+      if (Signed) {
+        // SHRSrc Hi|Lo: ????????|ssssyyyl -> ssssssss|ssssyyyl
+        Hi = B.buildAShr(VgprRB_S32, Lo, B.buildConstant(VgprRB_S32, 31));
+      } else {
+        // SHRSrc Hi|Lo: ????????|000syyyl -> 00000000|000syyyl
+        Hi = Zero;
+      }
+      B.buildMergeLikeInstr(Dst, {Lo, Hi});
+    } else {
+      auto Amt = B.buildConstant(VgprRB_S32, WidthImm - 32);
+      // SHRSrc Hi|Lo: ??????sy|yyyyyyyl -> sssssssy|yyyyyyyl
+      auto Hi = B.buildInstr(BFXOpc, {VgprRB_S32}, {SHRSrcHi, Zero, Amt});
+      B.buildMergeLikeInstr(Dst, {SHRSrcLo, Hi});
+    }
+
+    MI.eraseFromParent();
+    return;
+  }
+  case Uni_BFE: {
+    Register DstReg = MI.getOperand(0).getReg();
+    LLT Ty = MRI.getType(DstReg);
+    bool Signed = isa<GIntrinsic>(MI) ? MI.getOperand(1).getIntrinsicID() ==
+                                            Intrinsic::amdgcn_sbfe
+                                      : MI.getOpcode() == AMDGPU::G_SBFX;
+    unsigned FirstOpnd = isa<GIntrinsic>(MI) ? 2 : 1;
+    Register Src = MI.getOperand(FirstOpnd).getReg();
+    Register LSBit = MI.getOperand(FirstOpnd + 1).getReg();
+    Register Width = MI.getOperand(FirstOpnd + 2).getReg();
+    // For uniform bit field extract there are 4 available instructions, but
+    // LSBit(field offset) and Width(size of bitfield) need to be packed in S32,
+    // field offset in low and size in high 16 bits.
+
+    // Src1 Hi16|Lo16 = Size|FieldOffset
+    auto Mask = B.buildConstant(SgprRB_S32, maskTrailingOnes<unsigned>(6));
+    auto FieldOffset = B.buildAnd(SgprRB_S32, LSBit, Mask);
+    auto Size = B.buildShl(SgprRB_S32, Width, B.buildConstant(SgprRB_S32, 16));
+    auto Src1 = B.buildOr(SgprRB_S32, FieldOffset, Size);
+    unsigned Opc32 = Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32;
+    unsigned Opc64 = Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64;
+    unsigned Opc = Ty == S32 ? Opc32 : Opc64;
+
+    // Select machine instruction, because of reg class constraining, insert
+    // copies from reg class to reg bank.
+    auto S_BFE = B.buildInstr(Opc, {{SgprRB, Ty}},
+                              {B.buildCopy(Ty, Src), B.buildCopy(S32, Src1)});
+    const GCNSubtarget &ST = B.getMF().getSubtarget<GCNSubtarget>();
+    if (!constrainSelectedInstRegOperands(*S_BFE, *ST.getInstrInfo(),
+                                          *ST.getRegisterInfo(), RBI))
+      llvm_unreachable("failed to constrain BFE");
+
+    B.buildCopy(DstReg, S_BFE->getOperand(0).getReg());
+    MI.eraseFromParent();
+    return;
+  }
   case SplitLoad: {
     LLT DstTy = MRI.getType(MI.getOperand(0).getReg());
     unsigned Size = DstTy.getSizeInBits();
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
index 6ee15709d2fa6..7959bf30ca27d 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
@@ -450,6 +450,14 @@ RegBankLegalizeRules::RegBankLegalizeRules(const GCNSubtarget &_ST,
       .Uni(S64, {{Sgpr64}, {Sgpr64, Sgpr32}})
       .Div(S64, {{Vgpr64}, {Vgpr64, Vgpr32}});
 
+  addRulesForGOpcs({G_LSHR}, Standard).Uni(S32, {{Sgpr32}, {Sgpr32, Sgpr32}});
+
+  addRulesForGOpcs({G_UBFX, G_SBFX}, Standard)
+      .Uni(S32, {{Sgpr32}, {Sgpr32, Sgpr32, Sgpr32}, Uni_BFE})
+      .Div(S32, {{Vgpr32}, {Vgpr32, Vgpr32, Vgpr32}})
+      .Uni(S64, {{Sgpr64}, {Sgpr64, Sgpr32, Sgpr32}, Uni_BFE})
+      .Div(S64, {{Vgpr64}, {Vgpr64, Vgpr32, Vgpr32}, Div_BFE});
+
   // Note: we only write S1 rules for G_IMPLICIT_DEF, G_CONSTANT, G_FCONSTANT
   // and G_FREEZE here, rest is trivially regbankselected earlier
   addRulesForGOpcs({G_IMPLICIT_DEF}).Any({{UniS1}, {{Sgpr32Trunc}, {}}});
@@ -628,6 +636,9 @@ RegBankLegalizeRules::RegBankLegalizeRules(const GCNSubtarget &_ST,
       .Div(S32, {{}, {Vgpr32, None, Vgpr32, Vgpr32}});
 
   addRulesForIOpcs({amdgcn_readfirstlane})
-      .Any({{UniS32, _, DivS32}, {{}, {Sgpr32, None, Vgpr32}}});
+      .Any({{UniS32, _, DivS32}, {{}, {Sgpr32, None, Vgpr32}}})
+      // this should not exist in the first place, it is from call lowering
+      // readfirstlaning just in case register is not in sgpr.
+      .Any({{UniS32, _, UniS32}, {{}, {Sgpr32, None, Vgpr32}}});
 
 } // end initialize rules
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.h b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.h
index 6bde7f2cd676d..5edc88aaff73e 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.h
@@ -168,6 +168,8 @@ enum LoweringMethodID {
   DoNotLower,
   VccExtToSel,
   UniExtToSel,
+  Uni_BFE,
+  Div_BFE,
   VgprToVccCopy,
   SplitTo32,
   Ext32To64,
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-sbfx.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-sbfx.mir
index 97c006a1a7216..572f1ea2516f1 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-sbfx.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-sbfx.mir
@@ -1,6 +1,5 @@
 # NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
-# RUN: llc -mtriple=amdgcn -mcpu=fiji -run-pass=regbankselect -regbankselect-fast -verify-machineinstrs -o - %s | FileCheck %s
-# RUN: llc -mtriple=amdgcn -mcpu=fiji -run-pass=regbankselect -regbankselect-greedy -verify-machineinstrs -o - %s | FileCheck %s
+# RUN: llc -mtriple=amdgcn -mcpu=fiji -run-pass="amdgpu-regbankselect,amdgpu-regbanklegalize" -verify-machineinstrs -o - %s | FileCheck %s
 
 ...
 
@@ -96,12 +95,11 @@ body: |
     ; CHECK-NEXT: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY $vgpr2
     ; CHECK-NEXT: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY $vgpr3
     ; CHECK-NEXT: [[ASHR:%[0-9]+]]:vgpr(s64) = G_ASHR [[COPY]], [[COPY1]](s32)
-    ; CHECK-NEXT: [[UV:%[0-9]+]]:vgpr(s32), [[UV1:%[0-9]+]]:vgpr(s32) = G_UNMERGE_VALUES [[ASHR]](s64)
-    ; CHECK-NEXT: [[C:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 64
+    ; CHECK-NEXT: [[C:%[0-9]+]]:sgpr(s32) = G_CONSTANT i32 64
     ; CHECK-NEXT: [[SUB:%[0-9]+]]:vgpr(s32) = G_SUB [[C]], [[COPY2]]
     ; CHECK-NEXT: [[SHL:%[0-9]+]]:vgpr(s64) = G_SHL [[ASHR]], [[SUB]](s32)
     ; CHECK-NEXT: [[ASHR1:%[0-9]+]]:vgpr(s64) = G_ASHR [[SHL]], [[SUB]](s32)
-    ; CHECK-NEXT: $vgpr0_vgpr1 = COPY %3:vgpr(s64)
+    ; CHECK-NEXT: $vgpr0_vgpr1 = COPY [[ASHR1]](s64)
     %0:_(s64) = COPY $vgpr0_vgpr1
     %1:_(s32) = COPY $vgpr2
     %2:_(s32) = COPY $vgpr3
@@ -124,12 +122,11 @@ body: |
     ; CHECK-NEXT: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
     ; CHECK-NEXT: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY $vgpr1
     ; CHECK-NEXT: [[ASHR:%[0-9]+]]:vgpr(s64) = G_ASHR [[COPY]], [[COPY1]](s32)
-    ; CHECK-NEXT: [[UV:%[0-9]+]]:vgpr(s32), [[UV1:%[0-9]+]]:vgpr(s32) = G_UNMERGE_VALUES [[ASHR]](s64)
-    ; CHECK-NEXT: [[C:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 64
+    ; CHECK-NEXT: [[C:%[0-9]+]]:sgpr(s32) = G_CONSTANT i32 64
     ; CHECK-NEXT: [[SUB:%[0-9]+]]:vgpr(s32) = G_SUB [[C]], [[COPY2]]
     ; CHECK-NEXT: [[SHL:%[0-9]+]]:vgpr(s64) = G_SHL [[ASHR]], [[SUB]](s32)
     ; CHECK-NEXT: [[ASHR1:%[0-9]+]]:vgpr(s64) = G_ASHR [[SHL]], [[SUB]](s32)
-    ; CHECK-NEXT: $vgpr0_vgpr1 = COPY %3:vgpr(s64)
+    ; CHECK-NEXT: $vgpr0_vgpr1 = COPY [[ASHR1]](s64)
     %0:_(s64) = COPY $vgpr0_vgpr1
     %1:_(s32) = COPY $vgpr0
     %2:_(s32) = COPY $vgpr1
@@ -216,12 +213,11 @@ body: |
     ; CHECK-NEXT: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY $vgpr1
     ; CHECK-NEXT: [[COPY3:%[0-9]+]]:vgpr(s64) = COPY [[COPY]](s64)
     ; CHECK-NEXT: [[ASHR:%[0-9]+]]:vgpr(s64) = G_ASHR [[COPY3]], [[COPY1]](s32)
-    ; CHECK-NEXT: [[UV:%[0-9]+]]:vgpr(s32), [[UV1:%[0-9]+]]:vgpr(s32) = G_UNMERGE_VALUES [[ASHR]](s64)
-    ; CHECK-NEXT: [[C:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 64
+    ; CHECK-NEXT: [[C:%[0-9]+]]:sgpr(s32) = G_CONSTANT i32 64
     ; CHECK-NEXT: [[SUB:%[0-9]+]]:vgpr(s32) = G_SUB [[C]], [[COPY2]]
     ; CHECK-NEXT: [[SHL:%[0-9]+]]:vgpr(s64) = G_SHL [[ASHR]], [[SUB]](s32)
     ; CHECK-NEXT: [[ASHR1:%[0-9]+]]:vgpr(s64) = G_ASHR [[SHL]], [[SUB]](s32)
-    ; CHECK-NEXT: $vgpr0_vgpr1 = COPY %3:vgpr(s64)
+    ; CHECK-NEXT: $vgpr0_vgpr1 = COPY [[ASHR1]](s64)
     %0:_(s64) = COPY $sgpr0_sgpr1
     %1:_(s32) = COPY $vgpr0
     %2:_(s32) = COPY $vgpr1
@@ -266,16 +262,19 @@ body: |
     ; CHECK-LABEL: name: test_sbfx_s32_sss
     ; CHECK: liveins: $sgpr0, $sgpr1, $sgpr3
     ; CHECK-NEXT: {{  $}}
-    ; CHECK-NEXT: [[COPY:%[0-9]+]]:sreg_32(s32) = COPY $sgpr0
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; CHECK-NEXT: [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr1
     ; CHECK-NEXT: [[COPY2:%[0-9]+]]:sgpr(s32) = COPY $sgpr2
     ; CHECK-NEXT: [[C:%[0-9]+]]:sgpr(s32) = G_CONSTANT i32 63
     ; CHECK-NEXT: [[AND:%[0-9]+]]:sgpr(s32) = G_AND [[COPY1]], [[C]]
     ; CHECK-NEXT: [[C1:%[0-9]+]]:sgpr(s32) = G_CONSTANT i32 16
     ; CHECK-NEXT: [[SHL:%[0-9]+]]:sgpr(s32) = G_SHL [[COPY2]], [[C1]](s32)
-    ; CHECK-NEXT: [[OR:%[0-9]+]]:sreg_32(s32) = G_OR [[AND]], [[SHL]]
-    ; CHECK-NEXT: [[S_BFE_I32_:%[0-9]+]]:sreg_32(s32) = S_BFE_I32 [[COPY]](s32), [[OR]](s32), implicit-def $scc
-    ; CHECK-NEXT: $sgpr0 = COPY [[S_BFE_I32_]](s32)
+    ; CHECK-NEXT: [[OR:%[0-9]+]]:sgpr(s32) = G_OR [[AND]], [[SHL]]
+    ; CHECK-NEXT: [[COPY3:%[0-9]+]]:sreg_32(s32) = COPY [[COPY]](s32)
+    ; CHECK-NEXT: [[COPY4:%[0-9]+]]:sreg_32(s32) = COPY [[OR]](s32)
+    ; CHECK-NEXT: [[S_BFE_I32_:%[0-9]+]]:sreg_32(s32) = S_BFE_I32 [[COPY3]](s32), [[COPY4]](s32), implicit-def $scc
+    ; CHECK-NEXT: [[COPY5:%[0-9]+]]:sgpr(s32) = COPY [[S_BFE_I32_]](s32)
+    ; CHECK-NEXT: $sgpr0 = COPY [[COPY5]](s32)
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = COPY $sgpr1
     %2:_(s32) = COPY $sgpr2
@@ -294,16 +293,18 @@ body: |
     ; CHECK-LABEL: name: test_sbfx_s32_sii
     ; CHECK: liveins: $sgpr0
     ; CHECK-NEXT: {{  $}}
-    ; CHECK-NEXT: [[COPY:%[0-9]+]]:sreg_32(s32) = COPY $sgpr0
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; CHECK-NEXT: [[C:%[0-9]+]]:sgpr(s32) = G_CONSTANT i32 1
     ; CHECK-NEXT: [[C1:%[0-9]+]]:sgpr(s32) = G_CONSTANT i32 10
     ; CHECK-NEXT: [[C2:%[0-9]+]]:sgpr(s32) = G_CONSTANT i32 63
-    ; CHECK-NEXT: [[AND:%[0-9]+]]:sgpr(s32) = G_AND [[C]], [[C2]]
     ; CHECK-NEXT: [[C3:%[0-9]+]]:sgpr(s32) = G_CONSTANT i32 16
-    ; CHECK-NEXT: [[SHL:%[0-9]+]]:sgpr(s32) = G_SHL [[C1]], [[C3]](s32)
-    ; CHECK-NEXT: [[OR:%[0-9]+]]:sreg_32(s32) = G_OR [[AND]], [[SHL]]
-    ; CHECK-NEXT: [[S_BFE_I32_:%[0-9]+]]:sreg_32(s32) = S_BFE_I32 [[COPY]](s32), [[OR]](s32), implicit-def $scc
-    ; CHECK-NEXT: $sgpr0 = COPY [[S_BFE_I32_]](s32)
+    ; CHECK-NEXT: [[C4:%[0-9]+]]:sgpr(s32) = G_CONSTANT i32 655360
+    ; CHECK-NEXT: [[C5:%[0-9]+]]:sgpr(s32) = G_CONSTANT i32 655361
+    ; CHECK-NEXT: [[COPY1:%[0-9]+]]:sreg_32(s32) = COPY [[COPY]](s32)
+    ; CHECK-NEXT: [[COPY2:%[0-9]+]]:sreg_32(s32) = COPY [[C5]](s32)
+    ; CHECK-NEXT: [[S_BFE_I32_:%[0-9]+]]:sreg_32(s32) = S_BFE_I32 [[COPY1]](s32), [[COPY2]](s32), implicit-def $scc
+    ; CHECK-NEXT: [[COPY3:%[0-9]+]]:sgpr(s32) = COPY [[S_BFE_I32_]](s32)
+    ; CHECK-NEXT: $sgpr0 = COPY [[COPY3]](s32)
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = G_CONSTANT i32 1
     %2:_(s32) = G_CONSTANT i32 10
@@ -324,16 +325,19 @@ body: |
     ; CHECK-LABEL: name: test_sbfx_s64_sss
     ; CHECK: liveins: $sgpr0_sgpr1, $sgpr0, $sgpr1
     ; CHECK-NEXT: {{  $}}
-    ; CHECK-NEXT: [[COPY:%[0-9]+]]:sreg_64(s64) = COPY $sgpr0_sgpr1
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:sgpr(s64) = COPY $sgpr0_sgpr1
     ; CHECK-NEXT: [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; CHECK-NEXT: [[COPY2:%[0-9]+]]:sgpr(s32) = COPY $sgpr1
     ; CHECK-NEXT: [[C:%[0-9]+]]:sgpr(s32) = G_CONSTANT i32 63
     ; CHECK-NEXT: [[AND:%[0-9]+]]:sgpr(s32) = G_AND [[COPY1]], [[C]]
     ; CHECK-NEXT: [[C1:%[0-9]+]]:sgpr(s32) = G_CONSTANT i32 16
     ; CHECK-NEXT: [[SHL:%[0-9]+]]:sgpr(s32) = G_SHL [[COPY2]], [[C1]](s32)
-    ; CHECK-NEXT: [[OR:%[0-9]+]]:sreg_32(s32) = G_OR [[AND]], [[SHL]]
-    ; CHECK-NEXT: [[S_BFE_I64_:%[0-9]+]]:sreg_64(s64) = S_BFE_I64 [[COPY]](s64), [[OR]](s32), implicit-def $scc
-    ; CHECK-NEXT: $sgpr0_sgpr1 = COPY [[S_BFE_I64_]](s64)
+    ; CHECK-NEXT: [[OR:%[0-9]+]]:sgpr(s32) = G_OR [[AND]], [[SHL]]
+    ; CHECK-NEXT: [[COPY3:%[0-9]+]]:sreg_64(s64) = COPY [[COPY]](s64)
+    ; CHECK-NEXT: [[COPY4:%[0-9]+]]:sreg_32(s32) = COPY [[OR]](s32)
+    ; CHECK-NEXT: [[S_BFE_I64_:%[0-9]+]]:sreg_64(s64) = S_BFE_I64 [[COPY3]](s64), [[COPY4]](s32), implicit-def $scc
+    ; CHECK-NEXT: [[COPY5:%[0-9]+]]:sgpr(s64) = COPY [[S_BFE_I64_]](s64)
+    ; CHECK-NEXT: $sgpr0_sgpr1 = COPY [[COPY5]](s64)
     %0:_(s64) = COPY $sgpr0_sgpr1
     %1:_(s32) = COPY $sgpr0
     %2:_(s32) = COPY $sgpr1
@@ -352,15 +356,17 @@ body: |
     ; CHECK-LABEL: name: test_sbfx_s64_sii
     ; CHECK: liveins: $sgpr0_sgpr1
     ; CHECK-NEXT: {{  $}}
-    ; CHECK-NEXT: [[COPY:%[0-9]+]]:sreg_64(s64) = COPY $sgpr0_sgpr1
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:sgpr(s64) = COPY $sgpr0_sgpr1
     ; CHECK-NEXT: [[C:%[0-9]+]]:sgpr(s32) = G_CONSTANT i32 1
     ; CHECK-NEXT: [[C1:%[0-9]+]]:sgpr(s32) = G_CONSTANT i32 10
     ; CHECK-NEXT: [[C2:%[0-9]+]]:sgpr(s32) = G_CONSTANT i32 63
-    ; CHECK-NEXT: [[AND:%[0-9]+]]:sgpr(s32) = G_AND [[C]], [[C2]]
     ; CHECK-NEXT: [[C3:%[0-9]+]]:sgpr(s32) = G_CONSTANT i32 16
-    ; CHECK-NEXT: [[SHL:%[0-9]+]]:sgpr(s32) = G_SHL [[C1]], [[C3]](s32)
-    ; CHECK-NEXT: [[OR:%[0-9]+]]:sreg_32(s32) = G_OR [[AND]], [[SHL]]
-    ; CHECK-NEXT: [[S_BFE_I64_:%[0-9]+]]:sreg_64(s64) = S_BFE_I64 [[COPY]](s64), [[OR]](s32), implicit-def $scc
+    ; CHECK-NEXT: [[C4:%[0-9]+]]:sgpr(s32) = G_CONSTANT i32 655360
+    ; CHECK-NEXT: [[C5:%[0-9]+]]:sgpr(s32) = G_CONSTANT i32 655361
+    ; CHECK-NEXT: [[COPY1:%[0-9]+]]:sreg_64(s64) = COPY [[COPY]](s64)
+    ; CHECK-NEXT: [[COPY2:%[0-9]+]]:sreg_32(s32) = COPY [[C5]](s32)
+    ; CHECK-NEXT: [[S_BFE_I64_:%[0-9]+]]:sreg_64(s64) = S_BFE_I64 [[COPY1]](s64), [[COPY2]](s32), implicit-def $scc
+    ; CHECK-NEXT: [[COPY3:%[0-9]+]]:sgpr(s64) = COPY [[S_BFE_I64_]](s64)
     %0:_(s64) = COPY $sgpr0_sgpr1
     %1:_(s32) = G_CONSTANT i32 1
     %2:_(s32) = G_CONSTANT i32 10
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-ubfx.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-ubfx.mir
index 20d280680a09d..267960ad74eff 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-ubfx.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-ubfx.mir
@@ -1,6 +1,5 @@
 # NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
-# RUN: llc -mtriple=amdgcn -mcpu=fiji -run-pass=regbankselect -regbankselect-fast -verify-machineinstrs -o - %s | FileCheck %s
-# RUN: llc -mtriple=amdgcn -mcpu=fiji -run-pass=regbankselect -regbankselect-greedy -verify-machineinstrs -o - %s | FileCheck %s
+# RUN: llc -mtriple=amdgcn -mcpu=fiji -run-pass="amdgpu-regbankselect,amdgpu-regbanklegalize" -verify-machineinstrs -o - %s | FileCheck %s
 
 ...
 
@@ -96,12 +95,11 @@ body: |
     ; CHECK-NEXT: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY $vgpr2
     ; CHECK-NEXT: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY $vgpr3
     ; CHECK-NEXT: [[LSHR:%[0-9]+]]:vgpr(s64) = G_LSHR [[COPY]], [[COPY1]](s32)
-    ; CHECK-NEXT: [[UV:%[0-9]+]]:vgpr(s32), [[UV1:%[0-9]+]]:vgpr(s32) = G_UNMERGE_VALUES [[LSHR]](s64)
-    ; CHECK-NEXT: [[C:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 64
+    ; CHECK-NEXT: [[C:%[0-9]+]]:sgpr(s32) = G_CONSTANT i32 64
     ; CHECK-NEXT: [[SUB:%[0-9]+]]:vgpr(s32) = G_SUB [[C]], [[COPY2]]
     ; CHECK-NEXT: [[SHL:%[0-9]+]]:vgpr(s64) = G_SHL [[LSHR]], [[SUB]](s32)
     ; CHECK-NEXT: [[LSHR1:%[0-9]+]]:vgpr(s64) = G_LSHR [[SHL]], [[SUB]](s32)
-    ; CHECK-NEXT: $vgpr0_vgpr1 = COPY %3:vgpr(s64)
+    ; CHECK-NEXT: $vgpr0_vgpr1 = COPY [[LSHR1]](s64)
     %0:_(s64) = COPY $vgpr0_vgpr1
     %1:_(s32) = COPY $vgpr2
     %2:_(s32) = COPY $vgpr3
@@ -124,12 +122,11 @@ body: |
     ; CHECK-NEXT: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
     ; CHECK-NEXT: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY $vgpr1
     ; CHECK-NEXT: [[LSHR:%[0-9]+]]:vgpr(s64) = G_LSHR [[COPY]], [[COPY1]](s32)
-    ; CHECK-NEXT: [[UV:%[0-9]+]]:vgpr(s32), [[UV1:%[0-9]+]]:vgpr(s32) = G_UNMERGE_VALUES [[LSHR]](s64)
-    ; CHECK-NEXT: [[C:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 64
+    ; CHECK-NEXT: [[C:%[0-9]+]]:sgpr(s32) = G_CONSTANT i32 64
     ; CHECK-NEXT: [[SUB:%[0-9]+]]:vgpr(s32) = G_SUB [[C]], [[COPY2]]
     ; CHECK-NEXT: [[SHL:%[0-9]+]]:vgpr(s64) = G_SHL [[LSHR]], [[SUB]](s32)
     ; CHECK-NEXT: [[LSHR1:%[0-9]+]]:vgpr(s64) = G_LSHR [[SHL]], [[SUB]](s32)
-    ; CHECK-NEXT: $vgpr0_vgpr1 = COPY %3:vgpr(s64)
+    ; CHECK-NEXT: $vgpr0_vgpr1 = COPY [[LSHR1]](s64)
     %0:_(s64) = COPY $vgpr0_vgpr1
     %1:_(s32) = COPY $vgpr0
     %2:_(s32) = COPY $vgpr1
@@ -214,12 +211,11 @@ body: |
     ; CHECK-NEXT: [[COPY2:%[0-9]+]]:vgpr(s32) = COPY $vgpr1
     ; CHECK-NEXT: [[COPY3:%[0-9]+]]:vgpr(s64) = COPY [[COPY]](s64)
     ; CHECK-NEXT: [[LSHR:%[0-9]+]]:vgpr(s64) = G_LSHR [[COPY3]], [[COPY1]](s32)
-    ; CHECK-NEXT: [[UV:%[0-9]+]]:vgpr(s32), [[UV1:%[0-9]+]]:vgpr(s32) = G_UNMERGE_VALUES [[LSHR]](s64)
-    ; CHECK-NEXT: [[C:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 64
+    ; CHECK-NEXT: [[C:%[0-9]+]]:sgpr(s32) = G_CONSTANT i32 64
     ; CHECK-NEXT: [[SUB:%[0-9]+]]:vgpr(s32) = G_SUB [[C]], [[COPY2]]
     ; CHECK-NEXT: [[SHL:%[0-9]+]]:vgpr(s64) = G_SHL [[LSHR]], [[SUB]](s32)
     ; CHECK-NEXT: [[LSHR1:%[0-9]+]]:vgpr(s64) = G_LSHR [[SHL]], [[SUB]](s32)
-    ; CHECK-NEXT: $vgpr0_vgpr1 = COPY %3:vgpr(s64)
+    ; CHECK-NEXT: $vgpr0_vgpr1 = COPY [[LSHR1]](s64)
     %0:_(s64) = COPY $sgpr0_sgpr1
     %1:_(s32) = C...
[truncated]

@@ -225,6 +228,103 @@ void RegBankLegalizeHelper::lower(MachineInstr &MI,
MI.eraseFromParent();
break;
}
case Div_BFE: {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the contents of the case should be a helper, otherwise the switch becomes hard to read.
The code is also easier to follow with less indentation, and you can add a comment on the function to explain why/how it's lowered like this

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes sense.

Actually, this could be a motivation for being able to directly specify a helper function to be used as part of the legalizer rule definition...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting, will look into that

Register Src = MI.getOperand(FirstOpnd).getReg();
Register LSBit = MI.getOperand(FirstOpnd + 1).getReg();
Register Width = MI.getOperand(FirstOpnd + 2).getReg();
// Comments are for signed bitfield extract, similar for unsigned. x is sign
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO these comments are confusing, what do other reviewers think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think they're helpful.

Comment on lines 234 to 236
bool Signed = isa<GIntrinsic>(MI) ? MI.getOperand(1).getIntrinsicID() ==
Intrinsic::amdgcn_sbfe
: MI.getOpcode() == AMDGPU::G_SBFX;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this into a isSignedBFE helper and reuse it below too ?
A 3 line ternary is a bit hard to read IMO

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering that the next line also inspects the exact type of MI, I'm not convinced that this would make things cleaner.

Comment on lines 234 to 236
bool Signed = isa<GIntrinsic>(MI) ? MI.getOperand(1).getIntrinsicID() ==
Intrinsic::amdgcn_sbfe
: MI.getOpcode() == AMDGPU::G_SBFX;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering that the next line also inspects the exact type of MI, I'm not convinced that this would make things cleaner.

Comment on lines 246 to 261
// Src >> LSBit Hi|Lo: x?????syyyyyyl??? -> xxxx?????syyyyyyl
unsigned SHROpc = Signed ? AMDGPU::G_ASHR : AMDGPU::G_LSHR;
auto SHRSrc = B.buildInstr(SHROpc, {{VgprRB, S64}}, {Src, LSBit});

auto ConstWidth = getIConstantVRegValWithLookThrough(Width, MRI);

// Expand to Src >> LSBit << (64 - Width) >> (64 - Width)
// << (64 - Width): Hi|Lo: xxxx?????syyyyyyl -> syyyyyyl000000000
// >> (64 - Width): Hi|Lo: syyyyyyl000000000 -> ssssssssssyyyyyyl
if (!ConstWidth) {
auto Amt = B.buildSub(VgprRB_S32, B.buildConstant(SgprRB_S32, 64), Width);
auto SignBit = B.buildShl({VgprRB, S64}, SHRSrc, Amt);
B.buildInstr(SHROpc, {Dst}, {SignBit, Amt});
MI.eraseFromParent();
return;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you do shr -> shl -> shr? It's cheaper to just do shl -> shr with appropriate widths.

Another issue here: What if only the data source operand is divergent, and the other inputs are uniform? I feel like we should handle that appropriately.

Copy link
Collaborator Author

@petar-avramovic petar-avramovic Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not give it too much thought, just copied original implementation.
Need 4 instructions either way, instructions are two shifts and sub, fourth can be either sub or shift. Shift is faster then sub?

SHRSrc = Src >> LSBit
Amt = 64 - Width
SignBit = SHRSrc << Amt
Res = SignBit >> Amt
AmtR = 64 - Width
AmtL = AmtR - LSBit
SignBit = Src << AmtL
Res = SignBit >> AmtR

What if only the data source operand is divergent, and the other inputs are uniform?
What should be done differently then?

Register Src = MI.getOperand(FirstOpnd).getReg();
Register LSBit = MI.getOperand(FirstOpnd + 1).getReg();
Register Width = MI.getOperand(FirstOpnd + 2).getReg();
// Comments are for signed bitfield extract, similar for unsigned. x is sign
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think they're helpful.

@@ -225,6 +228,103 @@ void RegBankLegalizeHelper::lower(MachineInstr &MI,
MI.eraseFromParent();
break;
}
case Div_BFE: {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes sense.

Actually, this could be a motivation for being able to directly specify a helper function to be used as part of the legalizer rule definition...

@petar-avramovic petar-avramovic force-pushed the users/petar-avramovic/bfe branch from a0088e7 to c127ee1 Compare March 27, 2025 14:42
llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp Outdated Show resolved Hide resolved
// Expand to Src >> LSBit << (64 - Width) >> (64 - Width)
// << (64 - Width): Hi|Lo: xxxx?????syyyyyyl -> syyyyyyl000000000
// >> (64 - Width): Hi|Lo: syyyyyyl000000000 -> ssssssssssyyyyyyl
if (!ConstWidth) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this functionally the same as the intrinsic case?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume so, looks like intrinsics were introduced first, then the G_ opcodes

llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp Outdated Show resolved Hide resolved
Comment on lines +223 to +215
auto Mask = B.buildConstant(SgprRB_S32, maskTrailingOnes<unsigned>(6));
auto FieldOffset = B.buildAnd(SgprRB_S32, LSBit, Mask);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a buildZextInReg helper

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buildZextInReg builds multiple instructions and does not set reg banks, this way we set all register banks

Comment on lines +227 to +219
unsigned Opc32 = Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32;
unsigned Opc64 = Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried about this harming the ability of downstream transforms and analyses to do anything with the operation. We should still be able to handle known and demanded bits. Also don't want to break any patterns that read BFE inputs. Except for the intrinsic with variable inputs case, can't you just use the G_* opcodes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no tablegen patterns for S_BFE, known bits for GlobalISel are missing support for many generic opcodes not to mention target intrinsics. Can we leave that for another patch?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no tablegen patterns for S_BFE

Since the two operands are merged into one operand, I think you need to manually select it. Should add a fixme to stop directly selecting it

llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp Outdated Show resolved Hide resolved
@petar-avramovic petar-avramovic force-pushed the users/petar-avramovic/bfe branch from 3bf2052 to 19a9421 Compare April 14, 2025 14:38
@petar-avramovic
Copy link
Collaborator Author

ping

llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.h Outdated Show resolved Hide resolved
@petar-avramovic petar-avramovic force-pushed the users/petar-avramovic/bfe branch from 19a9421 to eb742ec Compare April 24, 2025 10:03
Copy link
Contributor

@Pierre-vh Pierre-vh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but please wait in case @arsenm has more comments

@@ -127,6 +131,112 @@ void RegBankLegalizeHelper::widenLoad(MachineInstr &MI, LLT WideTy,
MI.eraseFromParent();
}

static bool isSignedBFE(MachineInstr &MI) {
if (isa<GIntrinsic>(MI)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: How about dyn_cast + use the .is(IID) helper?

@petar-avramovic petar-avramovic force-pushed the users/petar-avramovic/bfe branch from eb742ec to dd430d2 Compare May 8, 2025 10:14
Comment on lines 137 to 138
if (GI->is(Intrinsic::amdgcn_sbfe))
return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can just return the GI->is, I don't think it's particularly important that this function only be called with the BFE ops

@petar-avramovic petar-avramovic force-pushed the users/petar-avramovic/bfe branch from dd430d2 to ec4200d Compare May 22, 2025 12:18
Divergent S32 instruction is available, for S64 need to lower to S32.
Uniform instructions available for both S32 and S64 but need to pack
bitfield offset and size of bitfield into S32. Uniform instruction is
straight up selected since there is no available isel pattern.
@petar-avramovic petar-avramovic force-pushed the users/petar-avramovic/bfe branch from ec4200d to fa35907 Compare May 26, 2025 09:56
Copy link
Collaborator Author

petar-avramovic commented May 26, 2025

Merge activity

  • May 26, 9:58 AM UTC: A user started a stack merge that includes this pull request via Graphite.
  • May 26, 9:58 AM UTC: @petar-avramovic merged this pull request with Graphite.

@petar-avramovic petar-avramovic merged commit 38cec04 into main May 26, 2025
6 of 11 checks passed
@petar-avramovic petar-avramovic deleted the users/petar-avramovic/bfe branch May 26, 2025 09:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.