Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

AMDGPU/GlobalISel: add RegBankLegalize rules for extends and trunc #132383

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 26, 2025

Conversation

petar-avramovic
Copy link
Collaborator

Uniform S1:
Truncs to uniform S1 and AnyExts from S1 are left as is as they are meant
to be combined away. Uniform S1 ZExt and SExt are lowered using select.
Divergent S1:
Trunc of VGPR to VCC is lowered as compare.
Extends of VCC are lowered using select.

For remaining types:
S32 to S64 ZExt and SExt are lowered using merge values, AnyExt and Trunc
are again left as is to be combined away.
Notably uniform S16 for SExt and Zext is not lowered to S32 and left as is
for instruction select to deal with them. This is because there are patterns
that check for S16 type.

@llvmbot
Copy link
Member

llvmbot commented Mar 21, 2025

@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-amdgpu

Author: Petar Avramovic (petar-avramovic)

Changes

Uniform S1:
Truncs to uniform S1 and AnyExts from S1 are left as is as they are meant
to be combined away. Uniform S1 ZExt and SExt are lowered using select.
Divergent S1:
Trunc of VGPR to VCC is lowered as compare.
Extends of VCC are lowered using select.

For remaining types:
S32 to S64 ZExt and SExt are lowered using merge values, AnyExt and Trunc
are again left as is to be combined away.
Notably uniform S16 for SExt and Zext is not lowered to S32 and left as is
for instruction select to deal with them. This is because there are patterns
that check for S16 type.


Patch is 42.29 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/132383.diff

8 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp (+2-1)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp (+45-6)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp (+43-4)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.h (+3)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-anyext.mir (+30-31)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-sext.mir (+66-34)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-trunc.mir (+16-6)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-zext.mir (+57-32)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp
index d5a83903e2b13..44f1b5419abb9 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp
@@ -216,7 +216,8 @@ class AMDGPURegBankLegalizeCombiner {
       return;
     }
 
-    if (DstTy == S32 && TruncSrcTy == S16) {
+    if ((DstTy == S64 && TruncSrcTy == S32) ||
+        (DstTy == S32 && TruncSrcTy == S16)) {
       B.buildAnyExt(Dst, TruncSrc);
       cleanUpAfterCombine(MI, Trunc);
       return;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
index 5dbaa9488d668..7301cba9e8ed3 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
@@ -173,13 +173,23 @@ void RegBankLegalizeHelper::lower(MachineInstr &MI,
   case Ext32To64: {
     const RegisterBank *RB = MRI.getRegBank(MI.getOperand(0).getReg());
     MachineInstrBuilder Hi;
-
-    if (MI.getOpcode() == AMDGPU::G_ZEXT) {
+    switch (MI.getOpcode()) {
+    case AMDGPU::G_ZEXT: {
       Hi = B.buildConstant({RB, S32}, 0);
-    } else {
+      break;
+    }
+    case AMDGPU::G_SEXT: {
       // Replicate sign bit from 32-bit extended part.
       auto ShiftAmt = B.buildConstant({RB, S32}, 31);
       Hi = B.buildAShr({RB, S32}, MI.getOperand(1).getReg(), ShiftAmt);
+      break;
+    }
+    case AMDGPU::G_ANYEXT: {
+      Hi = B.buildUndef({RB, S32});
+      break;
+    }
+    default:
+      llvm_unreachable("Unsuported Opcode in Ext32To64");
     }
 
     B.buildMergeLikeInstr(MI.getOperand(0).getReg(),
@@ -202,7 +212,7 @@ void RegBankLegalizeHelper::lower(MachineInstr &MI,
     // compares all bits in register.
     Register BoolSrc = MRI.createVirtualRegister({VgprRB, Ty});
     if (Ty == S64) {
-      auto Src64 = B.buildUnmerge({VgprRB, Ty}, Src);
+      auto Src64 = B.buildUnmerge(VgprRB_S32, Src);
       auto One = B.buildConstant(VgprRB_S32, 1);
       auto AndLo = B.buildAnd(VgprRB_S32, Src64.getReg(0), One);
       auto Zero = B.buildConstant(VgprRB_S32, 0);
@@ -396,8 +406,11 @@ LLT RegBankLegalizeHelper::getTyFromID(RegBankLLTMappingApplyID ID) {
   case Sgpr32AExt:
   case Sgpr32AExtBoolInReg:
   case Sgpr32SExt:
+  case Sgpr32ZExt:
   case UniInVgprS32:
   case Vgpr32:
+  case Vgpr32SExt:
+  case Vgpr32ZExt:
     return LLT::scalar(32);
   case Sgpr64:
   case Vgpr64:
@@ -508,6 +521,7 @@ RegBankLegalizeHelper::getRegBankFromID(RegBankLLTMappingApplyID ID) {
   case Sgpr32AExt:
   case Sgpr32AExtBoolInReg:
   case Sgpr32SExt:
+  case Sgpr32ZExt:
     return SgprRB;
   case Vgpr16:
   case Vgpr32:
@@ -524,6 +538,8 @@ RegBankLegalizeHelper::getRegBankFromID(RegBankLLTMappingApplyID ID) {
   case VgprB128:
   case VgprB256:
   case VgprB512:
+  case Vgpr32SExt:
+  case Vgpr32ZExt:
     return VgprRB;
   default:
     return nullptr;
@@ -729,8 +745,8 @@ void RegBankLegalizeHelper::applyMappingSrc(
       assert(Ty.getSizeInBits() == 1);
       assert(RB == SgprRB);
       auto Aext = B.buildAnyExt(SgprRB_S32, Reg);
-      // Zext SgprS1 is not legal, this instruction is most of times meant to be
-      // combined away in RB combiner, so do not make AND with 1.
+      // Zext SgprS1 is not legal, make AND with 1 instead. This instruction is
+      // most of times meant to be combined away in AMDGPURegBankCombiner.
       auto Cst1 = B.buildConstant(SgprRB_S32, 1);
       auto BoolInReg = B.buildAnd(SgprRB_S32, Aext, Cst1);
       Op.setReg(BoolInReg.getReg(0));
@@ -743,6 +759,29 @@ void RegBankLegalizeHelper::applyMappingSrc(
       Op.setReg(Sext.getReg(0));
       break;
     }
+    case Sgpr32ZExt: {
+      assert(1 < Ty.getSizeInBits() && Ty.getSizeInBits() < 32);
+      assert(RB == SgprRB);
+      auto Zext = B.buildZExt({SgprRB, S32}, Reg);
+      Op.setReg(Zext.getReg(0));
+      break;
+    }
+    case Vgpr32SExt: {
+      // Note this ext allows S1, and it is meant to be combined away.
+      assert(Ty.getSizeInBits() < 32);
+      assert(RB == VgprRB);
+      auto Sext = B.buildSExt({VgprRB, S32}, Reg);
+      Op.setReg(Sext.getReg(0));
+      break;
+    }
+    case Vgpr32ZExt: {
+      // Note this ext allows S1, and it is meant to be combined away.
+      assert(Ty.getSizeInBits() < 32);
+      assert(RB == VgprRB);
+      auto Zext = B.buildZExt({VgprRB, S32}, Reg);
+      Op.setReg(Zext.getReg(0));
+      break;
+    }
     default:
       llvm_unreachable("ID not supported");
     }
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
index 96bc969dd1f40..b4ef4ecc3fe28 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
@@ -489,22 +489,61 @@ RegBankLegalizeRules::RegBankLegalizeRules(const GCNSubtarget &_ST,
       .Uni(B32, {{SgprB32}, {Sgpr32AExtBoolInReg, SgprB32, SgprB32}});
 
   addRulesForGOpcs({G_ANYEXT})
+      .Any({{UniS16, S1}, {{None}, {None}}}) // should be combined away
       .Any({{UniS32, S1}, {{None}, {None}}}) // should be combined away
-      .Any({{UniS32, S16}, {{Sgpr32}, {Sgpr16}}});
+      .Any({{UniS64, S1}, {{None}, {None}}}) // should be combined away
+      .Any({{{DivS16, S1}}, {{Vgpr16}, {Vcc}, VccExtToSel}})
+      .Any({{{DivS32, S1}}, {{Vgpr32}, {Vcc}, VccExtToSel}})
+      .Any({{{DivS64, S1}}, {{Vgpr64}, {Vcc}, VccExtToSel}})
+      .Any({{UniS64, S32}, {{Sgpr64}, {Sgpr32}, Ext32To64}})
+      .Any({{DivS64, S32}, {{Vgpr64}, {Vgpr32}, Ext32To64}})
+      .Any({{UniS32, S16}, {{Sgpr32}, {Sgpr16}}})
+      .Any({{DivS32, S16}, {{Vgpr32}, {Vgpr16}}});
 
   // In global-isel G_TRUNC in-reg is treated as no-op, inst selected into COPY.
   // It is up to user to deal with truncated bits.
   addRulesForGOpcs({G_TRUNC})
+      .Any({{UniS1, UniS16}, {{None}, {None}}}) // should be combined away
       .Any({{UniS1, UniS32}, {{None}, {None}}}) // should be combined away
+      .Any({{UniS1, UniS64}, {{None}, {None}}}) // should be combined away
       .Any({{UniS16, S32}, {{Sgpr16}, {Sgpr32}}})
+      .Any({{DivS16, S32}, {{Vgpr16}, {Vgpr32}}})
+      .Any({{UniS32, S64}, {{Sgpr32}, {Sgpr64}}})
+      .Any({{DivS32, S64}, {{Vgpr32}, {Vgpr64}}})
       // This is non-trivial. VgprToVccCopy is done using compare instruction.
-      .Any({{DivS1, DivS32}, {{Vcc}, {Vgpr32}, VgprToVccCopy}});
+      .Any({{DivS1, DivS16}, {{Vcc}, {Vgpr16}, VgprToVccCopy}})
+      .Any({{DivS1, DivS32}, {{Vcc}, {Vgpr32}, VgprToVccCopy}})
+      .Any({{DivS1, DivS64}, {{Vcc}, {Vgpr64}, VgprToVccCopy}});
 
-  addRulesForGOpcs({G_ZEXT, G_SEXT})
+  addRulesForGOpcs({G_ZEXT})
+      .Any({{UniS16, S1}, {{Sgpr32Trunc}, {Sgpr32AExtBoolInReg}, UniExtToSel}})
+      .Any({{UniS32, S1}, {{Sgpr32}, {Sgpr32AExtBoolInReg}, UniExtToSel}})
+      .Any({{UniS64, S1}, {{Sgpr64}, {Sgpr32AExtBoolInReg}, UniExtToSel}})
+      .Any({{DivS16, S1}, {{Vgpr16}, {Vcc}, VccExtToSel}})
+      .Any({{DivS32, S1}, {{Vgpr32}, {Vcc}, VccExtToSel}})
+      .Any({{DivS64, S1}, {{Vgpr64}, {Vcc}, VccExtToSel}})
+      .Any({{UniS64, S32}, {{Sgpr64}, {Sgpr32}, Ext32To64}})
+      .Any({{DivS64, S32}, {{Vgpr64}, {Vgpr32}, Ext32To64}})
+      // not extending S16 to S32 is questionable.
+      .Any({{UniS64, S16}, {{Sgpr64}, {Sgpr32ZExt}, Ext32To64}})
+      .Any({{DivS64, S16}, {{Vgpr64}, {Vgpr32ZExt}, Ext32To64}})
+      .Any({{UniS32, S16}, {{Sgpr32}, {Sgpr16}}})
+      .Any({{DivS32, S16}, {{Vgpr32}, {Vgpr16}}});
+
+  addRulesForGOpcs({G_SEXT})
+      .Any({{UniS16, S1}, {{Sgpr32Trunc}, {Sgpr32AExtBoolInReg}, UniExtToSel}})
       .Any({{UniS32, S1}, {{Sgpr32}, {Sgpr32AExtBoolInReg}, UniExtToSel}})
+      .Any({{UniS64, S1}, {{Sgpr64}, {Sgpr32AExtBoolInReg}, UniExtToSel}})
+      .Any({{DivS16, S1}, {{Vgpr16}, {Vcc}, VccExtToSel}})
       .Any({{DivS32, S1}, {{Vgpr32}, {Vcc}, VccExtToSel}})
+      .Any({{DivS64, S1}, {{Vgpr64}, {Vcc}, VccExtToSel}})
       .Any({{UniS64, S32}, {{Sgpr64}, {Sgpr32}, Ext32To64}})
-      .Any({{DivS64, S32}, {{Vgpr64}, {Vgpr32}, Ext32To64}});
+      .Any({{DivS64, S32}, {{Vgpr64}, {Vgpr32}, Ext32To64}})
+      // not extending S16 to S32 is questionable.
+      .Any({{UniS64, S16}, {{Sgpr64}, {Sgpr32SExt}, Ext32To64}})
+      .Any({{DivS64, S16}, {{Vgpr64}, {Vgpr32SExt}, Ext32To64}})
+      .Any({{UniS32, S16}, {{Sgpr32}, {Sgpr16}}})
+      .Any({{DivS32, S16}, {{Vgpr32}, {Vgpr16}}});
 
   bool hasUnalignedLoads = ST->getGeneration() >= AMDGPUSubtarget::GFX12;
   bool hasSMRDSmall = ST->hasScalarSubwordLoads();
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.h b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.h
index 685cc0acc0cc4..cdf70d99d4a9e 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.h
@@ -159,6 +159,9 @@ enum RegBankLLTMappingApplyID {
   Sgpr32AExt,
   Sgpr32AExtBoolInReg,
   Sgpr32SExt,
+  Sgpr32ZExt,
+  Vgpr32SExt,
+  Vgpr32ZExt,
 };
 
 // Instruction needs to be replaced with sequence of instructions. Lowering was
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-anyext.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-anyext.mir
index 22de4d00c2a51..a5077fc89240d 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-anyext.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-anyext.mir
@@ -1,6 +1,5 @@
 # NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
-# RUN: llc -mtriple=amdgcn -mcpu=fiji -run-pass=regbankselect %s -verify-machineinstrs -o - -regbankselect-fast | FileCheck %s
-# RUN: llc -mtriple=amdgcn -mcpu=fiji -run-pass=regbankselect %s -verify-machineinstrs -o - -regbankselect-greedy | FileCheck %s
+# RUN: llc -mtriple=amdgcn -mcpu=fiji -run-pass="amdgpu-regbankselect,amdgpu-regbanklegalize" %s -verify-machineinstrs -o - | FileCheck %s
 
 ---
 name: anyext_s32_to_s64_s
@@ -13,7 +12,8 @@ body: |
     ; CHECK: liveins: $sgpr0
     ; CHECK-NEXT: {{  $}}
     ; CHECK-NEXT: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
-    ; CHECK-NEXT: [[ANYEXT:%[0-9]+]]:sgpr(s64) = G_ANYEXT [[COPY]](s32)
+    ; CHECK-NEXT: [[DEF:%[0-9]+]]:sgpr(s32) = G_IMPLICIT_DEF
+    ; CHECK-NEXT: [[MV:%[0-9]+]]:sgpr(s64) = G_MERGE_VALUES [[COPY]](s32), [[DEF]](s32)
     %0:_(s32) = COPY $sgpr0
     %1:_(s64) = G_ANYEXT %0
 ...
@@ -29,9 +29,8 @@ body: |
     ; CHECK: liveins: $vgpr0
     ; CHECK-NEXT: {{  $}}
     ; CHECK-NEXT: [[COPY:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
-    ; CHECK-NEXT: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
     ; CHECK-NEXT: [[DEF:%[0-9]+]]:vgpr(s32) = G_IMPLICIT_DEF
-    ; CHECK-NEXT: [[MV:%[0-9]+]]:vgpr(s64) = G_MERGE_VALUES [[COPY1]](s32), [[DEF]](s32)
+    ; CHECK-NEXT: [[MV:%[0-9]+]]:vgpr(s64) = G_MERGE_VALUES [[COPY]](s32), [[DEF]](s32)
     %0:_(s32) = COPY $vgpr0
     %1:_(s64) = G_ANYEXT %0
 ...
@@ -49,8 +48,7 @@ body: |
     ; CHECK-NEXT: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; CHECK-NEXT: [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr1
     ; CHECK-NEXT: [[ICMP:%[0-9]+]]:sgpr(s32) = G_ICMP intpred(eq), [[COPY]](s32), [[COPY1]]
-    ; CHECK-NEXT: [[TRUNC:%[0-9]+]]:sgpr(s1) = G_TRUNC [[ICMP]](s32)
-    ; CHECK-NEXT: [[ANYEXT:%[0-9]+]]:sgpr(s16) = G_ANYEXT [[TRUNC]](s1)
+    ; CHECK-NEXT: [[TRUNC:%[0-9]+]]:sgpr(s16) = G_TRUNC [[ICMP]](s32)
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = COPY $sgpr1
     %2:_(s1) = G_ICMP intpred(eq), %0, %1
@@ -70,8 +68,6 @@ body: |
     ; CHECK-NEXT: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; CHECK-NEXT: [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr1
     ; CHECK-NEXT: [[ICMP:%[0-9]+]]:sgpr(s32) = G_ICMP intpred(eq), [[COPY]](s32), [[COPY1]]
-    ; CHECK-NEXT: [[TRUNC:%[0-9]+]]:sgpr(s1) = G_TRUNC [[ICMP]](s32)
-    ; CHECK-NEXT: [[ANYEXT:%[0-9]+]]:sgpr(s32) = G_ANYEXT [[TRUNC]](s1)
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = COPY $sgpr1
     %2:_(s1) = G_ICMP intpred(eq), %0, %1
@@ -91,8 +87,7 @@ body: |
     ; CHECK-NEXT: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; CHECK-NEXT: [[COPY1:%[0-9]+]]:sgpr(s32) = COPY $sgpr1
     ; CHECK-NEXT: [[ICMP:%[0-9]+]]:sgpr(s32) = G_ICMP intpred(eq), [[COPY]](s32), [[COPY1]]
-    ; CHECK-NEXT: [[TRUNC:%[0-9]+]]:sgpr(s1) = G_TRUNC [[ICMP]](s32)
-    ; CHECK-NEXT: [[ANYEXT:%[0-9]+]]:sgpr(s64) = G_ANYEXT [[TRUNC]](s1)
+    ; CHECK-NEXT: [[ANYEXT:%[0-9]+]]:sgpr(s64) = G_ANYEXT [[ICMP]](s32)
     %0:_(s32) = COPY $sgpr0
     %1:_(s32) = COPY $sgpr1
     %2:_(s1) = G_ICMP intpred(eq), %0, %1
@@ -112,10 +107,9 @@ body: |
     ; CHECK-NEXT: [[COPY:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
     ; CHECK-NEXT: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY $vgpr1
     ; CHECK-NEXT: [[ICMP:%[0-9]+]]:vcc(s1) = G_ICMP intpred(eq), [[COPY]](s32), [[COPY1]]
-    ; CHECK-NEXT: [[C:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 1
-    ; CHECK-NEXT: [[C1:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 0
-    ; CHECK-NEXT: [[SELECT:%[0-9]+]]:vgpr(s32) = G_SELECT [[ICMP]](s1), [[C]], [[C1]]
-    ; CHECK-NEXT: [[TRUNC:%[0-9]+]]:vgpr(s16) = G_TRUNC [[SELECT]](s32)
+    ; CHECK-NEXT: [[C:%[0-9]+]]:vgpr(s16) = G_CONSTANT i16 1
+    ; CHECK-NEXT: [[C1:%[0-9]+]]:vgpr(s16) = G_CONSTANT i16 0
+    ; CHECK-NEXT: [[SELECT:%[0-9]+]]:vgpr(s16) = G_SELECT [[ICMP]](s1), [[C]], [[C1]]
     %0:_(s32) = COPY $vgpr0
     %1:_(s32) = COPY $vgpr1
     %2:_(s1) = G_ICMP intpred(eq), %0, %1
@@ -160,8 +154,7 @@ body: |
     ; CHECK-NEXT: [[C:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 1
     ; CHECK-NEXT: [[C1:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 0
     ; CHECK-NEXT: [[SELECT:%[0-9]+]]:vgpr(s32) = G_SELECT [[ICMP]](s1), [[C]], [[C1]]
-    ; CHECK-NEXT: [[DEF:%[0-9]+]]:vgpr(s32) = G_IMPLICIT_DEF
-    ; CHECK-NEXT: [[MV:%[0-9]+]]:vgpr(s64) = G_MERGE_VALUES [[SELECT]](s32), [[DEF]](s32)
+    ; CHECK-NEXT: [[MV:%[0-9]+]]:vgpr(s64) = G_MERGE_VALUES [[SELECT]](s32), [[C1]](s32)
     %0:_(s32) = COPY $vgpr0
     %1:_(s32) = COPY $vgpr1
     %2:_(s1) = G_ICMP intpred(eq), %0, %1
@@ -179,8 +172,7 @@ body: |
     ; CHECK: liveins: $sgpr0
     ; CHECK-NEXT: {{  $}}
     ; CHECK-NEXT: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
-    ; CHECK-NEXT: [[TRUNC:%[0-9]+]]:sgpr(s1) = G_TRUNC [[COPY]](s32)
-    ; CHECK-NEXT: [[ANYEXT:%[0-9]+]]:sgpr(s16) = G_ANYEXT [[TRUNC]](s1)
+    ; CHECK-NEXT: [[TRUNC:%[0-9]+]]:sgpr(s16) = G_TRUNC [[COPY]](s32)
     %0:_(s32) = COPY $sgpr0
     %1:_(s1) = G_TRUNC %0
     %2:_(s16) = G_ANYEXT %1
@@ -197,8 +189,6 @@ body: |
     ; CHECK: liveins: $sgpr0
     ; CHECK-NEXT: {{  $}}
     ; CHECK-NEXT: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
-    ; CHECK-NEXT: [[TRUNC:%[0-9]+]]:sgpr(s1) = G_TRUNC [[COPY]](s32)
-    ; CHECK-NEXT: [[ANYEXT:%[0-9]+]]:sgpr(s32) = G_ANYEXT [[TRUNC]](s1)
     %0:_(s32) = COPY $sgpr0
     %1:_(s1) = G_TRUNC %0
     %2:_(s32) = G_ANYEXT %1
@@ -215,8 +205,7 @@ body: |
     ; CHECK: liveins: $sgpr0
     ; CHECK-NEXT: {{  $}}
     ; CHECK-NEXT: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
-    ; CHECK-NEXT: [[TRUNC:%[0-9]+]]:sgpr(s1) = G_TRUNC [[COPY]](s32)
-    ; CHECK-NEXT: [[ANYEXT:%[0-9]+]]:sgpr(s64) = G_ANYEXT [[TRUNC]](s1)
+    ; CHECK-NEXT: [[ANYEXT:%[0-9]+]]:sgpr(s64) = G_ANYEXT [[COPY]](s32)
     %0:_(s32) = COPY $sgpr0
     %1:_(s1) = G_TRUNC %0
     %2:_(s64) = G_ANYEXT %1
@@ -233,8 +222,13 @@ body: |
     ; CHECK: liveins: $vgpr0
     ; CHECK-NEXT: {{  $}}
     ; CHECK-NEXT: [[COPY:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
-    ; CHECK-NEXT: [[TRUNC:%[0-9]+]]:vgpr(s1) = G_TRUNC [[COPY]](s32)
-    ; CHECK-NEXT: [[ANYEXT:%[0-9]+]]:vgpr(s16) = G_ANYEXT [[TRUNC]](s1)
+    ; CHECK-NEXT: [[C:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 1
+    ; CHECK-NEXT: [[AND:%[0-9]+]]:vgpr(s32) = G_AND [[COPY]], [[C]]
+    ; CHECK-NEXT: [[C1:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 0
+    ; CHECK-NEXT: [[ICMP:%[0-9]+]]:vcc(s1) = G_ICMP intpred(ne), [[AND]](s32), [[C1]]
+    ; CHECK-NEXT: [[C2:%[0-9]+]]:vgpr(s16) = G_CONSTANT i16 1
+    ; CHECK-NEXT: [[C3:%[0-9]+]]:vgpr(s16) = G_CONSTANT i16 0
+    ; CHECK-NEXT: [[SELECT:%[0-9]+]]:vgpr(s16) = G_SELECT [[ICMP]](s1), [[C2]], [[C3]]
     %0:_(s32) = COPY $vgpr0
     %1:_(s1) = G_TRUNC %0
     %2:_(s16) = G_ANYEXT %1
@@ -251,8 +245,11 @@ body: |
     ; CHECK: liveins: $vgpr0
     ; CHECK-NEXT: {{  $}}
     ; CHECK-NEXT: [[COPY:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
-    ; CHECK-NEXT: [[TRUNC:%[0-9]+]]:vgpr(s1) = G_TRUNC [[COPY]](s32)
-    ; CHECK-NEXT: [[ANYEXT:%[0-9]+]]:vgpr(s32) = G_ANYEXT [[TRUNC]](s1)
+    ; CHECK-NEXT: [[C:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 1
+    ; CHECK-NEXT: [[AND:%[0-9]+]]:vgpr(s32) = G_AND [[COPY]], [[C]]
+    ; CHECK-NEXT: [[C1:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 0
+    ; CHECK-NEXT: [[ICMP:%[0-9]+]]:vcc(s1) = G_ICMP intpred(ne), [[AND]](s32), [[C1]]
+    ; CHECK-NEXT: [[SELECT:%[0-9]+]]:vgpr(s32) = G_SELECT [[ICMP]](s1), [[C]], [[C1]]
     %0:_(s32) = COPY $vgpr0
     %1:_(s1) = G_TRUNC %0
     %2:_(s32) = G_ANYEXT %1
@@ -269,10 +266,12 @@ body: |
     ; CHECK: liveins: $vgpr0
     ; CHECK-NEXT: {{  $}}
     ; CHECK-NEXT: [[COPY:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
-    ; CHECK-NEXT: [[TRUNC:%[0-9]+]]:vgpr(s1) = G_TRUNC [[COPY]](s32)
-    ; CHECK-NEXT: [[ANYEXT:%[0-9]+]]:vgpr(s32) = G_ANYEXT [[TRUNC]](s1)
-    ; CHECK-NEXT: [[DEF:%[0-9]+]]:vgpr(s32) = G_IMPLICIT_DEF
-    ; CHECK-NEXT: [[MV:%[0-9]+]]:vgpr(s64) = G_MERGE_VALUES [[ANYEXT]](s32), [[DEF]](s32)
+    ; CHECK-NEXT: [[C:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 1
+    ; CHECK-NEXT: [[AND:%[0-9]+]]:vgpr(s32) = G_AND [[COPY]], [[C]]
+    ; CHECK-NEXT: [[C1:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 0
+    ; CHECK-NEXT: [[ICMP:%[0-9]+]]:vcc(s1) = G_ICMP intpred(ne), [[AND]](s32), [[C1]]
+    ; CHECK-NEXT: [[SELECT:%[0-9]+]]:vgpr(s32) = G_SELECT [[ICMP]](s1), [[C]], [[C1]]
+    ; CHECK-NEXT: [[MV:%[0-9]+]]:vgpr(s64) = G_MERGE_VALUES [[SELECT]](s32), [[C1]](s32)
     %0:_(s32) = COPY $vgpr0
     %1:_(s1) = G_TRUNC %0
     %2:_(s64) = G_ANYEXT %1
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-sext.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-sext.mir
index 1e0ba2e79b82e..7378c9366ec36 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-sext.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-sext.mir
@@ -1,6 +1,5 @@
 # NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
-# RUN: llc -mtriple=amdgcn -mcpu=fiji -run-pass=regbankselect %s -verify-machineinstrs -o - -regbankselect-fast | FileCheck %s
-# RUN: llc -mtriple=amdgcn -mcpu=fiji -run-pass=regbankselect %s -verify-machineinstrs -o - -regbankselect-greedy | FileCheck %s
+# RUN: llc -mtriple=amdgcn -mcpu=fiji -run-pass="amdgpu-regbankselect,amdgpu-regbanklegalize" %s -verify-machineinstrs -o - | FileCheck %s
 
 ---
 name: sext_s32_to_s64_s
@@ -13,7 +12,9 @@ body: |
     ; CHECK: liveins: $sgpr0
     ; CHECK-NEXT: {{  $}}
     ; CHECK-NEXT: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
-    ; CHECK-NEXT: [[SEXT:%[0-9]+]]:sgpr(s64) = G_SEXT [[COPY]](s32)
+    ; CHECK-NEXT: [[C:%[0-9]+]]:sgpr(s32) = G_CONSTANT i32 31
+    ; CHECK-NEXT: [[ASHR:%[0-9]+]]:sgpr(s32) = G_ASHR [[COPY]], [[C]](s32)
+    ; CHECK-NEXT: [[MV:%[0-9]+]]:sgpr(s64) = G_MERGE_VALUES [[COPY]](s32), [[ASHR]](s32)
     %0:_(s32) = COPY $sgpr0
     %1:_(s64) = G_SEXT %0
 ...
@@ -30,7 +31,10 @@ body: |
     ; CHECK-NEXT: {{  $}}
     ; CHECK-NEXT: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
     ; CHECK-NEXT: [[TRUNC:%[0-9]+]]:sgpr(s16) = G_TRUNC [[COPY]](s32)
-    ; CHECK-NEXT: [[SEXT:%[0-9]+]]:sgpr(s64) = G_SEXT [[TRUNC]](s16)
+    ; CHECK-NEXT: [[SEXT:%[0-9]+]]:sgpr(s32) = G_SEXT [[TRUNC]](s16)
+    ; CHECK-NEXT: [[C:%[0-9]+]]:sgpr(s32) = G_CONSTANT i32 31
+    ; CHECK-NEXT: [[ASHR:%[0-9]+]]:sgpr(s32) = G_ASHR [[SEXT]], [[C]](s32)
+    ; CHECK-NEXT: [[MV:%[0-9]+]]:sgpr(s64) = G_MERGE_VALUES [[SEXT]](s32), [[ASHR]](s32)
     %0:_(s32) = COPY $sgpr0
     %1:_(s16) = G_TRUNC %0
     %2:_(s64) = G_SEXT %1
@@ -47,10 +51,9 @@ body: |
     ; CHECK: liveins: $vgpr0_vgpr1
     ; CHECK-NEXT: {{  $}}
     ; CHECK-NEXT: [[COPY:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
-    ; CHECK-NEXT: [[COPY1:%[0-9]+]]:vgpr(s32) = COPY [[COPY]](s32)
     ; CHECK-NEXT: [[C:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 31
-    ; CHECK-NEXT: [[ASHR:%[0-9]+]]:vgpr(s32) = G_ASHR [[COPY1]], [[C]](s32)
-    ; CHECK-NEXT: [[MV:%[0-9]+]]:vgpr(s64) = G_MERGE_VALUES [[COPY1]](s32), [[ASHR]](s32)
+    ; CHECK-NEXT: [[ASHR:%[0-9]+]]:vgpr(s32) = G_ASHR [[COPY]], [[C]](s32)
+    ; CHECK-NEXT: [[MV:%[0-9]+]]:vgpr(s64) = G_MERGE_VALUES [[COPY]](s32), [[ASHR]](s32)
     %0:_(s32) = COPY $vgpr0
     %1:_(s64) = G_SEXT %0
 ...
@@ -68,8 +71,12 @@ body: |
     ; CHECK-...
[truncated]

.Any({{{DivS16, S1}}, {{Vgpr16}, {Vcc}, VccExtToSel}})
.Any({{{DivS32, S1}}, {{Vgpr32}, {Vcc}, VccExtToSel}})
.Any({{{DivS64, S1}}, {{Vgpr64}, {Vcc}, VccExtToSel}})
.Any({{UniS64, S32}, {{Sgpr64}, {Sgpr32}, Ext32To64}})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated to the patch: These should be better documented, otherwise it's very hard to read what's actually happening here. I had to go find 2 different struct signatures before getting an idea of what these lines do.

A small comment on top RegBankLegalizeRules that explains how many braces are needed and how the arguments are laid out could go a long way.

I also feel like we could eliminate one or even two sets of braces by just making them arguments, further helping readability. It could just be an overload that's preferred when manually writing the rules, and keep the current signature if we're pushing rules using a loop or something?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably could improve this one a bit. Originally I wanted to keep rules as oneliners. There are Uni and Div that are specialized and have fewer braces and think that almost all remaining opcodes are using them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on the need for documentation: It's hard to follow which of the parts serve, e.g., as patterns, replacements, or asserts.

Copy link
Collaborator Author

@petar-avramovic petar-avramovic Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick explanation for now:
.Any({{DivS32, S1}, {{Vgpr32}, {Vcc}, VccExtToSel}}) is list for predicate checks uniform/divergent and types of operands. Usually one is enough (just dst) but here we check for divergent S32 dst and S1 source
there is a place for c++ check here (see loads)

.Any({{DivS32, S1}, {{Vgpr32}, {Vcc}, VccExtToSel}}) list of which register bank to apply on dst registers (check RegBankLegalizeHelper for details)

.Any({{DivS32, S1}, {{Vgpr32}, {Vcc}, VccExtToSel}}) list of which register bank to apply on source registers

.Any({{DivS32, S1}, {{Vgpr32}, {Vcc}, VccExtToSel}}) ID of more complicated lowering method, for example this one is transforming G_ANYEXT to G_SELECT

there is shorter faster version when checking just dst operand, for example
.Div(S32, {{Vgpr32}, {Vgpr32, Vgpr32, Vgpr32}})
would be equivalent to
.Any({DivS32}, {{Vgpr32}, {Vgpr32, Vgpr32, Vgpr32}})

In first list you don't have to check all operands, check enough to decide what to do, in second two lists (for destination and sources operands) need to cover all operands

Copy link
Collaborator

@nhaehnle nhaehnle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't look at everything, I just went through some of the tests.

Comment on lines +15 to +16
; CHECK-NEXT: [[DEF:%[0-9]+]]:sgpr(s32) = G_IMPLICIT_DEF
; CHECK-NEXT: [[MV:%[0-9]+]]:sgpr(s64) = G_MERGE_VALUES [[COPY]](s32), [[DEF]](s32)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we legalizing this G_ANYEXT to G_MERGE_VALUES, but in anyext_s1_to_s64_scc we generate a new G_ANYEXT?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was combiner, fixed

@@ -160,8 +154,7 @@ body: |
; CHECK-NEXT: [[C:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 1
; CHECK-NEXT: [[C1:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 0
; CHECK-NEXT: [[SELECT:%[0-9]+]]:vgpr(s32) = G_SELECT [[ICMP]](s1), [[C]], [[C1]]
; CHECK-NEXT: [[DEF:%[0-9]+]]:vgpr(s32) = G_IMPLICIT_DEF
; CHECK-NEXT: [[MV:%[0-9]+]]:vgpr(s64) = G_MERGE_VALUES [[SELECT]](s32), [[DEF]](s32)
; CHECK-NEXT: [[MV:%[0-9]+]]:vgpr(s64) = G_MERGE_VALUES [[SELECT]](s32), [[C1]](s32)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is a code quality regression: the input has G_ANYEXT, so the high half can be undefined.

@@ -215,8 +205,7 @@ body: |
; CHECK: liveins: $sgpr0
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
; CHECK-NEXT: [[TRUNC:%[0-9]+]]:sgpr(s1) = G_TRUNC [[COPY]](s32)
; CHECK-NEXT: [[ANYEXT:%[0-9]+]]:sgpr(s64) = G_ANYEXT [[TRUNC]](s1)
; CHECK-NEXT: [[ANYEXT:%[0-9]+]]:sgpr(s64) = G_ANYEXT [[COPY]](s32)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this a correctness regression? I'm not entirely certain because I remember there was some weirdness around what G_TRUNC means semantically. Can you explain why there is no need for a trunc or bitwise and or something like that in the output?

Note that anyext_s1_to_s32_vgpr does leave a G_AND, so either that test shows a code quality issue or this test is incorrect.

Comment on lines +225 to +234
; CHECK-NEXT: [[C:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 1
; CHECK-NEXT: [[AND:%[0-9]+]]:vgpr(s32) = G_AND [[COPY]], [[C]]
; CHECK-NEXT: [[C1:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 0
; CHECK-NEXT: [[ICMP:%[0-9]+]]:vcc(s1) = G_ICMP intpred(ne), [[AND]](s32), [[C1]]
; CHECK-NEXT: [[C2:%[0-9]+]]:vgpr(s16) = G_CONSTANT i16 1
; CHECK-NEXT: [[C3:%[0-9]+]]:vgpr(s16) = G_CONSTANT i16 0
; CHECK-NEXT: [[SELECT:%[0-9]+]]:vgpr(s16) = G_SELECT [[ICMP]](s1), [[C2]], [[C3]]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unnecessarily convoluted. A single G_TRUNC should do the trick. (Isn't that something a combiner could do?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is not much we can do here. Test by itself is not appropriate for regbankselect considering our rules in legalizer.
Input like this should not reach regbankselect.
Should have been picked up by basic trunc + anyext combine from artifact combiner in legalizer:
%0:(s32) = COPY $vgpr0
%2:
(s16) = G_TRUNC %0
I see no reason to try and fix this as it would require running combiner before regbanklegalize

Comment on lines 247 to +255
; CHECK-NEXT: [[COPY:%[0-9]+]]:vgpr(s32) = COPY $vgpr0
; CHECK-NEXT: [[TRUNC:%[0-9]+]]:vgpr(s1) = G_TRUNC [[COPY]](s32)
; CHECK-NEXT: [[ANYEXT:%[0-9]+]]:vgpr(s32) = G_ANYEXT [[TRUNC]](s1)
; CHECK-NEXT: [[C:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 1
; CHECK-NEXT: [[AND:%[0-9]+]]:vgpr(s32) = G_AND [[COPY]], [[C]]
; CHECK-NEXT: [[C1:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 0
; CHECK-NEXT: [[ICMP:%[0-9]+]]:vcc(s1) = G_ICMP intpred(ne), [[AND]](s32), [[C1]]
; CHECK-NEXT: [[SELECT:%[0-9]+]]:vgpr(s32) = G_SELECT [[ICMP]](s1), [[C]], [[C1]]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar here: This could be combined down to just a no-op -- don't combiners do that already? They should, and so this should probably not be handled separately by legalization....

Copy link
Collaborator Author

@petar-avramovic petar-avramovic Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be combined down to just a no-op -- don't combiners do that already?

They do, I just did not want to delete any old tests

Comment on lines 269 to 274
; CHECK-NEXT: [[C:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 1
; CHECK-NEXT: [[AND:%[0-9]+]]:vgpr(s32) = G_AND [[COPY]], [[C]]
; CHECK-NEXT: [[C1:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 0
; CHECK-NEXT: [[ICMP:%[0-9]+]]:vcc(s1) = G_ICMP intpred(ne), [[AND]](s32), [[C1]]
; CHECK-NEXT: [[SELECT:%[0-9]+]]:vgpr(s32) = G_SELECT [[ICMP]](s1), [[C]], [[C1]]
; CHECK-NEXT: [[MV:%[0-9]+]]:vgpr(s64) = G_MERGE_VALUES [[SELECT]](s32), [[C1]](s32)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could just be a single G_ANYEXT

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again job for artifact combiner from legalizer. Need to lower Vgpr-to-Vcc trunc to select

@petar-avramovic petar-avramovic force-pushed the users/petar-avramovic/extends branch from 2ac46e4 to fdca6f3 Compare March 27, 2025 14:42
@petar-avramovic petar-avramovic force-pushed the users/petar-avramovic/andorxor branch 2 times, most recently from 4e74a7e to 54ff035 Compare March 27, 2025 15:04
@petar-avramovic petar-avramovic force-pushed the users/petar-avramovic/extends branch from fdca6f3 to 55cb190 Compare March 27, 2025 15:04
@petar-avramovic petar-avramovic force-pushed the users/petar-avramovic/extends branch from 55cb190 to c57e522 Compare April 14, 2025 14:38
@petar-avramovic petar-avramovic force-pushed the users/petar-avramovic/andorxor branch from 54ff035 to ba48a1a Compare April 14, 2025 14:38
@@ -179,8 +174,7 @@ body: |
; CHECK: liveins: $sgpr0
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Re: line +159]

This change is a code quality regression: the input has G_ANYEXT, so the high half can be undefined.

fixed

See this comment inline on Graphite.

; CHECK-NEXT: [[TRUNC:%[0-9]+]]:sgpr(s1) = G_TRUNC [[COPY]](s32)
; CHECK-NEXT: [[ANYEXT:%[0-9]+]]:sgpr(s64) = G_ANYEXT [[TRUNC]](s1)
; CHECK-NEXT: [[DEF:%[0-9]+]]:sgpr(s32) = G_IMPLICIT_DEF
; CHECK-NEXT: [[MV:%[0-9]+]]:sgpr(s64) = G_MERGE_VALUES [[COPY]](s32), [[DEF]](s32)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is correct spot from the original comment but

Isn't this a correctness regression? I'm not entirely certain because I remember there was some weirdness around what G_TRUNC means semantically. Can you explain why there is no need for a trunc or bitwise and or something like that in the output?

G_TRUNC and G_ANYEXT are no-op with the exception when one operand is vcc. Here we have uniform S1 - trunc + anyext is no-op.
Trunc to vcc is clear high bits, then compare
Anyext from vcc is select

Note that anyext_s1_to_s32_vgpr does leave a G_AND, so either that test shows a code quality issue or this test is incorrect.

anyext_s1_to_s32_vgpr we need to lower vgpr trunc to vcc. And is from clearing high bits for icmp

@petar-avramovic petar-avramovic force-pushed the users/petar-avramovic/extends branch from c57e522 to 61c28d2 Compare April 24, 2025 10:03
@petar-avramovic petar-avramovic force-pushed the users/petar-avramovic/andorxor branch from ba48a1a to 5cc4f82 Compare April 24, 2025 10:03
Comment on lines 143 to 144
}
if (Ty == S64) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
}
if (Ty == S64) {
} else if (Ty == S64) {

Can also add a final else with llvm_unreachable

Register Src = MI.getOperand(1).getReg();
unsigned Opc = MI.getOpcode();
if (Ty == S32 || Ty == S16) {
auto True = B.buildConstant({VgprRB, Ty}, Opc == G_SEXT ? -1 : 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Opc == G_SEXT ? -1 : 1 can go in a variable and be reused in both cases to remove repetitions

@petar-avramovic petar-avramovic force-pushed the users/petar-avramovic/andorxor branch from 5cc4f82 to e6b6d99 Compare May 8, 2025 10:14
@petar-avramovic petar-avramovic force-pushed the users/petar-avramovic/extends branch from 61c28d2 to 915bee3 Compare May 8, 2025 10:14
@petar-avramovic petar-avramovic force-pushed the users/petar-avramovic/extends branch from 915bee3 to 649adb1 Compare May 22, 2025 12:18
@petar-avramovic petar-avramovic force-pushed the users/petar-avramovic/andorxor branch from e6b6d99 to 5b98486 Compare May 22, 2025 12:18
@petar-avramovic
Copy link
Collaborator Author

ping

break;
}
case AMDGPU::G_ANYEXT: {
Hi = B.buildUndef({RB, S32});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

poison, but not sure the poison PR got merged yet

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one #127825 ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Hi = False;
break;
case G_ANYEXT:
Hi = B.buildUndef({VgprRB_S32});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

poison, but not sure the poison PR got merged yet

auto True = B.buildConstant({VgprRB, Ty}, TrueExtCst);
auto False = B.buildConstant({VgprRB, Ty}, 0);
B.buildSelect(Dst, Src, True, False);
} else if (Ty == S64) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about pointers and vectors?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is used for S1 {S|Z|A}EXT to S16/S32/S64, iirc pointers are not legal here, and vectors types are scalarized in legalizer

LLT Ty = MRI.getType(Dst);
Register Src = MI.getOperand(1).getReg();
unsigned Opc = MI.getOpcode();
int TrueExtCst = (Opc == G_SEXT ? -1 : 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int TrueExtCst = (Opc == G_SEXT ? -1 : 1);
int TrueExtCst = Opc == G_SEXT ? -1 : 1;

@petar-avramovic petar-avramovic force-pushed the users/petar-avramovic/extends branch from 649adb1 to ffc8507 Compare May 22, 2025 15:10
@petar-avramovic petar-avramovic force-pushed the users/petar-avramovic/andorxor branch from 5b98486 to c08c9f0 Compare May 22, 2025 15:10
@petar-avramovic petar-avramovic force-pushed the users/petar-avramovic/andorxor branch from c08c9f0 to 4b82d75 Compare May 26, 2025 09:56
@petar-avramovic petar-avramovic force-pushed the users/petar-avramovic/extends branch from ffc8507 to 5083994 Compare May 26, 2025 09:56
@petar-avramovic petar-avramovic force-pushed the users/petar-avramovic/andorxor branch from 4b82d75 to 7074bc4 Compare May 26, 2025 10:00
@petar-avramovic petar-avramovic force-pushed the users/petar-avramovic/extends branch from 5083994 to 9716154 Compare May 26, 2025 10:01
Copy link
Collaborator Author

petar-avramovic commented May 26, 2025

Merge activity

  • May 26, 10:03 AM UTC: A user started a stack merge that includes this pull request via Graphite.
  • May 26, 10:08 AM UTC: Graphite rebased this pull request as part of a merge.
  • May 26, 10:10 AM UTC: @petar-avramovic merged this pull request with Graphite.

@petar-avramovic petar-avramovic force-pushed the users/petar-avramovic/andorxor branch from 7074bc4 to f4ded85 Compare May 26, 2025 10:05
Base automatically changed from users/petar-avramovic/andorxor to main May 26, 2025 10:07
Uniform S1:
Truncs to uniform S1 and AnyExts from S1 are left as is as they are meant
to be combined away. Uniform S1 ZExt and SExt are lowered using select.
Divergent S1:
Trunc of VGPR to VCC is lowered as compare.
Extends of VCC are lowered using select.

For remaining types:
S32 to S64 ZExt and SExt are lowered using merge values, AnyExt and Trunc
are again left as is to be combined away.
Notably uniform S16 for SExt and Zext is not lowered to S32 and left as is
for instruction select to deal with them. This is because there are patterns
that check for S16 type.
@petar-avramovic petar-avramovic force-pushed the users/petar-avramovic/extends branch from 9716154 to dc8b75b Compare May 26, 2025 10:08
@petar-avramovic petar-avramovic merged commit 66915b5 into main May 26, 2025
5 of 7 checks passed
@petar-avramovic petar-avramovic deleted the users/petar-avramovic/extends branch May 26, 2025 10:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.