Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

[MachineScheduler][AMDGPU] Allow scheduling of single-MI regions #128739

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Feb 27, 2025

Conversation

lucas-rami
Copy link
Contributor

@lucas-rami lucas-rami commented Feb 25, 2025

The MI scheduler skips regions containing a single MI during scheduling. This can prevent targets that perform multi-stage scheduling and move MIs between regions during some stages to reason correctly about the entire IR, since some MIs will not be assigned to a region at the beginning.

This makes the machine scheduler no longer skip single-MI regions. Only a few unit tests are affected (mainly those which check for the scheduler's debug output).

@lucas-rami lucas-rami added backend:AMDGPU mi-sched machine instruction scheduler labels Feb 25, 2025
@llvmbot
Copy link
Member

llvmbot commented Feb 25, 2025

@llvm/pr-subscribers-backend-powerpc
@llvm/pr-subscribers-backend-arm

@llvm/pr-subscribers-backend-amdgpu

Author: Lucas Ramirez (lucas-rami)

Changes

The MI scheduler skips regions containing a single MI during scheduling. This can prevent targets that perform multi-stage scheduling and move MIs between regions during some stages to reason correctly about the entire IR, since some MIs will not be assigned to a region at the beginning.

This adds a flag to ScheduleDAGInstrs to tell the scheduler to not skip over single-MI regions. Only the AMDGPU target currently enables this flag, so scheduling behavior is unaffected for all other targets.


Full diff: https://github.com/llvm/llvm-project/pull/128739.diff

5 Files Affected:

  • (modified) llvm/include/llvm/CodeGen/ScheduleDAGInstrs.h (+8)
  • (modified) llvm/lib/CodeGen/MachineScheduler.cpp (+2-1)
  • (modified) llvm/lib/CodeGen/ScheduleDAGInstrs.cpp (+3-2)
  • (modified) llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp (+4-1)
  • (modified) llvm/test/CodeGen/AMDGPU/debug-value-scheduler-liveins.mir (+2)
diff --git a/llvm/include/llvm/CodeGen/ScheduleDAGInstrs.h b/llvm/include/llvm/CodeGen/ScheduleDAGInstrs.h
index aaa10e684687c..82240745c2772 100644
--- a/llvm/include/llvm/CodeGen/ScheduleDAGInstrs.h
+++ b/llvm/include/llvm/CodeGen/ScheduleDAGInstrs.h
@@ -124,6 +124,9 @@ namespace llvm {
     /// rescheduling).
     bool RemoveKillFlags;
 
+    /// True if regions with a single MI should be scheduled.
+    bool ScheduleSingleMIRegions = false;
+
     /// The standard DAG builder does not normally include terminators as DAG
     /// nodes because it does not create the necessary dependencies to prevent
     /// reordering. A specialized scheduler can override
@@ -288,6 +291,11 @@ namespace llvm {
       return Topo.IsReachable(SU, TargetSU);
     }
 
+    /// Whether regions with a single MI should be scheduled.
+    bool shouldScheduleSingleMIRegions() const {
+      return ScheduleSingleMIRegions;
+    }
+
     /// Returns an iterator to the top of the current scheduling region.
     MachineBasicBlock::iterator begin() const { return RegionBegin; }
 
diff --git a/llvm/lib/CodeGen/MachineScheduler.cpp b/llvm/lib/CodeGen/MachineScheduler.cpp
index 0da7535031a7d..b9903ee832d31 100644
--- a/llvm/lib/CodeGen/MachineScheduler.cpp
+++ b/llvm/lib/CodeGen/MachineScheduler.cpp
@@ -770,6 +770,7 @@ void MachineSchedulerBase::scheduleRegions(ScheduleDAGInstrs &Scheduler,
 
     MBBRegionsVector MBBRegions;
     getSchedRegions(&*MBB, MBBRegions, Scheduler.doMBBSchedRegionsTopDown());
+    bool ScheduleSingleMI = Scheduler.shouldScheduleSingleMIRegions();
     for (const SchedRegion &R : MBBRegions) {
       MachineBasicBlock::iterator I = R.RegionBegin;
       MachineBasicBlock::iterator RegionEnd = R.RegionEnd;
@@ -780,7 +781,7 @@ void MachineSchedulerBase::scheduleRegions(ScheduleDAGInstrs &Scheduler,
       Scheduler.enterRegion(&*MBB, I, RegionEnd, NumRegionInstrs);
 
       // Skip empty scheduling regions (0 or 1 schedulable instructions).
-      if (I == RegionEnd || I == std::prev(RegionEnd)) {
+      if (I == RegionEnd || (!ScheduleSingleMI && I == std::prev(RegionEnd))) {
         // Close the current region. Bundle the terminator if needed.
         // This invalidates 'RegionEnd' and 'I'.
         Scheduler.exitRegion();
diff --git a/llvm/lib/CodeGen/ScheduleDAGInstrs.cpp b/llvm/lib/CodeGen/ScheduleDAGInstrs.cpp
index a26804707dd1f..cd652659dfdef 100644
--- a/llvm/lib/CodeGen/ScheduleDAGInstrs.cpp
+++ b/llvm/lib/CodeGen/ScheduleDAGInstrs.cpp
@@ -116,8 +116,9 @@ ScheduleDAGInstrs::ScheduleDAGInstrs(MachineFunction &mf,
                                      bool RemoveKillFlags)
     : ScheduleDAG(mf), MLI(mli), MFI(mf.getFrameInfo()),
       RemoveKillFlags(RemoveKillFlags),
-      UnknownValue(UndefValue::get(
-                             Type::getVoidTy(mf.getFunction().getContext()))), Topo(SUnits, &ExitSU) {
+      UnknownValue(
+          UndefValue::get(Type::getVoidTy(mf.getFunction().getContext()))),
+      Topo(SUnits, &ExitSU) {
   DbgValues.clear();
 
   const TargetSubtargetInfo &ST = mf.getSubtarget();
diff --git a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
index 176586e3fbbb6..dbab18b7ae46f 100644
--- a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
@@ -759,7 +759,10 @@ GCNScheduleDAGMILive::GCNScheduleDAGMILive(
       MFI(*MF.getInfo<SIMachineFunctionInfo>()),
       StartingOccupancy(MFI.getOccupancy()), MinOccupancy(StartingOccupancy),
       RegionLiveOuts(this, /*IsLiveOut=*/true) {
-
+  // We want regions with a single MI to be scheduled so that we can reason on
+  // them correctlt during scheduling stages that move MIs between regions (e.g.
+  // rematerialization).
+  ScheduleSingleMIRegions = true;
   LLVM_DEBUG(dbgs() << "Starting occupancy is " << StartingOccupancy << ".\n");
   if (RelaxedOcc) {
     MinOccupancy = std::min(MFI.getMinAllowedOccupancy(), StartingOccupancy);
diff --git a/llvm/test/CodeGen/AMDGPU/debug-value-scheduler-liveins.mir b/llvm/test/CodeGen/AMDGPU/debug-value-scheduler-liveins.mir
index d415346b49b28..2a08c52e447ba 100644
--- a/llvm/test/CodeGen/AMDGPU/debug-value-scheduler-liveins.mir
+++ b/llvm/test/CodeGen/AMDGPU/debug-value-scheduler-liveins.mir
@@ -2,6 +2,8 @@
 # RUN: llc -mtriple=amdgcn -mcpu=gfx908 -passes=machine-scheduler %s -o - -debug-only=machine-scheduler 2>&1 | FileCheck %s
 # REQUIRES: asserts
 
+# CHECK: ********** MI Scheduling **********
+# CHECK-NEXT: test_get_liveins:%bb.0
 # CHECK: ********** MI Scheduling **********
 # CHECK-NEXT: test_get_liveins:%bb.1
 # CHECK: Region live-in pressure:  VGPRs: 1 AGPRs: 0, SGPRs: 0, LVGPR WT: 0, LSGPR WT: 0

@lucas-rami lucas-rami requested review from arsenm, bcahoon, davemgreen and jrbyrnes and removed request for arsenm and bcahoon February 25, 2025 16:32
llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp Outdated Show resolved Hide resolved
Co-authored-by: Jay Foad <jay.foad@gmail.com>
Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testcase where this makes a difference?

@@ -124,6 +124,9 @@ namespace llvm {
/// rescheduling).
bool RemoveKillFlags;

/// True if regions with a single MI should be scheduled.
bool ScheduleSingleMIRegions = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Over-configuration? Can you just unconditionally do this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the flag and made this the generic scheduler behavior. The only unit test that breaks in a way I cannot really understand is misched-branch-targets.mir. I "fixed" it anyway, but not sure if the change break something here.

Copy link
Contributor

@shiltian shiltian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I looking at the right diff here? The only code change shown here is the removal of I == std::prev(RegionEnd).

@lucas-rami
Copy link
Contributor Author

lucas-rami commented Feb 26, 2025

Testcase where this makes a difference?

I spent some time trying to come up with one but didn't manage to. This change is motivated by ongoing pre-RA rematerialization work on the AMDGPU backend (see #125885) which requires that we are able to track regions with a single MI. The current implementation follows a different logic to identify rematerializable instructions, and I can't manage to make this fail/succeed depending on whether single-MI regions are filtered out/in.

Am I looking at the right diff here? The only code change shown here is the removal of I == std::prev(RegionEnd).

A few unit tests across various targets were also updated manually, and some previous changes reverted.

llvm/lib/CodeGen/MachineScheduler.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the comment about the flag from the message

@lucas-rami lucas-rami merged commit 15e295d into llvm:main Feb 27, 2025
6 of 10 checks passed
@nikic
Copy link
Contributor

nikic commented Feb 27, 2025

@jayfoad
Copy link
Contributor

jayfoad commented Feb 28, 2025

The MI scheduler skips regions containing a single MI during scheduling. This can prevent targets that perform multi-stage scheduling and move MIs between regions during some stages to reason correctly about the entire IR, since some MIs will not be assigned to a region at the beginning.

I thought the scheduler only ever built the DAG for one region at a time. What's the mechanism for having DAG for multiple regions live at the same time?

@lucas-rami
Copy link
Contributor Author

lucas-rami commented Feb 28, 2025

Can we not do this on targets that don't need it?

I am ok reintroducing the flag to only conditionally do this for targets that care (off-by-default).

I thought the scheduler only ever built the DAG for one region at a time. What's the mechanism for having DAG for multiple regions live at the same time?

The AMDGPU backend doesn't actually schedule in the ScheduleDAGInstrs::schedule hook, it only collects the regions there. Scheduling only happens in the ScheduleDAGInstrs::finalizeSchedule hook, once all regions have been collected. In the end it still schedule regions one at a time, but there may be some instruction movement across regions between scheduling stages.

joaosaffran pushed a commit to joaosaffran/llvm-project that referenced this pull request Mar 3, 2025
…m#128739)

The MI scheduler skips regions containing a single MI during scheduling.
This can prevent targets that perform multi-stage scheduling and move
MIs between regions during some stages to reason correctly about the
entire IR, since some MIs will not be assigned to a region at the
beginning.

This makes the machine scheduler no longer skip single-MI regions. Only
a few unit tests are affected (mainly those which check for the
scheduler's debug output).
lucas-rami added a commit that referenced this pull request Mar 4, 2025
Following 15e295d the machine scheduler no longer filters-out single-MI
regions when emitting regions to schedule. While this has no functional
impact at the moment, it generally has a negative compile-time impact
(see #128739).

Since all targets but AMDGPU do not care for this behavior, this
introduces an off-by-default flag to `ScheduleDAGInstrs` to control
whether such regions are going to be scheduled, effectively reverting
15e295d for all targets but AMDGPU (currently the only target enabling
this flag).
jph-13 pushed a commit to jph-13/llvm-project that referenced this pull request Mar 21, 2025
)

Following 15e295d the machine scheduler no longer filters-out single-MI
regions when emitting regions to schedule. While this has no functional
impact at the moment, it generally has a negative compile-time impact
(see llvm#128739).

Since all targets but AMDGPU do not care for this behavior, this
introduces an off-by-default flag to `ScheduleDAGInstrs` to control
whether such regions are going to be scheduled, effectively reverting
15e295d for all targets but AMDGPU (currently the only target enabling
this flag).
jrbyrnes pushed a commit to jrbyrnes/llvm-project that referenced this pull request May 1, 2025
)

Following 15e295d the machine scheduler no longer filters-out single-MI
regions when emitting regions to schedule. While this has no functional
impact at the moment, it generally has a negative compile-time impact
(see llvm#128739).

Since all targets but AMDGPU do not care for this behavior, this
introduces an off-by-default flag to `ScheduleDAGInstrs` to control
whether such regions are going to be scheduled, effectively reverting
15e295d for all targets but AMDGPU (currently the only target enabling
this flag).

Change-Id: Ib38f9b7e8d2bb1073cb43ed7e58eaf251ffdce48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.