-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[MachineScheduler][AMDGPU] Allow scheduling of single-MI regions #128739
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-powerpc @llvm/pr-subscribers-backend-amdgpu Author: Lucas Ramirez (lucas-rami) ChangesThe MI scheduler skips regions containing a single MI during scheduling. This can prevent targets that perform multi-stage scheduling and move MIs between regions during some stages to reason correctly about the entire IR, since some MIs will not be assigned to a region at the beginning. This adds a flag to Full diff: https://github.com/llvm/llvm-project/pull/128739.diff 5 Files Affected:
diff --git a/llvm/include/llvm/CodeGen/ScheduleDAGInstrs.h b/llvm/include/llvm/CodeGen/ScheduleDAGInstrs.h
index aaa10e684687c..82240745c2772 100644
--- a/llvm/include/llvm/CodeGen/ScheduleDAGInstrs.h
+++ b/llvm/include/llvm/CodeGen/ScheduleDAGInstrs.h
@@ -124,6 +124,9 @@ namespace llvm {
/// rescheduling).
bool RemoveKillFlags;
+ /// True if regions with a single MI should be scheduled.
+ bool ScheduleSingleMIRegions = false;
+
/// The standard DAG builder does not normally include terminators as DAG
/// nodes because it does not create the necessary dependencies to prevent
/// reordering. A specialized scheduler can override
@@ -288,6 +291,11 @@ namespace llvm {
return Topo.IsReachable(SU, TargetSU);
}
+ /// Whether regions with a single MI should be scheduled.
+ bool shouldScheduleSingleMIRegions() const {
+ return ScheduleSingleMIRegions;
+ }
+
/// Returns an iterator to the top of the current scheduling region.
MachineBasicBlock::iterator begin() const { return RegionBegin; }
diff --git a/llvm/lib/CodeGen/MachineScheduler.cpp b/llvm/lib/CodeGen/MachineScheduler.cpp
index 0da7535031a7d..b9903ee832d31 100644
--- a/llvm/lib/CodeGen/MachineScheduler.cpp
+++ b/llvm/lib/CodeGen/MachineScheduler.cpp
@@ -770,6 +770,7 @@ void MachineSchedulerBase::scheduleRegions(ScheduleDAGInstrs &Scheduler,
MBBRegionsVector MBBRegions;
getSchedRegions(&*MBB, MBBRegions, Scheduler.doMBBSchedRegionsTopDown());
+ bool ScheduleSingleMI = Scheduler.shouldScheduleSingleMIRegions();
for (const SchedRegion &R : MBBRegions) {
MachineBasicBlock::iterator I = R.RegionBegin;
MachineBasicBlock::iterator RegionEnd = R.RegionEnd;
@@ -780,7 +781,7 @@ void MachineSchedulerBase::scheduleRegions(ScheduleDAGInstrs &Scheduler,
Scheduler.enterRegion(&*MBB, I, RegionEnd, NumRegionInstrs);
// Skip empty scheduling regions (0 or 1 schedulable instructions).
- if (I == RegionEnd || I == std::prev(RegionEnd)) {
+ if (I == RegionEnd || (!ScheduleSingleMI && I == std::prev(RegionEnd))) {
// Close the current region. Bundle the terminator if needed.
// This invalidates 'RegionEnd' and 'I'.
Scheduler.exitRegion();
diff --git a/llvm/lib/CodeGen/ScheduleDAGInstrs.cpp b/llvm/lib/CodeGen/ScheduleDAGInstrs.cpp
index a26804707dd1f..cd652659dfdef 100644
--- a/llvm/lib/CodeGen/ScheduleDAGInstrs.cpp
+++ b/llvm/lib/CodeGen/ScheduleDAGInstrs.cpp
@@ -116,8 +116,9 @@ ScheduleDAGInstrs::ScheduleDAGInstrs(MachineFunction &mf,
bool RemoveKillFlags)
: ScheduleDAG(mf), MLI(mli), MFI(mf.getFrameInfo()),
RemoveKillFlags(RemoveKillFlags),
- UnknownValue(UndefValue::get(
- Type::getVoidTy(mf.getFunction().getContext()))), Topo(SUnits, &ExitSU) {
+ UnknownValue(
+ UndefValue::get(Type::getVoidTy(mf.getFunction().getContext()))),
+ Topo(SUnits, &ExitSU) {
DbgValues.clear();
const TargetSubtargetInfo &ST = mf.getSubtarget();
diff --git a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
index 176586e3fbbb6..dbab18b7ae46f 100644
--- a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
@@ -759,7 +759,10 @@ GCNScheduleDAGMILive::GCNScheduleDAGMILive(
MFI(*MF.getInfo<SIMachineFunctionInfo>()),
StartingOccupancy(MFI.getOccupancy()), MinOccupancy(StartingOccupancy),
RegionLiveOuts(this, /*IsLiveOut=*/true) {
-
+ // We want regions with a single MI to be scheduled so that we can reason on
+ // them correctlt during scheduling stages that move MIs between regions (e.g.
+ // rematerialization).
+ ScheduleSingleMIRegions = true;
LLVM_DEBUG(dbgs() << "Starting occupancy is " << StartingOccupancy << ".\n");
if (RelaxedOcc) {
MinOccupancy = std::min(MFI.getMinAllowedOccupancy(), StartingOccupancy);
diff --git a/llvm/test/CodeGen/AMDGPU/debug-value-scheduler-liveins.mir b/llvm/test/CodeGen/AMDGPU/debug-value-scheduler-liveins.mir
index d415346b49b28..2a08c52e447ba 100644
--- a/llvm/test/CodeGen/AMDGPU/debug-value-scheduler-liveins.mir
+++ b/llvm/test/CodeGen/AMDGPU/debug-value-scheduler-liveins.mir
@@ -2,6 +2,8 @@
# RUN: llc -mtriple=amdgcn -mcpu=gfx908 -passes=machine-scheduler %s -o - -debug-only=machine-scheduler 2>&1 | FileCheck %s
# REQUIRES: asserts
+# CHECK: ********** MI Scheduling **********
+# CHECK-NEXT: test_get_liveins:%bb.0
# CHECK: ********** MI Scheduling **********
# CHECK-NEXT: test_get_liveins:%bb.1
# CHECK: Region live-in pressure: VGPRs: 1 AGPRs: 0, SGPRs: 0, LVGPR WT: 0, LSGPR WT: 0
|
Co-authored-by: Jay Foad <jay.foad@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testcase where this makes a difference?
@@ -124,6 +124,9 @@ namespace llvm { | ||
/// rescheduling). | ||
bool RemoveKillFlags; | ||
|
||
/// True if regions with a single MI should be scheduled. | ||
bool ScheduleSingleMIRegions = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Over-configuration? Can you just unconditionally do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the flag and made this the generic scheduler behavior. The only unit test that breaks in a way I cannot really understand is misched-branch-targets.mir
. I "fixed" it anyway, but not sure if the change break something here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I looking at the right diff here? The only code change shown here is the removal of I == std::prev(RegionEnd)
.
I spent some time trying to come up with one but didn't manage to. This change is motivated by ongoing pre-RA rematerialization work on the AMDGPU backend (see #125885) which requires that we are able to track regions with a single MI. The current implementation follows a different logic to identify rematerializable instructions, and I can't manage to make this fail/succeed depending on whether single-MI regions are filtered out/in.
A few unit tests across various targets were also updated manually, and some previous changes reverted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the comment about the flag from the message
This has some compile-time impact (https://llvm-compile-time-tracker.com/compare.php?from=c0b5451129bba52e33cd7957d58af897a58d14c6&to=15e295d30aa356a0ab1d83e477375cf3ef314947&stat=instructions:u). Can we not do this on targets that don't need it? |
I thought the scheduler only ever built the DAG for one region at a time. What's the mechanism for having DAG for multiple regions live at the same time? |
I am ok reintroducing the flag to only conditionally do this for targets that care (off-by-default).
The AMDGPU backend doesn't actually schedule in the |
…m#128739) The MI scheduler skips regions containing a single MI during scheduling. This can prevent targets that perform multi-stage scheduling and move MIs between regions during some stages to reason correctly about the entire IR, since some MIs will not be assigned to a region at the beginning. This makes the machine scheduler no longer skip single-MI regions. Only a few unit tests are affected (mainly those which check for the scheduler's debug output).
Following 15e295d the machine scheduler no longer filters-out single-MI regions when emitting regions to schedule. While this has no functional impact at the moment, it generally has a negative compile-time impact (see #128739). Since all targets but AMDGPU do not care for this behavior, this introduces an off-by-default flag to `ScheduleDAGInstrs` to control whether such regions are going to be scheduled, effectively reverting 15e295d for all targets but AMDGPU (currently the only target enabling this flag).
) Following 15e295d the machine scheduler no longer filters-out single-MI regions when emitting regions to schedule. While this has no functional impact at the moment, it generally has a negative compile-time impact (see llvm#128739). Since all targets but AMDGPU do not care for this behavior, this introduces an off-by-default flag to `ScheduleDAGInstrs` to control whether such regions are going to be scheduled, effectively reverting 15e295d for all targets but AMDGPU (currently the only target enabling this flag).
) Following 15e295d the machine scheduler no longer filters-out single-MI regions when emitting regions to schedule. While this has no functional impact at the moment, it generally has a negative compile-time impact (see llvm#128739). Since all targets but AMDGPU do not care for this behavior, this introduces an off-by-default flag to `ScheduleDAGInstrs` to control whether such regions are going to be scheduled, effectively reverting 15e295d for all targets but AMDGPU (currently the only target enabling this flag). Change-Id: Ib38f9b7e8d2bb1073cb43ed7e58eaf251ffdce48
The MI scheduler skips regions containing a single MI during scheduling. This can prevent targets that perform multi-stage scheduling and move MIs between regions during some stages to reason correctly about the entire IR, since some MIs will not be assigned to a region at the beginning.
This makes the machine scheduler no longer skip single-MI regions. Only a few unit tests are affected (mainly those which check for the scheduler's debug output).