[AMDGPU][LLVM] Improve unrolling for user-requested loop unrolling via pragma directive #140320

doru1004 · May 17, 2025

In certain cases in which the user requests loop unrolling via pragma unroll the unroll pass decides that the transformation is not profitable due to the cost model being too conservative. In this patch we relax the thresholds for unrolling in the case in which the pragma unroll is used.

llvmbot · May 17, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Gheorghe-Teodor Bercea (doru1004)

Changes

In certain cases in which the user requests loop unrolling via pragma unroll the unroll pass decides that the transformation is not profitable due to the cost model being too conservative. In this patch we relax the thresholds for unrolling in the case in which the pragma unroll is used.

Full diff: https://github.com/llvm/llvm-project/pull/140320.diff

3 Files Affected:

(modified) llvm/include/llvm/Analysis/TargetTransformInfo.h (+2)
(modified) llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp (+3)
(modified) llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp (+8-2)

diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index 4e2d37be3a2b2..305a5181ce3cd 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -633,6 +633,8 @@ class TargetTransformInfo {
     /// Fall back to the generic logic to determine whether multi-exit unrolling
     /// is profitable if set to false.
     bool RuntimeUnrollMultiExit;
+    // Relax conditions for unrolling when user requests unrolling via pragma.
+    bool RelaxPragmaUnrollThresholds;
   };
 
   /// Get target-customized preferences for the generic loop unrolling
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
index c26726c445401..b135c58e52550 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
@@ -116,6 +116,9 @@ void AMDGPUTTIImpl::getUnrollingPreferences(
   UP.MaxCount = std::numeric_limits<unsigned>::max();
   UP.Partial = true;
 
+  // Relax conditions for unrolling when user requests unrolling via pragma.
+  UP.RelaxPragmaUnrollThresholds = true;
+
   // Conditional branch in a loop back edge needs 3 additional exec
   // manipulations in average.
   UP.BEInsns += 3;
diff --git a/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp b/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
index d84b74dd0eecc..030fe54091ba4 100644
--- a/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
@@ -221,6 +221,7 @@ TargetTransformInfo::UnrollingPreferences llvm::gatherUnrollingPreferences(
   UP.MaxIterationsCountToAnalyze = UnrollMaxIterationsCountToAnalyze;
   UP.SCEVExpansionBudget = SCEVCheapExpansionBudget;
   UP.RuntimeUnrollMultiExit = false;
+  UP.RelaxPragmaUnrollThresholds = false;
 
   // Override with any target specific settings
   TTI.getUnrollingPreferences(L, SE, UP, &ORE);
@@ -939,6 +940,10 @@ bool llvm::computeUnrollCount(
 
   const bool ExplicitUnroll = PragmaCount > 0 || PragmaFullUnroll ||
                               PragmaEnableUnroll || UserUnrollCount;
+  // If enabled, relax unrolling thresholds when pragma unroll is used.
+  const bool RelaxUnrollThrehsholds = UP.RelaxPragmaUnrollThresholds &&
+                                      (PragmaEnableUnroll && !UserUnrollCount &&
+                                       !PragmaFullUnroll && PragmaCount == 0);
 
   PragmaInfo PInfo(UserUnrollCount, PragmaFullUnroll, PragmaCount,
                    PragmaEnableUnroll);
@@ -967,7 +972,7 @@ bool llvm::computeUnrollCount(
     UP.Runtime |= (PragmaCount > 0);
     return ExplicitUnroll;
   } else {
-    if (ExplicitUnroll && TripCount != 0) {
+    if (RelaxUnrollThrehsholds || (ExplicitUnroll && TripCount != 0)) {
       // If the loop has an unrolling pragma, we want to be more aggressive with
       // unrolling limits. Set thresholds to at least the PragmaUnrollThreshold
       // value which is larger than the default limits.
@@ -1077,7 +1082,8 @@ bool llvm::computeUnrollCount(
   }
 
   // Don't unroll a small upper bound loop unless user or TTI asked to do so.
-  if (MaxTripCount && !UP.Force && MaxTripCount < UP.MaxUpperBound) {
+  if (!RelaxUnrollThrehsholds && MaxTripCount && !UP.Force &&
+      MaxTripCount < UP.MaxUpperBound) {
     UP.Count = 0;
     return false;
   }

llvmbot · May 17, 2025

@llvm/pr-subscribers-llvm-analysis

Author: Gheorghe-Teodor Bercea (doru1004)

Changes

In certain cases in which the user requests loop unrolling via pragma unroll the unroll pass decides that the transformation is not profitable due to the cost model being too conservative. In this patch we relax the thresholds for unrolling in the case in which the pragma unroll is used.

Full diff: https://github.com/llvm/llvm-project/pull/140320.diff

3 Files Affected:

(modified) llvm/include/llvm/Analysis/TargetTransformInfo.h (+2)
(modified) llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp (+3)
(modified) llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp (+8-2)

diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index 4e2d37be3a2b2..305a5181ce3cd 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -633,6 +633,8 @@ class TargetTransformInfo {
     /// Fall back to the generic logic to determine whether multi-exit unrolling
     /// is profitable if set to false.
     bool RuntimeUnrollMultiExit;
+    // Relax conditions for unrolling when user requests unrolling via pragma.
+    bool RelaxPragmaUnrollThresholds;
   };
 
   /// Get target-customized preferences for the generic loop unrolling
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
index c26726c445401..b135c58e52550 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
@@ -116,6 +116,9 @@ void AMDGPUTTIImpl::getUnrollingPreferences(
   UP.MaxCount = std::numeric_limits<unsigned>::max();
   UP.Partial = true;
 
+  // Relax conditions for unrolling when user requests unrolling via pragma.
+  UP.RelaxPragmaUnrollThresholds = true;
+
   // Conditional branch in a loop back edge needs 3 additional exec
   // manipulations in average.
   UP.BEInsns += 3;
diff --git a/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp b/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
index d84b74dd0eecc..030fe54091ba4 100644
--- a/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
@@ -221,6 +221,7 @@ TargetTransformInfo::UnrollingPreferences llvm::gatherUnrollingPreferences(
   UP.MaxIterationsCountToAnalyze = UnrollMaxIterationsCountToAnalyze;
   UP.SCEVExpansionBudget = SCEVCheapExpansionBudget;
   UP.RuntimeUnrollMultiExit = false;
+  UP.RelaxPragmaUnrollThresholds = false;
 
   // Override with any target specific settings
   TTI.getUnrollingPreferences(L, SE, UP, &ORE);
@@ -939,6 +940,10 @@ bool llvm::computeUnrollCount(
 
   const bool ExplicitUnroll = PragmaCount > 0 || PragmaFullUnroll ||
                               PragmaEnableUnroll || UserUnrollCount;
+  // If enabled, relax unrolling thresholds when pragma unroll is used.
+  const bool RelaxUnrollThrehsholds = UP.RelaxPragmaUnrollThresholds &&
+                                      (PragmaEnableUnroll && !UserUnrollCount &&
+                                       !PragmaFullUnroll && PragmaCount == 0);
 
   PragmaInfo PInfo(UserUnrollCount, PragmaFullUnroll, PragmaCount,
                    PragmaEnableUnroll);
@@ -967,7 +972,7 @@ bool llvm::computeUnrollCount(
     UP.Runtime |= (PragmaCount > 0);
     return ExplicitUnroll;
   } else {
-    if (ExplicitUnroll && TripCount != 0) {
+    if (RelaxUnrollThrehsholds || (ExplicitUnroll && TripCount != 0)) {
       // If the loop has an unrolling pragma, we want to be more aggressive with
       // unrolling limits. Set thresholds to at least the PragmaUnrollThreshold
       // value which is larger than the default limits.
@@ -1077,7 +1082,8 @@ bool llvm::computeUnrollCount(
   }
 
   // Don't unroll a small upper bound loop unless user or TTI asked to do so.
-  if (MaxTripCount && !UP.Force && MaxTripCount < UP.MaxUpperBound) {
+  if (!RelaxUnrollThrehsholds && MaxTripCount && !UP.Force &&
+      MaxTripCount < UP.MaxUpperBound) {
     UP.Count = 0;
     return false;
   }

llvmbot · May 17, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Gheorghe-Teodor Bercea (doru1004)

Changes

In certain cases in which the user requests loop unrolling via pragma unroll the unroll pass decides that the transformation is not profitable due to the cost model being too conservative. In this patch we relax the thresholds for unrolling in the case in which the pragma unroll is used.

Full diff: https://github.com/llvm/llvm-project/pull/140320.diff

3 Files Affected:

(modified) llvm/include/llvm/Analysis/TargetTransformInfo.h (+2)
(modified) llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp (+3)
(modified) llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp (+8-2)

diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index 4e2d37be3a2b2..305a5181ce3cd 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -633,6 +633,8 @@ class TargetTransformInfo {
     /// Fall back to the generic logic to determine whether multi-exit unrolling
     /// is profitable if set to false.
     bool RuntimeUnrollMultiExit;
+    // Relax conditions for unrolling when user requests unrolling via pragma.
+    bool RelaxPragmaUnrollThresholds;
   };
 
   /// Get target-customized preferences for the generic loop unrolling
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
index c26726c445401..b135c58e52550 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
@@ -116,6 +116,9 @@ void AMDGPUTTIImpl::getUnrollingPreferences(
   UP.MaxCount = std::numeric_limits<unsigned>::max();
   UP.Partial = true;
 
+  // Relax conditions for unrolling when user requests unrolling via pragma.
+  UP.RelaxPragmaUnrollThresholds = true;
+
   // Conditional branch in a loop back edge needs 3 additional exec
   // manipulations in average.
   UP.BEInsns += 3;
diff --git a/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp b/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
index d84b74dd0eecc..030fe54091ba4 100644
--- a/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
@@ -221,6 +221,7 @@ TargetTransformInfo::UnrollingPreferences llvm::gatherUnrollingPreferences(
   UP.MaxIterationsCountToAnalyze = UnrollMaxIterationsCountToAnalyze;
   UP.SCEVExpansionBudget = SCEVCheapExpansionBudget;
   UP.RuntimeUnrollMultiExit = false;
+  UP.RelaxPragmaUnrollThresholds = false;
 
   // Override with any target specific settings
   TTI.getUnrollingPreferences(L, SE, UP, &ORE);
@@ -939,6 +940,10 @@ bool llvm::computeUnrollCount(
 
   const bool ExplicitUnroll = PragmaCount > 0 || PragmaFullUnroll ||
                               PragmaEnableUnroll || UserUnrollCount;
+  // If enabled, relax unrolling thresholds when pragma unroll is used.
+  const bool RelaxUnrollThrehsholds = UP.RelaxPragmaUnrollThresholds &&
+                                      (PragmaEnableUnroll && !UserUnrollCount &&
+                                       !PragmaFullUnroll && PragmaCount == 0);
 
   PragmaInfo PInfo(UserUnrollCount, PragmaFullUnroll, PragmaCount,
                    PragmaEnableUnroll);
@@ -967,7 +972,7 @@ bool llvm::computeUnrollCount(
     UP.Runtime |= (PragmaCount > 0);
     return ExplicitUnroll;
   } else {
-    if (ExplicitUnroll && TripCount != 0) {
+    if (RelaxUnrollThrehsholds || (ExplicitUnroll && TripCount != 0)) {
       // If the loop has an unrolling pragma, we want to be more aggressive with
       // unrolling limits. Set thresholds to at least the PragmaUnrollThreshold
       // value which is larger than the default limits.
@@ -1077,7 +1082,8 @@ bool llvm::computeUnrollCount(
   }
 
   // Don't unroll a small upper bound loop unless user or TTI asked to do so.
-  if (MaxTripCount && !UP.Force && MaxTripCount < UP.MaxUpperBound) {
+  if (!RelaxUnrollThrehsholds && MaxTripCount && !UP.Force &&
+      MaxTripCount < UP.MaxUpperBound) {
     UP.Count = 0;
     return false;
   }

Improve unrolling for user-requested loop unrolling via pragma directive

b34e885

llvmbot added backend:AMDGPU llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels May 17, 2025

doru1004 requested review from bcahoon and jrbyrnes May 17, 2025 00:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU][LLVM] Improve unrolling for user-requested loop unrolling via pragma directive #140320

[AMDGPU][LLVM] Improve unrolling for user-requested loop unrolling via pragma directive #140320

Uh oh!

doru1004 commented May 17, 2025

Uh oh!

llvmbot commented May 17, 2025

Uh oh!

llvmbot commented May 17, 2025

Uh oh!

llvmbot commented May 17, 2025

Uh oh!

Uh oh!

Search code, repositories, users, issues, pull requests...

[AMDGPU][LLVM] Improve unrolling for user-requested loop unrolling via pragma directive #140320

Are you sure you want to change the base?

[AMDGPU][LLVM] Improve unrolling for user-requested loop unrolling via pragma directive #140320

Uh oh!

Conversation

doru1004 commented May 17, 2025

Uh oh!

llvmbot commented May 17, 2025

Uh oh!

llvmbot commented May 17, 2025

Uh oh!

llvmbot commented May 17, 2025

Uh oh!

Uh oh!