Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

[AMDGPU][LLVM] Improve unrolling for user-requested loop unrolling via pragma directive #140320

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
Loading
from

Conversation

doru1004
Copy link
Contributor

In certain cases in which the user requests loop unrolling via pragma unroll the unroll pass decides that the transformation is not profitable due to the cost model being too conservative. In this patch we relax the thresholds for unrolling in the case in which the pragma unroll is used.

@llvmbot
Copy link
Member

llvmbot commented May 17, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Gheorghe-Teodor Bercea (doru1004)

Changes

In certain cases in which the user requests loop unrolling via pragma unroll the unroll pass decides that the transformation is not profitable due to the cost model being too conservative. In this patch we relax the thresholds for unrolling in the case in which the pragma unroll is used.


Full diff: https://github.com/llvm/llvm-project/pull/140320.diff

3 Files Affected:

  • (modified) llvm/include/llvm/Analysis/TargetTransformInfo.h (+2)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp (+3)
  • (modified) llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp (+8-2)
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index 4e2d37be3a2b2..305a5181ce3cd 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -633,6 +633,8 @@ class TargetTransformInfo {
     /// Fall back to the generic logic to determine whether multi-exit unrolling
     /// is profitable if set to false.
     bool RuntimeUnrollMultiExit;
+    // Relax conditions for unrolling when user requests unrolling via pragma.
+    bool RelaxPragmaUnrollThresholds;
   };
 
   /// Get target-customized preferences for the generic loop unrolling
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
index c26726c445401..b135c58e52550 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
@@ -116,6 +116,9 @@ void AMDGPUTTIImpl::getUnrollingPreferences(
   UP.MaxCount = std::numeric_limits<unsigned>::max();
   UP.Partial = true;
 
+  // Relax conditions for unrolling when user requests unrolling via pragma.
+  UP.RelaxPragmaUnrollThresholds = true;
+
   // Conditional branch in a loop back edge needs 3 additional exec
   // manipulations in average.
   UP.BEInsns += 3;
diff --git a/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp b/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
index d84b74dd0eecc..030fe54091ba4 100644
--- a/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
@@ -221,6 +221,7 @@ TargetTransformInfo::UnrollingPreferences llvm::gatherUnrollingPreferences(
   UP.MaxIterationsCountToAnalyze = UnrollMaxIterationsCountToAnalyze;
   UP.SCEVExpansionBudget = SCEVCheapExpansionBudget;
   UP.RuntimeUnrollMultiExit = false;
+  UP.RelaxPragmaUnrollThresholds = false;
 
   // Override with any target specific settings
   TTI.getUnrollingPreferences(L, SE, UP, &ORE);
@@ -939,6 +940,10 @@ bool llvm::computeUnrollCount(
 
   const bool ExplicitUnroll = PragmaCount > 0 || PragmaFullUnroll ||
                               PragmaEnableUnroll || UserUnrollCount;
+  // If enabled, relax unrolling thresholds when pragma unroll is used.
+  const bool RelaxUnrollThrehsholds = UP.RelaxPragmaUnrollThresholds &&
+                                      (PragmaEnableUnroll && !UserUnrollCount &&
+                                       !PragmaFullUnroll && PragmaCount == 0);
 
   PragmaInfo PInfo(UserUnrollCount, PragmaFullUnroll, PragmaCount,
                    PragmaEnableUnroll);
@@ -967,7 +972,7 @@ bool llvm::computeUnrollCount(
     UP.Runtime |= (PragmaCount > 0);
     return ExplicitUnroll;
   } else {
-    if (ExplicitUnroll && TripCount != 0) {
+    if (RelaxUnrollThrehsholds || (ExplicitUnroll && TripCount != 0)) {
       // If the loop has an unrolling pragma, we want to be more aggressive with
       // unrolling limits. Set thresholds to at least the PragmaUnrollThreshold
       // value which is larger than the default limits.
@@ -1077,7 +1082,8 @@ bool llvm::computeUnrollCount(
   }
 
   // Don't unroll a small upper bound loop unless user or TTI asked to do so.
-  if (MaxTripCount && !UP.Force && MaxTripCount < UP.MaxUpperBound) {
+  if (!RelaxUnrollThrehsholds && MaxTripCount && !UP.Force &&
+      MaxTripCount < UP.MaxUpperBound) {
     UP.Count = 0;
     return false;
   }

@llvmbot
Copy link
Member

llvmbot commented May 17, 2025

@llvm/pr-subscribers-llvm-analysis

Author: Gheorghe-Teodor Bercea (doru1004)

Changes

In certain cases in which the user requests loop unrolling via pragma unroll the unroll pass decides that the transformation is not profitable due to the cost model being too conservative. In this patch we relax the thresholds for unrolling in the case in which the pragma unroll is used.


Full diff: https://github.com/llvm/llvm-project/pull/140320.diff

3 Files Affected:

  • (modified) llvm/include/llvm/Analysis/TargetTransformInfo.h (+2)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp (+3)
  • (modified) llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp (+8-2)
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index 4e2d37be3a2b2..305a5181ce3cd 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -633,6 +633,8 @@ class TargetTransformInfo {
     /// Fall back to the generic logic to determine whether multi-exit unrolling
     /// is profitable if set to false.
     bool RuntimeUnrollMultiExit;
+    // Relax conditions for unrolling when user requests unrolling via pragma.
+    bool RelaxPragmaUnrollThresholds;
   };
 
   /// Get target-customized preferences for the generic loop unrolling
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
index c26726c445401..b135c58e52550 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
@@ -116,6 +116,9 @@ void AMDGPUTTIImpl::getUnrollingPreferences(
   UP.MaxCount = std::numeric_limits<unsigned>::max();
   UP.Partial = true;
 
+  // Relax conditions for unrolling when user requests unrolling via pragma.
+  UP.RelaxPragmaUnrollThresholds = true;
+
   // Conditional branch in a loop back edge needs 3 additional exec
   // manipulations in average.
   UP.BEInsns += 3;
diff --git a/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp b/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
index d84b74dd0eecc..030fe54091ba4 100644
--- a/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
@@ -221,6 +221,7 @@ TargetTransformInfo::UnrollingPreferences llvm::gatherUnrollingPreferences(
   UP.MaxIterationsCountToAnalyze = UnrollMaxIterationsCountToAnalyze;
   UP.SCEVExpansionBudget = SCEVCheapExpansionBudget;
   UP.RuntimeUnrollMultiExit = false;
+  UP.RelaxPragmaUnrollThresholds = false;
 
   // Override with any target specific settings
   TTI.getUnrollingPreferences(L, SE, UP, &ORE);
@@ -939,6 +940,10 @@ bool llvm::computeUnrollCount(
 
   const bool ExplicitUnroll = PragmaCount > 0 || PragmaFullUnroll ||
                               PragmaEnableUnroll || UserUnrollCount;
+  // If enabled, relax unrolling thresholds when pragma unroll is used.
+  const bool RelaxUnrollThrehsholds = UP.RelaxPragmaUnrollThresholds &&
+                                      (PragmaEnableUnroll && !UserUnrollCount &&
+                                       !PragmaFullUnroll && PragmaCount == 0);
 
   PragmaInfo PInfo(UserUnrollCount, PragmaFullUnroll, PragmaCount,
                    PragmaEnableUnroll);
@@ -967,7 +972,7 @@ bool llvm::computeUnrollCount(
     UP.Runtime |= (PragmaCount > 0);
     return ExplicitUnroll;
   } else {
-    if (ExplicitUnroll && TripCount != 0) {
+    if (RelaxUnrollThrehsholds || (ExplicitUnroll && TripCount != 0)) {
       // If the loop has an unrolling pragma, we want to be more aggressive with
       // unrolling limits. Set thresholds to at least the PragmaUnrollThreshold
       // value which is larger than the default limits.
@@ -1077,7 +1082,8 @@ bool llvm::computeUnrollCount(
   }
 
   // Don't unroll a small upper bound loop unless user or TTI asked to do so.
-  if (MaxTripCount && !UP.Force && MaxTripCount < UP.MaxUpperBound) {
+  if (!RelaxUnrollThrehsholds && MaxTripCount && !UP.Force &&
+      MaxTripCount < UP.MaxUpperBound) {
     UP.Count = 0;
     return false;
   }

@llvmbot
Copy link
Member

llvmbot commented May 17, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Gheorghe-Teodor Bercea (doru1004)

Changes

In certain cases in which the user requests loop unrolling via pragma unroll the unroll pass decides that the transformation is not profitable due to the cost model being too conservative. In this patch we relax the thresholds for unrolling in the case in which the pragma unroll is used.


Full diff: https://github.com/llvm/llvm-project/pull/140320.diff

3 Files Affected:

  • (modified) llvm/include/llvm/Analysis/TargetTransformInfo.h (+2)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp (+3)
  • (modified) llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp (+8-2)
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index 4e2d37be3a2b2..305a5181ce3cd 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -633,6 +633,8 @@ class TargetTransformInfo {
     /// Fall back to the generic logic to determine whether multi-exit unrolling
     /// is profitable if set to false.
     bool RuntimeUnrollMultiExit;
+    // Relax conditions for unrolling when user requests unrolling via pragma.
+    bool RelaxPragmaUnrollThresholds;
   };
 
   /// Get target-customized preferences for the generic loop unrolling
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
index c26726c445401..b135c58e52550 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
@@ -116,6 +116,9 @@ void AMDGPUTTIImpl::getUnrollingPreferences(
   UP.MaxCount = std::numeric_limits<unsigned>::max();
   UP.Partial = true;
 
+  // Relax conditions for unrolling when user requests unrolling via pragma.
+  UP.RelaxPragmaUnrollThresholds = true;
+
   // Conditional branch in a loop back edge needs 3 additional exec
   // manipulations in average.
   UP.BEInsns += 3;
diff --git a/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp b/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
index d84b74dd0eecc..030fe54091ba4 100644
--- a/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
@@ -221,6 +221,7 @@ TargetTransformInfo::UnrollingPreferences llvm::gatherUnrollingPreferences(
   UP.MaxIterationsCountToAnalyze = UnrollMaxIterationsCountToAnalyze;
   UP.SCEVExpansionBudget = SCEVCheapExpansionBudget;
   UP.RuntimeUnrollMultiExit = false;
+  UP.RelaxPragmaUnrollThresholds = false;
 
   // Override with any target specific settings
   TTI.getUnrollingPreferences(L, SE, UP, &ORE);
@@ -939,6 +940,10 @@ bool llvm::computeUnrollCount(
 
   const bool ExplicitUnroll = PragmaCount > 0 || PragmaFullUnroll ||
                               PragmaEnableUnroll || UserUnrollCount;
+  // If enabled, relax unrolling thresholds when pragma unroll is used.
+  const bool RelaxUnrollThrehsholds = UP.RelaxPragmaUnrollThresholds &&
+                                      (PragmaEnableUnroll && !UserUnrollCount &&
+                                       !PragmaFullUnroll && PragmaCount == 0);
 
   PragmaInfo PInfo(UserUnrollCount, PragmaFullUnroll, PragmaCount,
                    PragmaEnableUnroll);
@@ -967,7 +972,7 @@ bool llvm::computeUnrollCount(
     UP.Runtime |= (PragmaCount > 0);
     return ExplicitUnroll;
   } else {
-    if (ExplicitUnroll && TripCount != 0) {
+    if (RelaxUnrollThrehsholds || (ExplicitUnroll && TripCount != 0)) {
       // If the loop has an unrolling pragma, we want to be more aggressive with
       // unrolling limits. Set thresholds to at least the PragmaUnrollThreshold
       // value which is larger than the default limits.
@@ -1077,7 +1082,8 @@ bool llvm::computeUnrollCount(
   }
 
   // Don't unroll a small upper bound loop unless user or TTI asked to do so.
-  if (MaxTripCount && !UP.Force && MaxTripCount < UP.MaxUpperBound) {
+  if (!RelaxUnrollThrehsholds && MaxTripCount && !UP.Force &&
+      MaxTripCount < UP.MaxUpperBound) {
     UP.Count = 0;
     return false;
   }

@doru1004 doru1004 requested review from bcahoon and jrbyrnes May 17, 2025 00:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.