[llvm][opt][Transforms] Preserve AMDGPU atomic metadata #140314

AlexVlx · May 16, 2025

The AMDGPU BE migrated to using special metadata to drive how atomic instructions are lowered. At the moment this is treated as unknown metadata when merging instructions, which can lead to serious performance issues. This patch addresses that by adding specific handling for these MD kinds.

llvmbot · May 16, 2025

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-amdgpu

Author: Alex Voicu (AlexVlx)

Changes

The AMDGPU BE migrated to using special metadata to drive how atomic instructions are lowered. At the moment this is treated as unknown metadata when merging instructions, which can lead to serious performance issues. This patch addresses that by adding specific handling for these MD kinds.

Full diff: https://github.com/llvm/llvm-project/pull/140314.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Utils/Local.cpp (+10-1)
(added) llvm/test/Transforms/SimplifyCFG/merge-amdgpu-atomic-md.ll (+42)

diff --git a/llvm/lib/Transforms/Utils/Local.cpp b/llvm/lib/Transforms/Utils/Local.cpp
index 3dbd605e19c3a..0d2a82d407170 100644
--- a/llvm/lib/Transforms/Utils/Local.cpp
+++ b/llvm/lib/Transforms/Utils/Local.cpp
@@ -3303,6 +3303,12 @@ static void combineMetadata(Instruction *K, const Instruction *J,
                             bool DoesKMove, bool AAOnly = false) {
   SmallVector<std::pair<unsigned, MDNode *>, 4> Metadata;
   K->getAllMetadataOtherThanDebugLoc(Metadata);
+
+  const unsigned AMDGPUMD[] = {
+      K->getContext().getMDKindID("amdgpu.no.fine.grained.memory"),
+      K->getContext().getMDKindID("amdgpu.no.remote.memory"),
+      K->getContext().getMDKindID("amdgpu.ignore.denormal.mode")};
+
   for (const auto &MD : Metadata) {
     unsigned Kind = MD.first;
     MDNode *JMD = J->getMetadata(Kind);
@@ -3311,7 +3317,10 @@ static void combineMetadata(Instruction *K, const Instruction *J,
     // TODO: Assert that this switch is exhaustive for fixed MD kinds.
     switch (Kind) {
       default:
-        K->setMetadata(Kind, nullptr); // Remove unknown metadata
+        if (K->isAtomic() && (find(AMDGPUMD, Kind) != std::cend(AMDGPUMD)))
+         break; // Preserve AMDGPU atomic metadata.
+        else
+          K->setMetadata(Kind, nullptr); // Remove unknown metadata
         break;
       case LLVMContext::MD_dbg:
         llvm_unreachable("getAllMetadataOtherThanDebugLoc returned a MD_dbg");
diff --git a/llvm/test/Transforms/SimplifyCFG/merge-amdgpu-atomic-md.ll b/llvm/test/Transforms/SimplifyCFG/merge-amdgpu-atomic-md.ll
new file mode 100644
index 0000000000000..1cd574e714b43
--- /dev/null
+++ b/llvm/test/Transforms/SimplifyCFG/merge-amdgpu-atomic-md.ll
@@ -0,0 +1,42 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+;; Test to ensure that AMDGPU atomic related metadata is not dropped when
+;; instructions are sunk. Currently the metadata from the first instruction
+;; is kept, which prevents full loss of optimisation information.
+
+; RUN: opt < %s -passes=simplifycfg -passes=simplifycfg -sink-common-insts -S | FileCheck %s
+
+define amdgpu_kernel void @f(i1 %pred0, i1 %pred1, ptr captures(none) %p, double %d) local_unnamed_addr {
+; CHECK-LABEL: define amdgpu_kernel void @f(
+; CHECK-SAME: i1 [[PRED0:%.*]], i1 [[PRED1:%.*]], ptr captures(none) [[P:%.*]], double [[D:%.*]]) local_unnamed_addr {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[P_GLOBAL:%.*]] = addrspacecast ptr [[P]] to ptr addrspace(1)
+; CHECK-NEXT:    [[BRMERGE:%.*]] = select i1 [[PRED0]], i1 true, i1 [[PRED1]]
+; CHECK-NEXT:    br i1 [[BRMERGE]], label %[[IF_END_SINK_SPLIT:.*]], label %[[IF_END:.*]]
+; CHECK:       [[IF_END_SINK_SPLIT]]:
+; CHECK-NEXT:    [[TMP0:%.*]] = atomicrmw fadd ptr addrspace(1) [[P_GLOBAL]], double [[D]] monotonic, align 8, !amdgpu.no.fine.grained.memory [[META0:![0-9]+]], !amdgpu.no.remote.memory [[META0]]
+; CHECK-NEXT:    br label %[[IF_END]]
+; CHECK:       [[IF_END]]:
+; CHECK-NEXT:    ret void
+;
+entry:
+  %p.global = addrspacecast ptr %p to ptr addrspace(1)
+  br i1 %pred0, label %for.body, label %for.body1
+
+for.body:
+  %0 = atomicrmw fadd ptr addrspace(1) %p.global, double %d monotonic, align 8, !amdgpu.no.fine.grained.memory !0, !amdgpu.no.remote.memory !0
+  br label %if.end
+
+for.body1:
+  br i1 %pred1, label %if.then, label %if.end
+
+if.then:
+  %1 = atomicrmw fadd ptr addrspace(1) %p.global, double %d monotonic, align 8, !amdgpu.no.fine.grained.memory !0, !amdgpu.no.remote.memory !0
+  br label %if.end
+
+if.end:
+  ret void
+}
+
+!0 = !{!"float", !1, i64 0}
+!1 = !{!"omnipotent char", !2, i64 0}
+!2 = !{!"Simple C++ TBAA"}

github-actions · May 16, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

llvm/test/Transforms/SimplifyCFG/merge-amdgpu-atomic-md.ll

arsenm · May 17, 2025

llvm/lib/Transforms/Utils/Local.cpp

+  const unsigned AMDGPUMD[] = {
+      K->getContext().getMDKindID("amdgpu.no.fine.grained.memory"),
+      K->getContext().getMDKindID("amdgpu.no.remote.memory"),
+      K->getContext().getMDKindID("amdgpu.ignore.denormal.mode")};


Should avoid looking up these IDs until they are finally needed below.

If we're going to handle these here, we probably ought to promote these to recognized enum metadata

Possibly, but there's no other example of target specific MD in the fixed enum. It's also not quite clear where else we could handle these (if you have something in mind, please share). I will re-factor this a bit anyway because it's incomplete (we're missing cases where the replaces Instr carries the MD).

But it shouldn't really be an issue to add one with the target prefix in the name. It's not really any different than how calling conventions are defined, with a target prefix but it's an IR global constant

nikic · May 22, 2025

llvm/test/Transforms/SimplifyCFG/merge-amdgpu-atomic-md.ll

+; CHECK-NEXT:    [[BRMERGE:%.*]] = select i1 [[PRED0]], i1 true, i1 [[PRED1]]
+; CHECK-NEXT:    br i1 [[BRMERGE]], label %[[IF_END_SINK_SPLIT:.*]], label %[[IF_END:.*]]
+; CHECK:       [[IF_END_SINK_SPLIT]]:
+; CHECK-NEXT:    [[TMP0:%.*]] = atomicrmw fadd ptr addrspace(1) [[P_GLOBAL]], double [[D]] monotonic, align 8, !amdgpu.no.fine.grained.memory [[META0]], !amdgpu.no.remote.memory [[META0]]


Why is it valid to preserve the metadata if it's only present on one of the branches?

I believe I might've gotten this wrong actually, and it probably is not valid to do this - I'll elaborate a bit when replying to @arsenm's point about the intersection.

arsenm · May 22, 2025

llvm/test/Transforms/SimplifyCFG/merge-amdgpu-atomic-md.ll

+
+; RUN: opt < %s -passes=simplifycfg -sink-common-insts -S | FileCheck %s
+
+define amdgpu_kernel void @both(i1 %pred0, i1 %pred1, ptr captures(none) %p, double %d) local_unnamed_addr {


Suggested change

define amdgpu_kernel void @both(i1 %pred0, i1 %pred1, ptr captures(none) %p, double %d) local_unnamed_addr {

define void @both(i1 %pred0, i1 %pred1, ptr captures(none) %p, double %d) {

arsenm · May 22, 2025

llvm/lib/Transforms/Utils/Local.cpp

+  // Preserve AMDGPU atomic metadata from J, if present. K might already be
+  // carrying this but overwriting should cause no issue.
+  if (K->isAtomic()) {
+    if (auto *JMD = J->getMetadata("amdgpu.no.fine.grained.memory"))
+      K->setMetadata("amdgpu.no.fine.grained.memory", JMD);
+    if (auto *JMD = J->getMetadata("amdgpu.no.remote.memory"))
+      K->setMetadata("amdgpu.no.remote.memory", JMD);
+    if (auto *JMD = J->getMetadata("amdgpu.ignore.denormal.mode"))
+      K->setMetadata("amdgpu.ignore.denormal.mode", JMD);
+  }


I don't follow this part, it should be intersection

I think I misunderstood how these worked / assumed they are per-module global settings. I'll add @yxsamliu to the review since I know you and him worked on this. Intersection seems correct, however I wonder if we should actually prevent sinking / combining altogether if it results in removal of the metadata, since it could incur a pretty significant performance penalty / atomics with different MD aren't really the same instruction to begin with. Thoughts?

…metadata_sinking

nikic

I wonder whether it would make sense to have a fallback behavior that allows preserving metadata if it is exactly the same on both instructions, even if it's unknown.

Do we have any metadata where preserving it in case of exact equality would be invalid?

nikic · Jun 2, 2025

llvm/lib/Transforms/Utils/Local.cpp

@@ -3311,7 +3318,10 @@ static void combineMetadata(Instruction *K, const Instruction *J,
    // TODO: Assert that this switch is exhaustive for fixed MD kinds.
    switch (Kind) {
      default:
-        K->setMetadata(Kind, nullptr); // Remove unknown metadata
+        if (K->isAtomic() && IsAMDGPUMD(Kind))
+          K->setMetadata(Kind, MDNode::intersect(JMD, KMD));


It looks like all of this metadata accepts an empty node? Using intersect for this doesn't really make sense.

Do not drop atomic metadata when combining instructions.

31ceada

llvmbot added backend:AMDGPU llvm:transforms labels May 16, 2025

AlexVlx requested review from nikic and arsenm May 16, 2025 22:17

AlexVlx added tools:opt and removed backend:AMDGPU labels May 16, 2025

Fix formatting.

9e7ca22

llvmbot added the backend:AMDGPU label May 16, 2025

arsenm reviewed May 17, 2025

View reviewed changes

AlexVlx added 3 commits May 20, 2025 02:22

Clean up test.

8ea68b4

Handle all cases.

c6cfed6

Fix formatting.

401b882

nikic reviewed May 22, 2025

View reviewed changes

arsenm reviewed May 22, 2025

View reviewed changes

AlexVlx requested review from arsenm, nikic and yxsamliu May 23, 2025 15:53

AlexVlx added 3 commits May 23, 2025 23:36

Intersection, not union.

c309620

Fix stray whitespace.

c724475

Merge branch 'main' of https://github.com/llvm/llvm-project into fix_…

768c92e

…metadata_sinking

nikic reviewed Jun 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[llvm][opt][Transforms] Preserve AMDGPU atomic metadata #140314

[llvm][opt][Transforms] Preserve AMDGPU atomic metadata #140314

Uh oh!

AlexVlx commented May 16, 2025

Uh oh!

llvmbot commented May 16, 2025 •

edited

Loading

Uh oh!

github-actions bot commented May 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

arsenm May 17, 2025

Uh oh!

AlexVlx May 19, 2025

Uh oh!

arsenm May 22, 2025

Uh oh!

nikic May 22, 2025

Uh oh!

AlexVlx May 23, 2025

Uh oh!

arsenm May 22, 2025

Uh oh!

arsenm May 22, 2025

Uh oh!

AlexVlx May 23, 2025

Uh oh!

nikic left a comment

Uh oh!

nikic Jun 2, 2025

Uh oh!

Uh oh!


		; RUN: opt < %s -passes=simplifycfg -sink-common-insts -S \| FileCheck %s

		define amdgpu_kernel void @both(i1 %pred0, i1 %pred1, ptr captures(none) %p, double %d) local_unnamed_addr {

	define amdgpu_kernel void @both(i1 %pred0, i1 %pred1, ptr captures(none) %p, double %d) local_unnamed_addr {
	define void @both(i1 %pred0, i1 %pred1, ptr captures(none) %p, double %d) {

Search code, repositories, users, issues, pull requests...

[llvm][opt][Transforms] Preserve AMDGPU atomic metadata #140314

Are you sure you want to change the base?

[llvm][opt][Transforms] Preserve AMDGPU atomic metadata #140314

Uh oh!

Conversation

AlexVlx commented May 16, 2025

Uh oh!

llvmbot commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nikic left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvmbot commented May 16, 2025 •

edited

Loading

github-actions bot commented May 16, 2025 •

edited

Loading