[InstCombine] Canonicalize xor with disjoint ops to or disjoint #133139

jrbyrnes · Mar 26, 2025

or disjoints work better with other optimizations (e.g. SeparateConstOffsetFromGEP). This feels like the right place for this canonicalization, but perhaps not.

Change-Id: Ib278c8c6a2893b0f9f75a0a6fb0430437b9cff14

llvmbot · Mar 26, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Jeffrey Byrnes (jrbyrnes)

Changes

or disjoints work better with other optimizations (e.g. SeparateConstOffsetFromGEP). This feels like the right place for this canonicalization, but perhaps not.

Full diff: https://github.com/llvm/llvm-project/pull/133139.diff

2 Files Affected:

(modified) llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp (+5)
(modified) llvm/test/Transforms/InstCombine/xor.ll (+49)

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp b/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
index 6cc241781d112..c8d2e5d960ec8 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
@@ -4993,6 +4993,11 @@ Instruction *InstCombinerImpl::visitXor(BinaryOperator &I) {
   if (Instruction *Abs = canonicalizeAbs(I, Builder))
     return Abs;
 
+  if (KnownBits::haveNoCommonBitsSet(
+          computeKnownBits(I.getOperand(0), /*Depth=*/0, &I),
+          computeKnownBits(I.getOperand(1), /*Depth=*/0, &I)))
+    return BinaryOperator::CreateDisjointOr(I.getOperand(0), I.getOperand(1));
+
   // Otherwise, if all else failed, try to hoist the xor-by-constant:
   //   (X ^ C) ^ Y --> (X ^ Y) ^ C
   // Just like we do in other places, we completely avoid the fold
diff --git a/llvm/test/Transforms/InstCombine/xor.ll b/llvm/test/Transforms/InstCombine/xor.ll
index 3abaf74285cc0..0cfbb2c47e21c 100644
--- a/llvm/test/Transforms/InstCombine/xor.ll
+++ b/llvm/test/Transforms/InstCombine/xor.ll
@@ -1664,3 +1664,52 @@ entry:
   %or = or <2 x i32> %add, %c
   ret <2 x i32> %or
 }
+
+declare i32 @callee()
+
+define i32 @xor_disjoint() {
+; CHECK-LABEL: @xor_disjoint(
+; CHECK-NEXT:    [[CALL1:%.*]] = call i32 @callee(), !range [[RNG0:![0-9]+]]
+; CHECK-NEXT:    [[XOR:%.*]] = or disjoint i32 [[CALL1]], 4096
+; CHECK-NEXT:    ret i32 [[XOR]]
+;
+  %call1 = call i32 @callee(), !range !0
+  %xor = xor i32 %call1, 4096
+  ret i32 %xor
+}
+
+define i32 @xor_disjoint2() {
+; CHECK-LABEL: @xor_disjoint2(
+; CHECK-NEXT:    [[CALL1:%.*]] = call i32 @callee(), !range [[RNG1:![0-9]+]]
+; CHECK-NEXT:    [[XOR:%.*]] = or disjoint i32 [[CALL1]], 512
+; CHECK-NEXT:    ret i32 [[XOR]]
+;
+  %call1 = call i32 @callee(), !range !1
+  %xor = xor i32 %call1, 512
+  ret i32 %xor
+}
+
+define i32 @xor_non_disjoint() {
+; CHECK-LABEL: @xor_non_disjoint(
+; CHECK-NEXT:    [[CALL1:%.*]] = call i32 @callee(), !range [[RNG0]]
+; CHECK-NEXT:    [[XOR:%.*]] = xor i32 [[CALL1]], 1024
+; CHECK-NEXT:    ret i32 [[XOR]]
+;
+  %call1 = call i32 @callee(), !range !0
+  %xor = xor i32 %call1, 1024
+  ret i32 %xor
+}
+
+define i32 @xor_non_disjoint2() {
+; CHECK-LABEL: @xor_non_disjoint2(
+; CHECK-NEXT:    [[CALL1:%.*]] = call i32 @callee(), !range [[RNG1]]
+; CHECK-NEXT:    [[XOR:%.*]] = and i32 [[CALL1]], 511
+; CHECK-NEXT:    ret i32 [[XOR]]
+;
+  %call1 = call i32 @callee(), !range !1
+  %xor = xor i32 %call1, 1024
+  ret i32 %xor
+}
+
+!0 = !{ i32 0, i32 2048 }
+!1 = !{ i32 1024, i32 1536 }

arsenm · Mar 27, 2025

llvm/test/Transforms/InstCombine/xor.ll

+  %xor = xor i32 %call1, 1024
+  ret i32 %xor
+}
+


Test a vector version. Also could use alive2 proof link

arsenm · Mar 27, 2025

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp

@@ -4993,6 +4993,11 @@ Instruction *InstCombinerImpl::visitXor(BinaryOperator &I) {
  if (Instruction *Abs = canonicalizeAbs(I, Builder))
    return Abs;

+  if (KnownBits::haveNoCommonBitsSet(
+          computeKnownBits(I.getOperand(0), /*Depth=*/0, &I),
+          computeKnownBits(I.getOperand(1), /*Depth=*/0, &I)))


Can try to short circuit the first operand query if the second is unknown

jrbyrnes · Mar 27, 2025

alive2 proof link

xor_disjoint
https://alive2.llvm.org/ce/z/fQm9-T

xor_disjoint2
https://alive2.llvm.org/ce/z/AMPAib

xor_non_disjoint
https://alive2.llvm.org/ce/z/Vdw5W9

xor_non_disjoint2
https://alive2.llvm.org/ce/z/R-_RWG

jrbyrnes · Mar 27, 2025

Going to have to rethink this -- it's not that we're missing this canonicalization, but my test case does not canonicalize because of the recursion limit.

arsenm · Mar 28, 2025

Disjoint is set on or in InstCombineSimplifyDemanded. Move this there with the opcode switch?

jrbyrnes · Mar 28, 2025

Disjoint is set on or in InstCombineSimplifyDemanded. Move this there with the opcode switch?

Yeah that handling actually already covers the easy xor -> or disjoint cases

https://godbolt.org/z/5b95qvhqo

So this PR as it is is redundant.

However, we also care about the non easy cases which aren't covered due to the recursion limit

https://godbolt.org/z/Edr4brT1j

%17 = xor i32 %16, 4096 should be canonicalized to or disjoint

My case was accidentally covered since I did CKB on the operands instead of the instruction. I need to investigate how we can canonicalize these complex cases, and will update the PR with that implementation.

jrbyrnes · Apr 1, 2025

@nikic -- do you have any insight on how to resolve issues of this type?

We are missing xor -> or disjoint canonicalization because we hit the recursion depth limit in SimplifyDemandedBits ; the result of this is bad addressing schemes.

arsenm · Apr 2, 2025

What was the original sequence that exceeds the limit?

jrbyrnes · Apr 2, 2025

What was the original sequence that exceeds the limit?

See for a representation of the original sequence

https://godbolt.org/z/9rE9dneT9

There is an opportunity for another instcombine in this sequence -- we can transform

%i48 = and i32 %i28, 1
%i.not2 = icmp eq i32 %i48, 0
%i144 = select i1 %i.not2, i32 0, i32 72

into

%i48 = and %i28, 1
%i144 = mul i32 %i48, 72

There are a couple similar patterns combined together with an or disjoint -- these can basically be transformed into a single and->mul . There are a couple other more complex opportunities, but I haven't been able to find a way to reduce the depth enough to transform the final xor -> or disjoint and extract the constant. Nor am I sure that if I were to find the right set of combines for this specific case that the same set would reduce the depth enough in similar cases.

I'm leaning towards providing an API to computeKnownBits to override the depth limit, for cases where getting it right is worth a bit more compile time (e.g. separateConstOffsetFromGEP).

dtcxzyw · Apr 3, 2025

What was the original sequence that exceeds the limit?

See for a representation of the original sequence

https://godbolt.org/z/9rE9dneT9

There is an opportunity for another instcombine in this sequence -- we can transform
%i48 = and i32 %i28, 1
%i.not2 = icmp eq i32 %i48, 0
%i144 = select i1 %i.not2, i32 0, i32 72
into
%i48 = and %i28, 1
%i144 = mul i32 %i48, 72

I don't think this transformation is profitable. Generally cmov is cheaper than mul.
InstCombiner performs similar transformations in foldSelectICmpAndBinOp for add/xor/shl/lshr/ashr/..., but not for mul.

arsenm · Apr 3, 2025

I don't think this transformation is profitable. Generally cmov is cheaper than mul. InstCombiner performs similar transformations in foldSelectICmpAndBinOp for add/xor/shl/lshr/ashr/..., but not for mul.

It's fewer IR instructions, so it's better canonical form. Undoing for profitability is a backend issue. Though I guess mul is harder to analyze .

But you don't need mul, you just need 0-or-non0. Can you use `shl 72, %i48" equivalently

dtcxzyw · Apr 3, 2025

It's fewer IR instructions, so it's better canonical form.

I think the canonical form should be:

%i.not2.not = trunc i32 %i28 to i1
%i144 = select i1 %i.not2.not, i32 72, i32 0

cc @andjo403

jrbyrnes · Apr 3, 2025

For my purposes, what is most beneficial is:

%i48 = and i32 %i28, 1
%i.not2 = icmp eq i32 %i48, 0
%i49 = and i32 %i28, 2
%i50 = icmp eq i32 %i49, 0
%i144 = select i1 %i.not2, i32 0, i32 72
%i145 = select i1 %i50, i32 0, i32 144
%i146 = or disjoint i32 %i144, %i145

->

%temp = and i32 %i32, 3
%146 = mul %temp, 72

Which I imagine produces better code even for hardwares which prefer icmp->select over mul.

dtcxzyw · Apr 9, 2025

For my purposes, what is most beneficial is:
%i48 = and i32 %i28, 1
%i.not2 = icmp eq i32 %i48, 0
%i49 = and i32 %i28, 2
%i50 = icmp eq i32 %i49, 0
%i144 = select i1 %i.not2, i32 0, i32 72
%i145 = select i1 %i50, i32 0, i32 144
%i146 = or disjoint i32 %i144, %i145
->
%temp = and i32 %i32, 3
%146 = mul %temp, 72
Which I imagine produces better code even for hardwares which prefer icmp->select over mul.

https://alive2.llvm.org/ce/z/1kmtuR

jrbyrnes · Apr 10, 2025

For my purposes, what is most beneficial is:
%i48 = and i32 %i28, 1
%i.not2 = icmp eq i32 %i48, 0
%i49 = and i32 %i28, 2
%i50 = icmp eq i32 %i49, 0
%i144 = select i1 %i.not2, i32 0, i32 72
%i145 = select i1 %i50, i32 0, i32 144
%i146 = or disjoint i32 %i144, %i145
->
%temp = and i32 %i32, 3
%146 = mul %temp, 72
Which I imagine produces better code even for hardwares which prefer icmp->select over mul.

Added: #135274

jrbyrnes · Apr 10, 2025

This canonicalization already exists

[InstCombine] Canonicalize xor with disjoint ops to or disjoint

4dc267e

Change-Id: Ib278c8c6a2893b0f9f75a0a6fb0430437b9cff14

jrbyrnes requested a review from nikic as a code owner March 26, 2025 18:13

llvmbot added llvm:instcombine llvm:transforms labels Mar 26, 2025

jrbyrnes requested a review from arsenm March 27, 2025 03:01

arsenm reviewed Mar 27, 2025

View reviewed changes

This was referenced Mar 27, 2025

Task submission dtcxzyw/llvm-opt-benchmark#1312

Open

pre-commit: PR133139 dtcxzyw/llvm-opt-benchmark#2227

Closed

jrbyrnes closed this Apr 10, 2025

jrbyrnes mentioned this pull request Apr 25, 2025

[InstCombine] Combine and->cmp->sel->or-disjoint into and->mul #135274

Merged

dtcxzyw mentioned this pull request May 30, 2025

[InstCombine] Combine or-disjoint (and->mul), (and->mul) to and->mul #136013

Open

Search code, repositories, users, issues, pull requests...

[InstCombine] Canonicalize xor with disjoint ops to or disjoint #133139

[InstCombine] Canonicalize xor with disjoint ops to or disjoint #133139

Uh oh!

Conversation

jrbyrnes commented Mar 26, 2025

Uh oh!

llvmbot commented Mar 26, 2025

Uh oh!

arsenm Mar 27, 2025

Choose a reason for hiding this comment

Uh oh!

arsenm Mar 27, 2025

Choose a reason for hiding this comment

Uh oh!

jrbyrnes commented Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jrbyrnes commented Mar 27, 2025

Uh oh!

arsenm commented Mar 28, 2025

Uh oh!

jrbyrnes commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jrbyrnes commented Apr 1, 2025

Uh oh!

arsenm commented Apr 2, 2025

Uh oh!

jrbyrnes commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dtcxzyw commented Apr 3, 2025

Uh oh!

arsenm commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dtcxzyw commented Apr 3, 2025

Uh oh!

jrbyrnes commented Apr 3, 2025

Uh oh!

dtcxzyw commented Apr 9, 2025

Uh oh!

jrbyrnes commented Apr 10, 2025

Uh oh!

jrbyrnes commented Apr 10, 2025

Uh oh!

Uh oh!

jrbyrnes commented Mar 27, 2025 •

edited

Loading

jrbyrnes commented Mar 28, 2025 •

edited

Loading

jrbyrnes commented Apr 2, 2025 •

edited

Loading

arsenm commented Apr 3, 2025 •

edited

Loading