-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[InstCombine] Canonicalize xor with disjoint ops to or disjoint #133139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Change-Id: Ib278c8c6a2893b0f9f75a0a6fb0430437b9cff14
@llvm/pr-subscribers-llvm-transforms Author: Jeffrey Byrnes (jrbyrnes) Changes
Full diff: https://github.com/llvm/llvm-project/pull/133139.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp b/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
index 6cc241781d112..c8d2e5d960ec8 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
@@ -4993,6 +4993,11 @@ Instruction *InstCombinerImpl::visitXor(BinaryOperator &I) {
if (Instruction *Abs = canonicalizeAbs(I, Builder))
return Abs;
+ if (KnownBits::haveNoCommonBitsSet(
+ computeKnownBits(I.getOperand(0), /*Depth=*/0, &I),
+ computeKnownBits(I.getOperand(1), /*Depth=*/0, &I)))
+ return BinaryOperator::CreateDisjointOr(I.getOperand(0), I.getOperand(1));
+
// Otherwise, if all else failed, try to hoist the xor-by-constant:
// (X ^ C) ^ Y --> (X ^ Y) ^ C
// Just like we do in other places, we completely avoid the fold
diff --git a/llvm/test/Transforms/InstCombine/xor.ll b/llvm/test/Transforms/InstCombine/xor.ll
index 3abaf74285cc0..0cfbb2c47e21c 100644
--- a/llvm/test/Transforms/InstCombine/xor.ll
+++ b/llvm/test/Transforms/InstCombine/xor.ll
@@ -1664,3 +1664,52 @@ entry:
%or = or <2 x i32> %add, %c
ret <2 x i32> %or
}
+
+declare i32 @callee()
+
+define i32 @xor_disjoint() {
+; CHECK-LABEL: @xor_disjoint(
+; CHECK-NEXT: [[CALL1:%.*]] = call i32 @callee(), !range [[RNG0:![0-9]+]]
+; CHECK-NEXT: [[XOR:%.*]] = or disjoint i32 [[CALL1]], 4096
+; CHECK-NEXT: ret i32 [[XOR]]
+;
+ %call1 = call i32 @callee(), !range !0
+ %xor = xor i32 %call1, 4096
+ ret i32 %xor
+}
+
+define i32 @xor_disjoint2() {
+; CHECK-LABEL: @xor_disjoint2(
+; CHECK-NEXT: [[CALL1:%.*]] = call i32 @callee(), !range [[RNG1:![0-9]+]]
+; CHECK-NEXT: [[XOR:%.*]] = or disjoint i32 [[CALL1]], 512
+; CHECK-NEXT: ret i32 [[XOR]]
+;
+ %call1 = call i32 @callee(), !range !1
+ %xor = xor i32 %call1, 512
+ ret i32 %xor
+}
+
+define i32 @xor_non_disjoint() {
+; CHECK-LABEL: @xor_non_disjoint(
+; CHECK-NEXT: [[CALL1:%.*]] = call i32 @callee(), !range [[RNG0]]
+; CHECK-NEXT: [[XOR:%.*]] = xor i32 [[CALL1]], 1024
+; CHECK-NEXT: ret i32 [[XOR]]
+;
+ %call1 = call i32 @callee(), !range !0
+ %xor = xor i32 %call1, 1024
+ ret i32 %xor
+}
+
+define i32 @xor_non_disjoint2() {
+; CHECK-LABEL: @xor_non_disjoint2(
+; CHECK-NEXT: [[CALL1:%.*]] = call i32 @callee(), !range [[RNG1]]
+; CHECK-NEXT: [[XOR:%.*]] = and i32 [[CALL1]], 511
+; CHECK-NEXT: ret i32 [[XOR]]
+;
+ %call1 = call i32 @callee(), !range !1
+ %xor = xor i32 %call1, 1024
+ ret i32 %xor
+}
+
+!0 = !{ i32 0, i32 2048 }
+!1 = !{ i32 1024, i32 1536 }
|
%xor = xor i32 %call1, 1024 | ||
ret i32 %xor | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test a vector version. Also could use alive2 proof link
@@ -4993,6 +4993,11 @@ Instruction *InstCombinerImpl::visitXor(BinaryOperator &I) { | ||
if (Instruction *Abs = canonicalizeAbs(I, Builder)) | ||
return Abs; | ||
|
||
if (KnownBits::haveNoCommonBitsSet( | ||
computeKnownBits(I.getOperand(0), /*Depth=*/0, &I), | ||
computeKnownBits(I.getOperand(1), /*Depth=*/0, &I))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can try to short circuit the first operand query if the second is unknown
xor_disjoint xor_disjoint2 xor_non_disjoint xor_non_disjoint2 |
Going to have to rethink this -- it's not that we're missing this canonicalization, but my test case does not canonicalize because of the recursion limit. |
Disjoint is set on or in InstCombineSimplifyDemanded. Move this there with the opcode switch? |
Yeah that handling actually already covers the easy https://godbolt.org/z/5b95qvhqo So this PR as it is is redundant. However, we also care about the non easy cases which aren't covered due to the recursion limit https://godbolt.org/z/Edr4brT1j
My case was accidentally covered since I did CKB on the operands instead of the instruction. I need to investigate how we can canonicalize these complex cases, and will update the PR with that implementation. |
@nikic -- do you have any insight on how to resolve issues of this type? We are missing xor -> or disjoint canonicalization because we hit the recursion depth limit in SimplifyDemandedBits ; the result of this is bad addressing schemes. |
What was the original sequence that exceeds the limit? |
See for a representation of the original sequence https://godbolt.org/z/9rE9dneT9 There is an opportunity for another instcombine in this sequence -- we can transform
into
There are a couple similar patterns combined together with an I'm leaning towards providing an API to computeKnownBits to override the depth limit, for cases where getting it right is worth a bit more compile time (e.g. separateConstOffsetFromGEP). |
I don't think this transformation is profitable. Generally cmov is cheaper than mul. |
It's fewer IR instructions, so it's better canonical form. Undoing for profitability is a backend issue. Though I guess mul is harder to analyze . But you don't need mul, you just need 0-or-non0. Can you use `shl 72, %i48" equivalently |
I think the canonical form should be:
cc @andjo403 |
For my purposes, what is most beneficial is:
->
Which I imagine produces better code even for hardwares which prefer icmp->select over mul. |
|
Added: #135274 |
This canonicalization already exists |
or disjoint
s work better with other optimizations (e.g. SeparateConstOffsetFromGEP). This feels like the right place for this canonicalization, but perhaps not.