Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

[DAG][AArch64] Handle truncated buildvectors to allow and(subvector(anyext)) fold. #133915

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: users/davemgreen/gh-a64-v4i8subvec
Choose a base branch
Loading
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions 6 llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7166,7 +7166,8 @@ SDValue DAGCombiner::visitAND(SDNode *N) {

// if (and x, c) is known to be zero, return 0
unsigned BitWidth = VT.getScalarSizeInBits();
ConstantSDNode *N1C = isConstOrConstSplat(N1);
ConstantSDNode *N1C =
isConstOrConstSplat(N1, /*AllowUndef*/ false, /*AllowTrunc*/ true);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you check all the uses of N1C? Some of them look a little suspicious.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 7170 is checking DAG.MaskedValueIsZero.
7200 is being truncated, 7208 is the one being updated.
7346/7353 is checking for a mask, which I believe should be OK.
7389 will be checked inside reduceLoadWidth.
7491 is checking a constant, which I believed would again be OK.

Any of them in particular seem suspicious to you? We can make them more defensive if necessary.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AllowTrunc means the input constant is truncated. But we don't promise zero-extension, I think, so the high bits could be anything. Therefore any comparison that checks those high bits is broken. The check on 7491 is most suspect in this respect, but 7346/7353 is also slightly suspect.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could replace this with something like:

std::optional<APInt> N1C;
if (ConstantSDNode *C1 = isConstOrConstSplat(N1, /*AllowUndef*/ false, /*AllowTrunc*/ true))
  N1C = C1->getAPIntValue().zextOrTrunc(BitWidth);

WDYT?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AllowTrunc means the input constant is truncated. But we don't promise zero-extension, I think, so the high bits could be anything. Therefore any comparison that checks those high bits is broken. The check on 7491 is most suspect in this respect, but 7346/7353 is also slightly suspect.

Checking for a mask (either via isMask with a shorter ScalarWidth or by checking == 0xffff as in 7491) would check that the top bits are 0. (MatchBSwapHWordLow also only handles scalars). So they should be OK, and truncating the constant should just make it is more likely to match in cases where the top bits are non-zero.

It would be a good idea to try and protect against future uses not realizing that the constant had been truncated, so a optional sounds like a good idea.

if (N1C && DAG.MaskedValueIsZero(SDValue(N, 0), APInt::getAllOnes(BitWidth)))
return DAG.getConstant(0, DL, VT);

Expand Down Expand Up @@ -7205,7 +7206,8 @@ SDValue DAGCombiner::visitAND(SDNode *N) {
return DAG.getNode(ISD::ZERO_EXTEND, DL, VT, N0Op0);

// fold (and (any_ext V), c) -> (zero_ext (and (trunc V), c)) if profitable.
if (N1C->getAPIntValue().countLeadingZeros() >= (BitWidth - SrcBitWidth) &&
APInt N1APInt = N1C->getAPIntValue().trunc(VT.getScalarSizeInBits());
if (N1APInt.countLeadingZeros() >= (BitWidth - SrcBitWidth) &&
TLI.isTruncateFree(VT, SrcVT) && TLI.isZExtFree(SrcVT, VT) &&
TLI.isTypeDesirableForOp(ISD::AND, SrcVT) &&
TLI.isNarrowingProfitable(N, VT, SrcVT))
Expand Down
12 changes: 4 additions & 8 deletions 12 llvm/test/CodeGen/AArch64/aarch64-neon-vector-insert-uaddlv.ll
Original file line number Diff line number Diff line change
Expand Up @@ -282,8 +282,7 @@ define void @insert_vec_v16i8_uaddlv_from_v8i8(ptr %0) {
; CHECK-NEXT: uaddlv.8b h1, v0
; CHECK-NEXT: stp q0, q0, [x0, #32]
; CHECK-NEXT: mov.b v2[0], v1[0]
; CHECK-NEXT: zip1.8b v2, v2, v2
; CHECK-NEXT: bic.4h v2, #255, lsl #8
; CHECK-NEXT: ushll.8h v2, v2, #0
; CHECK-NEXT: ushll.4s v2, v2, #0
; CHECK-NEXT: ucvtf.4s v2, v2
; CHECK-NEXT: stp q2, q0, [x0]
Expand All @@ -305,8 +304,7 @@ define void @insert_vec_v8i8_uaddlv_from_v8i8(ptr %0) {
; CHECK-NEXT: stp xzr, xzr, [x0, #16]
; CHECK-NEXT: uaddlv.8b h1, v0
; CHECK-NEXT: mov.b v0[0], v1[0]
; CHECK-NEXT: zip1.8b v0, v0, v0
; CHECK-NEXT: bic.4h v0, #255, lsl #8
; CHECK-NEXT: ushll.8h v0, v0, #0
; CHECK-NEXT: ushll.4s v0, v0, #0
; CHECK-NEXT: ucvtf.4s v0, v0
; CHECK-NEXT: str q0, [x0]
Expand Down Expand Up @@ -436,8 +434,7 @@ define void @insert_vec_v8i8_uaddlv_from_v4i32(ptr %0) {
; CHECK-NEXT: stp xzr, xzr, [x0, #16]
; CHECK-NEXT: uaddlv.4s d0, v0
; CHECK-NEXT: mov.b v1[0], v0[0]
; CHECK-NEXT: zip1.8b v1, v1, v1
; CHECK-NEXT: bic.4h v1, #255, lsl #8
; CHECK-NEXT: ushll.8h v1, v1, #0
; CHECK-NEXT: ushll.4s v1, v1, #0
; CHECK-NEXT: ucvtf.4s v1, v1
; CHECK-NEXT: str q1, [x0]
Expand All @@ -461,8 +458,7 @@ define void @insert_vec_v16i8_uaddlv_from_v4i32(ptr %0) {
; CHECK-NEXT: uaddlv.4s d0, v0
; CHECK-NEXT: stp q2, q2, [x0, #32]
; CHECK-NEXT: mov.b v1[0], v0[0]
; CHECK-NEXT: zip1.8b v1, v1, v1
; CHECK-NEXT: bic.4h v1, #255, lsl #8
; CHECK-NEXT: ushll.8h v1, v1, #0
; CHECK-NEXT: ushll.4s v1, v1, #0
; CHECK-NEXT: ucvtf.4s v1, v1
; CHECK-NEXT: stp q1, q2, [x0]
Expand Down
4 changes: 2 additions & 2 deletions 4 llvm/test/CodeGen/AArch64/bitcast-extend.ll
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ define <4 x i16> @z_i32_v4i16(i32 %x) {
; CHECK-SD-LABEL: z_i32_v4i16:
; CHECK-SD: // %bb.0:
; CHECK-SD-NEXT: fmov s0, w0
; CHECK-SD-NEXT: zip1 v0.8b, v0.8b, v0.8b
; CHECK-SD-NEXT: bic v0.4h, #255, lsl #8
; CHECK-SD-NEXT: ushll v0.8h, v0.8b, #0
; CHECK-SD-NEXT: // kill: def $d0 killed $d0 killed $q0
; CHECK-SD-NEXT: ret
;
; CHECK-GI-LABEL: z_i32_v4i16:
Expand Down
3 changes: 1 addition & 2 deletions 3 llvm/test/CodeGen/AArch64/ctlz.ll
Original file line number Diff line number Diff line change
Expand Up @@ -44,8 +44,7 @@ define void @v3i8(ptr %p1) {
; CHECK-SD-NEXT: .cfi_def_cfa_offset 16
; CHECK-SD-NEXT: ldr s1, [x0]
; CHECK-SD-NEXT: movi v0.4h, #8
; CHECK-SD-NEXT: zip1 v1.8b, v1.8b, v1.8b
; CHECK-SD-NEXT: bic v1.4h, #255, lsl #8
; CHECK-SD-NEXT: ushll v1.8h, v1.8b, #0
; CHECK-SD-NEXT: clz v1.4h, v1.4h
; CHECK-SD-NEXT: sub v0.4h, v1.4h, v0.4h
; CHECK-SD-NEXT: uzp1 v1.8b, v0.8b, v0.8b
Expand Down
3 changes: 1 addition & 2 deletions 3 llvm/test/CodeGen/AArch64/ctpop.ll
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,7 @@ define void @v3i8(ptr %p1) {
; CHECK-SD-NEXT: sub sp, sp, #16
; CHECK-SD-NEXT: .cfi_def_cfa_offset 16
; CHECK-SD-NEXT: ldr s0, [x0]
; CHECK-SD-NEXT: zip1 v0.8b, v0.8b, v0.8b
; CHECK-SD-NEXT: bic v0.4h, #255, lsl #8
; CHECK-SD-NEXT: ushll v0.8h, v0.8b, #0
; CHECK-SD-NEXT: cnt v0.8b, v0.8b
; CHECK-SD-NEXT: uaddlp v0.4h, v0.8b
; CHECK-SD-NEXT: uzp1 v1.8b, v0.8b, v0.8b
Expand Down
90 changes: 31 additions & 59 deletions 90 llvm/test/CodeGen/AArch64/itofp.ll
Original file line number Diff line number Diff line change
Expand Up @@ -5503,14 +5503,10 @@ define <8 x float> @utofp_v8i8_v8f32(<8 x i8> %a) {
; CHECK-SD-LABEL: utofp_v8i8_v8f32:
; CHECK-SD: // %bb.0: // %entry
; CHECK-SD-NEXT: ushll v0.8h, v0.8b, #0
; CHECK-SD-NEXT: ext v1.16b, v0.16b, v0.16b, #8
; CHECK-SD-NEXT: // kill: def $d0 killed $d0 killed $q0
; CHECK-SD-NEXT: bic v0.4h, #255, lsl #8
; CHECK-SD-NEXT: bic v1.4h, #255, lsl #8
; CHECK-SD-NEXT: ushll2 v1.4s, v0.8h, #0
; CHECK-SD-NEXT: ushll v0.4s, v0.4h, #0
; CHECK-SD-NEXT: ushll v1.4s, v1.4h, #0
; CHECK-SD-NEXT: ucvtf v0.4s, v0.4s
; CHECK-SD-NEXT: ucvtf v1.4s, v1.4s
; CHECK-SD-NEXT: ucvtf v0.4s, v0.4s
; CHECK-SD-NEXT: ret
;
; CHECK-GI-LABEL: utofp_v8i8_v8f32:
Expand Down Expand Up @@ -5562,24 +5558,16 @@ entry:
define <16 x float> @utofp_v16i8_v16f32(<16 x i8> %a) {
; CHECK-SD-LABEL: utofp_v16i8_v16f32:
; CHECK-SD: // %bb.0: // %entry
; CHECK-SD-NEXT: ushll2 v1.8h, v0.16b, #0
; CHECK-SD-NEXT: ushll v0.8h, v0.8b, #0
; CHECK-SD-NEXT: ext v2.16b, v1.16b, v1.16b, #8
; CHECK-SD-NEXT: ext v3.16b, v0.16b, v0.16b, #8
; CHECK-SD-NEXT: // kill: def $d0 killed $d0 killed $q0
; CHECK-SD-NEXT: // kill: def $d1 killed $d1 killed $q1
; CHECK-SD-NEXT: bic v0.4h, #255, lsl #8
; CHECK-SD-NEXT: bic v1.4h, #255, lsl #8
; CHECK-SD-NEXT: bic v2.4h, #255, lsl #8
; CHECK-SD-NEXT: bic v3.4h, #255, lsl #8
; CHECK-SD-NEXT: ushll v0.4s, v0.4h, #0
; CHECK-SD-NEXT: ushll v1.4s, v1.4h, #0
; CHECK-SD-NEXT: ushll v4.4s, v2.4h, #0
; CHECK-SD-NEXT: ushll v5.4s, v3.4h, #0
; CHECK-SD-NEXT: ucvtf v0.4s, v0.4s
; CHECK-SD-NEXT: ucvtf v2.4s, v1.4s
; CHECK-SD-NEXT: ucvtf v3.4s, v4.4s
; CHECK-SD-NEXT: ucvtf v1.4s, v5.4s
; CHECK-SD-NEXT: ushll v1.8h, v0.8b, #0
; CHECK-SD-NEXT: ushll2 v0.8h, v0.16b, #0
; CHECK-SD-NEXT: ushll v2.4s, v1.4h, #0
; CHECK-SD-NEXT: ushll2 v3.4s, v0.8h, #0
; CHECK-SD-NEXT: ushll2 v1.4s, v1.8h, #0
; CHECK-SD-NEXT: ushll v4.4s, v0.4h, #0
; CHECK-SD-NEXT: ucvtf v0.4s, v2.4s
; CHECK-SD-NEXT: ucvtf v3.4s, v3.4s
; CHECK-SD-NEXT: ucvtf v1.4s, v1.4s
; CHECK-SD-NEXT: ucvtf v2.4s, v4.4s
; CHECK-SD-NEXT: ret
;
; CHECK-GI-LABEL: utofp_v16i8_v16f32:
Expand Down Expand Up @@ -5656,42 +5644,26 @@ entry:
define <32 x float> @utofp_v32i8_v32f32(<32 x i8> %a) {
; CHECK-SD-LABEL: utofp_v32i8_v32f32:
; CHECK-SD: // %bb.0: // %entry
; CHECK-SD-NEXT: ushll2 v2.8h, v0.16b, #0
; CHECK-SD-NEXT: ushll v0.8h, v0.8b, #0
; CHECK-SD-NEXT: ushll2 v3.8h, v1.16b, #0
; CHECK-SD-NEXT: ushll v1.8h, v1.8b, #0
; CHECK-SD-NEXT: ext v4.16b, v2.16b, v2.16b, #8
; CHECK-SD-NEXT: ext v5.16b, v0.16b, v0.16b, #8
; CHECK-SD-NEXT: ext v6.16b, v3.16b, v3.16b, #8
; CHECK-SD-NEXT: ext v7.16b, v1.16b, v1.16b, #8
; CHECK-SD-NEXT: // kill: def $d2 killed $d2 killed $q2
; CHECK-SD-NEXT: // kill: def $d3 killed $d3 killed $q3
; CHECK-SD-NEXT: // kill: def $d0 killed $d0 killed $q0
; CHECK-SD-NEXT: // kill: def $d1 killed $d1 killed $q1
; CHECK-SD-NEXT: bic v2.4h, #255, lsl #8
; CHECK-SD-NEXT: bic v0.4h, #255, lsl #8
; CHECK-SD-NEXT: bic v1.4h, #255, lsl #8
; CHECK-SD-NEXT: bic v3.4h, #255, lsl #8
; CHECK-SD-NEXT: bic v4.4h, #255, lsl #8
; CHECK-SD-NEXT: bic v5.4h, #255, lsl #8
; CHECK-SD-NEXT: bic v6.4h, #255, lsl #8
; CHECK-SD-NEXT: bic v7.4h, #255, lsl #8
; CHECK-SD-NEXT: ushll v2.8h, v0.8b, #0
; CHECK-SD-NEXT: ushll2 v0.8h, v0.16b, #0
; CHECK-SD-NEXT: ushll v3.8h, v1.8b, #0
; CHECK-SD-NEXT: ushll2 v1.8h, v1.16b, #0
; CHECK-SD-NEXT: ushll2 v4.4s, v2.8h, #0
; CHECK-SD-NEXT: ushll v2.4s, v2.4h, #0
; CHECK-SD-NEXT: ushll v0.4s, v0.4h, #0
; CHECK-SD-NEXT: ushll v1.4s, v1.4h, #0
; CHECK-SD-NEXT: ushll v17.4s, v3.4h, #0
; CHECK-SD-NEXT: ushll v16.4s, v4.4h, #0
; CHECK-SD-NEXT: ushll v5.4s, v5.4h, #0
; CHECK-SD-NEXT: ushll v18.4s, v6.4h, #0
; CHECK-SD-NEXT: ushll v19.4s, v7.4h, #0
; CHECK-SD-NEXT: ucvtf v2.4s, v2.4s
; CHECK-SD-NEXT: ucvtf v0.4s, v0.4s
; CHECK-SD-NEXT: ucvtf v4.4s, v1.4s
; CHECK-SD-NEXT: ucvtf v6.4s, v17.4s
; CHECK-SD-NEXT: ucvtf v3.4s, v16.4s
; CHECK-SD-NEXT: ucvtf v1.4s, v5.4s
; CHECK-SD-NEXT: ucvtf v7.4s, v18.4s
; CHECK-SD-NEXT: ucvtf v5.4s, v19.4s
; CHECK-SD-NEXT: ushll2 v5.4s, v0.8h, #0
; CHECK-SD-NEXT: ushll v6.4s, v0.4h, #0
; CHECK-SD-NEXT: ushll v7.4s, v3.4h, #0
; CHECK-SD-NEXT: ushll2 v16.4s, v1.8h, #0
; CHECK-SD-NEXT: ushll2 v17.4s, v3.8h, #0
; CHECK-SD-NEXT: ushll v18.4s, v1.4h, #0
; CHECK-SD-NEXT: ucvtf v1.4s, v4.4s
; CHECK-SD-NEXT: ucvtf v0.4s, v2.4s
; CHECK-SD-NEXT: ucvtf v3.4s, v5.4s
; CHECK-SD-NEXT: ucvtf v2.4s, v6.4s
; CHECK-SD-NEXT: ucvtf v4.4s, v7.4s
; CHECK-SD-NEXT: ucvtf v7.4s, v16.4s
; CHECK-SD-NEXT: ucvtf v5.4s, v17.4s
; CHECK-SD-NEXT: ucvtf v6.4s, v18.4s
; CHECK-SD-NEXT: ret
;
; CHECK-GI-LABEL: utofp_v32i8_v32f32:
Expand Down
23 changes: 8 additions & 15 deletions 23 llvm/test/CodeGen/AArch64/vec3-loads-ext-trunc-stores.ll
Original file line number Diff line number Diff line change
Expand Up @@ -444,8 +444,7 @@ define void @load_ext_to_64bits(ptr %src, ptr %dst) {
; CHECK-NEXT: orr w8, w9, w8, lsl #16
; CHECK-NEXT: fmov s0, w8
; CHECK-NEXT: add x8, x1, #4
; CHECK-NEXT: zip1.8b v0, v0, v0
; CHECK-NEXT: bic.4h v0, #255, lsl #8
; CHECK-NEXT: ushll.8h v0, v0, #0
; CHECK-NEXT: st1.h { v0 }[2], [x8]
; CHECK-NEXT: str s0, [x1]
; CHECK-NEXT: ret
Expand Down Expand Up @@ -480,8 +479,7 @@ define void @load_ext_to_64bits_default_align(ptr %src, ptr %dst) {
; CHECK: ; %bb.0: ; %entry
; CHECK-NEXT: ldr s0, [x0]
; CHECK-NEXT: add x8, x1, #4
; CHECK-NEXT: zip1.8b v0, v0, v0
; CHECK-NEXT: bic.4h v0, #255, lsl #8
; CHECK-NEXT: ushll.8h v0, v0, #0
; CHECK-NEXT: st1.h { v0 }[2], [x8]
; CHECK-NEXT: str s0, [x1]
; CHECK-NEXT: ret
Expand All @@ -491,8 +489,7 @@ define void @load_ext_to_64bits_default_align(ptr %src, ptr %dst) {
; BE-NEXT: ldr s0, [x0]
; BE-NEXT: add x8, x1, #4
; BE-NEXT: rev32 v0.8b, v0.8b
; BE-NEXT: zip1 v0.8b, v0.8b, v0.8b
; BE-NEXT: bic v0.4h, #255, lsl #8
; BE-NEXT: ushll v0.8h, v0.8b, #0
; BE-NEXT: rev32 v1.8h, v0.8h
; BE-NEXT: st1 { v0.h }[2], [x8]
; BE-NEXT: str s1, [x1]
Expand All @@ -509,8 +506,7 @@ define void @load_ext_to_64bits_align_4(ptr %src, ptr %dst) {
; CHECK: ; %bb.0: ; %entry
; CHECK-NEXT: ldr s0, [x0]
; CHECK-NEXT: add x8, x1, #4
; CHECK-NEXT: zip1.8b v0, v0, v0
; CHECK-NEXT: bic.4h v0, #255, lsl #8
; CHECK-NEXT: ushll.8h v0, v0, #0
; CHECK-NEXT: st1.h { v0 }[2], [x8]
; CHECK-NEXT: str s0, [x1]
; CHECK-NEXT: ret
Expand All @@ -520,8 +516,7 @@ define void @load_ext_to_64bits_align_4(ptr %src, ptr %dst) {
; BE-NEXT: ldr s0, [x0]
; BE-NEXT: add x8, x1, #4
; BE-NEXT: rev32 v0.8b, v0.8b
; BE-NEXT: zip1 v0.8b, v0.8b, v0.8b
; BE-NEXT: bic v0.4h, #255, lsl #8
; BE-NEXT: ushll v0.8h, v0.8b, #0
; BE-NEXT: rev32 v1.8h, v0.8h
; BE-NEXT: st1 { v0.h }[2], [x8]
; BE-NEXT: str s1, [x1]
Expand All @@ -541,13 +536,11 @@ define void @load_ext_add_to_64bits(ptr %src, ptr %dst) {
; CHECK-NEXT: Lloh2:
; CHECK-NEXT: adrp x8, lCPI15_0@PAGE
; CHECK-NEXT: Lloh3:
; CHECK-NEXT: ldr d1, [x8, lCPI15_0@PAGEOFF]
; CHECK-NEXT: ldr d0, [x8, lCPI15_0@PAGEOFF]
; CHECK-NEXT: add x8, x1, #4
; CHECK-NEXT: orr w9, w10, w9, lsl #16
; CHECK-NEXT: fmov s0, w9
; CHECK-NEXT: zip1.8b v0, v0, v0
; CHECK-NEXT: bic.4h v0, #255, lsl #8
; CHECK-NEXT: add.4h v0, v0, v1
; CHECK-NEXT: fmov s1, w9
; CHECK-NEXT: uaddw.8h v0, v0, v1
; CHECK-NEXT: st1.h { v0 }[2], [x8]
; CHECK-NEXT: str s0, [x1]
; CHECK-NEXT: ret
Expand Down
36 changes: 12 additions & 24 deletions 36 llvm/test/CodeGen/AArch64/vector-fcvt.ll
Original file line number Diff line number Diff line change
Expand Up @@ -114,14 +114,10 @@ define <8 x float> @uitofp_v8i8_float(<8 x i8> %a) {
; CHECK-LABEL: uitofp_v8i8_float:
; CHECK: // %bb.0:
; CHECK-NEXT: ushll v0.8h, v0.8b, #0
; CHECK-NEXT: ext v1.16b, v0.16b, v0.16b, #8
; CHECK-NEXT: // kill: def $d0 killed $d0 killed $q0
; CHECK-NEXT: bic v0.4h, #255, lsl #8
; CHECK-NEXT: bic v1.4h, #255, lsl #8
; CHECK-NEXT: ushll2 v1.4s, v0.8h, #0
; CHECK-NEXT: ushll v0.4s, v0.4h, #0
; CHECK-NEXT: ushll v1.4s, v1.4h, #0
; CHECK-NEXT: ucvtf v0.4s, v0.4s
; CHECK-NEXT: ucvtf v1.4s, v1.4s
; CHECK-NEXT: ucvtf v0.4s, v0.4s
; CHECK-NEXT: ret
%1 = uitofp <8 x i8> %a to <8 x float>
ret <8 x float> %1
Expand All @@ -130,24 +126,16 @@ define <8 x float> @uitofp_v8i8_float(<8 x i8> %a) {
define <16 x float> @uitofp_v16i8_float(<16 x i8> %a) {
; CHECK-LABEL: uitofp_v16i8_float:
; CHECK: // %bb.0:
; CHECK-NEXT: ushll2 v1.8h, v0.16b, #0
; CHECK-NEXT: ushll v0.8h, v0.8b, #0
; CHECK-NEXT: ext v2.16b, v1.16b, v1.16b, #8
; CHECK-NEXT: ext v3.16b, v0.16b, v0.16b, #8
; CHECK-NEXT: // kill: def $d0 killed $d0 killed $q0
; CHECK-NEXT: // kill: def $d1 killed $d1 killed $q1
; CHECK-NEXT: bic v0.4h, #255, lsl #8
; CHECK-NEXT: bic v1.4h, #255, lsl #8
; CHECK-NEXT: bic v2.4h, #255, lsl #8
; CHECK-NEXT: bic v3.4h, #255, lsl #8
; CHECK-NEXT: ushll v0.4s, v0.4h, #0
; CHECK-NEXT: ushll v1.4s, v1.4h, #0
; CHECK-NEXT: ushll v4.4s, v2.4h, #0
; CHECK-NEXT: ushll v5.4s, v3.4h, #0
; CHECK-NEXT: ucvtf v0.4s, v0.4s
; CHECK-NEXT: ucvtf v2.4s, v1.4s
; CHECK-NEXT: ucvtf v3.4s, v4.4s
; CHECK-NEXT: ucvtf v1.4s, v5.4s
; CHECK-NEXT: ushll v1.8h, v0.8b, #0
; CHECK-NEXT: ushll2 v0.8h, v0.16b, #0
; CHECK-NEXT: ushll v2.4s, v1.4h, #0
; CHECK-NEXT: ushll2 v3.4s, v0.8h, #0
; CHECK-NEXT: ushll2 v1.4s, v1.8h, #0
; CHECK-NEXT: ushll v4.4s, v0.4h, #0
; CHECK-NEXT: ucvtf v0.4s, v2.4s
; CHECK-NEXT: ucvtf v3.4s, v3.4s
; CHECK-NEXT: ucvtf v1.4s, v1.4s
; CHECK-NEXT: ucvtf v2.4s, v4.4s
; CHECK-NEXT: ret
%1 = uitofp <16 x i8> %a to <16 x float>
ret <16 x float> %1
Expand Down
Loading
Morty Proxy This is a proxified and sanitized view of the page, visit original site.