-
Notifications
You must be signed in to change notification settings - Fork 1k
Open
Labels
Description
When lowering a variable shift, the compiler creates a shift followed by a select.
This normally works fine, but on AVR, the shift is lowered to a loop that uses the shift length as a counter.
For example:
package main
func main() {
for i := 0; i < 256; i++ {
println("1 <<", i, "=", shl(uint8(i)))
}
}
//go:noinline
func shl(sh uint8) uint16 {
return 1 << sh
}
Is tuned into the LLVM IR:
; Function Attrs: minsize mustprogress nofree noinline norecurse nosync nounwind optsize willreturn memory(none)
define internal fastcc range(i16 0, -32767) i16 @main.shl(i8 %sh) unnamed_addr addrspace(1) #11 !dbg !4130 {
entry:
#dbg_value(i8 %sh, !4134, !DIExpression(), !4135)
#dbg_value(i8 %sh, !4134, !DIExpression(), !4136)
%shift.overflow = icmp ugt i8 %sh, 15, !dbg !4137
%0 = zext nneg i8 %sh to i16, !dbg !4137
%1 = shl nuw i16 1, %0, !dbg !4137
%shift.result = select i1 %shift.overflow, i16 0, i16 %1, !dbg !4137
ret i16 %shift.result, !dbg !4138
}
Which compiles to:
00000ff4 <main.shl>:
ff4: 28 2f mov r18, r24
ff6: 81 e0 ldi r24, 0x1
ff8: 90 e0 ldi r25, 0x0
ffa: 32 2f mov r19, r18
ffc: 3a 95 dec r19
ffe: 1a f0 brmi .+6
1000: 88 0f lsl r24
1002: 99 1f rol r25
1004: fb cf rjmp .-10
1006: 20 31 cpi r18, 0x10
1008: 10 f0 brlo .+4
100a: 80 e0 ldi r24, 0x0
100c: 90 e0 ldi r25, 0x0
100e: 08 95 ret
The lsl + rol loop can run up to 128 times (larger shifts will trigger the brmi immediately). I do not think this is an expected performance penalty.