Skip to main content
  1. About
  2. For Teams
37 events
when toggle format what by license comment
Mar 4 at 3:41 history edited Peter Cordes
edited tags
Aug 2, 2019 at 14:15 audit Triage
Aug 2, 2019 at 14:48
Jul 30, 2019 at 19:13 audit Triage
Jul 30, 2019 at 19:43
Jul 30, 2019 at 15:16 audit Close votes
Jul 31, 2019 at 8:23
Jul 29, 2019 at 17:54 audit Close votes
Jul 30, 2019 at 0:18
Jul 29, 2019 at 14:12 audit Triage
Jul 29, 2019 at 14:36
Jul 26, 2019 at 6:18 audit Close votes
Jul 26, 2019 at 6:18
Jul 25, 2019 at 15:00 audit Triage
Jul 25, 2019 at 15:12
Jul 25, 2019 at 7:37 audit Triage
Jul 25, 2019 at 7:57
Jul 22, 2019 at 14:06 audit Triage
Jul 22, 2019 at 14:53
Jul 19, 2019 at 6:21 audit Triage
Jul 19, 2019 at 6:21
Jul 17, 2019 at 9:45 audit Triage
Jul 17, 2019 at 10:01
Jul 16, 2019 at 5:45 audit Triage
Jul 16, 2019 at 6:13
Jul 4, 2019 at 0:16 comment added BeeOnRope @HCSF important routines in libc are generally compiled multiple times for different ISAs and then the version appropriate for the current CPU is selected at runtime using the dynamic loader's IFUNC capability. So you'll usually get a version optimized for your CPU (unless your libc is quite old and your CPU quite new).
Jul 3, 2019 at 15:06 comment added HCSF @BeeOnRope you brought up an interesting point -- "other libraries, at a minimum libc - and these have 256-bit instructions". I thought most libraries come with the Linux distros were not compiled for a specific x86 CPU, and some x86 CPUs don't have AVX 256 support and so library like libc shouldn't have any 256-bit instructions. No?
Jul 3, 2019 at 14:31 comment added BeeOnRope Yeah vzeroupper makes more sense after using umm registers, to avoid transition penalties for "dirty upper" and probably isn't needed for xmm only code. I think there is a flag to turn it's emission off.
Jul 3, 2019 at 14:30 comment added BeeOnRope Yeah that's a reasonable way to check the binary. Keep in mind at runtime you'll likely use other libraries, at a minimum libc - and these have 256-bit instructions, eg in their memcpy implementation. So you really have to do a runtime check to be sure you aren't executing any "forbidden" instructions. I don't the 256b instructions in libc are likely to be a problem wrt the licenses since they are light.
Jul 3, 2019 at 7:34 comment added HCSF I tried to compile with -march=skylake-avx512 -mtune=skylake-avx512 -mprefer-vector-width=128, and then I decompiled it objdump -d my binary > binary.asm, and then grep -i ymm binary.asm. I guess it is safe to conclude that it doesn't use any 256 and 512 bit registers and so no AVX-256 and AVX-512 instructions are emitted? @BeeOnRope Tho, I still see many vzeroupper instructions. I thought it were only used with ymm registers. No?
Jul 3, 2019 at 6:59 comment added HCSF I can try it now. But is there a way to check whether L1/L2 instructions are in the binary?
Jul 3, 2019 at 6:57 comment added BeeOnRope Try those options and then check if any L1/L2 instructions pop up using the performance counter events for L1 and L2 licenses.
Jul 3, 2019 at 6:57 comment added BeeOnRope Peter mentioned the -mpreferred-vector-width=256 option. I don't know if it prevents gcc from ever producing AVX-512 instructions (outside of direct intrinsic use), but it is certainly possible. I am not aware of any option which distinguishes between "heavy" and "light" instructions however. Usually this isn't a problem, since if you turn off AVX-512 and don't have a bunch of FP ops, you are probably targeting L0 anyways, and AVX-512 light is still L1.
Jul 3, 2019 at 6:54 comment added HCSF @BeeOnRope Based on your answer, is there anyway to tell GCC not to generate any AVX-512 and AVX-256-heavy instructions? But all other instructions are okay.
Jul 3, 2019 at 6:51 history edited BeeOnRope CC BY-SA 4.0
edited title
Jul 3, 2019 at 6:25 history edited HCSF CC BY-SA 4.0
added 5 characters in body
Jul 3, 2019 at 3:37 history edited phuclv CC BY-SA 4.0
Improved Formatting
Jul 3, 2019 at 0:03 comment added BeeOnRope @HCSF - you can avoid the ldd related penalty by issuing a vzeroupper at the start of your program.
Jul 3, 2019 at 0:02 answer added BeeOnRope timeline score: 99
Jul 2, 2019 at 20:35 history edited Peter Cordes CC BY-SA 4.0
edited tags
Jul 2, 2019 at 20:34 answer added Peter Cordes timeline score: 18
Jul 2, 2019 at 14:52 comment added Margaret Bloom @HCSF You can make three builds, one without AVX, one with AVX/AVX2 and one with AVX-512 (if applicable) and profile them. Then take the fastest one.
Jul 2, 2019 at 14:33 comment added HCSF @MargaretBloom thanks for sharing your thought and all the links. I also read Beeonrope's post about the penalty in ld. Given my ld is very old, I think it is best for me to avoid AVX and AVX512 related instructions. And as you pointed out, the ratio of vector to scalar is also important. Given I write high level C++ code, it is hard to figure the ratio unless I check the assembly output each time, slowing the the development...
Jul 2, 2019 at 13:10 comment added Margaret Bloom ... two cycles and drop the frequency according to their tier (e.g. AVX-512 heavy instrs drop the frequency to the AV-512 base). Travis also shared the code he used to test here. You can find the behaviour of each instruction with a bit of patience or by his rule of thumb. Finally note that this frequency scaling is a problem iif the ratio of vector to scalar instruction is low enough so that the drop in frequency is not balanced by the bigger width at which data is processed. Check the final binary to see if you really gained anything.
Jul 2, 2019 at 13:06 comment added Margaret Bloom Trevis Down (aka Beeonrope on OS) wrote about this in the comments in this post and continued the discussion here. He found that each ties (scalar, AVX/AVX2, AVX-512) has "cheap" (no FP, simple operations) instructions and "heavy" instruction. Cheap instructions drop the frequency to the one of the next higher tier (e.g. cheap AVX-512 inst use the AVX/AVX2 tier) even if used sparsely. Heavy inst must be used more than 1 every ...
Jul 2, 2019 at 12:51 comment added HCSF @500-InternalServerError in order to avoid jitters in the system. Think about a laser arm gets jitters.
Jul 2, 2019 at 12:50 history edited HCSF CC BY-SA 4.0
added 99 characters in body
Jul 2, 2019 at 12:49 comment added 500 - Internal Server Error instructions to avoid in order to accomplish what exactly?
Jul 2, 2019 at 12:45 history asked HCSF CC BY-SA 4.0
Morty Proxy This is a proxified and sanitized view of the page, visit original site.