Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

SHA instructions mixed with AVX degrades performance by 10x #142368

Copy link
Copy link
Open
@chfast

Description

@chfast
Issue body actions

In the following implementation of the SHA256 the SHA intrinsic are used to accelerate the performance on the supported CPUs. Although, the SSE intrinsic are used for other basic operations like load/store/shuffle the LLVM decides to emit AVX variants of these if the AVX is enabled. It looks like the mixture of AVX and SSE/SHA instruction significantly degrades the performance (over 10x).

https://github.com/noloader/SHA-Intrinsics/blob/master/sha256-x86.c

The small fragment of the implementation:

# include <stdint.h>
# include <x86intrin.h>

void sha256_process_x86(uint32_t state[8], const uint8_t data[])
{
    __m128i STATE0, STATE1;
    __m128i MSG, TMP;
    __m128i MSG0, MSG1, MSG2, MSG3;
    __m128i ABEF_SAVE, CDGH_SAVE;
    const __m128i MASK = _mm_set_epi64x(0x0c0d0e0f08090a0bULL, 0x0405060700010203ULL);

    /* Rounds 0-3 */
    MSG = _mm_loadu_si128((const __m128i*) (data+0));
    MSG0 = _mm_shuffle_epi8(MSG, MASK);
    MSG = _mm_add_epi32(MSG0, _mm_set_epi64x(0xE9B5DBA5B5C0FBCFULL, 0x71374491428A2F98ULL));
    STATE1 = _mm_sha256rnds2_epu32(STATE1, STATE0, MSG);
    MSG = _mm_shuffle_epi32(MSG, 0x0E);
    STATE0 = _mm_sha256rnds2_epu32(STATE0, STATE1, MSG);

    _mm_storeu_si128((__m128i*)&state[0], STATE0);
    _mm_storeu_si128((__m128i*)&state[4], STATE1);
}

https://godbolt.org/z/4Ks53fjeo

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Morty Proxy This is a proxified and sanitized view of the page, visit original site.