Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit 7f9ee73

Browse filesBrowse files
committed
Auto merge of rust-lang#103779 - the8472:simd-str-contains, r=thomcc
x86_64 SSE2 fast-path for str.contains(&str) and short needles Based on Wojciech Muła's [SIMD-friendly algorithms for substring searching](http://0x80.pl/articles/simd-strfind.html#sse-avx2) The two-way algorithm is Big-O efficient but it needs to preprocess the needle to find a "critical factorization" of it. This additional work is significant for short needles. Additionally it mostly advances needle.len() bytes at a time. The SIMD-based approach used here on the other hand can advance based on its vector width, which can exceed the needle length. Except for pathological cases, but due to being limited to small needles the worst case blowup is also small. benchmarks taken on a Zen2, compiled with `-Ccodegen-units=1`: ``` OLD: test str::bench_contains_16b_in_long ... bench: 504 ns/iter (+/- 14) = 5061 MB/s test str::bench_contains_2b_repeated_long ... bench: 948 ns/iter (+/- 175) = 2690 MB/s test str::bench_contains_32b_in_long ... bench: 445 ns/iter (+/- 6) = 5732 MB/s test str::bench_contains_bad_naive ... bench: 130 ns/iter (+/- 1) = 569 MB/s test str::bench_contains_bad_simd ... bench: 84 ns/iter (+/- 8) = 880 MB/s test str::bench_contains_equal ... bench: 142 ns/iter (+/- 7) = 394 MB/s test str::bench_contains_short_long ... bench: 677 ns/iter (+/- 25) = 3768 MB/s test str::bench_contains_short_short ... bench: 27 ns/iter (+/- 2) = 2074 MB/s NEW: test str::bench_contains_16b_in_long ... bench: 82 ns/iter (+/- 0) = 31109 MB/s test str::bench_contains_2b_repeated_long ... bench: 73 ns/iter (+/- 0) = 34945 MB/s test str::bench_contains_32b_in_long ... bench: 71 ns/iter (+/- 1) = 35929 MB/s test str::bench_contains_bad_naive ... bench: 7 ns/iter (+/- 0) = 10571 MB/s test str::bench_contains_bad_simd ... bench: 97 ns/iter (+/- 41) = 762 MB/s test str::bench_contains_equal ... bench: 4 ns/iter (+/- 0) = 14000 MB/s test str::bench_contains_short_long ... bench: 73 ns/iter (+/- 0) = 34945 MB/s test str::bench_contains_short_short ... bench: 12 ns/iter (+/- 0) = 4666 MB/s ```
2 parents 0d2bdef + c096c0a commit 7f9ee73
Copy full SHA for 7f9ee73

File tree

Expand file treeCollapse file tree

0 file changed

+0
-0
lines changed
Filter options
    Expand file treeCollapse file tree

    0 file changed

    +0
    -0
    lines changed

    0 commit comments

    Comments
    0 (0)
    Morty Proxy This is a proxified and sanitized view of the page, visit original site.