Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Conversation

martinvuyk
Copy link
Contributor

Greatly optimize unicode casing by adding fast paths for ASCII and Latin-1 encodings.

Benchmark results

CPU: Intel® Core™ i7-7700HQ

bench old value (ms) new value (ms) markdown perc magnitude
bench_string_lower[10] 72.42 15.50 0.79 4.67
bench_string_upper[10] 121.40 22.11 0.82 5.49
bench_string_lower[30] 74.95 15.39 0.79 4.87
bench_string_upper[30] 125.36 21.35 0.83 5.87
bench_string_lower[50] 77.73 14.76 0.81 5.27
bench_string_upper[50] 132.22 20.05 0.85 6.59
bench_string_lower[100] 86.46 14.48 0.83 5.97
bench_string_upper[100] 119.76 19.96 0.83 6.00
bench_string_lower[1000] 69.18 14.03 0.80 4.93
bench_string_upper[1000] 125.47 19.48 0.84 6.44
bench_string_lower[10000] 67.78 13.90 0.79 4.88
bench_string_upper[10000] 125.44 19.44 0.85 6.45
bench_string_lower[100000] 67.40 14.68 0.78 4.59
bench_string_upper[100000] 125.52 20.99 0.83 5.98
bench_string_lower[1000000] 67.39 15.11 0.78 4.46
bench_string_upper[1000000] 125.58 20.58 0.84 6.10

@martinvuyk martinvuyk requested a review from a team as a code owner October 5, 2025 20:40
@martinvuyk martinvuyk force-pushed the fix-unicode-lowercasing branch 3 times, most recently from c86ae88 to 46a2469 Compare October 5, 2025 20:47
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
@barcharcraz
Copy link
Contributor

This is obviously an improvement, but I think we should optimize the whole algorithm as well. _unicode_lookups appears to be a very simple translation of the UCD case folding data, and we could probably get better performance (and better code size) by preprocessing that data to something more compact.

A starting point would be figuring out the offsets for each character, then merging ranges with the same offsets and storing just the range start and length. Obviously this is more "work" to look up but it's a lot less thrashing. If you start the search by looking at the first range then you end up doing basically the same thing as this optimization except you might not get the bitwise version (which probably won't matter, arithmetic has the same TP as bitwise ops and there aren't any dependency chains on the output).

You'd probably want to first check at the beginning of your map (in case of ascii) then start searching from wherever the last processed character was.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.