Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Use of two table lookups instead of existing three table lookups for Utf8Validator. #69

Copy link
Copy link
@jatin-bhateja

Description

@jatin-bhateja
Issue body actions

I have been experimenting with Utf8Validator and find that existing handling uses three lookup tables which are indexed by upper and lower nibbles of first byte and upper nibble of second byte in the pair of consecutive bytes to catch various error scenarios.

Effectively, we refer to twelve bits, 8 from first byte and 4 from second bytes for lookups in 16 byte tables. Following PoC implimentation[1] uses two 64 byte lookup tables accessed using 6 bit indices. For first lookup, index is compsed of least signifianct 6 bits of first byte and for second lookup index concatinates upper nibble of second byte with most significant two bits from first byte.

I see around 5-7% performance improvement[2] over three table lookup.

Algorithm can be directly ported to Utf8Validator.

Best Regards,
Jatin

[1] https://github.com/jatin-bhateja/external_staging/blob/main/Code/java/vector-api/simd_json/ThreeVsTwoTableLookup.java
[2] https://github.com/jatin-bhateja/external_staging/blob/main/Code/java/vector-api/simd_json/performance_3Tvs2T_lookup.txt

Reactions are currently unavailable

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Morty Proxy This is a proxified and sanitized view of the page, visit original site.