Generic method for byte classification? #2033

Jul 16, 2023

QuarticCat
Jul 16, 2023

In a talk, @lemire mentioned that in simdjson he used a technique to classify bytes:

Construct 2 tables, each one has 16 entries, mapping a nibble (4-bit) to a byte.
Split a byte to upper nibble and lower nibble.
Lookup 2 nibbles in 2 tables respectively, and get 2 bytes.
Bitand / bitor these 2 bytes to get the final class of the original byte.

My question is, is there a generic algorithm to construct such tables and identify situations that are not applicable?

Jul 16, 2023

We call this vectorized classification. At this time, to my knowledge, it hasn't been formalized yet.

I will work on it.

0 replies