Closed
Description
- lazy builtins are mostly used for finite automaton engines (re2, regex)
- the use of large bounded repetitions in the default regex set is an issue for finite automaton engines (Specify and regularise non-functional bounded repetitions uap-core#596)
- Graal also uses a finite automaton engine under the hood
- TRegex seems to really not deal well with the writing style of regexes.yaml (re is much slower than cpython oracle/graalpython#445)
As such, it would make sense to:
- transform the builtin regexes set for better compatibility (similar to ua-parser/uap-rust@29b9195)
- use (or at least bench) the lazy regexes for graal
- maybe implement Measure memory / performance if regexes are compiled in
ASCII
mode #212 as a back-transformation as well, for both performances and semantic correctness
The one major drawback is that re2
currently uses re
for its actual extraction. So here there are two options:
- using re2 for extraction, if the back transformation leads to better performances
- or have the finite automaton engines perform the transformation at load time internally
The latter would be costlier on loads, but it would mean less compilation woes, working with yaml data, and would make for less trouble in graal (in that it would patch the lazy callbacks with a transformation pass).
Metadata
Metadata
Assignees
Labels
No labels