Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit 18fab27

Browse filesBrowse files
committed
Use an atom length of 2 for the regex filtering
As it turns out a *significant* number of regexes have distinguishing atoms of length 2 rather than 3, leading to significant under-performing prefiltering using default settings e.g. when parsing sample 9997 (`sort -u` of sample file), the default setting prefilter from 633 to 61 regexes, of which the matching regex is number 50, leading to a lot of `Regex::is_match`. Looking at the "extra" regexes, while they do have pretty long atoms those tend to be optional, the only required atoms are very short. By reducing the atom length to 2, the prefiltered set goes down to 20, of which the regex we're looking for is 14th. This cuts down the post-prefiltering filtering from 6µs to 2 (in addition to a 2µs prefiltering but that doesn't change much, it goes from 2.2 to 2.3). This leads to a 15% perf increase on the benchmark, at no visible memory cost (maximum RSS and peak footprint are lost in noise), before: Lines: 751580 Total time: 8.139572291s 10µs / line 8.25 real 8.21 user 0.03 sys 57655296 maximum resident set size 0 average shared memory size 0 average unshared data size 0 average unshared stack size 3732 page reclaims 0 page faults 0 swaps 0 block input operations 0 block output operations 0 messages sent 0 messages received 0 signals received 0 voluntary context switches 85 involuntary context switches 74982477832 instructions retired 26557964231 cycles elapsed 54461952 peak memory footprint after: Lines: 751580 Total time: 6.797529459s 9µs / line 6.91 real 6.86 user 0.04 sys 57802752 maximum resident set size 0 average shared memory size 0 average unshared data size 0 average unshared stack size 3741 page reclaims 0 page faults 0 swaps 0 block input operations 0 block output operations 0 messages sent 0 messages received 0 signals received 0 voluntary context switches 154 involuntary context switches 65652792138 instructions retired 22207284899 cycles elapsed 54478080 peak memory footprint
1 parent 2d289cb commit 18fab27
Copy full SHA for 18fab27

File tree

Expand file treeCollapse file tree

1 file changed

+12
-3
lines changed
Filter options
Expand file treeCollapse file tree

1 file changed

+12
-3
lines changed

‎ua-parser/src/lib.rs

Copy file name to clipboardExpand all lines: ua-parser/src/lib.rs
+12-3Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -175,7 +175,10 @@ pub mod user_agent {
175175
impl<'a> Builder<'a> {
176176
/// Initialise an empty builder.
177177
pub fn new() -> Self {
178-
Self::default()
178+
Self {
179+
builder: regex_filtered::Builder::new_atom_len(2),
180+
repl: Vec::new(),
181+
}
179182
}
180183

181184
/// Build the extractor, may be called without pushing any
@@ -339,7 +342,10 @@ pub mod os {
339342
impl<'a> Builder<'a> {
340343
///
341344
pub fn new() -> Self {
342-
Self::default()
345+
Self {
346+
builder: regex_filtered::Builder::new_atom_len(2),
347+
repl: Vec::new(),
348+
}
343349
}
344350

345351
/// Builds the [`Extractor`], may fail if building the
@@ -503,7 +509,10 @@ pub mod device {
503509
/// Creates a builder in the default configurtion, which is
504510
/// the only configuration.
505511
pub fn new() -> Self {
506-
Self::default()
512+
Self {
513+
builder: regex_filtered::Builder::new_atom_len(2),
514+
repl: Vec::new(),
515+
}
507516
}
508517

509518
/// Builds an Extractor, may fail if compiling the prefilter fails.

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.