Add C accelerator for fnmatch

Enhancement

By adding a C accelerator, it is possible to improve the running time of fnmatch.filter and fnmatch.translate (the previous proposal is no longer correct since fnmatch(3) does not produce the same results as Python) by 1.1x and 6.5x respectively.

Machine specs:

OS: openSUSE Leap 15.5 x86_64
Kernel: 5.14.21-150500.55.68-default
CPU: 13th Gen Intel i9-13980HX (32) @ 5.400GHz
gcc (SUSE Linux) 7.5.0

`fnmatch.translate`

Without --with-pydebug

+------------------------------------------+-----------------------+-----------------------+
| Benchmark                                | fnmatch-translate-ref | fnmatch-translate-c   |
+==========================================+=======================+=======================+
| abc/[!]b-ac-z9-1]/def/\*?/*/**c/?*[][!]  | 6.09 us               | 1.26 us: 4.84x faster |
+------------------------------------------+-----------------------+-----------------------+
| !abc/[!]b-ac-z9-1]/def/\*?/*/**c/?*[][!] | 6.40 us               | 1.29 us: 4.97x faster |
+------------------------------------------+-----------------------+-----------------------+
| a**?**cd**?**??k***                      | 2.17 us               | 303 ns: 7.16x faster  |
+------------------------------------------+-----------------------+-----------------------+
| a/**/b/**/c                              | 2.00 us               | 286 ns: 7.00x faster  |
+------------------------------------------+-----------------------+-----------------------+
| man/man1/bash.1                          | 3.04 us               | 459 ns: 6.63x faster  |
+------------------------------------------+-----------------------+-----------------------+
| a*b*c*d*e*f*g*h*i*j*k*l*m*n*o**          | 5.14 us               | 685 ns: 7.50x faster  |
+------------------------------------------+-----------------------+-----------------------+
| Geometric mean                           | (ref)                 | 6.26x faster          |
+------------------------------------------+-----------------------+-----------------------+

With --enable-optimizations

+------------------------------------------+-----------------------+-----------------------+
| Benchmark                                | fnmatch-translate-ref | fnmatch-translate-c   |
+==========================================+=======================+=======================+
| abc/[!]b-ac-z9-1]/def/\*?/*/**c/?*[][!]  | 6.15 us               | 1.21 us: 5.07x faster |
+------------------------------------------+-----------------------+-----------------------+
| !abc/[!]b-ac-z9-1]/def/\*?/*/**c/?*[][!] | 6.45 us               | 1.24 us: 5.19x faster |
+------------------------------------------+-----------------------+-----------------------+
| a**?**cd**?**??k***                      | 2.15 us               | 287 ns: 7.49x faster  |
+------------------------------------------+-----------------------+-----------------------+
| a/**/b/**/c                              | 1.94 us               | 269 ns: 7.21x faster  |
+------------------------------------------+-----------------------+-----------------------+
| man/man1/bash.1                          | 3.00 us               | 427 ns: 7.03x faster  |
+------------------------------------------+-----------------------+-----------------------+
| a*b*c*d*e*f*g*h*i*j*k*l*m*n*o**          | 5.07 us               | 636 ns: 7.98x faster  |
+------------------------------------------+-----------------------+-----------------------+
| Geometric mean                           | (ref)                 | 6.56x faster          |
+------------------------------------------+-----------------------+-----------------------+

Benchmark script

import pyperf

import _fnmatch
from fnmatch import _translate, _join_translated_parts

def translate_py(pattern):
    STAR = object()
    parts = _translate(pattern, STAR, '.')
    return _join_translated_parts(parts, STAR)

def translate_ref(loops, pattern):
    __ = range(loops)
    t0 = pyperf.perf_counter()
    for _ in __:
        translated = translate_py(pattern)
    return pyperf.perf_counter() - t0

def translate_c(loops, pattern):
    __ = range(loops)
    t0 = pyperf.perf_counter()
    for _ in __:
        res = _fnmatch.translate(pattern)
    return pyperf.perf_counter() - t0

PATTERNS = [
    'abc/[!]b-ac-z9-1]/def/\\*?/*/**c/?*[][!]',
    '!abc/[!]b-ac-z9-1]/def/\\*?/*/**c/?*[][!]',
    'a**?**cd**?**??k***',
    'a/**/b/**/c',
    'man/man1/bash.1',
    'a*b*c*d*e*f*g*h*i*j*k*l*m*n*o**',
]

def bench(runner, func):
    for pattern in PATTERNS:
        runner.bench_time_func(pattern, func, pattern)

def add_cmdline_args(cmd, args):
    cmd.append(args.impl)

if __name__ == '__main__':
    runner = pyperf.Runner(add_cmdline_args=add_cmdline_args)
    runner.argparser.add_argument('impl', choices=['ref', 'c'])
    args = runner.parse_args()

    if args.impl == 'ref':
        bench(runner, translate_ref)
    elif args.impl == 'c':
        bench(runner, translate_c)

`fnmatch.filter` (POSIX)

Here are the true timings for using fnmatch.filter. I think we can change the Python implementation as well but I did not bother doing it in the PR (too lazy...). By the way, the timings are for POSIX platforms. On non-posix platforms os.path.normcase is called on every name before performing the match, but if the POSIX implementation is faster, I doubt the non-posix one would be slower (the code is essentially the same). I did not include the DEBUG builds since they are not really of interest to us.

Without --with-pydebug

+-------------------+--------------------+-----------------------+-----------------------+
| Benchmark         | fnmatch-filter-ref | fnmatch-filter-py     | fnmatch-filter-c      |
+===================+====================+=======================+=======================+
| small_match_none  | 353 ns             | 432 ns: 1.22x slower  | 383 ns: 1.08x slower  |
+-------------------+--------------------+-----------------------+-----------------------+
| small_match_some  | 429 ns             | 480 ns: 1.12x slower  | 440 ns: 1.03x slower  |
+-------------------+--------------------+-----------------------+-----------------------+
| small_match_all   | 485 ns             | 525 ns: 1.08x slower  | not significant       |
+-------------------+--------------------+-----------------------+-----------------------+
| medium_match_none | 2.46 us            | 2.36 us: 1.04x faster | 2.28 us: 1.08x faster |
+-------------------+--------------------+-----------------------+-----------------------+
| medium_match_some | 3.22 us            | 2.90 us: 1.11x faster | 2.76 us: 1.16x faster |
+-------------------+--------------------+-----------------------+-----------------------+
| medium_match_all  | 3.89 us            | 3.51 us: 1.11x faster | 3.33 us: 1.17x faster |
+-------------------+--------------------+-----------------------+-----------------------+
| large_match_none  | 25.5 us            | 22.6 us: 1.13x faster | 22.1 us: 1.16x faster |
+-------------------+--------------------+-----------------------+-----------------------+
| large_match_some  | 32.3 us            | 27.5 us: 1.18x faster | 26.8 us: 1.20x faster |
+-------------------+--------------------+-----------------------+-----------------------+
| large_match_all   | 38.8 us            | 33.0 us: 1.18x faster | 32.1 us: 1.21x faster |
+-------------------+--------------------+-----------------------+-----------------------+
| Geometric mean    | (ref)              | 1.03x faster          | 1.09x faster          |
+-------------------+--------------------+-----------------------+-----------------------+

With --enable-optimizations

+-------------------+--------------------+-----------------------+-----------------------+
| Benchmark         | fnmatch-filter-ref | fnmatch-filter-py     | fnmatch-filter-c      |
+===================+====================+=======================+=======================+
| small_match_none  | 355 ns             | 427 ns: 1.20x slower  | 364 ns: 1.03x slower  |
+-------------------+--------------------+-----------------------+-----------------------+
| small_match_some  | 437 ns             | 475 ns: 1.09x slower  | 427 ns: 1.02x faster  |
+-------------------+--------------------+-----------------------+-----------------------+
| small_match_all   | 476 ns             | 520 ns: 1.09x slower  | 468 ns: 1.02x faster  |
+-------------------+--------------------+-----------------------+-----------------------+
| medium_match_none | 2.50 us            | 2.37 us: 1.05x faster | 2.20 us: 1.14x faster |
+-------------------+--------------------+-----------------------+-----------------------+
| medium_match_some | 3.17 us            | 2.79 us: 1.14x faster | 2.71 us: 1.17x faster |
+-------------------+--------------------+-----------------------+-----------------------+
| medium_match_all  | 3.68 us            | 3.29 us: 1.12x faster | 3.11 us: 1.18x faster |
+-------------------+--------------------+-----------------------+-----------------------+
| large_match_none  | 24.9 us            | 21.5 us: 1.15x faster | 20.8 us: 1.20x faster |
+-------------------+--------------------+-----------------------+-----------------------+
| large_match_some  | 30.6 us            | 24.8 us: 1.23x faster | 24.5 us: 1.25x faster |
+-------------------+--------------------+-----------------------+-----------------------+
| large_match_all   | 35.8 us            | 29.6 us: 1.21x faster | 27.6 us: 1.30x faster |
+-------------------+--------------------+-----------------------+-----------------------+
| Geometric mean    | (ref)              | 1.06x faster          | 1.13x faster          |
+-------------------+--------------------+-----------------------+-----------------------+

Benchmark script

import pyperf

import _fnmatch
import os
from fnmatch import _compile_pattern

def fnmatch_ref(loops, names, pat0):
    __ = range(loops)
    t0 = pyperf.perf_counter()
    for _ in __:
        result = []
        pat = os.path.normcase(pat0)
        match = _compile_pattern(pat)
        for name in names:
            if match(name):
                result.append(name)
    return pyperf.perf_counter() - t0

def fnmatch_py(loops, names, pat0):
    __ = range(loops)
    t0 = pyperf.perf_counter()
    for _ in __:
        pat = os.path.normcase(pat0)
        match = _compile_pattern(pat)
        res = list(filter(match, names))
    return pyperf.perf_counter() - t0

def fnmatch_c(loops, names, pat0):
    __ = range(loops)
    t0 = pyperf.perf_counter()
    for _ in __:
        res = _fnmatch.filter(names, pat0)
    return pyperf.perf_counter() - t0

def bench(runner, func):
    base_data = ['Python', 'Ruby', 'Perl', 'Tcl']
    multipliers = [('small', 1), ('medium', 10), ('large', 100)]
    patterns = [('none', 'A*'), ('some', 'P*'), ('all', '*')]
    for data_label, data_size in multipliers:
        for pat_label, pattern in patterns:
            name = f'{data_label}_match_{pat_label}'
            data = base_data * data_size
            _compile_pattern.cache_clear()
            runner.bench_time_func(name, func, data, pattern)

def add_cmdline_args(cmd, args):
    cmd.append(args.impl)

if __name__ == '__main__':
    runner = pyperf.Runner(add_cmdline_args=add_cmdline_args)
    runner.argparser.add_argument('impl', choices=['ref', 'py', 'c'])
    args = runner.parse_args()

    if args.impl == 'ref':
        bench(runner, fnmatch_ref)
    elif args.impl == 'py':
        bench(runner, fnmatch_py)
    elif args.impl == 'c':
        bench(runner, fnmatch_c)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add C accelerator for `fnmatch` #121445

Enhancement

`fnmatch.translate`

`fnmatch.filter` (POSIX)

Linked PRs

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Search code, repositories, users, issues, pull requests...

Uh oh!

Add C accelerator for fnmatch #121445

Description

Enhancement

fnmatch.translate

fnmatch.filter (POSIX)

Linked PRs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Add C accelerator for `fnmatch` #121445

`fnmatch.translate`

`fnmatch.filter` (POSIX)