Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

GH-102613: Fast recursive globbing in pathlib.Path.glob() #104512

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Jun 6, 2023
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Optimize walk-and-match logic.
  • Loading branch information
barneygale committed Jun 1, 2023
commit 14c6a587a22b2492438d1b5dbe90189bd8ee0c24
28 changes: 12 additions & 16 deletions 28 Lib/pathlib.py
Original file line number Diff line number Diff line change
Expand Up @@ -1058,7 +1058,7 @@ def _glob(self, pattern, case_sensitive, follow_symlinks):
# build a `re.Pattern` object. This pattern is used to filter the
# recursive walk. As a result, pattern parts following a '**' wildcard
# do not perform any filesystem access, which can be much faster!
filter_paths_supported = follow_symlinks is not None and '..' not in pattern_parts
filter_paths = follow_symlinks is not None and '..' not in pattern_parts
deduplicate_paths = False
paths = iter([self] if self.is_dir() else [])
part_idx = 0
Expand All @@ -1071,26 +1071,22 @@ def _glob(self, pattern, case_sensitive, follow_symlinks):
elif part == '..':
paths = (path._make_child_relpath('..') for path in paths)
elif part == '**':
filter_paths = False
if filter_paths_supported:
# Consume remaining path components, except trailing slash.
while part_idx < len(pattern_parts) and pattern_parts[part_idx] != '':
filter_paths = True
part_idx += 1
else:
# Consume adjacent '**' components.
while part_idx < len(pattern_parts) and pattern_parts[part_idx] == '**':
part_idx += 1
if filter_paths and part_idx < len(pattern_parts) and pattern_parts[part_idx] != '':
dir_only = pattern_parts[-1] == ''
paths = _select_recursive(paths, dir_only, follow_symlinks)

dir_only = part_idx < len(pattern_parts)
paths = _select_recursive(paths, dir_only, follow_symlinks)

if filter_paths:
# Filter out paths that don't match pattern.
prefix_len = len(self._make_child_relpath('_')._lines) - 1
match = _compile_pattern_lines(path_pattern._lines, case_sensitive).match
paths = (path for path in paths if match(path._lines[prefix_len:]))
elif deduplicate_paths:
return paths

# Consume adjacent '**' components.
while part_idx < len(pattern_parts) and pattern_parts[part_idx] == '**':
part_idx += 1
dir_only = part_idx < len(pattern_parts)
paths = _select_recursive(paths, dir_only, follow_symlinks)
if deduplicate_paths:
# De-duplicate if we've already seen a '**' component.
paths = _select_unique(paths)
deduplicate_paths = True
Expand Down
Morty Proxy This is a proxified and sanitized view of the page, visit original site.