Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Revisit adding lexical normalization support to pathlib #124825

Copy link
Copy link
Open
@ncoghlan

Description

@ncoghlan
Issue body actions

Feature or enhancement

Proposal:

I'd like to add a resolve_lexical method to concrete pathlib objects:

def resolve_lexical(self, /, strict=False):
    """Make the path absolute, and also normalize it, *without* resolving symlinks."""

As with resolve(), if strict is True and any segment of the given path doesn't exist, FileNotFoundError is raised (so /some/dir/_nonexistent_/.. would fail). If strict is False (the default), all path segments are processed without checking whether they exist (so /some/dir/_nonexistent_/.. lexically resolves to /some/dir).

While theoretically this could be added to PurePath without the strict option, I don't see any significant benefit to that (whereas I do see benefits to paralleling the Path.resolve() API as closely as possible).

As a minor note, adding this method would give a more direct way of checking if a path contains any symlinks at any level: path.resolve() == path.resolve_lexical() (vs the current path.is_symlink() or any(segment.is_symlink() for segment in path.parents)).

Chaining the two resolution methods would also be valid (path.resolve().resolve_lexical()), with symlinks then being resolved in the segments that actually exist, and the rest of the path, if any, being resolved lexically)


On my current project, I recently ran into a pair of subtle symlink-and-relative-path-handling bugs.

  • the original code used path.resolve() to fully resolve a path to its actual target. This gave a dynamic loading error on macOS, because one of the dynamic libraries the referenced executable needed was stored relative to the symlink, not relative to the actual binary. While that feels like a bug in the way the offending executable was packaged, handling it meant actively avoiding fully resolving paths, and instead respecting their nominal locations.
  • switching to path.absolute() not only turned off the symlink resolution, it also turned off the path normalisation that removes \..\ segments. This also resulted in dynamic loading errors on macOS when those segments were present in the executable reference (probably due to an underlying Python or OS API conditionally doing its own equivalent of path.resolve() when /../ was present in the path, since the resulting dynamic loading errors looked very similar to those I saw when hitting the first bug).

The resolution that handled both situations ended up being to use os.path to perform lexical normalisation (via os.path.abspath):

import os
import os.path

from pathlib import Path

def as_normalized_path(path:str|os.PathLike[str], /) -> Path:
    """Normalize given path and make it absolute, *without* resolving symlinks

    Expands user directory references, but *not* environment variable references.
    """
    # Ensure user directory references are handled as absolute paths
    expanded_path = os.path.expanduser(path)
    return Path(os.path.abspath(expanded_path))

Having to drop down to the lower level API to request "resolve /../ relative to the path as given" instead of the default "resolve /../ relative to symlink targets" feels like an unnecessary gap in the abstraction layer.

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

Previously suggested here: #83105

When it comes to symlink security vulnerabilities arising from parent directory traversal, they mostly relate to using symlink resolution to reach an unexpected location:

  • a human reading /some/dir/symlink/../sibling will expect it to refer to /some/dir/sibling (lexical normalization)
  • Path(/some/dir/symlink/../sibling).resolve() actually refers to /parent_of_symlink_target/sibling (i.e. you have no idea where it points without access to the filesystem state that specifies the destination of /some/dir/symlink)

As a result, the rationale for rejection doesn't feel strong to me (since the intuitive behaviour is unavailable in the high level API, and instead only the subtle system state dependent behaviour is offered)

Also noting that Java does offer lexical normalisation on its Path abstraction: https://docs.oracle.com/en/java/javase/23/docs/api/java.base/java/nio/file/Path.html#normalize()

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Morty Proxy This is a proxified and sanitized view of the page, visit original site.