Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Revert "[lit][clang] Avoid realpath on Windows due to MAX_PATH limitations" #139739

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
Loading
from

Conversation

guillem-bartrina-sonarsource
Copy link

@guillem-bartrina-sonarsource guillem-bartrina-sonarsource commented May 13, 2025

This reverts commit 05d613e.

Only one single test has been added after the reverted patch that makes use of the functionality it introduced.

  • The macro %{/t:real} in case-insensitive-include-absolute.c has been replaced by the good ol' %/t.

Original discussion: https://reviews.llvm.org/D154130


Although it has been almost two years since the patch was merged and apparently no major problems associated with it have arisen, we still believe that the patch is unsound overall.

Rationale:

The whole point of canonical paths is to be unique for a given entity (directory or file), so that they can be directly compared to determine whether or not two entities are the same.

This patch outright renders this fundamental property useless by allowing the same entity to have more than one canonical path on a Windows machine, depending on whether it is accessed via the real path or a substituted path.

And the only motivation behind these deceptively innocuous changes seems to be to help circumvent a platform-dependent limitation (MAX_PATH on Windows) that apparently makes it difficult for some users (?) to run lit tests. This limitation has workarounds (using directory junctions, enabling long paths or using UNC) that do not require changing llvm at all but simply some settings on the host machine. Furthermore, this limitation may no longer be relevant two years later.

We also believe that the original patch was a bit incomplete. It indirectly affected the resolution of canonical paths used by certain parts of other llvm subsystems (module maps, frameworks, ...), and were not thoroughly tested for regressions.

In reality, it is not easy to find a regression in llvm itself because the fundamental property of canonical paths is not widely exploited, except in some form of caches. But it is certainly exploited by many of the “users” of llvm.

Also, some simplifications in the resolution of canonical paths in (Text|SARIF)Diagnostic.cpp may have been overlooked.

But the biggest concern, aside from breaking the fundamental property, is that the patch also inadvertently changes the way certain symbolic links are resolved in Windows, particularly those between different drives. [If the path contains a symbolic link to another drive, the absolute and real paths have different drive letters, even though they are not substituted drives... ] This behavior change is not documented in the patch description so it is definitely a regression.

Finally, the last comment in the original discussion, which was posted after the merge, also raised concerns about the changes introduced by the patch.

In summary, we believe that this patch is more detrimental than beneficial and should be reverted.

Similar to the original patch, it is not easy to create any kind of test for this behavior change. Since we are reversing a change, the need for tests is not as relevant.

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@steakhal steakhal marked this pull request as ready for review May 13, 2025 14:55
@llvmbot llvmbot added cmake Build system in general and CMake in particular clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" llvm-lit testing-tools labels May 13, 2025
@llvmbot
Copy link
Member

llvmbot commented May 13, 2025

@llvm/pr-subscribers-testing-tools

@llvm/pr-subscribers-clang

Author: None (guillem-bartrina-sonarsource)

Changes

This reverts commit 05d613e.

Only one single test has been added after the reverted patch that makes use of the functionality it introduced.

  • The macro %{/t:real} in case-insensitive-include-absolute.c has been replaced by the good ol' %/t.

Original discussion: https://reviews.llvm.org/D154130


Although it has been almost two years since the patch was merged and apparently no major problems associated with it have arisen, we still believe that the patch is unsound overall.

Rationale:

The whole point of canonical paths is to be unique for a given entity (directory or file), so that they can be directly compared to determine whether or not two entities are the same.

This patch outright renders this fundamental property useless by allowing the same entity to have more than one canonical path on a Windows machine, depending on whether it is accessed via the real path or a substituted path.

And the only motivation behind these deceptively innocuous changes seems to be to help circumvent a platform-dependent limitation (MAX_PATH on Windows) that apparently makes it difficult for some users (?) to run on lit tests. This limitation has workarounds (using directory junctions, enabling long paths or using UNC) that do not require changing llvm at all but simply some settings on the host machine. Furthermore, this limitation may no longer be relevant two years later.

We also believe that the original patch was a bit incomplete. It indirectly affected the resolution of canonical paths used by certain parts of other llvm subsystems (module maps, frameworks, ...), and were not thoroughly tested for regressions.
> In reality, it is not easy to find a regression in llvm itself because the fundamental property of canonical paths is not widely exploited, except in some form of caches. But it is certainly exploited by many of the “users” of llvm.

Also, some simplifications in the resolution of canonical paths in (Text|SARIF)Diagnostic.cpp may have been overlooked.

But the biggest concern, aside from breaking the fundamental property, is that the patch also inadvertently changes the way certain symbolic links are resolved in Windows, particularly those between different drives. [If the path contains a symbolic link to another drive, the absolute and real paths have different drive letters, even though they are not substituted drives... ] This behavior change is not documented in the patch description so it is definitely a regression.

Finally, the last comment in the original discussion, which was posted after the merge, also raised concerns about the changes introduced by the patch.

In summary, we believe that this patch is more detrimental than beneficial and should be reverted.

As for the original patch, it is not easy to create any kind of test for this change in behavior. Since we are reverting a change, the need for tests is not so relevant.


Patch is 20.43 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/139739.diff

16 Files Affected:

  • (modified) clang/include/clang/Basic/FileManager.h (-7)
  • (modified) clang/lib/Basic/FileManager.cpp (+19-34)
  • (modified) clang/test/Lexer/case-insensitive-include-absolute.c (+2-2)
  • (modified) clang/test/Lexer/case-insensitive-include-win.c (+1-7)
  • (modified) llvm/cmake/modules/AddLLVM.cmake (+2-7)
  • (modified) llvm/docs/CommandGuide/lit.rst (-10)
  • (modified) llvm/docs/TestingGuide.rst (+2-12)
  • (modified) llvm/utils/lit/lit/TestRunner.py (+52-42)
  • (modified) llvm/utils/lit/lit/builtin_commands/diff.py (+1-1)
  • (modified) llvm/utils/lit/lit/discovery.py (+5-5)
  • (modified) llvm/utils/lit/lit/util.py (-17)
  • (modified) llvm/utils/lit/tests/Inputs/config-map-discovery/driver.py (+2-1)
  • (modified) llvm/utils/lit/tests/Inputs/config-map-discovery/lit.alt.cfg (+1-1)
  • (modified) llvm/utils/lit/tests/Inputs/use-llvm-tool-required/lit.cfg (+1-2)
  • (modified) llvm/utils/lit/tests/Inputs/use-llvm-tool/lit.cfg (+1-2)
  • (modified) llvm/utils/llvm-lit/llvm-lit.in (+1-1)
diff --git a/clang/include/clang/Basic/FileManager.h b/clang/include/clang/Basic/FileManager.h
index e83a61d6ff00c..769cfedae6018 100644
--- a/clang/include/clang/Basic/FileManager.h
+++ b/clang/include/clang/Basic/FileManager.h
@@ -319,13 +319,6 @@ class FileManager : public RefCountedBase<FileManager> {
   /// required, which is (almost) never.
   StringRef getCanonicalName(FileEntryRef File);
 
-private:
-  /// Retrieve the canonical name for a given file or directory.
-  ///
-  /// The first param is a key in the CanonicalNames array.
-  StringRef getCanonicalName(const void *Entry, StringRef Name);
-
-public:
   void PrintStats() const;
 
   /// Import statistics from a child FileManager and add them to this current
diff --git a/clang/lib/Basic/FileManager.cpp b/clang/lib/Basic/FileManager.cpp
index 86fe352df0461..f0285f56c9476 100644
--- a/clang/lib/Basic/FileManager.cpp
+++ b/clang/lib/Basic/FileManager.cpp
@@ -618,48 +618,33 @@ void FileManager::GetUniqueIDMapping(
 }
 
 StringRef FileManager::getCanonicalName(DirectoryEntryRef Dir) {
-  return getCanonicalName(Dir, Dir.getName());
-}
+  auto Known = CanonicalNames.find(Dir);
+  if (Known != CanonicalNames.end())
+    return Known->second;
 
-StringRef FileManager::getCanonicalName(FileEntryRef File) {
-  return getCanonicalName(File, File.getName());
+  StringRef CanonicalName(Dir.getName());
+
+  SmallString<4096> CanonicalNameBuf;
+  if (!FS->getRealPath(Dir.getName(), CanonicalNameBuf))
+    CanonicalName = CanonicalNameBuf.str().copy(CanonicalNameStorage);
+
+  CanonicalNames.insert({Dir, CanonicalName});
+  return CanonicalName;
 }
 
-StringRef FileManager::getCanonicalName(const void *Entry, StringRef Name) {
+StringRef FileManager::getCanonicalName(FileEntryRef File) {
   llvm::DenseMap<const void *, llvm::StringRef>::iterator Known =
-      CanonicalNames.find(Entry);
+      CanonicalNames.find(File);
   if (Known != CanonicalNames.end())
     return Known->second;
 
-  // Name comes from FileEntry/DirectoryEntry::getName(), so it is safe to
-  // store it in the DenseMap below.
-  StringRef CanonicalName(Name);
-
-  SmallString<256> AbsPathBuf;
-  SmallString<256> RealPathBuf;
-  if (!FS->getRealPath(Name, RealPathBuf)) {
-    if (is_style_windows(llvm::sys::path::Style::native)) {
-      // For Windows paths, only use the real path if it doesn't resolve
-      // a substitute drive, as those are used to avoid MAX_PATH issues.
-      AbsPathBuf = Name;
-      if (!FS->makeAbsolute(AbsPathBuf)) {
-        if (llvm::sys::path::root_name(RealPathBuf) ==
-            llvm::sys::path::root_name(AbsPathBuf)) {
-          CanonicalName = RealPathBuf.str().copy(CanonicalNameStorage);
-        } else {
-          // Fallback to using the absolute path.
-          // Simplifying /../ is semantically valid on Windows even in the
-          // presence of symbolic links.
-          llvm::sys::path::remove_dots(AbsPathBuf, /*remove_dot_dot=*/true);
-          CanonicalName = AbsPathBuf.str().copy(CanonicalNameStorage);
-        }
-      }
-    } else {
-      CanonicalName = RealPathBuf.str().copy(CanonicalNameStorage);
-    }
-  }
+  StringRef CanonicalName(File.getName());
+
+  SmallString<4096> CanonicalNameBuf;
+  if (!FS->getRealPath(File.getName(), CanonicalNameBuf))
+    CanonicalName = CanonicalNameBuf.str().copy(CanonicalNameStorage);
 
-  CanonicalNames.insert({Entry, CanonicalName});
+  CanonicalNames.insert({File, CanonicalName});
   return CanonicalName;
 }
 
diff --git a/clang/test/Lexer/case-insensitive-include-absolute.c b/clang/test/Lexer/case-insensitive-include-absolute.c
index ef9e131c45df5..4274d05045ecf 100644
--- a/clang/test/Lexer/case-insensitive-include-absolute.c
+++ b/clang/test/Lexer/case-insensitive-include-absolute.c
@@ -1,8 +1,8 @@
 // REQUIRES: case-insensitive-filesystem
 
 // RUN: rm -rf %t && split-file %s %t
-// RUN: sed "s|DIR|%{/t:real}|g" %t/tu.c.in > %t/tu.c
-// RUN: %clang_cc1 -fsyntax-only %t/tu.c 2>&1 | FileCheck %s -DDIR=%{/t:real}
+// RUN: sed "s|DIR|%/t|g" %t/tu.c.in > %t/tu.c
+// RUN: %clang_cc1 -fsyntax-only %t/tu.c 2>&1 | FileCheck %s -DDIR=%/t
 
 //--- header.h
 //--- tu.c.in
diff --git a/clang/test/Lexer/case-insensitive-include-win.c b/clang/test/Lexer/case-insensitive-include-win.c
index ed722feb3d994..6a17d0b552448 100644
--- a/clang/test/Lexer/case-insensitive-include-win.c
+++ b/clang/test/Lexer/case-insensitive-include-win.c
@@ -2,15 +2,9 @@
 // This file should only include code that really needs a Windows host OS to
 // run.
 
-// Note: We must use the real path here, because the logic to detect case
-// mismatches must resolve the real path to figure out the original casing.
-// If we use %t and we are on a substitute drive S: mapping to C:\subst,
-// then we will compare "S:\test.dir\FOO.h" to "C:\subst\test.dir\foo.h"
-// and avoid emitting the diagnostic because the structure is different.
-
 // REQUIRES: system-windows
 // RUN: mkdir -p %t.dir
 // RUN: touch %t.dir/foo.h
-// RUN: not %clang_cl /FI \\?\%{t:real}.dir\FOO.h /WX -fsyntax-only %s 2>&1 | FileCheck %s
+// RUN: not %clang_cl /FI\\?\%t.dir\FOO.h /WX -fsyntax-only %s 2>&1 | FileCheck %s
 
 // CHECK: non-portable path to file '"\\?\
diff --git a/llvm/cmake/modules/AddLLVM.cmake b/llvm/cmake/modules/AddLLVM.cmake
index 52bf7ca0906a3..2360f72e59d08 100644
--- a/llvm/cmake/modules/AddLLVM.cmake
+++ b/llvm/cmake/modules/AddLLVM.cmake
@@ -1887,15 +1887,10 @@ endfunction()
 # use it and can't be in a lit module. Use with make_paths_relative().
 string(CONCAT LLVM_LIT_PATH_FUNCTION
   "# Allow generated file to be relocatable.\n"
-  "import os\n"
-  "import platform\n"
+  "from pathlib import Path\n"
   "def path(p):\n"
   "    if not p: return ''\n"
-  "    # Follows lit.util.abs_path_preserve_drive, which cannot be imported here.\n"
-  "    if platform.system() == 'Windows':\n"
-  "        return os.path.abspath(os.path.join(os.path.dirname(__file__), p))\n"
-  "    else:\n"
-  "        return os.path.realpath(os.path.join(os.path.dirname(__file__), p))\n"
+  "    return str((Path(__file__).parent / p).resolve())\n"
   )
 
 # This function provides an automatic way to 'configure'-like generate a file
diff --git a/llvm/docs/CommandGuide/lit.rst b/llvm/docs/CommandGuide/lit.rst
index 2a0ddd0ea04b4..46c5b9936239e 100644
--- a/llvm/docs/CommandGuide/lit.rst
+++ b/llvm/docs/CommandGuide/lit.rst
@@ -616,16 +616,6 @@ TestRunner.py:
  %/T                     %T but ``\`` is replaced by ``/``
  %{s:basename}           The last path component of %s
  %{t:stem}               The last path component of %t but without the ``.tmp`` extension (alias for %basename_t)
- %{s:real}               %s after expanding all symbolic links and substitute drives
- %{S:real}               %S after expanding all symbolic links and substitute drives
- %{p:real}               %p after expanding all symbolic links and substitute drives
- %{t:real}               %t after expanding all symbolic links and substitute drives
- %{T:real}               %T after expanding all symbolic links and substitute drives
- %{/s:real}              %/s after expanding all symbolic links and substitute drives
- %{/S:real}              %/S after expanding all symbolic links and substitute drives
- %{/p:real}              %/p after expanding all symbolic links and substitute drives
- %{/t:real}              %/t after expanding all symbolic links and substitute drives
- %{/T:real}              %/T after expanding all symbolic links and substitute drives
  %{/s:regex_replacement} %/s but escaped for use in the replacement of a ``s@@@`` command in sed
  %{/S:regex_replacement} %/S but escaped for use in the replacement of a ``s@@@`` command in sed
  %{/p:regex_replacement} %/p but escaped for use in the replacement of a ``s@@@`` command in sed
diff --git a/llvm/docs/TestingGuide.rst b/llvm/docs/TestingGuide.rst
index b6dda6a732405..3a555d6bea5aa 100644
--- a/llvm/docs/TestingGuide.rst
+++ b/llvm/docs/TestingGuide.rst
@@ -757,7 +757,7 @@ RUN lines:
 ``%{fs-sep}``
    Expands to the file system separator, i.e. ``/`` or ``\`` on Windows.
 
-``%/s, %/S, %/t, %/T``
+``%/s, %/S, %/t, %/T:``
 
   Act like the corresponding substitution above but replace any ``\``
   character with a ``/``. This is useful to normalize path separators.
@@ -766,17 +766,7 @@ RUN lines:
 
    Example: ``%/s: C:/Desktop Files/foo_test.s.tmp``
 
-``%{s:real}, %{S:real}, %{t:real}, %{T:real}``
-``%{/s:real}, %{/S:real}, %{/t:real}, %{/T:real}``
-
-  Act like the corresponding substitution, including with ``/``, but use
-  the real path by expanding all symbolic links and substitute drives.
-
-   Example: ``%s:  S:\foo_test.s.tmp``
-
-   Example: ``%{/s:real}: C:/SDrive/foo_test.s.tmp``
-
-``%:s, %:S, %:t, %:T``
+``%:s, %:S, %:t, %:T:``
 
   Act like the corresponding substitution above but remove colons at
   the beginning of Windows paths. This is useful to allow concatenation
diff --git a/llvm/utils/lit/lit/TestRunner.py b/llvm/utils/lit/lit/TestRunner.py
index 16e9c7fbf45c5..fbef1aa6ec9dc 100644
--- a/llvm/utils/lit/lit/TestRunner.py
+++ b/llvm/utils/lit/lit/TestRunner.py
@@ -96,7 +96,7 @@ def change_dir(self, newdir):
         if os.path.isabs(newdir):
             self.cwd = newdir
         else:
-            self.cwd = lit.util.abs_path_preserve_drive(os.path.join(self.cwd, newdir))
+            self.cwd = os.path.realpath(os.path.join(self.cwd, newdir))
 
 
 class TimeoutHelper(object):
@@ -455,7 +455,7 @@ def executeBuiltinMkdir(cmd, cmd_shenv):
         dir = to_unicode(dir) if kIsWindows else to_bytes(dir)
         cwd = to_unicode(cwd) if kIsWindows else to_bytes(cwd)
         if not os.path.isabs(dir):
-            dir = lit.util.abs_path_preserve_drive(os.path.join(cwd, dir))
+            dir = os.path.realpath(os.path.join(cwd, dir))
         if parent:
             lit.util.mkdir_p(dir)
         else:
@@ -501,7 +501,7 @@ def on_rm_error(func, path, exc_info):
         path = to_unicode(path) if kIsWindows else to_bytes(path)
         cwd = to_unicode(cwd) if kIsWindows else to_bytes(cwd)
         if not os.path.isabs(path):
-            path = lit.util.abs_path_preserve_drive(os.path.join(cwd, path))
+            path = os.path.realpath(os.path.join(cwd, path))
         if force and not os.path.exists(path):
             continue
         try:
@@ -1399,11 +1399,19 @@ def getDefaultSubstitutions(test, tmpDir, tmpBase, normalize_slashes=False):
     tmpBaseName = os.path.basename(tmpBase)
     sourceBaseName = os.path.basename(sourcepath)
 
-    substitutions.append(("%{pathsep}", os.pathsep))
-    substitutions.append(("%basename_t", tmpBaseName))
-
-    substitutions.append(("%{s:basename}", sourceBaseName))
-    substitutions.append(("%{t:stem}", tmpBaseName))
+    substitutions.extend(
+        [
+            ("%s", sourcepath),
+            ("%S", sourcedir),
+            ("%p", sourcedir),
+            ("%{pathsep}", os.pathsep),
+            ("%t", tmpName),
+            ("%basename_t", tmpBaseName),
+            ("%T", tmpDir),
+            ("%{s:basename}", sourceBaseName),
+            ("%{t:stem}", tmpBaseName)
+        ]
+    )
 
     substitutions.extend(
         [
@@ -1413,47 +1421,49 @@ def getDefaultSubstitutions(test, tmpDir, tmpBase, normalize_slashes=False):
         ]
     )
 
-    substitutions.append(("%/et", tmpName.replace("\\", "\\\\\\\\\\\\\\\\")))
+    # "%/[STpst]" should be normalized.
+    substitutions.extend(
+        [
+            ("%/s", sourcepath.replace("\\", "/")),
+            ("%/S", sourcedir.replace("\\", "/")),
+            ("%/p", sourcedir.replace("\\", "/")),
+            ("%/t", tmpBase.replace("\\", "/") + ".tmp"),
+            ("%/T", tmpDir.replace("\\", "/")),
+            ("%/et", tmpName.replace("\\", "\\\\\\\\\\\\\\\\")),
+        ]
+    )
 
+    # "%{/[STpst]:regex_replacement}" should be normalized like "%/[STpst]" but we're
+    # also in a regex replacement context of a s@@@ regex.
     def regex_escape(s):
         s = s.replace("@", r"\@")
         s = s.replace("&", r"\&")
         return s
 
-    path_substitutions = [
-        ("s", sourcepath), ("S", sourcedir), ("p", sourcedir),
-        ("t", tmpName), ("T", tmpDir)
-    ]
-    for path_substitution in path_substitutions:
-        letter = path_substitution[0]
-        path = path_substitution[1]
-
-        # Original path variant
-        substitutions.append(("%" + letter, path))
-
-        # Normalized path separator variant
-        substitutions.append(("%/" + letter, path.replace("\\", "/")))
-
-        # realpath variants
-        # Windows paths with substitute drives are not expanded by default
-        # as they are used to avoid MAX_PATH issues, but sometimes we do
-        # need the fully expanded path.
-        real_path = os.path.realpath(path)
-        substitutions.append(("%{" + letter + ":real}", real_path))
-        substitutions.append(("%{/" + letter + ":real}",
-            real_path.replace("\\", "/")))
-
-        # "%{/[STpst]:regex_replacement}" should be normalized like
-        # "%/[STpst]" but we're also in a regex replacement context
-        # of a s@@@ regex.
-        substitutions.append(
-            ("%{/" + letter + ":regex_replacement}",
-            regex_escape(path.replace("\\", "/"))))
-
-        # "%:[STpst]" are normalized paths without colons and without
-        # a leading slash.
-        substitutions.append(("%:" + letter, colonNormalizePath(path)))
+    substitutions.extend(
+        [
+            ("%{/s:regex_replacement}", regex_escape(sourcepath.replace("\\", "/"))),
+            ("%{/S:regex_replacement}", regex_escape(sourcedir.replace("\\", "/"))),
+            ("%{/p:regex_replacement}", regex_escape(sourcedir.replace("\\", "/"))),
+            (
+                "%{/t:regex_replacement}",
+                regex_escape(tmpBase.replace("\\", "/")) + ".tmp",
+            ),
+            ("%{/T:regex_replacement}", regex_escape(tmpDir.replace("\\", "/"))),
+        ]
+    )
 
+    # "%:[STpst]" are normalized paths without colons and without a leading
+    # slash.
+    substitutions.extend(
+        [
+            ("%:s", colonNormalizePath(sourcepath)),
+            ("%:S", colonNormalizePath(sourcedir)),
+            ("%:p", colonNormalizePath(sourcedir)),
+            ("%:t", colonNormalizePath(tmpBase + ".tmp")),
+            ("%:T", colonNormalizePath(tmpDir)),
+        ]
+    )
     return substitutions
 
 
diff --git a/llvm/utils/lit/lit/builtin_commands/diff.py b/llvm/utils/lit/lit/builtin_commands/diff.py
index fbf4eb0e137b3..3a91920f9b5ed 100644
--- a/llvm/utils/lit/lit/builtin_commands/diff.py
+++ b/llvm/utils/lit/lit/builtin_commands/diff.py
@@ -281,7 +281,7 @@ def main(argv):
     try:
         for file in args:
             if file != "-" and not os.path.isabs(file):
-                file = util.abs_path_preserve_drive(file)
+                file = os.path.realpath(os.path.join(os.getcwd(), file))
 
             if flags.recursive_diff:
                 if file == "-":
diff --git a/llvm/utils/lit/lit/discovery.py b/llvm/utils/lit/lit/discovery.py
index 2e7f90c6bb0c9..a0a2549d8df99 100644
--- a/llvm/utils/lit/lit/discovery.py
+++ b/llvm/utils/lit/lit/discovery.py
@@ -7,7 +7,7 @@
 import sys
 
 from lit.TestingConfig import TestingConfig
-from lit import LitConfig, Test, util
+from lit import LitConfig, Test
 
 
 def chooseConfigFileFromDir(dir, config_names):
@@ -56,7 +56,7 @@ def search1(path):
         # configuration to load instead.
         config_map = litConfig.params.get("config_map")
         if config_map:
-            cfgpath = util.abs_path_preserve_drive(cfgpath)
+            cfgpath = os.path.realpath(cfgpath)
             target = config_map.get(os.path.normcase(cfgpath))
             if target:
                 cfgpath = target
@@ -67,13 +67,13 @@ def search1(path):
 
         cfg = TestingConfig.fromdefaults(litConfig)
         cfg.load_from_path(cfgpath, litConfig)
-        source_root = util.abs_path_preserve_drive(cfg.test_source_root or path)
-        exec_root = util.abs_path_preserve_drive(cfg.test_exec_root or path)
+        source_root = os.path.realpath(cfg.test_source_root or path)
+        exec_root = os.path.realpath(cfg.test_exec_root or path)
         return Test.TestSuite(cfg.name, source_root, exec_root, cfg), ()
 
     def search(path):
         # Check for an already instantiated test suite.
-        real_path = util.abs_path_preserve_drive(path)
+        real_path = os.path.realpath(path)
         res = cache.get(real_path)
         if res is None:
             cache[real_path] = res = search1(path)
diff --git a/llvm/utils/lit/lit/util.py b/llvm/utils/lit/lit/util.py
index b03fd8bc22693..bd32d760f8ff1 100644
--- a/llvm/utils/lit/lit/util.py
+++ b/llvm/utils/lit/lit/util.py
@@ -128,23 +128,6 @@ def usable_core_count():
 
     return n
 
-def abs_path_preserve_drive(path):
-    """Return the absolute path without resolving drive mappings on Windows.
-
-    """
-    if platform.system() == "Windows":
-        # Windows has limitations on path length (MAX_PATH) that
-        # can be worked around using substitute drives, which map
-        # a drive letter to a longer path on another drive.
-        # Since Python 3.8, os.path.realpath resolves sustitute drives,
-        # so we should not use it. In Python 3.7, os.path.realpath
-        # was implemented as os.path.abspath.
-        return os.path.abspath(path)
-    else:
-        # On UNIX, the current directory always has symbolic links resolved,
-        # so any program accepting relative paths cannot preserve symbolic
-        # links in paths and we should always use os.path.realpath.
-        return os.path.realpath(path)
 
 def mkdir(path):
     try:
diff --git a/llvm/utils/lit/tests/Inputs/config-map-discovery/driver.py b/llvm/utils/lit/tests/Inputs/config-map-discovery/driver.py
index 50ad2cd65cb1b..d22b84d6a15cf 100644
--- a/llvm/utils/lit/tests/Inputs/config-map-discovery/driver.py
+++ b/llvm/utils/lit/tests/Inputs/config-map-discovery/driver.py
@@ -2,7 +2,8 @@
 import os
 import sys
 
-main_config = lit.util.abs_path_preserve_drive(sys.argv[1])
+main_config = sys.argv[1]
+main_config = os.path.realpath(main_config)
 main_config = os.path.normcase(main_config)
 
 config_map = {main_config: sys.argv[2]}
diff --git a/llvm/utils/lit/tests/Inputs/config-map-discovery/lit.alt.cfg b/llvm/utils/lit/tests/Inputs/config-map-discovery/lit.alt.cfg
index 27860e192fdc1..c7b303f50a05c 100644
--- a/llvm/utils/lit/tests/Inputs/config-map-discovery/lit.alt.cfg
+++ b/llvm/utils/lit/tests/Inputs/config-map-discovery/lit.alt.cfg
@@ -5,5 +5,5 @@ config.suffixes = ['.txt']
 config.test_format = lit.formats.ShTest()
 
 import os
-config.test_exec_root = os.path.dirname(lit.util.abs_path_preserve_drive(__file__))
+config.test_exec_root = os.path.realpath(os.path.dirname(__file__))
 config.test_source_root = os.path.join(config.test_exec_root, "tests")
diff --git a/llvm/utils/lit/tests/Inputs/use-llvm-tool-required/lit.cfg b/llvm/utils/lit/tests/Inputs/use-llvm-tool-required/lit.cfg
index 5062e38c82f14..e41207bc2f05d 100644
--- a/llvm/utils/lit/tests/Inputs/use-llvm-tool-required/lit.cfg
+++ b/llvm/utils/lit/tests/Inputs/use-llvm-tool-required/lit.cfg
@@ -1,5 +1,4 @@
 import lit.formats
-import lit.util
 
 config.name = "use-llvm-tool-required"
 config.suffixes = [".txt"]
@@ -8,7 +7,7 @@ config.test_source_root = None
 config.test_exec_root = None
 import os.path
 
-config.llvm_tools_dir = os.path.dirname(lit.util.abs_path_preserve_drive(__file__))
+config.llvm_tools_dir = os.path.realpath(os.path.dirname(__file__))
 import lit.llvm
 
 lit.llvm.initialize(lit_config, config)
diff --git a/llvm/utils/lit/tests/Inputs/use-llvm-tool/lit.cfg b/llvm/utils/lit/tests/Inputs/use-llvm-tool/lit.cfg
index 2a52db2183edf..8fe62d98c1349 100644
--- a/llvm/utils/lit/tests/Inputs/use-llvm-tool/lit.cfg
+++ b/llvm/utils/lit/tests/Inputs/use-llvm-tool/lit.cfg
@@ -1,5 +1,4 @@
 import lit.formats
-import lit.util
 
 config.name = "use-llvm-tool"
 config.suffixes = [".txt"]
@@ -8,7 +7,7 @@ config.test_source_root = None
 config.test_exec_root = None
 import os.path
 
-this_dir = os.path.dirname(lit.util.abs_path_preserve_drive(__file__))
+this_dir = os.path.realpath(os.path.dirname(__file__))
 config.llvm_tools_dir = os.path.join(this_dir, "build")
 import lit.llvm
 
diff --git a/llvm/utils/llvm-lit/llvm-lit.in b/llvm/utils/llvm-lit/llvm-lit.in
index 0b6683...
[truncated]

Copy link

github-actions bot commented May 13, 2025

✅ With the latest revision this PR passed the Python code formatter.

@compnerd
Copy link
Member

The original patch is needed and is actively in use. The intention is to be able to support testing of the LLVM (and Swift) test suites where the paths are substituted away. The tiny path limit in Win32 is a problem, and creating a drive substitution is a reasonable solution here. Without this change, a file on a substituted drive will have two separate paths.

@michael-jabbour-sonarsource
Copy link
Contributor

michael-jabbour-sonarsource commented May 14, 2025

Hello @compnerd, and thank you for sharing the information and for the prompt response.

Without this change, a file on a substituted drive will have two separate paths.

I am not sure I understand this part. For me, a file on a substituted drive (or under a symlink) may always have multiple different valid paths to reference it. What is important, is that it always has a single canonical path. This is the reason I don’t fully see “root-preserving canonicalization” as a form of a “canonical path” anymore. To summarize my concerns:

  1. The patch seems to work slightly differently from what is described in the patch description:

    1. It doesn't only avoid resolving the mapped drive as claimed, but it also skips the symlink resolution in mapped drives (it only removes dots in that case).
    2. The impact above is not only limited to substituted/mapped drives, but to any case where the roots are not identical (for example, also in cross-drive links).
  2. It affects clang’s usability as a library. I feel that such fallback should be applied on a higher-level API, and ideally should impact only lit tests (as they are the motivation stated in the patch). For example, by looking at usages in clang, I could find that many of them still assume that FileManager::getCanonicalName resolves symlinks (at least by looking at the comments):

    1. clangd’s SourceCode::getCanonicalPath which claims to resolve symlinks in its docs. IIUC, it no longer does so if the code is checked out in a substituted drive (see here).
    2. Inside clang itself, there is code that handles directory resolution in module map and framework header search (for example here and here). This may not impact Windows users, but I am not entirely sure.
  3. Skipping symlink resolution makes it possible for getCanonicalName to return different paths for the same file (breaking the canonical property). Note that the patch, not only alters the behavior for users of clang, but also breaks the lit setup for some other clang developers on Windows (see here), and the stated motivation IIUC is to work around the MAX_PATH limit for other Windows developers' convenience.

Basically, we have concerns about the correctness of all existing usages of getCanonicalName (inside and outside clang). It also impacted us as users of clang as a library. If the motivation is only to make running lit tests more convenient w.r.t. MAX_PATH, I can see a few alternatives:

  1. Try to apply such a fix on a higher-level API where needed, to avoid unexpected impact? I didn't try to look for such an API myself yet. Do you know if this been explored while working on the previous patch?

  2. Switch to a solution that adapts the test as needed without altering functionality (e.g. prior to what was explored after this comment)

  3. Revert the patch, and switch to a user-specific workaround which does not require changing LLVM (as suggested in this PR).

Any thoughts about this are appreciated.

Best regards,
Michael

@compnerd
Copy link
Member

The patch is meant to support the following pattern (which IMO is actually the proper way to deal with the path canonicalisation):

subst X: %UserProfile%\SourceCache
X:
cmake -B X:\b -G Ninja -S llvm-project\llvm
ninja -C \b

Everything is being built in S:, there is no reason that the path should be "canonicalised" to %UserProfile%\SourceCache\llvm-project\llvm\... instead of X:\llvm-project which is what you told the tools the path is.

I do not remember the details off hand, but IIRC, the symbolic link evaluation would implicitly traverse the substitution and give you the path of the storage rather than the actual disk path. Unlikely Unix, Windows does not have a single path tree and therefore the assumptions around paths do not hold up nearly as well in clang.

This isn't a higher level construction issue - the problem is the path representation in the internals of clang. It should be able to remember that the paths we are working with are based on a particular drive rather than trying to create a Unix-like approach of a singular path tree.

I do not know if there is an API that will let you resolve a path into a particular drive, but if so, that would likely be the proper tool to use here.

@tristanlabelle
Copy link

Hi there, I'm the author of the original change.

The whole point of canonical paths is to be unique for a given entity (directory or file), so that they can be directly compared to determine whether or not two entities are the same.

I agree with that intended definition, but that is not the usage in practice. Code will arbitrarily canonicalize some paths and keep going, rather than using canonicalized paths only for comparisons. If I recall, we ended up with different clang output files (source map files?) encoding the same info but canonicalized inconsistently in a way that was not solvable.

We need to use substitute drives to avoid MAX_PATH issues because llvm's build and test directories get very deep. MAX_PATH is a hopeless area of Windows programming. The "enable long paths" setting sounds like a silver bullet but it is not. It's not possible to set it in all environments since it's machine-wide, and it only applies to applications declared to be long-path aware, which is not the norm.

If I recall, the current implementation has issues, but they would be resolved by making it more low level: In the presence of chains of filesystem decorators, we'll undo the real-path mapping of the entire chain if it changes the drive, whereas we just want the real filesystem at the bottom of the chain to be decorated to avoid changing the drive.

I would love this change to not be needed and for Windows to grow up regarding paths, but I fully expect things (at least lit tests) on Windows to break again if we revert it.

@michael-jabbour-sonarsource
Copy link
Contributor

michael-jabbour-sonarsource commented May 14, 2025

Thank you very much @compnerd and @tristanlabelle for the information. I am writing my thoughts below:

Everything is being built in S:, there is no reason that the path should be "canonicalised" to %UserProfile%\SourceCache\llvm-project\llvm... instead of X:\llvm-project which is what you told the tools the path is.

I feel I am missing why being inside X: is any different from being inside a symlink to another directory. For me, both cases should be treated similarly: The canonical path of a file is the one inside %UserProfile%. The parts of the compiler that don't need the canonical path should always see X:, whereas the ones that ask for the canonical path should see %UserProfile%. Could you clarify why should a substituted drive be treated differently from any symlink?

I agree with that intended definition, but that is not the usage in practice. Code will arbitrarily canonicalize some paths and keep going, rather than using canonicalized paths only for comparisons.

I agree with this, and for me it sounds more like it is a problem in how the function is used, not how it behaves. The fix should be to go through the usages and make sure they really need to canonicalize IMO.

I do not know if there is an API that will let you resolve a path into a particular drive, but if so, that would likely be the proper tool to use here.

In the presence of chains of filesystem decorators, we'll undo the real-path mapping of the entire chain if it changes the drive, whereas we just want the real filesystem at the bottom of the chain to be decorated to avoid changing the drive.

IIUC, both these suggestions aim to address some of the concerns in my previous response. This might be an improvement over the existing patch (as it would start resolving symlinks inside the drive). However, I would still be concerned about calling such functionality a "canonical" path, as it doesn't satisfy the canonical property: One may still get different canonical paths for the same file, depending on whether they access %UserProfile%/SourceCache/file or X:/file. In other words, I don't agree with calling this "root-preserving transformation" a "canonicalization". Could you clarify if I am missing something here?

Best regards,
Michael

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category cmake Build system in general and CMake in particular llvm-lit testing-tools
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.