gh-127146: Skip test_open_undecodable_uri on Emscripten#136510
gh-127146: Skip test_open_undecodable_uri on Emscripten#136510hoodmane wants to merge 1 commit intopython:mainpython/cpython:mainfrom hoodmane:emscripten-skip-test_open_undecodable_urihoodmane/cpython:emscripten-skip-test_open_undecodable_uriCopy head branch name to clipboard
Conversation
PR python#136326 removed the Emscripten skip for this file but it is still broken.
serhiy-storchaka
left a comment
There was a problem hiding this comment.
I am surprised. Is test_open_with_undecodable_path passed?
|
@serhiy-storchaka Both FWIW - in both cases, the test path being used is When I run this locally, I get a warning in the console: which suggests to me that the bad unicode isn't round-tripping consistently through the filesystem. I agree that this should be a "skip both or skip neither" situation; and I'd definitely prefer to understand better why it needs to be skipped explicitly before restoring the skip. |
|
I suppose that paths are Unicode strings on Emscripten. Python and SQLite can use different ways to decode bytes path to Unicode, so This is similar to Windows where we need to keep a separate skip. |
|
Well |
Interesting. Locally only |
|
I changed the test to this and added syscall tracing: def get_undecodable_path(self):
path = TESTFN_UNDECODABLE
print("open", path)
f = open(path, 'wb')
print("close", path)
f.close()
print("unlink", path)
unlink(path)
return path
@unittest.skipIf(sys.platform == "win32", "skipped on Windows")
def test_open_with_undecodable_path(self):
path = self.get_undecodable_path()
self.addCleanup(unlink, path)
print("sqlite.connect", path)
c = sqlite.connect(path)
with contextlib.closing(c) as cx:
print("exists", path)
exists = os.path.exists(path)
print(" .. ", exists)
self.assertTrue(exists)The relevant part of the log looks like this: |
|
Okay I got it: the problem is that // When using conditional TextDecoder, skip it for short strings as the overhead of the native call is not worth it.
if (endPtr - idx > 16 && heapOrArray.buffer && UTF8Decoder) {
return UTF8Decoder.decode({{{ getUnsharedTextDecoderView('heapOrArray', 'idx', 'endPtr') }}});
}https://github.com/emscripten-core/emscripten/blob/main/src/lib/libstrings.js#L57-L59 So when the file system decodes the string, it checks the length. If it's short, it uses the JS decoder, if it's long it uses the native decoder. sqlite fully resolves the path before opening the file, and the absolute path is longer than 16 bytes. Whereas the not fully resolved path is 15 bytes long and gets the correct slow path. Adding two extra bytes to |
|
Upstream report: |
PR #136326 removed the Emscripten skip for this file but it is still broken.