Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion 5 Lib/platform.py
Original file line number Diff line number Diff line change
Expand Up @@ -194,7 +194,10 @@ def libc_ver(executable=sys.executable,lib='',version='', chunksize=2048):
binary = f.read(chunksize)
pos = 0
while pos < len(binary):
m = _libc_search.search(binary,pos)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow. That must have been slow.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I'm disappointed by the bad performance of re.search(). For example, re should faster since it is supposed to search for "GLIB" and "libc" patterns "at the same time". For example, it could use two bloom filters at the "same time". But no, it's 16x faster. I don't get it, but I never looked into _sre.c.

if 'libc' in binary or 'GLIBC' in binary:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use find()?

Suggested change
if 'libc' in binary or 'GLIBC' in binary:
if binary.find('libc', pos) >= 0 or binary.find('GLIBC', pos) >= 0:

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between ('libc' in binary) and (binary.find('libc', pos) >= 0), they are supposed to be equavalent, no? Last time I looked at micro-optimization, an operator was faster than a method call.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are equivalent only when pos == 0. If pos != 0, it may be that 'libc' in binary is True while binary.find('libc', pos) >= 0 is False. For example if binary is 'libc' + 'x'*1000000 and pos >= 4.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you know that the regex will never match before offset N, maybe we use file.seek(N)? I don't know where the string is supposed to match, so I prefer to avoid to make any assumption.

... By the way, parsing a binary file to find a string, to extract a version number is really ugly. I would prefer that the libc provides its own version at runtime.

IMHO running "ldd --version" or directly "/lib64/libc.so.6" would be less ugly:

$  ldd --version
ldd (GNU libc) 2.28
...

$ /lib64/libc.so.6 
GNU C Library (GNU libc) stable release version 2.28.
...

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternate suggestion:

Suggested change
if 'libc' in binary or 'GLIBC' in binary:
if pos or 'libc' in binary or 'GLIBC' in binary:

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the point of avoiding the two "in" if pos==0? Does it provide any speedup?

This code comes from the master branch. I have have a clever optimization, maybe write it in the master branch first, no?

This change already makes the function 16x faster, it should be enough no?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It avoids two "in" if pos != 0.

If pos == 0, we test just read block. In common case it doesn't contain 'libc', so this optimization makes sense. If pos != 0, then 'libc' was already found in this block, so 'libc' in binary will be always true, and performing this test just wastes a time.

m = _libc_search.search(binary, pos)
else:
m = None
if not m or m.end() == len(binary):
chunk = f.read(chunksize)
if chunk:
Expand Down
Morty Proxy This is a proxified and sanitized view of the page, visit original site.