Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit 1b22d80

Browse filesBrowse files
authored
gh-118518: Allow perf to work without frame pointers (#112254)
1 parent 999f0c5 commit 1b22d80
Copy full SHA for 1b22d80
Expand file treeCollapse file tree

19 files changed

+892
-39
lines changed

‎Doc/c-api/init_config.rst

Copy file name to clipboardExpand all lines: Doc/c-api/init_config.rst
+4-1Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1251,7 +1251,10 @@ PyConfig
12511251
for more information.
12521252
12531253
Set by :option:`-X perf <-X>` command line option and by the
1254-
:envvar:`PYTHONPERFSUPPORT` environment variable.
1254+
:envvar:`PYTHONPERFSUPPORT` environment variable for perf support
1255+
with stack pointers and :option:`-X perfjit <-X>` command line option
1256+
and by the :envvar:`PYTHONPERFJITSUPPORT` environment variable for perf
1257+
support with DWARF JIT information.
12551258
12561259
Default: ``-1``.
12571260

‎Doc/howto/perf_profiling.rst

Copy file name to clipboardExpand all lines: Doc/howto/perf_profiling.rst
+33Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -205,3 +205,36 @@ You can check if your system has been compiled with this flag by running::
205205
If you don't see any output it means that your interpreter has not been compiled with
206206
frame pointers and therefore it may not be able to show Python functions in the output
207207
of ``perf``.
208+
209+
210+
How to work without frame pointers
211+
----------------------------------
212+
213+
If you are working with a Python interpreter that has been compiled without frame pointers
214+
you can still use the ``perf`` profiler but the overhead will be a bit higher because Python
215+
needs to generate unwinding information for every Python function call on the fly. Additionally,
216+
``perf`` will take more time to process the data because it will need to use the DWARF debugging
217+
information to unwind the stack and this is a slow process.
218+
219+
To enable this mode, you can use the environment variable :envvar:`PYTHONPERFJITSUPPORT` or the
220+
:option:`-X perfjit <-X>` option, which will enable the JIT mode for the ``perf`` profiler.
221+
222+
When using the perf JIT mode, you need an extra step before you can run ``perf report``. You need to
223+
call the ``perf inject`` command to inject the JIT information into the ``perf.data`` file.
224+
225+
$ perf record -F 9999 -g --call-graph dwarf -o perf.data python -Xperfjit my_script.py
226+
$ perf inject -i perf.data --jit
227+
$ perf report -g -i perf.data
228+
229+
or using the environment variable::
230+
231+
$ PYTHONPERFJITSUPPORT=1 perf record -F 9999 -g --call-graph dwarf -o perf.data python my_script.py
232+
$ perf inject -i perf.data --jit
233+
$ perf report -g -i perf.data
234+
235+
Notice that when using ``--call-graph dwarf`` the ``perf`` tool will take snapshots of the stack of
236+
the process being profiled and save the information in the ``perf.data`` file. By default the size of
237+
the stack dump is 8192 bytes but the user can change the size by passing the size after comma like
238+
``--call-graph dwarf,4096``. The size of the stack dump is important because if the size is too small
239+
``perf`` will not be able to unwind the stack and the output will be incomplete.
240+

‎Doc/using/cmdline.rst

Copy file name to clipboardExpand all lines: Doc/using/cmdline.rst
+24Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -586,6 +586,15 @@ Miscellaneous options
586586

587587
.. versionadded:: 3.12
588588

589+
* ``-X perfjit`` enables support for the Linux ``perf`` profiler with DWARF
590+
support. When this option is provided, the ``perf`` profiler will be able
591+
to report Python calls using DWARF ifnormation. This option is only available on
592+
some platforms and will do nothing if is not supported on the current
593+
system. The default value is "off". See also :envvar:`PYTHONPERFJITSUPPORT`
594+
and :ref:`perf_profiling`.
595+
596+
.. versionadded:: 3.13
597+
589598
* :samp:`-X cpu_count={n}` overrides :func:`os.cpu_count`,
590599
:func:`os.process_cpu_count`, and :func:`multiprocessing.cpu_count`.
591600
*n* must be greater than or equal to 1.
@@ -1127,6 +1136,21 @@ conflict.
11271136

11281137
.. versionadded:: 3.12
11291138

1139+
.. envvar:: PYTHONPERFJITSUPPORT
1140+
1141+
If this variable is set to a nonzero value, it enables support for
1142+
the Linux ``perf`` profiler so Python calls can be detected by it
1143+
using DWARF information.
1144+
1145+
If set to ``0``, disable Linux ``perf`` profiler support.
1146+
1147+
See also the :option:`-X perfjit <-X>` command-line option
1148+
and :ref:`perf_profiling`.
1149+
1150+
.. versionadded:: 3.13
1151+
1152+
1153+
11301154
.. envvar:: PYTHON_CPU_COUNT
11311155

11321156
If this variable is set to a positive integer, it overrides the return

‎Doc/whatsnew/3.13.rst

Copy file name to clipboardExpand all lines: Doc/whatsnew/3.13.rst
+5Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -231,6 +231,11 @@ Other Language Changes
231231
equivalent of the :option:`-X frozen_modules <-X>` command-line option.
232232
(Contributed by Yilei Yang in :gh:`111374`.)
233233

234+
* Add :ref:`support for the perf profiler <perf_profiling>` working without
235+
frame pointers through the new environment variable
236+
:envvar:`PYTHONPERFJITSUPPORT` and command-line option :option:`-X perfjit
237+
<-X>` (Contributed by Pablo Galindo in :gh:`118518`.)
238+
234239
* The new :envvar:`PYTHON_HISTORY` environment variable can be used to change
235240
the location of a ``.python_history`` file.
236241
(Contributed by Levi Sabah, Zackery Spytz and Hugo van Kemenade in

‎Include/internal/pycore_ceval.h

Copy file name to clipboardExpand all lines: Include/internal/pycore_ceval.h
+1Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,7 @@ extern int _PyIsPerfTrampolineActive(void);
108108
extern PyStatus _PyPerfTrampoline_AfterFork_Child(void);
109109
#ifdef PY_HAVE_PERF_TRAMPOLINE
110110
extern _PyPerf_Callbacks _Py_perfmap_callbacks;
111+
extern _PyPerf_Callbacks _Py_perfmap_jit_callbacks;
111112
#endif
112113

113114
static inline PyObject*

‎Include/internal/pycore_ceval_state.h

Copy file name to clipboardExpand all lines: Include/internal/pycore_ceval_state.h
+2Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,7 @@ struct trampoline_api_st {
7575
unsigned int code_size, PyCodeObject* code);
7676
int (*free_state)(void* state);
7777
void *state;
78+
Py_ssize_t code_padding;
7879
};
7980
#endif
8081

@@ -83,6 +84,7 @@ struct _ceval_runtime_state {
8384
struct {
8485
#ifdef PY_HAVE_PERF_TRAMPOLINE
8586
perf_status_t status;
87+
int perf_trampoline_type;
8688
Py_ssize_t extra_code_index;
8789
struct code_arena_st *code_arena;
8890
struct trampoline_api_st trampoline_api;

‎Lib/test/test_perf_profiler.py

Copy file name to clipboardExpand all lines: Lib/test/test_perf_profiler.py
+114-32Lines changed: 114 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
import sysconfig
66
import os
77
import pathlib
8+
import shutil
89
from test import support
910
from test.support.script_helper import (
1011
make_script,
@@ -76,14 +77,27 @@ def baz():
7677
perf_file = pathlib.Path(f"/tmp/perf-{process.pid}.map")
7778
self.assertTrue(perf_file.exists())
7879
perf_file_contents = perf_file.read_text()
79-
perf_lines = perf_file_contents.splitlines();
80-
expected_symbols = [f"py::foo:{script}", f"py::bar:{script}", f"py::baz:{script}"]
80+
perf_lines = perf_file_contents.splitlines()
81+
expected_symbols = [
82+
f"py::foo:{script}",
83+
f"py::bar:{script}",
84+
f"py::baz:{script}",
85+
]
8186
for expected_symbol in expected_symbols:
82-
perf_line = next((line for line in perf_lines if expected_symbol in line), None)
83-
self.assertIsNotNone(perf_line, f"Could not find {expected_symbol} in perf file")
87+
perf_line = next(
88+
(line for line in perf_lines if expected_symbol in line), None
89+
)
90+
self.assertIsNotNone(
91+
perf_line, f"Could not find {expected_symbol} in perf file"
92+
)
8493
perf_addr = perf_line.split(" ")[0]
85-
self.assertFalse(perf_addr.startswith("0x"), "Address should not be prefixed with 0x")
86-
self.assertTrue(set(perf_addr).issubset(string.hexdigits), "Address should contain only hex characters")
94+
self.assertFalse(
95+
perf_addr.startswith("0x"), "Address should not be prefixed with 0x"
96+
)
97+
self.assertTrue(
98+
set(perf_addr).issubset(string.hexdigits),
99+
"Address should contain only hex characters",
100+
)
87101

88102
def test_trampoline_works_with_forks(self):
89103
code = """if 1:
@@ -212,7 +226,7 @@ def test_sys_api_get_status(self):
212226
assert_python_ok("-c", code)
213227

214228

215-
def is_unwinding_reliable():
229+
def is_unwinding_reliable_with_frame_pointers():
216230
cflags = sysconfig.get_config_var("PY_CORE_CFLAGS")
217231
if not cflags:
218232
return False
@@ -259,24 +273,49 @@ def perf_command_works():
259273
return True
260274

261275

262-
def run_perf(cwd, *args, **env_vars):
276+
def run_perf(cwd, *args, use_jit=False, **env_vars):
263277
if env_vars:
264278
env = os.environ.copy()
265279
env.update(env_vars)
266280
else:
267281
env = None
268282
output_file = cwd + "/perf_output.perf"
269-
base_cmd = ("perf", "record", "-g", "--call-graph=fp", "-o", output_file, "--")
283+
if not use_jit:
284+
base_cmd = ("perf", "record", "-g", "--call-graph=fp", "-o", output_file, "--")
285+
else:
286+
base_cmd = (
287+
"perf",
288+
"record",
289+
"-g",
290+
"--call-graph=dwarf,65528",
291+
"-F99",
292+
"-k1",
293+
"-o",
294+
output_file,
295+
"--",
296+
)
270297
proc = subprocess.run(
271298
base_cmd + args,
272299
stdout=subprocess.PIPE,
273300
stderr=subprocess.PIPE,
274301
env=env,
275302
)
276303
if proc.returncode:
277-
print(proc.stderr)
304+
print(proc.stderr, file=sys.stderr)
278305
raise ValueError(f"Perf failed with return code {proc.returncode}")
279306

307+
if use_jit:
308+
jit_output_file = cwd + "/jit_output.dump"
309+
command = ("perf", "inject", "-j", "-i", output_file, "-o", jit_output_file)
310+
proc = subprocess.run(
311+
command, stderr=subprocess.PIPE, stdout=subprocess.PIPE, env=env
312+
)
313+
if proc.returncode:
314+
print(proc.stderr)
315+
raise ValueError(f"Perf failed with return code {proc.returncode}")
316+
# Copy the jit_output_file to the output_file
317+
os.rename(jit_output_file, output_file)
318+
280319
base_cmd = ("perf", "script")
281320
proc = subprocess.run(
282321
("perf", "script", "-i", output_file),
@@ -290,20 +329,9 @@ def run_perf(cwd, *args, **env_vars):
290329
)
291330

292331

293-
@unittest.skipUnless(perf_command_works(), "perf command doesn't work")
294-
@unittest.skipUnless(is_unwinding_reliable(), "Unwinding is unreliable")
295-
class TestPerfProfiler(unittest.TestCase):
296-
def setUp(self):
297-
super().setUp()
298-
self.perf_files = set(pathlib.Path("/tmp/").glob("perf-*.map"))
299-
300-
def tearDown(self) -> None:
301-
super().tearDown()
302-
files_to_delete = (
303-
set(pathlib.Path("/tmp/").glob("perf-*.map")) - self.perf_files
304-
)
305-
for file in files_to_delete:
306-
file.unlink()
332+
class TestPerfProfilerMixin:
333+
def run_perf(self, script_dir, perf_mode, script):
334+
raise NotImplementedError()
307335

308336
def test_python_calls_appear_in_the_stack_if_perf_activated(self):
309337
with temp_dir() as script_dir:
@@ -322,14 +350,14 @@ def baz(n):
322350
baz(10000000)
323351
"""
324352
script = make_script(script_dir, "perftest", code)
325-
stdout, stderr = run_perf(script_dir, sys.executable, "-Xperf", script)
353+
stdout, stderr = self.run_perf(script_dir, script)
326354
self.assertEqual(stderr, "")
327355

328356
self.assertIn(f"py::foo:{script}", stdout)
329357
self.assertIn(f"py::bar:{script}", stdout)
330358
self.assertIn(f"py::baz:{script}", stdout)
331359

332-
def test_python_calls_do_not_appear_in_the_stack_if_perf_activated(self):
360+
def test_python_calls_do_not_appear_in_the_stack_if_perf_deactivated(self):
333361
with temp_dir() as script_dir:
334362
code = """if 1:
335363
def foo(n):
@@ -346,13 +374,38 @@ def baz(n):
346374
baz(10000000)
347375
"""
348376
script = make_script(script_dir, "perftest", code)
349-
stdout, stderr = run_perf(script_dir, sys.executable, script)
377+
stdout, stderr = self.run_perf(
378+
script_dir, script, activate_trampoline=False
379+
)
350380
self.assertEqual(stderr, "")
351381

352382
self.assertNotIn(f"py::foo:{script}", stdout)
353383
self.assertNotIn(f"py::bar:{script}", stdout)
354384
self.assertNotIn(f"py::baz:{script}", stdout)
355385

386+
@unittest.skipUnless(perf_command_works(), "perf command doesn't work")
387+
@unittest.skipUnless(
388+
is_unwinding_reliable_with_frame_pointers(),
389+
"Unwinding is unreliable with frame pointers",
390+
)
391+
class TestPerfProfiler(unittest.TestCase, TestPerfProfilerMixin):
392+
def run_perf(self, script_dir, script, activate_trampoline=True):
393+
if activate_trampoline:
394+
return run_perf(script_dir, sys.executable, "-Xperf", script)
395+
return run_perf(script_dir, sys.executable, script)
396+
397+
def setUp(self):
398+
super().setUp()
399+
self.perf_files = set(pathlib.Path("/tmp/").glob("perf-*.map"))
400+
401+
def tearDown(self) -> None:
402+
super().tearDown()
403+
files_to_delete = (
404+
set(pathlib.Path("/tmp/").glob("perf-*.map")) - self.perf_files
405+
)
406+
for file in files_to_delete:
407+
file.unlink()
408+
356409
def test_pre_fork_compile(self):
357410
code = """if 1:
358411
import sys
@@ -370,7 +423,7 @@ def bar_fork():
370423
foo_fork()
371424
372425
def foo():
373-
pass
426+
import time; time.sleep(1)
374427
375428
def bar():
376429
foo()
@@ -423,12 +476,41 @@ def compile_trampolines_for_all_functions():
423476
# identical in both the parent and child perf-map files.
424477
perf_file_lines = perf_file_contents.split("\n")
425478
for line in perf_file_lines:
426-
if (
427-
f"py::foo_fork:{script}" in line
428-
or f"py::bar_fork:{script}" in line
429-
):
479+
if f"py::foo_fork:{script}" in line or f"py::bar_fork:{script}" in line:
430480
self.assertIn(line, child_perf_file_contents)
431481

482+
def _is_kernel_version_at_least(major, minor):
483+
try:
484+
with open("/proc/version") as f:
485+
version = f.readline().split()[2]
486+
except FileNotFoundError:
487+
return False
488+
version = version.split(".")
489+
return int(version[0]) > major or (int(version[0]) == major and int(version[1]) >= minor)
490+
491+
@unittest.skipUnless(perf_command_works(), "perf command doesn't work")
492+
@unittest.skipUnless(_is_kernel_version_at_least(6, 6), "perf command may not work due to a perf bug")
493+
class TestPerfProfilerWithDwarf(unittest.TestCase, TestPerfProfilerMixin):
494+
def run_perf(self, script_dir, script, activate_trampoline=True):
495+
if activate_trampoline:
496+
return run_perf(
497+
script_dir, sys.executable, "-Xperfjit", script, use_jit=True
498+
)
499+
return run_perf(script_dir, sys.executable, script, use_jit=True)
500+
501+
def setUp(self):
502+
super().setUp()
503+
self.perf_files = set(pathlib.Path("/tmp/").glob("jit*.dump"))
504+
self.perf_files |= set(pathlib.Path("/tmp/").glob("jitted-*.so"))
505+
506+
def tearDown(self) -> None:
507+
super().tearDown()
508+
files_to_delete = set(pathlib.Path("/tmp/").glob("jit*.dump"))
509+
files_to_delete |= set(pathlib.Path("/tmp/").glob("jitted-*.so"))
510+
files_to_delete = files_to_delete - self.perf_files
511+
for file in files_to_delete:
512+
file.unlink()
513+
432514

433515
if __name__ == "__main__":
434516
unittest.main()

‎Makefile.pre.in

Copy file name to clipboardExpand all lines: Makefile.pre.in
+1Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -488,6 +488,7 @@ PYTHON_OBJS= \
488488
Python/fileutils.o \
489489
Python/suggestions.o \
490490
Python/perf_trampoline.o \
491+
Python/perf_jit_trampoline.o \
491492
Python/$(DYNLOADFILE) \
492493
$(LIBOBJS) \
493494
$(MACHDEP_OBJS) \
+4Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
Allow the Linux perf support to work without frame pointers using perf's
2+
advanced JIT support. The feature is activated when using the
3+
``PYTHONPERFJITSUPPORT`` environment variable or when running Python with
4+
``-Xperfjit``. Patch by Pablo Galindo

‎PCbuild/_freeze_module.vcxproj

Copy file name to clipboardExpand all lines: PCbuild/_freeze_module.vcxproj
+1Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -240,6 +240,7 @@
240240
<ClCompile Include="..\Python\parking_lot.c" />
241241
<ClCompile Include="..\Python\pathconfig.c" />
242242
<ClCompile Include="..\Python\perf_trampoline.c" />
243+
<ClCompile Include="..\Python\perf_jit_trampoline.c" />
243244
<ClCompile Include="..\Python\preconfig.c" />
244245
<ClCompile Include="..\Python\pyarena.c" />
245246
<ClCompile Include="..\Python\pyctype.c" />

‎PCbuild/_freeze_module.vcxproj.filters

Copy file name to clipboardExpand all lines: PCbuild/_freeze_module.vcxproj.filters
+3Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,9 @@
9494
<ClCompile Include="..\Python\perf_trampoline.c">
9595
<Filter>Source Files</Filter>
9696
</ClCompile>
97+
<ClCompile Include="..\Python\perf_jit_trampoline.c">
98+
<Filter>Source Files</Filter>
99+
</ClCompile>
97100
<ClCompile Include="..\Python\compile.c">
98101
<Filter>Source Files</Filter>
99102
</ClCompile>

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.