Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit b9caa09

Browse filesBrowse files
authored
gh-118518: Improve perf docs (#118708)
1 parent a94ac56 commit b9caa09
Copy full SHA for b9caa09

File tree

1 file changed

+48
-22
lines changed
Filter options

1 file changed

+48
-22
lines changed

‎Doc/howto/perf_profiling.rst

Copy file name to clipboardExpand all lines: Doc/howto/perf_profiling.rst
+48-22Lines changed: 48 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -162,12 +162,12 @@ the :option:`!-X` option takes precedence over the environment variable.
162162

163163
Example, using the environment variable::
164164

165-
$ PYTHONPERFSUPPORT=1 python script.py
165+
$ PYTHONPERFSUPPORT=1 perf record -F 9999 -g -o perf.data python script.py
166166
$ perf report -g -i perf.data
167167

168168
Example, using the :option:`!-X` option::
169169

170-
$ python -X perf script.py
170+
$ perf record -F 9999 -g -o perf.data python -X perf script.py
171171
$ perf report -g -i perf.data
172172

173173
Example, using the :mod:`sys` APIs in file :file:`example.py`:
@@ -184,7 +184,7 @@ Example, using the :mod:`sys` APIs in file :file:`example.py`:
184184
185185
...then::
186186

187-
$ python ./example.py
187+
$ perf record -F 9999 -g -o perf.data python ./example.py
188188
$ perf report -g -i perf.data
189189

190190

@@ -210,31 +210,57 @@ of ``perf``.
210210
How to work without frame pointers
211211
----------------------------------
212212

213-
If you are working with a Python interpreter that has been compiled without frame pointers
214-
you can still use the ``perf`` profiler but the overhead will be a bit higher because Python
215-
needs to generate unwinding information for every Python function call on the fly. Additionally,
216-
``perf`` will take more time to process the data because it will need to use the DWARF debugging
217-
information to unwind the stack and this is a slow process.
213+
If you are working with a Python interpreter that has been compiled without
214+
frame pointers, you can still use the ``perf`` profiler, but the overhead will be
215+
a bit higher because Python needs to generate unwinding information for every
216+
Python function call on the fly. Additionally, ``perf`` will take more time to
217+
process the data because it will need to use the DWARF debugging information to
218+
unwind the stack and this is a slow process.
218219

219-
To enable this mode, you can use the environment variable :envvar:`PYTHON_PERF_JIT_SUPPORT` or the
220-
:option:`-X perf_jit <-X>` option, which will enable the JIT mode for the ``perf`` profiler.
220+
To enable this mode, you can use the environment variable
221+
:envvar:`PYTHON_PERF_JIT_SUPPORT` or the :option:`-X perf_jit <-X>` option,
222+
which will enable the JIT mode for the ``perf`` profiler.
221223

222-
When using the perf JIT mode, you need an extra step before you can run ``perf report``. You need to
223-
call the ``perf inject`` command to inject the JIT information into the ``perf.data`` file.
224+
.. note::
225+
226+
Due to a bug in the ``perf`` tool, only ``perf`` versions higher than v6.8
227+
will work with the JIT mode. The fix was also backported to the v6.7.2
228+
version of the tool.
229+
230+
Note that when checking the version of the ``perf`` tool (which can be done
231+
by running ``perf version``) you must take into account that some distros
232+
add some custom version numbers including a ``-`` character. This means
233+
that ``perf 6.7-3`` is not necessarily ``perf 6.7.3``.
234+
235+
When using the perf JIT mode, you need an extra step before you can run ``perf
236+
report``. You need to call the ``perf inject`` command to inject the JIT
237+
information into the ``perf.data`` file.::
224238

225239
$ perf record -F 9999 -g --call-graph dwarf -o perf.data python -Xperf_jit my_script.py
226-
$ perf inject -i perf.data --jit
227-
$ perf report -g -i perf.data
240+
$ perf inject -i perf.data --jit --output perf.jit.data
241+
$ perf report -g -i perf.jit.data
228242

229243
or using the environment variable::
230244

231245
$ PYTHON_PERF_JIT_SUPPORT=1 perf record -F 9999 -g --call-graph dwarf -o perf.data python my_script.py
232-
$ perf inject -i perf.data --jit
233-
$ perf report -g -i perf.data
234-
235-
Notice that when using ``--call-graph dwarf`` the ``perf`` tool will take snapshots of the stack of
236-
the process being profiled and save the information in the ``perf.data`` file. By default the size of
237-
the stack dump is 8192 bytes but the user can change the size by passing the size after comma like
238-
``--call-graph dwarf,4096``. The size of the stack dump is important because if the size is too small
239-
``perf`` will not be able to unwind the stack and the output will be incomplete.
246+
$ perf inject -i perf.data --jit --output perf.jit.data
247+
$ perf report -g -i perf.jit.data
248+
249+
``perf inject --jit`` command will read ``perf.data``,
250+
automatically pick up the perf dump file that Python creates (in
251+
``/tmp/perf-$PID.dump``), and then create ``perf.jit.data`` which merges all the
252+
JIT information together. It should also create a lot of ``jitted-XXXX-N.so``
253+
files in the current directory which are ELF images for all the JIT trampolines
254+
that were created by Python.
255+
256+
.. warning::
257+
Notice that when using ``--call-graph dwarf`` the ``perf`` tool will take
258+
snapshots of the stack of the process being profiled and save the
259+
information in the ``perf.data`` file. By default the size of the stack dump
260+
is 8192 bytes but the user can change the size by passing the size after
261+
comma like ``--call-graph dwarf,4096``. The size of the stack dump is
262+
important because if the size is too small ``perf`` will not be able to
263+
unwind the stack and the output will be incomplete. On the other hand, if
264+
the size is too big, then ``perf`` won't be able to sample the process as
265+
frequently as it would like as the overhead will be higher.
240266

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.