summaryrefslogtreecommitdiff
path: root/docs/perfetto.rst
blob: c9b39ec7f732214760973e0af831933eb189ad65 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
Perfetto Tracing
================

Mesa has experimental support for `Perfetto <https://perfetto.dev>`__ for
GPU performance monitoring.  Perfetto supports multiple
`producers <https://perfetto.dev/docs/concepts/service-model>`__ each with
one or more data-sources.  Perfetto already provides various producers and
data-sources for things like:

- CPU scheduling events (``linux.ftrace``)
- CPU frequency scaling (``linux.ftrace``)
- System calls (``linux.ftrace``)
- Process memory utilization (``linux.process_stats``)

As well as various domain specific producers.

The mesa perfetto support adds additional producers, to allow for visualizing
GPU performance (frequency, utilization, performance counters, etc) on the
same timeline, to better understand and tune/debug system level performance:

- pps-producer: A systemwide daemon that can collect global performance
  counters.
- mesa: Per-process producer within mesa to capture render-stage traces
  on the GPU timeline, track events, etc.

The exact supported features vary per driver:

.. list-table:: Supported data-sources
   :header-rows: 1

   * - Driver
     - PPS Counters
     - Render Stages
   * - Freedreno
     - ``gpu.counters.msm``
     - ``gpu.renderstages.msm``
   * - Turnip
     - ``gpu.counters.msm``
     -
   * - Intel
     - ``gpu.counters.i915``
     -
   * - Panfrost
     - ``gpu.counters.panfrost``
     -

Run
---

To capture a trace with perfetto you need to take the following steps:

1. Build perfetto from sources available at ``subprojects/perfetto`` following
   `this guide <https://perfetto.dev/docs/quickstart/linux-tracing>`__.

2. Create a `trace config <https://perfetto.dev/#/trace-config.md>`__, which is
   a json formatted text file with extension ``.cfg``, or use one of the config
   files under the ``src/tool/pps/cfg`` directory. More examples of config files
   can be found in ``subprojects/perfetto/test/configs``.

3. Change directory to ``subprojects/perfetto`` and run a
   `convenience script <https://perfetto.dev/#/running.md>`__ to start the
   tracing service:

   .. code-block:: console

      cd subprojects/perfetto
      CONFIG=<path/to/gpu.cfg> OUT=out/linux_clang_release ./tools/tmux -n

4. Start other producers you may need, e.g. ``pps-producer``.

5. Start ``perfetto`` under the tmux session initiated in step 3.

6. Once tracing has finished, you can detach from tmux with :kbd:`Ctrl+b`,
   :kbd:`d`, and the convenience script should automatically copy the trace
   files into ``$HOME/Downloads``.

7. Go to `ui.perfetto.dev <https://ui.perfetto.dev>`__ and upload
   ``$HOME/Downloads/trace.protobuf`` by clicking on **Open trace file**.

8. Alternatively you can open the trace in `AGI <https://gpuinspector.dev/>`__
   (which despite the name can be used to view non-android traces).

To be a bit more explicit, here is a listing of commands reproducing
the steps above :

.. code-block:: console

   # Configure Mesa with perfetto
   mesa $ meson . build -Dperfetto=true -Dvulkan-drivers=intel,broadcom -Dgallium-drivers=
   # Build mesa
   mesa $ ninja -C build

   # Within the Mesa repo, build perfetto
   mesa $ cd subprojects/perfetto
   perfetto $ ./tools/install-build-deps
   perfetto $ ./tools/gn gen --args='is_debug=false' out/linux
   perfetto $ ./tools/ninja -C out/linux

   # Start perfetto
   perfetto $ CONFIG=../../src/tool/pps/cfg/gpu.cfg OUT=out/linux/ ./tools/tmux -n

   # In parallel from the Mesa repo, start the PPS producer
   mesa $ ./build/src/tool/pps/pps-producer

   # Back in the perfetto tmux, press enter to start the capture

Vulkan data sources
~~~~~~~~~~~~~~~~~~~

The Vulkan API gives the application control over recording of command
buffers as well as when they are submitted to the hardware. As a
consequence, we need to ensure command buffers are properly
instrumented for the perfetto driver data sources prior to Perfetto
actually collecting traces.

This can be achieved by setting the ``GPU_TRACE_INSTRUMENT``
environment variable before starting a Vulkan application :

.. code-block:: console

   GPU_TRACE_INSTRUMENT=1 ./build/my_vulkan_app

Driver Specifics
~~~~~~~~~~~~~~~~

Below is driver specific information/instructions for the PPS producer.

Freedreno / Turnip
^^^^^^^^^^^^^^^^^^

The Freedreno PPS driver needs root access to read system-wide
performance counters, so you can simply run it with sudo:

.. code-block:: console

   sudo ./build/src/tool/pps/pps-producer

Intel
^^^^^

The Intel PPS driver needs root access to read system-wide
`RenderBasic <https://software.intel.com/content/www/us/en/develop/documentation/vtune-help/top/reference/gpu-metrics-reference.html>`__
performance counters, so you can simply run it with sudo:

.. code-block:: console

   sudo ./build/src/tool/pps/pps-producer

Another option to enable access wide data without root permissions would be running the following:

.. code-block:: console

   sudo sysctl dev.i915.perf_stream_paranoid=0

Alternatively using the ``CAP_PERFMON`` permission on the binary should work too.

A particular metric set can also be selected to capture a different
set of HW counters :

.. code-block:: console

   INTEL_PERFETTO_METRIC_SET=RasterizerAndPixelBackend ./build/src/tool/pps/pps-producer

Vulkan applications can also be instrumented to be Perfetto producers.
To enable this for given application, set the environment variable as
follow :

.. code-block:: console

   PERFETTO_TRACE=1 my_vulkan_app

Panfrost
^^^^^^^^

The Panfrost PPS driver uses unstable ioctls that behave correctly on
kernel version `5.4.23+ <https://lwn.net/Articles/813601/>`__ and
`5.5.7+ <https://lwn.net/Articles/813600/>`__.

To run the producer, follow these two simple steps:

1. Enable Panfrost unstable ioctls via kernel parameter:

   .. code-block:: console

      modprobe panfrost unstable_ioctls=1

   Alternatively you could add ``panfrost.unstable_ioctls=1`` to your kernel command line, or ``echo 1 > /sys/module/panfrost/parameters/unstable_ioctls``.

2. Run the producer:

   .. code-block:: console

      ./build/pps-producer

Troubleshooting
---------------

Tmux
~~~~

If the convenience script ``tools/tmux`` keeps copying artifacts to your
``SSH_TARGET`` without starting the tmux session, make sure you have ``tmux``
installed in your system.

.. code-block:: console

   apt install tmux

Missing counter names
~~~~~~~~~~~~~~~~~~~~~

If the trace viewer shows a list of counters with a description like
``gpu_counter(#)`` instead of their proper names, maybe you had a data loss due
to the trace buffer being full and wrapped.

In order to prevent this loss of data you can tweak the trace config file in
two different ways:

- Increase the size of the buffer in use:

  .. code-block:: javascript

      buffers {
          size_kb: 2048,
          fill_policy: RING_BUFFER,
      }

- Periodically flush the trace buffer into the output file:

  .. code-block:: javascript

      write_into_file: true
      file_write_period_ms: 250


- Discard new traces when the buffer fills:

  .. code-block:: javascript

      buffers {
          size_kb: 2048,
          fill_policy: DISCARD,
      }