docs/design/draft-tracing.txt


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360

Tracing
=======

This subsystem will provide a mechanism to get structured tracing info from
GStreamer applications. This can be used for post-run analysis as well as for
live introspection.

Use cases
---------
* I'd like to get statistics from a running application.

* I'd like to to understand which parts of my pipeline use how many resources.

* I'd like to know which parts of the pipeline use how much memory.

* I'd like to know about ref-counts of parts in the pipeline to find ref-count
  issues.

Non use-cases
-------------
* Some element in the pipeline does not play along the rules, find out which
  one. This could be done with generic tests.

Design
------

The system brings the following new items:
core hooks: probes in the core api, that will expose internal state when tracing
  is in use
tracers: plugin features that can process data from the hooks and emit a log
tracing front-ends: applications that consume logs from tracers

Like the logging, the tracer hooks can be compiled out and if not use a local
condition to check if active.

Certain GStreamer core function (such as gst_pad_push or gst_element_add_pad)
will call into the tracer subsystem to dispatch into active tracing modules.
Developers will be able to select a list of plugins by setting an environment
variable, such as GST_TRACE="meminfo;dbus". One can also pass parameters to
plugins:  GST_TRACE="log(events,buffers);stats(all)".
When then plugins are loaded, we'll add them to certain hooks according to which
they are interested in.

Right now tracing info is logged as GstStructures to the TRACE level. 
Idea: Another env var GST_TRACE_CHANNEL could be used to send the tracing to a
file or a socket. See https://bugzilla.gnome.org/show_bug.cgi?id=733188 for
discussion on these environment variables.

Hook api
--------
We'll wrap interesting api calls with two macros, e.g. gst_pad_push():

GstFlowReturn
gst_pad_push (GstPad * pad, GstBuffer * buffer)
{
  GstFlowReturn res;

  g_return_val_if_fail (GST_IS_PAD (pad), GST_FLOW_ERROR);
  g_return_val_if_fail (GST_PAD_IS_SRC (pad), GST_FLOW_ERROR);
  g_return_val_if_fail (GST_IS_BUFFER (buffer), GST_FLOW_ERROR);

  GST_TRACER_PAD_PUSH_PRE (pad, buffer);
  res = gst_pad_push_data (pad,
      GST_PAD_PROBE_TYPE_BUFFER | GST_PAD_PROBE_TYPE_PUSH, buffer);
  GST_TRACER_PAD_PUSH_POST (pad, res);
  return res;
}

TODO(ensonic): gcc has some magic for wrapping functions
- http://gcc.gnu.org/onlinedocs/gcc/Constructing-Calls.html
- http://www.clifford.at/cfun/gccfeat/#gccfeat05.c

TODO(ensonic): we should eval if we can use something like jump_label in the kernel
- http://lwn.net/Articles/412072/ + http://lwn.net/Articles/435215/
- http://lxr.free-electrons.com/source/kernel/jump_label.c
- http://lxr.free-electrons.com/source/include/linux/jump_label.h
- http://lxr.free-electrons.com/source/arch/x86/kernel/jump_label.c
TODO(ensonic): liblttng-ust provides such a mechanism for user-space
- but this is mostly about logging traces
- it is linux specific :/

In addition to api hooks we should also provide timer hooks. Interval timers are
useful to get e.g. resource usage snapshots. Also absolute timers might make
sense. All this could be implemented with a clock thread. We can use another
env-var GST_TRACE_TIMERS="100ms,75ms" to configure timers and then pass them to
the tracers like, GST_TRACE="rusage(timer=100ms);meminfo(timer=75ms)". Maybe
we can create them ad-hoc and avoid the GST_TRACE_TIMERS var.

Hooks (* already implemented)
-----
* gst_bin_add
* gst_bin_remove
* gst_element_add_pad
* gst_element_post_message
* gst_element_query
* gst_element_remove_pad
* gst_element_factory_make
* gst_pad_link
* gst_pad_pull_range
* gst_pad_push
* gst_pad_push_list
* gst_pad_push_event
* gst_pad_unlink

Tracer api
----------
Tracers are plugin features. They have a simple api:

class init
Here the tracers describe the data the will emit.

instance init
Tracers attach handlers to one or more hooks using gst_tracer_register_hook().
In case the are configurable, they can read the options from the 'params'
property. This is the extra detail from the environment var.

hook functions
Hooks marshal the parameters given to a trace hook into varargs and also
add some extra into such as a timestamp. Hooks will be called from misc threads.
The trace plugins should only consume (=read) the provided data. Expensive
computation should be avoided to not affect the execution too much.
Most trace plugins will log data to a trace channel.

instance destruction
Tracers can output results and release data. This would ideally be done at the
end of the applications, but gst_deinit() is not mandatory. gst_tracelib was
using a gcc_destructor. Ideally tracer modules log data as they have them and
leave aggregation to a tool that processes the log.

tracer event classes
--------------------
Most tracers will log some kind of 'events' : a data transfer, an event,
a message, a query or a measurement. Every tracers should describe the data
format. This way tools that process tracer logs can show the data in a
meaningful way without having to know about the tracer plugin.

One way would be to introspect the data from the plugin. This has the
disadvantage that the postprocessing app needs to load the plugins or talk to
the gstreamer registry. An alternative is to also log the format description
into the log. Right now we're logging several nested GstStructure from the
_tracer_class_init() function (except in the log tracer).

// the name is the value name + ".class"
// the content describes a single log record
gst_structure_new ("thread-rusage.class",
    // value in the log record (order does not matter)
    // 'thread-id' is a 'key' to related the record to something as indicated
    // by 'scope' substructure
    "thread-id", GST_TYPE_STRUCTURE, gst_structure_new ("scope",
        // enum { process, thread, element, pad, context?, ... }
        "related-to", G_TYPE_STRING, "thread",
        NULL),
    // next value in the record
    // 'average-cpuload' is a measurement as indicated by the 'value'
    // substructure
    "average-cpuload", GST_TYPE_STRUCTURE, gst_structure_new ("value",
        // value type
        "type", G_TYPE_GTYPE, G_TYPE_UINT,
        // human readable description, that can be used as a graph label
        "description", G_TYPE_STRING, "average cpu usage per thread",
        // flags that help to use the right graph type
        // flags { aggregated, windowed, cumulative, ... }
        "flags", G_TYPE_STRING, "aggregated",
        // value range
        "min", G_TYPE_UINT, 0, 
        "max", G_TYPE_UINT, 100,
        NULL),
    ...
    NULL);

A few ideas that are not yet in the above spec:
- it would be nice to describe the unit of values
  - putting it into the description is not flexible though, e.g. time would be
    a guint64 but a ui would reformat it to e.g. h:m:s.ms
  - other units are e.g.: percent, per-mille, or kbit/s
- we'd like to have some metadata on scopes
  - e.g. we'd like to log the thread-names, so that a UI can show that instead
    of thread-ids
  - the stats tracer logs 'new-element' and 'new-pad' messages
    - they add a unique 'ix' to each instance as the memory ptr can be reused
      for new instances, the data is attached to the objects as qdata
    - the latency tracer would like to also reference this metadata
- enums and flags are logged as comma separated string to not require the front-
  ends to know the flag values (is that okay?)
- right now we log the classes as structures
  - this is important so that the log is self contained
  - it would be nice to add them to the registry, so that gst-inspect can show
    them

In addition to the above spec, each log event will contain a "ts" field as a
G_TYPE_UINT64 to specify the time of the event.

We could also consider to add each value as a READABLE gobject property. The
property has name/description. We could use qdata for scope and flags (or have
some new property flags).
We would also need a new "notify" signal, so that value-change notifications 
would include a time-stamp. This way the tracers would not needs to be aware of
the logging. The core tracer would register the notify handlers and emit the
log.
Or we just add a gst_tracer_class_install_event() and that mimics the 
g_object_class_install_property().

Frontends can:
- do an events over time histogram
- plot curves of values over time or deltas
- show gauges
- collect statistics (min, max, avg, ...)

Plugins ideas
=============

We can have some under gstreamer/plugins/tracers/

latency
-------
- register to buffer and event flow
- send custom event on buffer flow at source elements
- catch events on event transfer at sink elements

meminfo (not yet implemented)
-------
- register to an interval-timer hook.
- call mallinfo() and log memory usage

rusage
------
- register to an interval-timer hook.
- call getrusage() and log resource usage

dbus (not yet implemented)
----
- provide a dbus iface to announce applications that are traced
- tracing UIs can use the dbus iface to find the channels where logging and
  tracing is getting logged to
- one would start the tracing UI first and when the application is started with
  tracing activated, the dbus plugin will announce the new application,
  upon which the tracing UI can start reading from the log channels, this avoid
  missing some data

topology (not yet implemented)
--------
- register to pipeline topology hooks
- tracing UIs can show a live pipeline graph

stats
-----
- register to buffer, event, message and query flow
- tracing apps can do e.g. statistics

refcounts (not yet implemented)
---------
- log ref-counts of objects
- just logging them outside of glib/gobject would still make it hard to detect
  issues though

opengl (not yet implemented)
------
- upload/download times
- there is not hardware agnostic way to get e.g. memory usage info
  (gl extensions)

User interfaces
===============

gst-debug-viewer
----------------
gst-debug-viewer could be given the trace log in addition to the debug log (or a
combined log). Alternatively it would show a dialog that shows all local apps
(if the dbus plugin is loaded) and read the log streams from the sockets/files
that are configured for the app.

gst-tracer
----------
Counterpart of gst-tracelib-ui.

gst-stats
---------
A terminal app that shows summary/running stats like the summary gst-tracelib
shows at the end of a run. Currently only shows an aggregated status.

live-graphers
-------------
Maybe we can even feed the log into existing live graphers, with a little driver
* https://github.com/dkogan/feedgnuplot


Problems / Open items
=====================
- should tracers log into the debug.log or into a separate log?
  - separate log
    - use a binary format?
    - worse performance (we're writing two logs at the same time)
    - need to be careful when people to GST_DEBUG_CHANNEL=stderr and
      GST_TRACE_CHANNEL=stderr (use a shared channel, but what about the
      formats?)
  - debug log
    - the tracer subsystem would need to log the GST_TRACE at a level that is
      active
    - should the tracer call gst_debug_category_set_threshold() to ensure things
      work, even though the levels don't make a lot of sense here
  - make logging a tracer (a hook in gst_debug_log_valist, move
    gst_debug_log_default() to the tracer module)
    - log all debug log to the tracer log, some of the current logging
      statements can be replaced by generic logging as shown in the log-tracer
    - add tools/gst-debug to extract a human readable debug log from the trace
      log
  - we could maintain a list of log functions, where gst_tracer_log_trace() is
    the default one. This way e.g. gst-validate could consume the traces
    directly.

- when hooking into a timer, should we just have some predefined intervals?
  - can we add a tracer module that registers the timer hook? then we could do
      GST_TRACER="timer(10ms);rusage"
    right now the tracer hooks are defined as an enum though.

- when connecting to a running app, we can't easily get the 'current' state if
  logging is using a socket, as past events are not explicitly stored, we could 
  determine the current topology and emit events with GST_CLOCK_TIME_NONE as ts                                      
  to indicate that the events are synthetic.

- we need stable ids for scopes (elements, pads)
  - the address can be reused
  - we can use gst_util_seqnum_next()
  - something like gst_object_get_path_string() won't work as objects are 
    initially without parent

- right now the tracer is enabled/disabled from configure with 
  --{enable,disable}-debug, we should probably add --{enable,disable}-tracer
  although it is confusing with "--disable-trace disable tracing subsystem"
  where we could at least change the doc-string to say "disable memory tracing
  subsystem"

Try it
======
GST_DEBUG="GST_TRACER:7,GST_BUFFER*:7,GST_EVENT:7,GST_MESSAGE:7" GST_TRACE=log gst-launch-1.0 fakesrc num-buffers=10 ! fakesink
- traces for buffer flow in TRACE level

GST_DEBUG="GST_TRACER:7" GST_TRACE="stats;rusage" GST_DEBUG_FILE=trace.log gst-launch-1.0 fakesrc num-buffers=10 sizetype=fixed ! queue ! fakesink
gst-stats-1.0 trace.log
- print some pipeline stats on exit

GST_DEBUG="GST_TRACER:7" GST_TRACE="stats;rusage" GST_DEBUG_FILE=trace.log /usr/bin/gst-play-1.0 --interactive $HOME/Videos/movie.mp4
./scripts/gst-plot-traces.sh --format=png | gnuplot
eog trace.log.*.png
- get ts, average-cpuload, current-cpuload, time and plot

GST_DEBUG="GST_TRACER:7" GST_TRACE=latency gst-launch-1.0 audiotestsrc num-buffers=10 ! audioconvert ! volume volume=0.7 ! autoaudiosink
- print processing latencies

Performance
===========
run ./tests/benchmarks/tracing.sh <tracer(s)> <media>

egrep -c "(proc|thread)-rusage" trace.log 
658618
grep -c "gst_tracer_log_trace" trace.log
823351

- we can optimize most of it by using quarks in structures or eventually avoid structures totally