summaryrefslogtreecommitdiff
path: root/Development/Documentation/ServerDebugging.mdwn
blob: 0f72645efcabacc1576443a4d9a3a0645337ca06 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
# Debugging the Xserver

[[!toc startlevel=2]]

This minihowto attempts to explain how to debug the X server, particularly in the case where the server crashes.  It assumes a basic familiarity with unix and a willingness to risk deadlocking the machine.

Just as a warning, if you try this with a closed-source driver, the output is not likely to be very useful.


## Prerequisites

You'll really want to have a second machine around.  It's very difficult to debug the X server from within itself; when it stops and returns control to the debugger, you won't be able to send events to the xterm running your debugger.  ssh is your friend here.  If you don't have a second machine, see the [[Debugging with one machine section|Development/Documentation/ServerDebugging#OneMachine]], and good luck.

Your gdb needs to be reasonably recent, 5.3 or better is probably good.

And of course, you'll need a reproduceable way of crashing the X server, but if you've read this far you've probably got that already.  This is your testcase.


### Debug support

If you're debugging with a modern distribution, then they probably already have 'debuginfo' packages available.  These packages (usually quite large) include the debugging symbols for the software you have installed, which makes tools like gdb much more useful.  Refer to your distro's documentation for details on how to install these.  You'll probably want at least the debuginfo for the X server itself, and for the video driver you're using.  For example, on a Fedora machine, you'd say:


[[!format txt """
debuginfo-install xorg-x11-server-Xorg xorg-x11-drv-ati
"""]]
On Debian or Ubuntu you'd say
[[!format txt """
apt-get install xserver-xorg-core-dbg xserver-xorg-video-ati-dbg
"""]]
Otherwise, if you're building X yourself, you'll need to have built X with debugging information.  To pass compiler flags in at build time, say:
[[!format txt """
  CFLAGS='-O0 -g3' ./configure --prefix=...
"""]]
All the normal configure options should work as expected.  You may want to put your debuggable server in a different prefix.  Be careful of `ModulePath` and other such path statements in your `xorg.conf`.

Remember that if you're trying to debug into a driver, you'll want to repeat this step for the driver as well as for the server core.


## The basics

Start the server normally.  Go over to your second machine and ssh into the first one.  `su root`, and type
[[!format txt """
gdb /opt/xorg-debug/Xorg $(pidof Xorg)
"""]]
or
[[!format txt """
gdb /usr/bin/Xorg $(pidof X)
"""]]
depending on your setup.

Note that even when running with a ssh, X might cripples the console. You can avoid this by passing this option:
[[!format txt """
  -keeptty         don't detach controlling tty (for debugging only)
"""]]
gdb will attach to the running server and spin for a while reading in symbols from all the drivers.  Eventually you'll reach a `(gdb)` prompt. Notice that the X server has halted; type `cont` at the gdb prompt to continue executing.

Go back to the machine running X, and run your testcase.  This time, instead of the server crashing, it should freeze, and gdb should tell you the server got a signal (usually SIGSEGV), as well as the function and line of code where the problem happened.  An example looks like:


[[!format txt """
  Program received signal SIGSEGV, Segmentation fault.
  0x403245a3 in fbBlt (srcLine=0xc1a1c180, srcStride=59742, srcX=0,
                dstLine=0x4240cb6c, dstStride=1152, dstX=0, width=32960, height=764,
                alu=-1046602744, pm=1111538028, bpp=32, reverse=0, upsidedown=0)
                at fbblt.c:174
  174     *dst++ = FbDoDestInvarientMergeRop(*src++);
"""]]
This by itself is pretty helpful, but there's more info out there.  At the gdb prompt, type `bt f` for a full stack backtrace.  (Warning, this will be long!)  This dumps out the full call chain of functions from `main()` on down, as well as the arguments they were called with and the value of all local variables.  Keep hitting enter until you get back to the gdb prompt.

Get your mouse out, copy all the output from "Program received..." on down, and paste it into a file on your second machine.  Type `detach` at the gdb prompt to detach gdb from the server and let it finish crashing.  Go to [[http://bugs.freedesktop.org/|http://bugs.freedesktop.org/]] and file a new bug describing the testcase.  Attach the gdb output to the bug (please don't just paste it into the comments section).


## All the gdb commands you'll ever need

For any gdb command, you can say "help <command>" at the (gdb) prompt to get a (hopefully informative) explanation.

 * `bt` - Prints a stack backtrace.  This shows all the functions that you are currently inside, from `main()` on down to the point of the crash, along with their arguments.  Appending the word `full` (or just the letter `f`) also prints out the value of all the local variables within each function.
 * `list` - Prints the source around the current frame. When invoked multiple times, it will print the next lines, making it useful for quick code inspection. "`list -`" prints the source code backwards (starting from the current frame). This is useful to inspect the lines of code that led to an error.
 * `break` / `clear` - `break` sets a breakpoint.  When execution reaches a breakpoint, the debugger will stop the program and return you to the gdb prompt.  You can set breakpoints on functions, lines of code, or individual instructions; see the help text for details.  `clear`, naturally, clears a breakpoint.
 * `step` / `next` - `step` and `next` allow you to manually advance the program's execution.  `next` runs the program until you reach a different source line; `step` does the same thing, but also descends into called functions.
 * `continue` - continue the program normally until the next breakpoint is hit.
 * `print` - Prints the expression.  You can specify variable names, registers, and absolute addresses, as well as more complex expressions (`help print` for details).  Variable names have to be resolveable, which means they either have to be local variables within the current stack frame or global variables.  Register names start with a `$` sign, like `print $eax`.  Addresses are specified as numbers, like `print 0xdeadbeef`.
                         * Expressions can be fairly complex.  For example, if you have a pointer to a structure named `foo`, `print foo` will print the memory address that foo points to, `print *foo` will print the structure being pointed too, and `print foo->bar` will print the bar member of the foo structure.
 * `handle` - Tells the debugger how to handle various signals.  The defaults are mostly sensible, but there are two you may wish to change.  SIGPIPE is generated when a client dies, which you may not always care about, and SIGUSR1 is generated on VT switch.  By default, the debugger will halt the running process when it receives these signals; to change this, say `handle SIGPIPE nostop` and `handle SIGUSR1 nostop`. (Note: Don't use `handle SIGUSR1 ignore` or you can confuse things quite badly---for example, having multiple X servers simultaneously active on the same VT can be very confusing.)
 * `set environment` - Sets environment variables.  The syntax is `set environment name value`; don't use an = sign like in bash, it won't do what you expect.
 * `run` - Runs the program.  If you only specify a program name on the command line (and not a process ID or a core file), gdb will load the program but not start running it until you say so.  Arguments to `run` are passed verbatim to the child process, eg `run :0 -verbose -ac`.
 * `kill` - Kills the program being debugged.  Not always useful, you'd often rather say...
 * `detach` - which detaches the debugger from the running program, which can then shut down gracefully.
 * `disassemble` - Prints the assembly instructions being executed, starting at the current source line.  You can also specify absolute memory references or function names to start disassembly somewhere other than the default.  Only useful if you can read the assembly language of your CPU.
 * `finish` - Continue until exit of current function. Will also print the return value of the function (if applicable).

Note that most commands can be used in an abbreviated version (e.g. `n` instead of `next`). Just try it yourself!


## Things that can go wrong

The biggest thing to watch out for is attempting to print memory contents when that memory is located on the video card.  It won't work, on x86 anyway, for some not-very-interesting reasons.  You'll know when you did it because the machine will deadlock and you'll have to reboot.  See the DebuggingHints file (below) for workarounds.

Some issues with running X under gdb may be resolved by passing the `-dumbSched` option to the X server. This worked for me to resolve crashes of gdb 6.3 and strange loops in gdb 5.3.  You'll know if you need this option because gdb will get very confused by SIGALRM. Even if gdb isn't misbehaving, the -dumbSched option can be very helpful to avoid the SIGALRM peridocially interrupting your debugging session.

Likewise, some gdb versions crash when starting the X server when attempting to run xkbcomp.  This is, amazingly enough, a bug in the kernel's DRM code for suppressing some signals; it should be fixed in 2.6.28 if not earlier.  You can disable XKB by passing the `-kb` option on the server's command line; obviously if you're trying to debug XKB this may cause you some problems and you're probably better off attaching gdb to a running X instead.  Alternatively, disable DRI, but again, if DRI is the thing you're trying to debug, that won't help.

When you compile with optimization, the values printed by bt can sometimes be confusing.  Some variables can get optimized out of existance, some variables occupy the same position on the stack during different parts of a function's execution, and some functions might not show up on the stack at all.  Also, single-stepping can be confusing because the function might get executed in a different order than listed in the source if the compiler determines that's safe to do.  gcc 4.0 seems to be **much** more aggressive at confusing the debugger than earlier versions, although it does emit more debugging information such that you'll at least know when variables have been optimized away.  As always, lowering the optimization level improves debuggability.


## Further information

There is a DebuggingHints file available [[online|http://cgit.freedesktop.org/xorg/xserver/tree/hw/xfree86/doc/devel/DebuggingHints?id=9508a382f8a9f241dab097d921b6d290c1c3a776]]. It contains a lot of helpful (if very dated) information on how to debug the server, including how to dump PCI memory without deadlocking the machine.  In particular, you'll want to read this if you're trying to debug a server older than 6.9.

<a name="OneMachine"></a>
## Debugging with one machine


### Version 1

The script below allows you to run the server in gdb and catch the gdb output in a file. You cannot interactively control gdb, however the Xserver should not hang gdb by stopping inside the debugger while you cannot control it from a terminal. Store the following script in some file (for example: `/tmp/Xdbg`:
[[!format txt """
#!/bin/sh

#GDB=...
#XSERVER=...

ARGS=$*
PID=$$

test -z "$GDB" && GDB=gdb
test -z "$XSERVER" && XSERVER=/usr/bin/Xorg

cat > /tmp/.dbgfile.$PID << HERE
file $XSERVER
set confirm off
set args $ARGS
handle SIGUSR1 nostop
handle SIGUSR2 nostop
handle SIGPIPE nostop
run
bt full
cont
quit
HERE

$GDB --quiet --command=/tmp/.dbgfile.$PID &> /tmp/gdb_log.$PID

rm -f /tmp/.dbgfile.$PID
echo "Log written to: /tmp/gdb_log.$PID"
"""]]
Then (as root) do:
[[!format txt """
chmod u+x /tmp/Xdbg
mv /usr/X11R6/bin/X /usr/X11R6/bin/X.org
ln -sf /tmp/Xdbg /usr/X11R6/bin/X
"""]]
If you are using a module aware debugger you should remove the comment sign `#` form the line starting with `#GDB` and add the full path to your debugging gdb. You can now start your Xserver like normal. Note, that if you use `startx` you should do so as root. When the Xserver crashes the output of the server should have been written to `/tmp/gdb_log.<number>` together with a backtrace. If your Xserver resides at some other place you can use the `XSERVER` environment variable to specify the path. To restore the previous setup do:
[[!format txt """
mv /usr/X11R6/bin/X.org /usr/X11R6/bin/X
"""]]

### Version 2

If you only have one machine available, you might be able to pry some useful information from the server when it crashes.  The downside is that it will probably halt your machine entirely rather than just crashing X.

Edit your xorg.conf file and find the [[ServerFlags|ServerFlags]] section.  Uncomment the
[[!format txt """
  Option "NoTrapSignals"
"""]]
line (or add it if it doesn't exist).  This will prevent the server from catching fatal signals, which should cause core dumps instead.  (You need to make sure you have core dumps enabled for the server by removing the appropriate ulimit; see the `ulimit` command in the bash man page for details.)

The problem here is the same as mentioned earlier; the core dump will attempt to included mmap()'d sections of card memory, which will make the machine freeze.  Usually the core dump is informative enough to at least give a partial backtrace.

Once you've crashed the machine, find the core file and load it in gdb:
[[!format txt """
  gdb `which Xorg` /path/to/core/file
"""]]
and try to `bt f` like normal.  Fortunately at this point you can't make the machine crash again.

<a name="GdbServer"></a>
## Debugging with gdbserver

Run X on the target using gdbserver, listening on (for example) port 2500:
[[!format txt """
  gdbserver :2500 /usr/bin/X
"""]]
Attach to the running process from gdb, running it from an environment in which you have Xorg installed. In my case, this is a chroot environment. If I try to debug the program from the host environment, without chrooting into my Xorg build environment, gdb cannot find the symbols correctly.


[[!format txt """
root:/usr/src/xc-build# gdb
GNU gdb 6.3
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu".
(gdb) file programs/Xserver/Xorg
Reading symbols from /usr/src/xc-build/programs/Xserver/Xorg...done.Using host libthread_db library "/lib/libthread_db.so.1".
(gdb) target remote 192.168.0.134:2401
Remote debugging using 192.168.0.134:2401
0xb7fed7b0 in ?? ()
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0xb7a92524 in GXDisplayVideo (pScrni=0x828bd38, id=0xb7aa9490, offset=0x17,
    width=0x82a, height=0xe730, pitch=0xb7aa946c, x1=0x8289920, y1=0x0,
    x2=0x0, y2=0x0, dstBox=0x82ae680, src_w=0x82a, src_h=0xe794, drw_w=0x828,
    drw_h=0x8638) at amd_gx_video.c:849
849        GFX(set_video_enable(1));
(gdb)
"""]]
Note in this example that I specify the program to be debugged with a gdb command to read the Xorg symbols:
[[!format txt """
  (gdb) file programs/Xserver/Xorg
"""]]
This is simply an alternative to running gdb like this:
[[!format txt """
  gdb programs/Xserver/Xorg
"""]]