Support
Contribute
Contact
Tracker
Navigation
Personal tools
 

Debugging


Contents

I found a bug in DR17, how can I provide you with some useful information?

This is a guide to providing the Enlightenment developers with useful information that we can use to track down a bug you have found. Recompile EFL with Debug Symbols

The key to getting good debug information from a backtrace, valgrind or any any other collection method is to compile the Enlightenment Foundation Libraries with debugging symbols. For debugging enlightenment the following peices need to be recompiled:

   * ecore
   * eet
   * evas
   * edje
   * embryo
   * e

Before recompiling any of the parts of EFL make sure you add -g to your compile flags. This can be done by running the following command:

export CFLAGS=-g

This will add -g to the compile flags to add debugging symbols. Now for each of the libraries and e do:

make clean distclean
./configure
make
make install

(or whatever variation on this you have - but remember make clean distclean before re-running configure)

Crashing and Debugging

Now everything should have debugging symbols. We can now simply run enlightenment. Try to preform the steps needed to reproduce the crash. You will get the "white box of death" that says E segfaulted. We can now go over to a text console (ctrl+alt+f1) and log in. Now you need to attach gdb to e. find out the process ID of enlightenment as shown below.

ps -auwx | grep enlightenment

now type:

gdb enlightenment PID

Where PID is the process id you found. Gdb will load and stream along for a bit then give you a prompt. You can now debug. First try to use gdb"s backtrace command:

(gdb) bt
#0  0xb7d539f8 in select () from /lib/tls/libc.so.6
#1  0xb7dff66a in _XEnq () from /usr/X11R6/lib/libX11.so.6
#2  0xb7dffa7e in _XRead () from /usr/X11R6/lib/libX11.so.6
#3  0xb7e01795 in _XReadEvents () from /usr/X11R6/lib/libX11.so.6
#4  0xb7defa88 in XNextEvent () from /usr/X11R6/lib/libX11.so.6
#5  0x0809b698 in e_alert_show (
    text=0x80a34f0 "This is very bad. Enlightenment has segfaulted.\nThis 
    is not meant to happen and is likely a 
    sign of a\nbug in Enlightenment
    or the libraries it relies on.\n\nYou can gdb attach to this process 
    now to try"...)
    at e_alert.c:136
#6  0x0808f706 in e_sigseg_act (x=11, info=0x80a9fb0, data=0x80aa030)
    at e_signals.c:54
#7  <signal handler called>
#8  0xb7d539f8 in select () from /lib/tls/libc.so.6
#9  0xb7f814ee in _ecore_main_select (timeout=0) 
    at ecore_main.c:338
#10 0xb7f819ba in _ecore_main_loop_iterate_internal (once_only=0)
    at ecore_main.c:575
#11 xb7f81a2b in ecore_main_loop_begin () at ecore_main.c:79
#12 0x08059bb3 in main (argc=1, argv=0xbffff144) at e_main.c:551

This is the stack trace. It basically means the main() function called ecore_main_loop_begin(), it called _ecore_main_loop_iterate_internal(), and this function called _ecore_main_select(), and that in turn called select() etc.

The important bit here is that E has its own segfault handler - it traps its own problems and tries to let you recover (that's what the white box of death is). Lets take a look at the function that was called:

#6  0x0808f706 in e_sigseg_act (x=11, info=0x80a9fb0, data=0x80aa030)

The e_sigseg_act() function is called when the program segfaults (it is called directly by the kernel interrupting anything e was doing just before it was called - the thing it was doing would have caused the segfault). so that means in this example E segfaulted inside the select() function (frame 7 is an intermediate frame that calls the signal handler).

Next we need to get some more info about this crash. We will now go to the stack frame just before the segfault. In this case its stack frame 8. you want a listing of the code there and some info (so we can double check your code there is what we have here too). the gdb commands you then want are:

   * fr 8 = go to frame 8
   * l = list the source code here
   * p ret = print the value of ret

If you want to get adventurous you should start dumping variable values for us. In this example I can't debug select because its in libc and it is probably not the reason for the crash. We will look to the frame above that, frame 9, to see if any nasty data was being sent to select.

(gdb) fr 9
#9  0xb7f814ee in _ecore_main_select (timeout=0) at ecore_main.c:338
338        ret = select(max_fd + 1, &rfds, &wfds, &exfds, t);
(gdb) l
333               }
334          }
335     #ifndef WIN32
336        if (_ecore_signal_count_get()) return -1;
337     #endif
338        ret = select(max_fd + 1, &rfds, &wfds, &exfds, t);
339        if (ret < 0)
340          {
341             if (errno == EINTR) return -1;
342          }

We can see some variables there and function calls - often variables like pointers may be garbage or NULL and thus causing a segv. We can see what they are using the print (p) command, see the example below:

(gdb) p ret
$1 = -4
(gdb) p rfds
$2 = {__fds_bits = {1280, 0 <repeats 31 times>}}
(gdb) p wfds
$3 = {__fds_bits = {0 <repeats 32 times>}}
(gdb) p exfds
$4 = {__fds_bits = {0 <repeats 32 times>}}

If the variable is a pointer to something printing it will print the pointer value, not what it points to, what it points to is important. To print that we suggest:

p *pointer

Example:

(gdb) fr 5
#5  0x0809b698 in e_alert_show (
    text=0x80a34f0 "This is very bad. Enlightenment has segfaulted.\nThis 
    is not meant to happen and is likely a sign of a\nbug in Enlightenment 
    or the libraries it relies on.\n\nYou can gdb attach to this process 
    now to try"...)
    at e_alert.c:136
136             XNextEvent(dd, &ev);
(gdb) l
131        XSync(dd, False);
132        
133        button = 0;
134        for (; button == 0;)
135          {
136             XNextEvent(dd, &ev);
137             switch (ev.type)
138               {
139               case KeyPress:
140                  key = XKeysymToKeycode(dd, XStringToKeysym("F1")); 
(gdb) p dd
$5 = (Display *) 0x80d1018

As we know its a pointer (Display *) the * means its a pointer to a Display struct/type. The pointer value looks healthy, it is not 0x0 or a very low number, so we can try and look at the data it is pointing to:

(gdb) p *dd
$6 = <incomplete type>

Nevermind, that's xlib's display struct. It's private and we don't know what's inside - BUT all the types e uses(such as Evas_List) inside that it defines will allow you to do this generally.

In general it's a good idea to spend some quality time with gdb and do all this - mail all the output of gdb during one of these "debugging sessions" and then we can sift through it. it may not mean a lot to you, but it means a world to us. Sometimes the stack is screwed and well - nothing you can do. Often this means you need to resort to valgrind to catch things before the stack gets screwed. this gets a bit more intense, BUT you will need to run E under valgrind - allowing gdb to attach.

Finding memory problems with Valgrind

To debug using valgrind enlightenment must be run through valgrind. This can be done by executing valgrind in a console as shown below.

valgrind --tool=memcheck --db-attach=yes enlightenment

You will need an xserver running for it do display on. The console will need to be usable even if the wm is screwed (so another machine sshing in, a text console etc.). Remember, Valgrind is intercepting all memory operations so it will make things very slow. But it is thorough and can find a lot of difficult to find problems. When you get a problem valgrind will spew and then ask if you want to attach gdb. Often you get a harmless one of these once when you start e - about reading uninitialized memory inside XPutImage - ignore this. its harmless. It will be this:

==7072== Syscall param writev(vector[...]) points to uninitialised byte(s)
==7072==    at 0x1BC255E8: (within /lib/tls/libc-2.3.2.so)
==7072==    by 0x1BAC66D6: (within /usr/X11R6/lib/libX11.so.6.2)
==7072==    by 0x1BAC6986: _X11TransWritev (in /usr/X11R6/lib/libX11.so.6.2)
==7072==    by 0x1BAAB03C: _XSend (in /usr/X11R6/lib/libX11.so.6.2)
==7072==    by 0x1BA9EA6B: (within /usr/X11R6/lib/libX11.so.6.2)
==7072==    by 0x1BA9F1D2: XPutImage (in /usr/X11R6/lib/libX11.so.6.2)
==7072==    by 0x1B957459: evas_software_x11_x_output_buffer_paste (evas_x_buffer.c:173)
==7072==    by 0x1B955FEA: evas_software_x11_outbuf_flush (evas_outbuf.c:327)
==7072==    by 0x1B953EB4: evas_engine_software_x11_output_flush (evas_engine.c:417)
==7072==    by 0x1B93A6A4: evas_render_updates (evas_render.c:298)
==7072==    by 0x1B9A0960: _ecore_evas_x_render (ecore_evas_x.c:173)
==7072==    by 0x1B9A1EF8: _ecore_evas_x_idle_enter (ecore_evas_x.c:825)
==7072==  Address 0x1ED603FC is 596 bytes inside a block of size 38912 alloc"d
==7072==    at 0x1B90459D: malloc (vg_replace_malloc.c:130)
==7072==    by 0x1B957200: evas_software_x11_x_output_buffer_new (evas_x_buffer.c:132)
==7072==    by 0x1B955224: evas_software_x11_outbuf_new_region_for_update (evas_outbuf.c:256)
==7072==   by0x1B953DDA:evas_engine_software_x11_output_redraws_next_update_get (evas_engine.c:394)
==7072==    by 0x1B93A355: evas_render_updates (evas_render.c:210)
==7072==    by 0x1B9A0960: _ecore_evas_x_render (ecore_evas_x.c:173)
==7072==    by 0x1B9A1EF8: _ecore_evas_x_idle_enter (ecore_evas_x.c:825)
==7072==    by 0x1B9725E3: _ecore_idle_enterer_call (ecore_idle_enterer.c:78)
==7072==    by 0x1B9746AE: _ecore_main_loop_iterate_internal (ecore_main.c:477)
==7072==    by 0x1B974A2A: ecore_main_loop_begin (ecore_main.c:79)
==7072==    by 0x8059BB2: main (e_main.c:551)
==7072== 
==7072== ---- Attach to debugger ? --- [Return/N/n/Y/y/C/c] ---- 

To ignore the error just say no (n) you may even get it 2 times if you are running multihead. Anything else though is a likely candidate for a problem, when it complains say yes (y) to attach and get us the valgrind AND gdb info (debug in gdb as above).

Valgrind may complain a lot when enlightenment shuts down about problems inside exit() these can also be ignored. They look like this:

==7072== 
==7072== Invalid read of size 4
==7072==    at 0x1BB6B16C: (within /lib/tls/libc-2.3.2.so)
==7072==    by 0x1BB6B58C: (within /lib/tls/libc-2.3.2.so)
==7072==    by 0x1BBE6FF6: (within /lib/tls/libc-2.3.2.so)
==7072==    by 0x1BC61422: (within /lib/tls/libc-2.3.2.so)
==7072==    by 0x1BC61337: (within /lib/tls/libc-2.3.2.so)
==7072==    by 0x1BC616C4: __libc_freeres (in /lib/tls/libc-2.3.2.so)
==7072==    by 0x1B8FEA08: _vgw(float, long double,...)(...)(long double,...)(short) (vg_intercept.c:55)
==7072==    by 0x1BB7F1C5: exit (in /lib/tls/libc-2.3.2.so)
==7072==    by 0x1BB6997D: __libc_start_main (in /lib/tls/libc-2.3.2.so)
==7072==    by 0x8058AE0: ??? (start.S:102)
==7072==  Address 0x1C7BFD98 is 8 bytes inside a block of size 60 free"d
==7072==    at 0x1B904B04: free (vg_replace_malloc.c:152)
==7072==    by 0x1BB6BD37: (within /lib/tls/libc-2.3.2.so)
==7072==    by 0x1BC2A902: (within /lib/tls/libc-2.3.2.so)
==7072==    by 0x1BC2A7A6: tdestroy (in /lib/tls/libc-2.3.2.so)
==7072==    by 0x1BC611C1: (within /lib/tls/libc-2.3.2.so)
==7072==    by 0x1BC616C4: __libc_freeres (in /lib/tls/libc-2.3.2.so)
==7072==    by 0x1B8FEA08: _vgw(float, long double,...)(...)(long double,...)(short) (vg_intercept.c:55)
==7072==    by 0x1BB7F1C5: exit (in /lib/tls/libc-2.3.2.so)
==7072==    by 0x1BB6997D: __libc_start_main (in /lib/tls/libc-2.3.2.so)
==7072==    by 0x8058AE0: ??? (start.S:102)
==7072== 
==7072== ---- Attach to debugger ? --- [Return/N/n/Y/y/C/c] ---- 

If you see this it is valgrind's own internal debugging hooks causing problems.

You may need to run valgrind from a console - many people ask how they can do this and debug a wm. well here is one way, note, you will need root access.

sudo X -ac :1 &
export DISPLAY=:1
valgrind --tool=memcheck --db-attach=yes enlightenment

This will run an empty xserver on :1 and flip to it. You can flip back to your console with ctrl+alt+f1 or where ever the console was. You can flip back to the new xserver with something like ctrl+alt+f8.

Enlightenment will be running (very slowly) under valgrind. Do whatever it is you do to make the bug happen. When e "locks up" and doesn't seem to move (but the mouse does), flip back to the text console where you ran valgrind from and see if it is complaining (as per above).

Reporting bugs to the E17 developers

Use the Bugtracker to report bugs. Don't send bugreports and patches to the Enlightenment mailing list as it strips most patches. Bug reports get lost if they're only discussed on the mailing list. If there is more discussion needed add a bug report before or after the discussion on the mailing list.

Debugging with Anjuta

The article Debugging with Anjuta explains how to debug Enlightenment with help of the Anjuta IDE.