test: Add test showing blocking problem when updating timers

I've noticed a blocking problem in Wayland's event-loop code when updating timer event sources. The problem occurs if you update the timer at a point after is has expired, but before it has been dispatched, i.e. from an event callback that happens during the same epoll wakeup. When the timer is subsequently dispatched, wl_event_source_timer_dispatch blocks for the duration of the new timeout in its call to read() from the timer fd (which is the expected behaviour according to the man page for timerfd_settime). This isn't too uncommon a scenario - for example, a socket with an associated timeout timer. You'd typically want to update the timer when reading from the socket. This is how I noticed the issue, since I was setting a timeout of 1 minute, and saw my server blocking for this duration! The following patch adds a (currently failing) test case to Wayland's event-loop-test.c. It demonstrates the problem using two timers, which are set to expire at the same time. The first timer to receive its expiry callback updates the other timer with a much larger timeout, which then causes the test to block for this timeout before calling the second timer's callback. As for a fix, I'm not so sure (which is why I thought I'd post the failing test case first to show what I mean). I notice that it doesn't actually do anything with the value read from the timerfd socket, which gives the number of times the timer expired since the last read, or when the timer was last updated (which blocks if the timer hasn't yet expired). I believe this value should always read as 1 anyway, since we don't use periodic timers. A simple fix would be to use the TFD_NONBLOCK option when creating the timerfd, ensuring that the read call won't block. We'd then have to ignore the case when the read returns EAGAIN.
author: Andrew Wedgbury <andrew.wedgbury@realvnc.com> 2014-04-25 11:36:57 +0100
committer: Kristian Høgsberg <krh@bitplanet.net> 2014-04-25 14:08:31 -0700
commit: 74df22befefa7e4ac2bd86ea0472347175b95ad1 (patch)
tree: 802cec49a34f57c11766e5ee8127a6d90cb0cd24
parent: 5535757a40e207c81460c4119097a697a245f4c5 (diff)
1 files changed, 74 insertions, 0 deletions
diff --git a/tests/event-loop-test.c b/tests/event-loop-test.c
index f250ed0..1218163 100644
--- a/tests/event-loop-test.c
+++ b/tests/event-loop-test.c
@@ -25,6 +25,7 @@
 #include <assert.h>
 #include <unistd.h>
 #include <signal.h>
+#include <sys/time.h>
 
 #include "wayland-private.h"
 #include "wayland-server.h"
@@ -199,6 +200,79 @@ TEST(event_loop_timer)
 	wl_event_loop_destroy(loop);
 }
 
+struct timer_update_context {
+	struct wl_event_source *source1, *source2;
+	int count;
+};
+
+static int
+timer_update_callback_1(void *data)
+{
+	struct timer_update_context *context = data;
+
+	context->count++;
+	wl_event_source_timer_update(context->source2, 1000);
+	return 1;
+}
+
+static int
+timer_update_callback_2(void *data)
+{
+	struct timer_update_context *context = data;
+
+	context->count++;
+	wl_event_source_timer_update(context->source1, 1000);
+	return 1;
+}
+
+TEST(event_loop_timer_updates)
+{
+	struct wl_event_loop *loop = wl_event_loop_create();
+	struct wl_event_source *source;
+	struct timer_update_context context;
+	struct timeval start_time, end_time, interval;
+
+	/* Create two timers that should expire at the same time (after 10ms).
+	 * The first timer to receive its expiry callback updates the other timer
+	 * with a much larger timeout (1s). This highlights a bug where
+	 * wl_event_source_timer_dispatch would block for this larger timeout
+	 * when reading from the timer fd, before calling the second timer's
+	 * callback.
+	 */
+
+	context.source1 = wl_event_loop_add_timer(loop, timer_update_callback_1,
+											  &context);
+	assert(context.source1);
+	wl_event_source_timer_update(context.source1, 10);
+
+	context.source2 = wl_event_loop_add_timer(loop, timer_update_callback_2,
+											  &context);
+	assert(context.source2);
+	wl_event_source_timer_update(context.source2, 10);
+
+	context.count = 0;
+
+	gettimeofday(&start_time, NULL);
+	wl_event_loop_dispatch(loop, 20);
+	gettimeofday(&end_time, NULL);
+
+	assert(context.count == 2);
+
+	/* Dispatching the events should not have taken much more than 20ms,
+	 * since this is the timeout passed to wl_event_loop_dispatch. If it
+	 * blocked, then it will have taken over 1s.
+	 * Of course, it could take over 1s anyway on a very slow or heavily
+	 * loaded system, so this test isn't 100% perfect.
+	 */
+
+	timersub(&end_time, &start_time, &interval);
+	assert(interval.tv_sec < 1);
+
+	wl_event_source_remove(context.source1);
+	wl_event_source_remove(context.source2);
+	wl_event_loop_destroy(loop);
+}
+
 struct event_loop_destroy_listener {
 	struct wl_listener listener;
 	int done;
author	Andrew Wedgbury <andrew.wedgbury@realvnc.com>	2014-04-25 11:36:57 +0100
committer	Kristian Høgsberg <krh@bitplanet.net>	2014-04-25 14:08:31 -0700
commit	74df22befefa7e4ac2bd86ea0472347175b95ad1 (patch)
tree	802cec49a34f57c11766e5ee8127a6d90cb0cd24
parent	5535757a40e207c81460c4119097a697a245f4c5 (diff)