path: root/kernel
diff options
authorEric W. Biederman <>2019-09-14 07:33:58 -0500
committerIngo Molnar <>2019-09-25 17:42:29 +0200
commit0ff7b2cfbae36ebcd216c6a5ad7f8534eebeaee2 (patch)
tree473542fd7ff0da6ac1822ce92c54a5ccd16760a0 /kernel
parent3fbd7ee285b2bbc6eebd15a3c8786d9776a402a8 (diff)
tasks, sched/core: Ensure tasks are available for a grace period after leaving the runqueue
In the ordinary case today the RCU grace period for a task_struct is triggered when another process wait's for it's zombine and causes the kernel to call release_task(). As the waiting task has to receive a signal and then act upon it before this happens, typically this will occur after the original task as been removed from the runqueue. Unfortunaty in some cases such as self reaping tasks it can be shown that release_task() will be called starting the grace period for task_struct long before the task leaves the runqueue. Therefore use put_task_struct_rcu_user() in finish_task_switch() to guarantee that the there is a RCU lifetime after the task leaves the runqueue. Besides the change in the start of the RCU grace period for the task_struct this change may cause perf_event_delayed_put and trace_sched_process_free. The function perf_event_delayed_put boils down to just a WARN_ON for cases that I assume never show happen. So I don't see any problem with delaying it. The function trace_sched_process_free is a trace point and thus visible to user space. Occassionally userspace has the strangest dependencies so this has a miniscule chance of causing a regression. This change only changes the timing of when the tracepoint is called. The change in timing arguably gives userspace a more accurate picture of what is going on. So I don't expect there to be a regression. In the case where a task self reaps we are pretty much guaranteed that the RCU grace period is delayed. So we should get quite a bit of coverage in of this worst case for the change in a normal threaded workload. So I expect any issues to turn up quickly or not at all. I have lightly tested this change and everything appears to work fine. Inspired-by: Linus Torvalds <> Inspired-by: Oleg Nesterov <> Signed-off-by: Eric W. Biederman <> Signed-off-by: Peter Zijlstra (Intel) <> Cc: Chris Metcalf <> Cc: Christoph Lameter <> Cc: Davidlohr Bueso <> Cc: Kirill Tkhai <> Cc: Linus Torvalds <> Cc: Mike Galbraith <> Cc: Paul E. McKenney <> Cc: Peter Zijlstra <> Cc: Russell King - ARM Linux admin <> Cc: Thomas Gleixner <> Link: Signed-off-by: Ingo Molnar <>
Diffstat (limited to 'kernel')
2 files changed, 8 insertions, 5 deletions
diff --git a/kernel/fork.c b/kernel/fork.c
index 7eefe338d7a2..d6e552552e52 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -902,10 +902,13 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
if (orig->cpus_ptr == &orig->cpus_mask)
tsk->cpus_ptr = &tsk->cpus_mask;
- /* One for the user space visible state that goes away when reaped. */
- refcount_set(&tsk->rcu_users, 1);
- /* One for the rcu users, and one for the scheduler */
- refcount_set(&tsk->usage, 2);
+ /*
+ * One for the user space visible state that goes away when reaped.
+ * One for the scheduler.
+ */
+ refcount_set(&tsk->rcu_users, 2);
+ /* One for the rcu users */
+ refcount_set(&tsk->usage, 1);
tsk->btrace_seq = 0;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 06961b997ed6..5e5fefbd4cc7 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3254,7 +3254,7 @@ static struct rq *finish_task_switch(struct task_struct *prev)
/* Task is done with its stack. */
- put_task_struct(prev);
+ put_task_struct_rcu_user(prev);