[BUG] NFS client kernel panic in rpciod

2014. 4. 24. 15:24

Issue:

NFS client kernel crash because async task already queued hitting BUG_ON(RPC_IS_QUEUED(task)); in __rpc_execute
A second panic is similar in that rpciod thread panics, but at a different place, hitting kernel BUG at kernel/workqueue.c which is the following BUG_ON(get_wq_data(work) != cwq);. Also, prior to the oops, we see some warnings about list corruption, triggered from a __list_add called from xprt_reserve_xprt. Based on the location in the code, the list corruption is being flagged on the rpc_xprt's 'sending' or 'resend' queue
See this kbase article for more details.

Environment:

Resolution:

A fix is still being developed. Test kernels are available. Please contact your support representative for more information.

Root Cause:

Because of a race condition or use after free, it is possible the rpc_task.tk_runstate 'RPC_TASK_QUEUED' bit can get set incorrectly on an rpc_task.
Ultimately one of the following kernel crashes will result:
1. rpciod thread crashes with kernel BUG at net/sunrpc/sched.c seen in the log with RIP inside __rpc_execute. The specific BUG_ON is BUG_ON(RPC_IS_QUEUED(task))
2. rpciod thread crashes with kernel BUG at kernel/workqueue.c seen in the log with RIP worker_thread. The specific BUG_ON statement is BUG_ON(get_wq_data(work) != cwq
3. A kernel crash results because of the corruption of the rpc_task.u union. In the rpc_task.u union corruption instance, simultaneous use of both the 'tk_work' and 'tk_wait' members of the union leads to either a corrupt rpc_wait_queue (the 'tk_work' member is initialized, but the 'tk_wait' member is accessed, often seen as a corrupt rpc_xprt's pending, sending, or resend queue), or a corrupt workqueue_struct (the 'tk_wait' member is initialized, but the 'tk_work' member is accessed, often seen as a corrupt rpciod workqueue_struct).

[BUG] certain versions of Red Hat Enterprise Linux 6 kernels become unresponsive/hung or incur a kernel panic (0)	2014.04.24
[BUG] Filesystem corruption: "ext3_new_block: Allocating block in system zone" (0)	2014.04.24
[BUG] RHEV guests hanging and crashing (0)	2014.04.24
[BUG] Problem with Hot-adding memory in VMWare RHEL guest (0)	2014.04.24
[BUG] Kernel panic in gfs2_inplace_reserve (0)	2014.04.24
[BUG] GFS2 filesystem withdraw (0)	2014.04.24
[BUG] NFSv4 flock regression (0)	2014.04.24
[BUG] Hot-adding memory in VMWare RHEL guest does not work. (0)	2014.04.24
[BUG] Kernel Panic with message "kernel BUG at net/sunrpc/sched.c:655!" (0)	2014.04.24
[BUG] Kernel panic in libfc code (0)	2014.04.24

TOP GUN