NFS client kernel panic in rpciod
Issue:
- NFS client kernel crash because async task already queued hitting BUG_ON(RPC_IS_QUEUED(task)); in __rpc_execute
- A second panic is similar in that rpciod thread panics, but at a different place, hitting kernel BUG at kernel/workqueue.c which is the following BUG_ON(get_wq_data(work) != cwq);. Also, prior to the oops, we see some warnings about list corruption, triggered from a __list_add called from xprt_reserve_xprt. Based on the location in the code, the list corruption is being flagged on the rpc_xprt's 'sending' or 'resend' queue
- See this kbase article for more details.
- NFS 4 Client and Server are RHEL 6.2 (kernel 2.6.32-220.el6.x86_64)
A fix is still being developed. Test kernels are available. Please contact your support representative for more information.
Root Cause:
- Because of a race condition or use after free, it is possible the rpc_task.tk_runstate 'RPC_TASK_QUEUED' bit can get set incorrectly on an rpc_task.
- Ultimately one of the following kernel crashes will result:
- rpciod thread crashes with kernel BUG at net/sunrpc/sched.c seen in the log with RIP inside __rpc_execute. The specific BUG_ON is BUG_ON(RPC_IS_QUEUED(task))
- rpciod thread crashes with kernel BUG at kernel/workqueue.c seen in the log with RIP worker_thread. The specific BUG_ON statement is BUG_ON(get_wq_data(work) != cwq
- A kernel crash results because of the corruption of the rpc_task.u union. In the rpc_task.u union corruption instance, simultaneous use of both the 'tk_work' and 'tk_wait' members of the union leads to either a corrupt rpc_wait_queue (the 'tk_work' member is initialized, but the 'tk_wait' member is accessed, often seen as a corrupt rpc_xprt's pending, sending, or resend queue), or a corrupt workqueue_struct (the 'tk_wait' member is initialized, but the 'tk_work' member is accessed, often seen as a corrupt rpciod workqueue_struct).
'OS > RedHat Bug Report' 카테고리의 다른 글
[BUG] certain versions of Red Hat Enterprise Linux 6 kernels become unresponsive/hung or incur a kernel panic (0) | 2014.04.24 |
---|---|
[BUG] Filesystem corruption: "ext3_new_block: Allocating block in system zone" (0) | 2014.04.24 |
[BUG] RHEV guests hanging and crashing (0) | 2014.04.24 |
[BUG] Problem with Hot-adding memory in VMWare RHEL guest (0) | 2014.04.24 |
[BUG] Kernel panic in gfs2_inplace_reserve (0) | 2014.04.24 |
[BUG] GFS2 filesystem withdraw (0) | 2014.04.24 |
[BUG] NFSv4 flock regression (0) | 2014.04.24 |
[BUG] Hot-adding memory in VMWare RHEL guest does not work. (0) | 2014.04.24 |
[BUG] Kernel Panic with message "kernel BUG at net/sunrpc/sched.c:655!" (0) | 2014.04.24 |
[BUG] Kernel panic in libfc code (0) | 2014.04.24 |