Systems with Intel® Xeon® Processor E5, Intel® Xeon® Processor E5 v2, or Intel® Xeon® Processor E7 v2 and certain versions of Red Hat Enterprise Linux 6 kernels become unresponsive/hung or incur a kernel panic
¶The system becomes unresponsive with processes blocked in the uninterruptible state 'D', or it incurs a kernel panic 'hung_task: blocked tasks'. In very rare circumstances the kernel can also crash due to an attempted divide-by-zero. Please see the Diagnostic Steps section for further details about possible symptoms. The issue occurs if all of the following conditions are met.
- A Red Hat Enterprise Linux 6 kernel that contains this change from Red Hat private Bug 765720 is warm booted (for example, via the
shutdown -r
command):
[sched] x86: Avoid unnecessary overflow in sched_clock
-
¶The kernel is warm booted on a machine with any of the Intel® Xeon® E5, Intel® Xeon® E5 v2, or Intel® Xeon® E7 v2 series processors.
-
¶The kernel is warm booted on a machine that has not been power cycled (hard reset) for a long time (typically more than ~200 days).
¶Notice that this does not mean that a kernel is affected if it has more than ~200 days uptime. It is the warm boot after ~200 days of 'hardware uptime' that actually triggers the issue. The issue occurs at a random point in time after the warm boot, typically within the range of a few minutes to a few hours.
¶KVM guests (on RHEL KVM hosts or RHEV-H hypervisors) that configure KVM clock as their clock source by default are not affected by the issue. For other virtualization platforms, please contact the platform vendor.
¶Red Hat Enterprise Linux 5 kernels that are based on upstream kernel version 2.6.18 are not affected by the issue.
¶Please see the Environment section for details about the versions of the Red Hat Enterprise Linux 6 kernel that are prone to the issue.
환경
- Red Hat Enterprise Linux 6.1 (
kernel-2.6.32-131.26.1.el6
and newer) - Red Hat Enterprise Linux 6.2 (
kernel-2.6.32-220.4.2.el6
and newer) - Red Hat Enterprise Linux 6.3 (
kernel-2.6.32-279
series) - Red Hat Enterprise Linux 6.4 (
kernel-2.6.32-358
series) - Any Intel® Xeon® E5, Intel® Xeon® E5 v2, or Intel® Xeon® E7 v2 series processor
- The issue has been observed in the following environments with 64-bit kernels. Notice that 32-bit kernels of the above mentioned versions are prone to the issue too.
RHEL6.2 kernel version | CPU model
---------------------------|------------------------------------------
2.6.32-220.42.1.el6.x86_64 | Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
RHEL6.3 kernel version | CPU model
---------------------------|------------------------------------------
2.6.32-279.19.1.el6.x86_64 | Intel(R) Xeon(R) CPU E5-2440 0 @ 2.40GHz
2.6.32-279.22.1.el6.x86_64 | Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
2.6.32-279.22.1.el6.x86_64 | Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
RHEL6.4 kernel version | CPU model
---------------------------|------------------------------------------
2.6.32-358.el6.x86_64 | Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
2.6.32-358.0.1.el6.x86_64 | Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
2.6.32-358.6.1.el6.x86_64 | Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
2.6.32-358.6.2.el6.x86_64 | Intel(R) Xeon(R) CPU E5-2650L 0 @ 1.80GHz
2.6.32-358.6.2.el6.x86_64 | Intel(R) Xeon(R) CPU E5-2603 0 @ 1.80GHz
2.6.32-358.18.1.el6.x86_64 | Intel(R) Xeon(R) CPU E5-4617 0 @ 2.90GHz
2.6.32-358.18.1.el6.x86_64 | Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz
해결
¶This issue is addressed in the following kernel updates:
- RHEL 6.5 -
kernel-2.6.32-431.el6
.
This package is available via Errata RHSA-2013:1645. The related Red Hat Private Bug is 975507. - RHEL 6.4.z EUS -
kernel-2.6.32-358.23.2.el6
.
This package is available via Errata RHSA-2013:1436. The related Red Hat Private Bug is 1001954. - RHEL 6.3.z EUS -
kernel-2.6.32-279.37.2.el6
.
This package is available via Errata RHSA-2013:1450. The related Red Hat Private Bug is 1004185. - RHEL 6.2.z EUS -
kernel-2.6.32-220.45.1.el6
.
This package is available via Errata RHSA-2013:1519. The related Red Hat Private Bug is 1024453.
근본 원인
¶On Intel® Xeon® Processor E5 Family 6 Model 45 (also known as SandyBridge), the Time Stamp Counter (TSC) is not cleared by a warm reset. This is documented in the Intel® Xeon® Processor E5 Family Specification Update as erratum BT81.
¶On Intel® Xeon® Processor E5 v2 Family 6 Model 62 (also known as IvyBridge), the Time Stamp Counter (TSC) is not cleared by a warm reset. This is documented in the Intel® Xeon® Processor E5 v2 Family Specification Update as erratum CA105.
¶On Intel® Xeon® Processor E7 v2 Family 6 Model 62 (also known as IvyBridge-EX), the Time Stamp Counter (TSC) is not cleared by a warm reset. This is documented in the Intel® Xeon® E7-2800/4800/8800 v2 Product Family Specification Update as erratum CF101.
¶These processor errata can adversely affect all versions of Red Hat Enterprise Linux 6 kernels which contain the following change:
[sched] x86: Avoid unnecessary overflow in sched_clock (...) [765720]
¶This change requires that the TSC is cleared at the time when the system boots. Otherwise the values in the kernel's cyc2ns_offset
table that are relevant to scheduling are not initialized correctly on systems that have not been power cycled (hard reset) for a long time, which is typically longer than ~200 days. The incorrect values in this table can cause various symptoms mentioned under Issue and under Diagnostic Steps.
¶The following upstream commits have been identified as the resolution to work around these processor errata:
2353b47bffe4e6ab39042f470c55d41bb3ff3846
Round the calculated scale factor in set_cyc2ns_scale()
9993bc635d01a6ee7f6b833b4ee65ce7c06350b1
sched/x86: Fix overflow in cyc2ns_offset
¶KVM guests (on RHEL KVM hosts or RHEV-H hypervisors) that configure KVM clock as their clock source by default are not affected by the issue because they do not depend on the correctness of the values in the kernel's cyc2ns_offset
table.
¶On other virtualization platforms the issue may occur or may not occur, depending on the TSC value that the hypervisor emulates/presents to the virtual machine after a warm boot of the guest kernel.
¶Red Hat Enterprise Linux 5 kernels that are based on upstream kernel version 2.6.18 are not affected by the issue because the cyc2ns_offset
table does not exist in these kernels.
진단 단계
- Examine
/proc/cpuinfo
. Look for CPU family, model and model name similar to the following examples.
- example of SandyBridge processor
...
cpu family : 6
model : 45
model name : Intel(R) Xeon(R) CPU E5-2650L 0 @ 1.80GHz
...
- example of IvyBridge Processor
...
cpu family : 6
model : 62
model name : Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz
...
¶The following combination of symptoms is known to be typical of this issue.
- A system that is affected by this issue may log a set of messages similar to following on the console and in
/var/log/messages
. Notice thedo_execve()
,sched_exec()
andwait_for_completion()
functions in the call trace.
INFO: task bash:12543 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
bash D 0000000000000012 0 12543 12542 0x00000084
ffff880c343b3ce8 0000000000000082 ffff880c343b3d98 ffffffffffffffe9
ffff880c343b3c88 ffffffffa00c9129 ffff880c343f4aa0 0000010100000015
ffff880c343f5058 ffff880c343b3fd8 000000000000fb88 ffff880c343f5058
Call Trace:
[<ffffffffa00c9129>] ? ext4_check_acl+0x29/0x90 [ext4]
[<ffffffffa008fbf0>] ? ext4_file_open+0x0/0x130 [ext4]
[<ffffffff8150ea05>] schedule_timeout+0x215/0x2e0
[<ffffffff8117e514>] ? nameidata_to_filp+0x54/0x70
[<ffffffff81277379>] ? cpumask_next_and+0x29/0x50
[<ffffffff8150e683>] wait_for_common+0x123/0x180
[<ffffffff81063310>] ? default_wake_function+0x0/0x20
[<ffffffff8150e79d>] wait_for_completion+0x1d/0x20
[<ffffffff8106513c>] sched_exec+0xdc/0xe0
[<ffffffff8118a0a0>] do_execve+0xe0/0x2c0
[<ffffffff810095ea>] sys_execve+0x4a/0x80
[<ffffffff8100b4ca>] stub_execve+0x6a/0xc0
¶If a vmcore (crash dump) has been captured at the time when the system was unresponsive or when it incurred a kernel panic 'hung_task: blocked tasks', use the crash
utility to examine the run queues and the kernel's cyc2ns_offset
table.
- At least one of the realtime priority run queues will include a
migration
thread that cannot be scheduled because the run queue is throttled. Thetask ... blocked for more than ... seconds
message shown above is a side-effect of this since the blocked task is waiting for services of themigration
thread.
crash> runq
...
CPU 1 RUNQUEUE: ffff88002be36700
CURRENT: PID: 0 TASK: ffff88013d523540 COMMAND: "swapper"
RT PRIO_ARRAY: ffff88002be36888
[ 0] PID: 7 TASK: ffff88013d905500 COMMAND: "migration/1"
[ 0] PID: 10 TASK: ffff88013d522ae0 COMMAND: "watchdog/1"
...
crash> pd ((struct rq *)0xffff88002be36700)->rt.rt_throttled
$1 = 1
- The
cyc2ns_offset
table entry pertaining to CPU0 is different from the remaining table entries. It contains the valuefff
in the upper 12 bits whereas the remaining entries typically contain003
(10 bits cleared).
crash> px cyc2ns_offset
PER-CPU DATA TYPE:
unsigned long long per_cpu__cyc2ns_offset;
PER-CPU ADDRESSES:
[0]: ffff88002be0cb40
[1]: ffff88002be2cb40
[2]: ffff88002be4cb40
[3]: ffff88002be6cb40
crash> rd -x 0xffff88002be0cb40
ffff88002be0cb40: fffa751c3c9e4b76
crash> rd -x 0xffff88002be2cb40
ffff88002be2cb40: 003a751c3c9e4b76
crash> rd -x 0xffff88002be4cb40
ffff88002be4cb40: 003a751c3c9e4b76
crash> rd -x 0xffff88002be6cb40
ffff88002be6cb40: 003a751c3c9e4b76
¶In very rare circumstances, a divide-by-zero crash in find_busiest_group() can occur even though RHEL6.3 and RHEL6.4 kernels have a patch from Red Hat private Bug 785959 to prevent most cases of this divide-by-zero.
PID: 0 TASK: ffff881034a45500 CPU: 5 COMMAND: "swapper"
#0 [ffff8800456a38f0] machine_kexec at ffffffff81035d6b
#1 [ffff8800456a3950] crash_kexec at ffffffff810c0d42
#2 [ffff8800456a3a20] oops_end at ffffffff81511870
#3 [ffff8800456a3a50] die at ffffffff8100f19b
#4 [ffff8800456a3a80] do_trap at ffffffff815110d4
#5 [ffff8800456a3ae0] do_divide_error at ffffffff8100cf7f
#6 [ffff8800456a3b80] divide_error at ffffffff8100bdfb
[exception RIP: find_busiest_group+1372]
RIP: ffffffff81059abc RSP: ffff8800456a3c30 RFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8800456a3e34 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880045616700
RBP: ffff8800456a3da0 R8: 0000000000000000 R9: 0000000000000040
R10: 0000000000000000 R11: 0000000000000000 R12: 00000000ffffff01
R13: 0000000000016700 R14: ffffffffffffffff R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#7 [ffff8800456a3da8] rebalance_domains at ffffffff81063536
#8 [ffff8800456a3e78] run_rebalance_domains at ffffffff81063a1c
#9 [ffff8800456a3ec8] __do_softirq at ffffffff81076fd1
#10 [ffff8800456a3f38] call_softirq at ffffffff8100c1cc
#11 [ffff8800456a3f50] do_softirq at ffffffff8100de05
#12 [ffff8800456a3f70] irq_exit at ffffffff81076db5
#13 [ffff8800456a3f80] scheduler_ipi at ffffffff8105b3de
#14 [ffff8800456a3fa0] smp_reschedule_interrupt at ffffffff8102df6a
#15 [ffff8800456a3fb0] reschedule_interrupt at ffffffff8100bd73
--- <IRQ stack> ---
...