Systems with Intel® Xeon® Processor E5, Intel® Xeon® Processor E5 v2, or Intel® Xeon® Processor E7 v2 and certain versions of Red Hat Enterprise Linux 6 kernels become unresponsive/hung or incur a kernel panic

09시 29분 금요일 업데이트

The system becomes unresponsive with processes blocked in the uninterruptible state 'D', or it incurs a kernel panic 'hung_task: blocked tasks'. In very rare circumstances the kernel can also crash due to an attempted divide-by-zero. Please see the Diagnostic Steps section for further details about possible symptoms. The issue occurs if all of the following conditions are met.

  • A Red Hat Enterprise Linux 6 kernel that contains this change from Red Hat private Bug 765720 is warm booted (for example, via the shutdown -r command):
[sched] x86: Avoid unnecessary overflow in sched_clock
  • The kernel is warm booted on a machine with any of the Intel® Xeon® E5, Intel® Xeon® E5 v2, or Intel® Xeon® E7 v2 series processors.

  • The kernel is warm booted on a machine that has not been power cycled (hard reset) for a long time (typically more than ~200 days).

Notice that this does not mean that a kernel is affected if it has more than ~200 days uptime. It is the warm boot after ~200 days of 'hardware uptime' that actually triggers the issue. The issue occurs at a random point in time after the warm boot, typically within the range of a few minutes to a few hours.

KVM guests (on RHEL KVM hosts or RHEV-H hypervisors) that configure KVM clock as their clock source by default are not affected by the issue. For other virtualization platforms, please contact the platform vendor.

Red Hat Enterprise Linux 5 kernels that are based on upstream kernel version 2.6.18 are not affected by the issue.

Please see the Environment section for details about the versions of the Red Hat Enterprise Linux 6 kernel that are prone to the issue.

환경

  • Red Hat Enterprise Linux 6.1 (kernel-2.6.32-131.26.1.el6 and newer)
  • Red Hat Enterprise Linux 6.2 (kernel-2.6.32-220.4.2.el6 and newer)
  • Red Hat Enterprise Linux 6.3 (kernel-2.6.32-279 series)
  • Red Hat Enterprise Linux 6.4 (kernel-2.6.32-358 series)
  • Any Intel® Xeon® E5, Intel® Xeon® E5 v2, or Intel® Xeon® E7 v2 series processor
  • The issue has been observed in the following environments with 64-bit kernels. Notice that 32-bit kernels of the above mentioned versions are prone to the issue too.
RHEL6.2 kernel version     | CPU model
---------------------------|------------------------------------------
2.6.32-220.42.1.el6.x86_64 | Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz

RHEL6.3 kernel version     | CPU model
---------------------------|------------------------------------------
2.6.32-279.19.1.el6.x86_64 | Intel(R) Xeon(R) CPU E5-2440 0 @ 2.40GHz
2.6.32-279.22.1.el6.x86_64 | Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
2.6.32-279.22.1.el6.x86_64 | Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz

RHEL6.4 kernel version     | CPU model
---------------------------|------------------------------------------
2.6.32-358.el6.x86_64      | Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
2.6.32-358.0.1.el6.x86_64  | Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
2.6.32-358.6.1.el6.x86_64  | Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
2.6.32-358.6.2.el6.x86_64  | Intel(R) Xeon(R) CPU E5-2650L 0 @ 1.80GHz
2.6.32-358.6.2.el6.x86_64  | Intel(R) Xeon(R) CPU E5-2603 0 @ 1.80GHz
2.6.32-358.18.1.el6.x86_64 | Intel(R) Xeon(R) CPU E5-4617 0 @ 2.90GHz
2.6.32-358.18.1.el6.x86_64 | Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz

해결

This issue is addressed in the following kernel updates:

  • RHEL 6.5 - kernel-2.6.32-431.el6.
    This package is available via Errata RHSA-2013:1645. The related Red Hat Private Bug is 975507.
  • RHEL 6.4.z EUS - kernel-2.6.32-358.23.2.el6.
    This package is available via Errata RHSA-2013:1436. The related Red Hat Private Bug is 1001954.
  • RHEL 6.3.z EUS - kernel-2.6.32-279.37.2.el6.
    This package is available via Errata RHSA-2013:1450. The related Red Hat Private Bug is 1004185.
  • RHEL 6.2.z EUS - kernel-2.6.32-220.45.1.el6.
    This package is available via Errata RHSA-2013:1519. The related Red Hat Private Bug is 1024453.

근본 원인

On Intel® Xeon® Processor E5 Family 6 Model 45 (also known as SandyBridge), the Time Stamp Counter (TSC) is not cleared by a warm reset. This is documented in the Intel® Xeon® Processor E5 Family Specification Update as erratum BT81.

On Intel® Xeon® Processor E5 v2 Family 6 Model 62 (also known as IvyBridge), the Time Stamp Counter (TSC) is not cleared by a warm reset. This is documented in the Intel® Xeon® Processor E5 v2 Family Specification Update as erratum CA105.

On Intel® Xeon® Processor E7 v2 Family 6 Model 62 (also known as IvyBridge-EX), the Time Stamp Counter (TSC) is not cleared by a warm reset. This is documented in the Intel® Xeon® E7-2800/4800/8800 v2 Product Family Specification Update as erratum CF101.

These processor errata can adversely affect all versions of Red Hat Enterprise Linux 6 kernels which contain the following change:

[sched] x86: Avoid unnecessary overflow in sched_clock (...) [765720]

This change requires that the TSC is cleared at the time when the system boots. Otherwise the values in the kernel's cyc2ns_offset table that are relevant to scheduling are not initialized correctly on systems that have not been power cycled (hard reset) for a long time, which is typically longer than ~200 days. The incorrect values in this table can cause various symptoms mentioned under Issue and under Diagnostic Steps.

The following upstream commits have been identified as the resolution to work around these processor errata:

2353b47bffe4e6ab39042f470c55d41bb3ff3846
Round the calculated scale factor in set_cyc2ns_scale()

9993bc635d01a6ee7f6b833b4ee65ce7c06350b1
sched/x86: Fix overflow in cyc2ns_offset

KVM guests (on RHEL KVM hosts or RHEV-H hypervisors) that configure KVM clock as their clock source by default are not affected by the issue because they do not depend on the correctness of the values in the kernel's cyc2ns_offset table.

On other virtualization platforms the issue may occur or may not occur, depending on the TSC value that the hypervisor emulates/presents to the virtual machine after a warm boot of the guest kernel.

Red Hat Enterprise Linux 5 kernels that are based on upstream kernel version 2.6.18 are not affected by the issue because the cyc2ns_offset table does not exist in these kernels.

진단 단계

  • Examine /proc/cpuinfo. Look for CPU family, model and model name similar to the following examples.
- example of SandyBridge processor
  ...
  cpu family      : 6
  model           : 45
  model name      : Intel(R) Xeon(R) CPU E5-2650L 0 @ 1.80GHz
  ...

- example of IvyBridge Processor
  ...
  cpu family      : 6
  model           : 62
  model name      : Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz
  ...

The following combination of symptoms is known to be typical of this issue.

  • A system that is affected by this issue may log a set of messages similar to following on the console and in /var/log/messages. Notice the do_execve(), sched_exec() and wait_for_completion() functions in the call trace.
INFO: task bash:12543 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
bash          D 0000000000000012     0 12543  12542 0x00000084
ffff880c343b3ce8 0000000000000082 ffff880c343b3d98 ffffffffffffffe9
ffff880c343b3c88 ffffffffa00c9129 ffff880c343f4aa0 0000010100000015
ffff880c343f5058 ffff880c343b3fd8 000000000000fb88 ffff880c343f5058
Call Trace:
[<ffffffffa00c9129>] ? ext4_check_acl+0x29/0x90 [ext4]
[<ffffffffa008fbf0>] ? ext4_file_open+0x0/0x130 [ext4]
[<ffffffff8150ea05>] schedule_timeout+0x215/0x2e0
[<ffffffff8117e514>] ? nameidata_to_filp+0x54/0x70
[<ffffffff81277379>] ? cpumask_next_and+0x29/0x50
[<ffffffff8150e683>] wait_for_common+0x123/0x180
[<ffffffff81063310>] ? default_wake_function+0x0/0x20
[<ffffffff8150e79d>] wait_for_completion+0x1d/0x20
[<ffffffff8106513c>] sched_exec+0xdc/0xe0
[<ffffffff8118a0a0>] do_execve+0xe0/0x2c0
[<ffffffff810095ea>] sys_execve+0x4a/0x80
[<ffffffff8100b4ca>] stub_execve+0x6a/0xc0

If a vmcore (crash dump) has been captured at the time when the system was unresponsive or when it incurred a kernel panic 'hung_task: blocked tasks', use the crash utility to examine the run queues and the kernel's cyc2ns_offset table.

  • At least one of the realtime priority run queues will include a migration thread that cannot be scheduled because the run queue is throttled. The task ... blocked for more than ... seconds message shown above is a side-effect of this since the blocked task is waiting for services of the migration thread.
crash> runq
...
CPU 1 RUNQUEUE: ffff88002be36700
  CURRENT: PID: 0      TASK: ffff88013d523540  COMMAND: "swapper"
  RT PRIO_ARRAY: ffff88002be36888
     [  0] PID: 7      TASK: ffff88013d905500  COMMAND: "migration/1"
     [  0] PID: 10     TASK: ffff88013d522ae0  COMMAND: "watchdog/1"
...
crash> pd ((struct rq *)0xffff88002be36700)->rt.rt_throttled
$1 = 1
  • The cyc2ns_offset table entry pertaining to CPU0 is different from the remaining table entries. It contains the value fff in the upper 12 bits whereas the remaining entries typically contain 003 (10 bits cleared).
crash> px cyc2ns_offset
PER-CPU DATA TYPE:
  unsigned long long per_cpu__cyc2ns_offset;
PER-CPU ADDRESSES:
  [0]: ffff88002be0cb40
  [1]: ffff88002be2cb40
  [2]: ffff88002be4cb40
  [3]: ffff88002be6cb40

crash> rd -x 0xffff88002be0cb40
ffff88002be0cb40:  fffa751c3c9e4b76
crash> rd -x 0xffff88002be2cb40
ffff88002be2cb40:  003a751c3c9e4b76
crash> rd -x 0xffff88002be4cb40
ffff88002be4cb40:  003a751c3c9e4b76
crash> rd -x 0xffff88002be6cb40
ffff88002be6cb40:  003a751c3c9e4b76

In very rare circumstances, a divide-by-zero crash in find_busiest_group() can occur even though RHEL6.3 and RHEL6.4 kernels have a patch from Red Hat private Bug 785959 to prevent most cases of this divide-by-zero.

PID: 0      TASK: ffff881034a45500  CPU: 5   COMMAND: "swapper"
 #0 [ffff8800456a38f0] machine_kexec at ffffffff81035d6b
 #1 [ffff8800456a3950] crash_kexec at ffffffff810c0d42
 #2 [ffff8800456a3a20] oops_end at ffffffff81511870
 #3 [ffff8800456a3a50] die at ffffffff8100f19b
 #4 [ffff8800456a3a80] do_trap at ffffffff815110d4
 #5 [ffff8800456a3ae0] do_divide_error at ffffffff8100cf7f
 #6 [ffff8800456a3b80] divide_error at ffffffff8100bdfb
    [exception RIP: find_busiest_group+1372]
    RIP: ffffffff81059abc  RSP: ffff8800456a3c30  RFLAGS: 00010246
    RAX: 0000000000000000  RBX: ffff8800456a3e34  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: 0000000000000000  RDI: ffff880045616700
    RBP: ffff8800456a3da0   R8: 0000000000000000   R9: 0000000000000040
    R10: 0000000000000000  R11: 0000000000000000  R12: 00000000ffffff01
    R13: 0000000000016700  R14: ffffffffffffffff  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffff8800456a3da8] rebalance_domains at ffffffff81063536
 #8 [ffff8800456a3e78] run_rebalance_domains at ffffffff81063a1c
 #9 [ffff8800456a3ec8] __do_softirq at ffffffff81076fd1
#10 [ffff8800456a3f38] call_softirq at ffffffff8100c1cc
#11 [ffff8800456a3f50] do_softirq at ffffffff8100de05
#12 [ffff8800456a3f70] irq_exit at ffffffff81076db5
#13 [ffff8800456a3f80] scheduler_ipi at ffffffff8105b3de
#14 [ffff8800456a3fa0] smp_reschedule_interrupt at ffffffff8102df6a
#15 [ffff8800456a3fb0] reschedule_interrupt at ffffffff8100bd73
--- <IRQ stack> ---
...

+ Recent posts