Failed GFP_ATOMIC allocations (dropped network packets) result in kernel warnings and backtrace

22시 13분 2014년 3월 3일 업데이트 

문제

  • Failed GFP_ATOMIC allocations by the network stack result in dropped packets with the following message as seen from the console:
    program-name: page allocation failure. order:2, mode:0x4020

These errors will be followed by a backtrace that shows similar characteristics to the following though the module affected is usually different:

Feb 14 18:26:56 <hostname> kernel: ksoftirqd/0: page allocation failure. order:1, mode:0x20
Feb 14 18:26:56 <hostname> kernel: Pid: 4, comm: ksoftirqd/0 Tainted: P           ---------------    2.6.32-358.el6.x86_64 #1
Feb 14 18:26:56 <hostname> kernel: Call Trace:
Feb 14 18:26:56 <hostname> kernel: <IRQ>  [<ffffffff8112c127>] ? __alloc_pages_nodemask+0x757/0x8d0
Feb 14 18:26:56 <hostname> kernel: [<ffffffff811669d2>] ? kmem_getpages+0x62/0x170
Feb 14 18:26:56 <hostname> kernel: [<ffffffff811675ea>] ? fallback_alloc+0x1ba/0x270
Feb 14 18:26:56 <hostname> kernel: [<ffffffff8116703f>] ? cache_grow+0x2cf/0x320
Feb 14 18:26:56 <hostname> kernel: [<ffffffff81167369>] ? ____cache_alloc_node+0x99/0x160
Feb 14 18:26:56 <hostname> kernel: [<ffffffff81168530>] ? kmem_cache_alloc_node_trace+0x90/0x200
Feb 14 18:26:56 <hostname> kernel: [<ffffffff8116874d>] ? __kmalloc_node+0x4d/0x60
Feb 14 18:26:56 <hostname> kernel: [<ffffffff8143d6ad>] ? __alloc_skb+0x6d/0x190
Feb 14 18:26:56 <hostname> kernel: [<ffffffff8143d7ed>] ? dev_alloc_skb+0x1d/0x40
Feb 14 18:26:56 <hostname> kernel: [<ffffffffa0d01174>] ? ipoib_alloc_rx_skb+0x44/0x200 [ib_ipoib]
Feb 14 18:26:56 <hostname> kernel: [<ffffffffa0d013cf>] ? ipoib_ib_handle_rx_wc+0x9f/0x590 [ib_ipoib]
Feb 14 18:26:56 <hostname> kernel: [<ffffffffa0d01977>] ? ipoib_poll+0xb7/0x160 [ib_ipoib]
Feb 14 18:26:56 <hostname> kernel: [<ffffffff8144cd43>] ? net_rx_action+0x103/0x2f0
Feb 14 18:26:56 <hostname> kernel: [<ffffffff81076fb1>] ? __do_softirq+0xc1/0x1e0
Feb 14 18:26:56 <hostname> kernel: [<ffffffff8100c1cc>] ? call_softirq+0x1c/0x30
Feb 14 18:26:56 <hostname> kernel: <EOI>  [<ffffffff8100de05>] ? do_softirq+0x65/0xa0
Feb 14 18:26:56 <hostname> kernel: [<ffffffff81076b10>] ? ksoftirqd+0x80/0x110
Feb 14 18:26:56 <hostname> kernel: [<ffffffff81076a90>] ? ksoftirqd+0x0/0x110
Feb 14 18:26:56 <hostname> kernel: [<ffffffff81096916>] ? kthread+0x96/0xa0
Feb 14 18:26:56 <hostname> kernel: [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
Feb 14 18:26:56 <hostname> kernel: [<ffffffff81096880>] ? kthread+0x0/0xa0
Feb 14 18:26:56 <hostname> kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20

환경

  • Red Hat Enterprise Linux 4
  • Red Hat Enterprise Linux 5
  • Red Hat Enterprise Linux 6
  • Red Hat Enterprise Linux 7

해결

In interrupt context it is not possible to reclaim memory, or wait for memory to be freed, to satisfy an allocation request. Therefore, if "free" memory on a system is low, at the time of the allocation, the allocation will simply fail.

Failed GFP_ATOMIC allocations by the network stack result in dropped packets which is likely to be received on a subsequent retransmit.

Note that a mode value of 0x4020 and 0x20 (as reported in the "page allocation failure" message) - when 1<<5 is present - is a GFP_ATOMIC allocation.

GFP_ATOMIC allocation requests can go beyond the per-zone watermark[WMARK_MIN] threshold (normal allocation requests cannot fall below this watermark).

Presently, there is a patch in kernel 2.6.32-358 that helps to alleviate this problem by compacting memory as it is freed. Should "page allocation failures" continue to occur on the newer kernel or should a kernel update not be feasible yet in a particular environment, the only other solution is to increase watermark[WMARK_MIN] via min_free_kbytes. Once there are no more free pages, nothing else can be done to avoid a "page allocation failure" message.

Please note that vm.min_free_kbytes cannot be increased enough to keep up with sustained periods of 10 GiB ethernet network traffic. Increasing the value too much can affect user space effectively increasing the likelihood of OOM (Out-of-memory) occurrences.

The following is an illustration to aid understanding:

         ---  WMARK_HIGH
           |
           | (a)
           |
         ---  WMARK_LOW
           |
           | (b)
           |
         ---  WMARK_MIN
           |
           | (c)
           |
         ---  0 <--- no free pages

When watermark[WMARK_LOW] is reached, asynchronous reclaim is started, which provides an interval of "b" before we must start sync reclaim, and gives kswapd an interval of "a" before it need go back to sleep. When watermark[WMARK_MIN] is reached, normal allocators must enter synchronous reclaim, but PF_MEMALLOC, ALLOC_HARDER, and ALLOC_HIGH (i.e., GFP_ATOMIC) get access to varying amounts of the reserve "c".

+ Recent posts