문제
- How to monitor memory usage statistics and tune the memory management subsystem if needed?
- Memory tuning guidelines for Red Hat Enterprise Linux.
환경
- Red Hat Enterprise Linux 3
- Red Hat Enterprise Linux 4
- Red Hat Enterprise Linux 5
- Red Hat Enterprise Linux 6
해결
LowMem Starvation
Memory usage on 32-bit system can become problematic under some workloads, especially for I/O intensive applications such as:
Oracle Database or Application Server
Java
With the x86 architecture the first 16MB-896MB of physical memory is known as "low memory" (ZONE_NORMAL) which is permanently mapped into kernel space. Many kernel resources must live in the low memory zone. In fact, many kernel operations can only take place in this zone. This means that the low memory area is the most performance critical zone. For example, if you run many resources intensive applications/programs and/or use large physical memory, then "low memory" can become low since more kernel structures must be allocated in this area. Under heavy I/O workloads the kernel may become starved for LowMem even though there is an abundance of available HighMem. As the kernel tries to keep as much data in cache as possible this can lead to oom-killers or complete system hangs.
- In 64-bit systems all the memory is allocated in ZONE_NORMAL. So lowmem starvation will not affect 64-bit systems. Moving to 64-bit would be a permanent fix for lowmem starvation.
Diagnosing
The amount of LowMem can be checked in
/proc/meminfo
. If the LowFree falls below 50Mb it may be cause for concern. However this does not always indicate a problem as the kernel will try to use the entire LowMem zone and it may be able to reclaim some of the cache.MemTotal: 502784 kB MemFree: 29128 kB HighTotal: 162088 kB HighFree: 22860 kB LowTotal: 340696 kB LowFree: 6268 kB
OOM-KILLER: the kernel should print sysrq-M information to messages and the console. You may see the Normal zone reporting
all_unreclaimable? yes
, meaning the kernel could not reclaim any memory in this zone.
kernel: DMA free:12544kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB pages_scanned:2814 all_unreclaimable? yes
kernel: Normal free:888kB min:928kB low:1856kB high:2784kB active:4152kB inactive:3724kB present:901120kB pages_scanned:9900 all_unreclaimable? yes
kernel: HighMem free:8731264kB min:512kB low:1024kB high:1536kB active:784164kB inactive:38796kB present:10354684kB pages_scanned:0 all_unreclaimable? no
- In this case we can also see that the largest contiguous block of memory in the LowMem range is 32kB, so if the kernel requires anything larger than that the allocation may fail.
kernel: DMA: 4*4kB 4*8kB 3*16kB 3*32kB 3*64kB 3*128kB 2*256kB 0*512kB 1*1024kB 1*2048kB 2*4096kB = 12544kB
kernel: Normal: 80*4kB 18*kB 29*16kB 3*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 888kB
kernel: HighMem: 660*4kB 160*8kB 61*16kB 55*32kB 496*64kB 2799*128kB 2385*256kB 1356*512kB 643*1024kB 261*2048kB 1425*4096kB = 8731264kB
SYSTEM HANGS: A core file captured at the time of the hang can often provide evidence for LowMem starvation:
crash> kmem -i PAGES TOTAL PERCENTAGE TOTAL MEM 1021613 3.9 GB FREE 159502 623.1 MB 15% of TOTAL MEM USED 862111 3.3 GB 84% of TOTAL MEM SHARED 198019 773.5 MB 19% of TOTAL MEM BUFFERS 43212 168.8 MB 4% of TOTAL MEM CACHED 103623 404.8 MB 10% of TOTAL MEM SLAB 63170 246.8 MB 6% of TOTAL MEM TOTAL HIGH 802802 3.1 GB 78% of TOTAL MEM FREE HIGH 155824 608.7 MB 19% of TOTAL HIGH TOTAL LOW 218811 854.7 MB 21% of TOTAL MEM FREE LOW 3678 14.4 MB 1% of TOTAL LOW TOTAL SWAP 1048554 4 GB SWAP USED 52 208 KB 0% of TOTAL SWAP SWAP FREE 1048502 4 GB 99% of TOTAL SWAP
Installing hangwatch can also be useful if the M flag is enabled for sysrq.
Tuning
Sysctl
RHEL 4
Attempt to protect 100Mb of LowMem from userspace allocations (defaults to 0)
vm.lower_zone_protection=100
RHEL 5
The pagecache value represents a percentage of physical RAM. When the size of the filesystem cache exceeds this size then cache pages are added only to the inactive list so under memory reclaim conditions the kernel is more likely to reclaim pages from the cache instead of swapping anonymous pages.
vm.pagecache=100
RHEL 6
Will take into account highmem along with lowmem when calculating dirty_ratio and dirty_background_ratio. This will will make page reclaiming faster.
vm.highmem_is_dirtyable=1
RHEL 5 and 6
zone_reclaim_mode
determines the approaches to reclaim memory when a zone runs out of memory. If it is set to zero then no zone reclaim occurs. Allocations will be satisfied from other zones / nodes in the system. The value is a bitmap, consisting of:1 = Zone reclaim on 2 = Zone reclaim writes dirty pages out 4 = Zone reclaim swaps pages
Attempt to protect approximately 1/9 (98Mb) of LowMem from userspace allocations (defaults to 1/32, or 27.5 Mb)
vm.lowmem_reserve_ratio=256 256 9
Note: The above parameter requires special syntax.
RHEL 4, 5 and 6
Have a higher tendency to swap out to disk. This value can go from 0 to 100 (default 60). Setting below 10 is not recommended.
vm.swappiness=80
Try to keep at least 19Mb of memory free (default varies). Adjust this to something higher than what is currently in use.
vm.min_free_kbytes=19000
Decrease the amount of time for a page to be considered old enough for flushing to disk via the pdflush daemon (default 2999). Expressed in 100'ths of a second.
vm.dirty_expire_centisecs=2000
Shorten the interval at which the pdflush daemon wakes up to write dirty data to disk (default 499). Expressed in 100'ths of a second.
vm.dirty_writeback_centisecs=400
Decrease the tendency of the kernel to reclaim the memory which is used for caching of directory and inode objects (default 100, do not increase this beyond 100 as it can cause excessive reclaim).
vm.vfs_cache_pressure=50
Overcommit Memory
- Overcommitting memory allows the kernel to potentially allocate more memory than the system actually has. This is perfectly safe, and in fact default behavior, as the Linux VM will handle the management of memory. However, to tune it, consider the following information per the
man proc
documentation:
/proc/sys/vm/overcommit_memory
This file contains the kernel virtual memory accounting mode. Values are:
0: heuristic overcommit (this is the default)
1: always overcommit, never check
2: always check, never overcommit
In mode 0, calls of mmap(2) with MAP_NORESERVE are not checked, and the
default check is very weak, leading to the risk of getting a process "OOM-
killed". Under Linux 2.4 any non-zero value implies mode 1. In mode 2
(available since Linux 2.6), the total virtual address space on the system is
limited to (SS + RAM*(r/100)), where SS is the size of the swap space, and RAM
is the size of the physical memory, and r is the contents of the file
/proc/sys/vm/overcommit_ratio.
HugePages
Enabling an application to use HugePages provides many benefits for the VM as it allows that application to lock data into memory and prevent it from swapping. Some advantages of such a configuration:
Increased performance by through increased TLB hits
Pages are locked in memory and are never swapped out which guarantees that shared memory like SGA remains in RAM
Contiguous pages are preallocated and cannot be used for anything else but for System V shared memory (e.g. SGA)
Less bookkeeping work for the kernel for that part of virtual memory due to larger page sizes
HugePages are only useful for applications that are aware of them (i.e., don't recommend them as a way to solve all memory issues). They are only used for shared memory allocations so be sure not to allocate too many pages. By default the size of one HugePage is 2Mb. For Oracle systems allocate enough HugePages to hold the entire SGA in memory.
To enable HugePages use a sysctl setting to define how many pages should be allocated:
RHEL 3
vm.hugetlb_pool=1024
RHEL 4 onwards
vm.nr_hugepages=1024
The application user must also have its memlock limit increased in
/etc/security/limits.conf
so they can lock that many pages into memory:oracle - memlock 2097152
On RHEL 4, 5 or 6 this user must be logged out and back in (i.e., the application restarted) before the settings will be applied. On RHEL 3 the system must be rebooted.
Resources
- Tuning and Optimizing RHEL for Oracle 9i and 10g
- Understanding Virtual Memory in Red Hat Enterprise Linux 3
Warning: The following links are to sources that are not authored by Red Hat directly. We cannot verify its accuracy and content.
진단 단계
Apart from /proc/meminfo
, the file /proc/zoneinfo
can also be used to monitor the memory usage statistics. In that file, note the following fields under each zone. For example, in Normal zone of Node 0,
Node 0, zone Normal
pages free 1451395
min 4000
low 5000
high 6000
pages_low - When pages_low number of free pages is reached, kswapd is woken up by the buddy allocator to start freeing pages.
pages_min - When pages_min is reached, the allocator will do the kswapd work in a synchronous fashion, sometimes referred to as the direct-reclaim path.
pages_high - Once kswapd has been woken to start freeing pages it will not consider the zone to be “balanced” until pages_high pages are free. Once the watermark has been reached, kswapd will go back to sleep.
Here, if we observe the value of pages_low and pages_min over a period of time, we will be fairly able to get an estimate of the usage pattern of memory in that particular zone of that node.
'OS > Linux' 카테고리의 다른 글
[RHEL] 프로파일 메모리 사용에 Valgrind 사용 (0) | 2014.03.10 |
---|---|
[RHEL] Huge Pages 및 Transparent Huge Pages (0) | 2014.03.10 |
[RHEL] HugeTLB (Huge Translation Lookaside Buffer) (RHEL6) (0) | 2014.03.10 |
[RHEL] 가상 메모리 튜닝 (0) | 2014.03.09 |
[RHEL] 이슈 : Page out activity when there is no current VM load (0) | 2014.03.08 |
[RHEL] Tuning Red Hat Enterprise Linux for Oracle and Oracle RAC performance (0) | 2014.03.08 |
[RHEL] What is the maximum support for hugepages (0) | 2014.03.08 |
[RHEL] What is an approprite memlock value in limits.conf when using hugepages for an Oracle DB on RHEL (0) | 2014.03.08 |
[RHEL] Enabling hugepages for use with Oracle Database (0) | 2014.03.08 |
[RHEL] Linux HugePages and virtual memory (VM) tuning (0) | 2014.03.08 |