문제

  • How can I tune my Red Hat Enterprise Linux system for Oracle 10g/11g?
  • High memory consumption on Oracle database system
  • High CPU consumption on Oracle hosts
  • How should I configure swap memory for an Oracle database?
  • Slower than expected IO performance running Oracle on RHEL

환경

  • Red Hat Enterprise Linux 6
  • Red Hat Enterprise Linux 5
  • Red Hat Enterprise Linux 4
  • Oracle 9g
  • Oracle 10g
  • Oracle 11g

해결

  • Please see the below subsections for kernel tuning recommendations:
  • Note: All the numbers mentioned in this article are not generic and have been shown to have a positive effect on Database workloads resident on Red Hat Enterprise Linux. That being said, the specific values depend on user environments and may require further adjustment.

Memory settings in /etc/sysctl.conf:

  • Swapping for Oracle is not ideal and should be avoided as much as possible. The following tunable will tune the kernel to swap less aggressively.
vm.swappiness=10
  • Maximum percentage of active memory that can have dirty pages:

For example if a system has 1000 pages of memory and dirty_background_ratio is set to 3%, writeback will begin when 30 pages have been dirtied.

vm.dirty_background_ratio=3
  • Maximum percentage of total memory that can have dirty pages:

If it is set to 15% on a 1000 page system, a process dirtying pages will be made to wait once the 150th page is dirtied.
This mechanism will, thus, slow the dirtying of pages while the system catches up.

vm.dirty_ratio=15
  • How long data can be in page cache before being expired:
vm.dirty_expire_centisecs=500
  • How often pdflush is activated to clean dirty pages (in hundreths of a second):
vm.dirty_writeback_centisecs=100

HugePages

  • Obtain the Hugepagesize from /proc/meminfo
$ grep Huge /proc/meminfo
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
Hugepagesize:       2048 kB
  • Using Hugepagesize and the size of your SGA and PGA values, calculate the recommended number of HugePages
(SGA+PGA+(20KB * # of Oracle processes running)) / 2MB
  • For example:
(20GB SGA + 10GB PGA + (20KB * 1,000 Oracle processes)) / 2MB = 15369
  • /etc/sysctl.conf setting:
vm.nr_hugepages=15369
  • Once the above setting has been tested and shown to provide the performance required, it is recommended to move HugePage allocation to the kernel line in /boot/grub/grub.conf as a boot option. This allows for earlier and cleaner allocation of HugePages:
kernel /vmlinuz-<kernel-version> ro root=/dev/vg01/lv01 hugepages=15369 <other boot options>
  • Change the Oracle database configuration to use the Huge Pages. Contact Oracle support if assistance is needed

Limits setting in /etc/security/limits.conf:

  • Note: In order for Oracle database to use Huge Pages in Red Hat Enterprise Linux 4, 5 or 6, you may also need to increase the ulimit parameter "memlock" for the oracle user in /etc/security/limits.conf. The memlock setting is specified in KB and must match the memory size of the number of Huge Pages that Oracle should be able to allocate. So, if the Oracle database should be able to use 512 Huge Pages, then memlock must be set to at least (512 * Hugepagesize), which on a default system would be 1048576 KB (512*2048).
oracle     soft     memlock     1048576  
oracle     hard     memlock     1048576

Shared Memory

  • In a terminal window, obtain the total memory from the system:
mem=$(free|grep Mem|awk '{print $2}')
  • Convert the value of $mem to bytes:
totmem=$(echo "$mem*1024"|bc)
  • Get the Hugepagesize from /proc/meminfo:
huge=$(grep Hugepagesize /proc/meminfo|awk '{print $2}')
  • Calculate what 75% of the total memory on the system for SHMMAX:
max=$(echo "$totmem*75/100"|bc)
  • Divide the SHMMAX value by the Hugepagesize to get SHMALL:
all=$(echo "$max/$huge"|bc)
  • Set the SHMMAX value in the /etc/sysctl.conf file:
echo "kernel.shmmax = $max" >> /etc/sysctl.conf
  • Set the SHMALL value in the /etc/sysctl.conf file:
echo "kernel.shmall = $all" >> /etc/sysctl.conf
  • Setting the maximum number of shared memory segments with SHMMNI.
kernel.shmmni=4096

Semaphores

  • Recommended minimums for semaphore operations:
kernel.sem="250 32000 100 128"
  • The first value, SEMMSL, is the maximum number of semaphores per semaphore set
  • The second value, SEMMNS, defines the total number of semaphores for the system
  • The third value, SEMOPM, defines the maximum number of semaphore operations per semaphore call
  • The last value, SEMMNI, defines the number of entire semaphore sets for the system
#sysctl -w "kernel.sem = 250 32000 100 128"

Open file descriptors for oracle user

#vi /etc/security/limits.conf
#<domain>      <type>  <item>         <value>
oracle          hard    nofile          10000

Disabling transparent hugepages (THP) recommended just for Red Hat Enterprise Linux 6

  • Disable tuned and ktune services if you have it.
    Eg:
# service tuned stop
# chkconfig tuned off
# service ktune stop
# chkconfig ktune off

OR

# tuned-adm off

Note: The tuned-adm command will revert all your settings to what they were before tuned started and disable the tuning services from running at boot.

  • Append "transparent_hugepage=never" to kernel command line in /boot/grub/grub.conf file.
    Eg:
kernel /boot/vmlinuz-2.6.32-358.el6.x86_64 ro root=UUID=a216d1e5-884f-4e5c-859a-6e2e2530d486 rhgb quiet transparent_hugepage=never
  • Reboot the server for changes to take effect.
    Eg:
# reboot

I/O scheduler

Reference
2013 - Deploying Oracle RAC 11g R2 Database on Red Hat Enterprise Linux 6 - Best Practices









Question about the huge pages value calculation: my understanding is that PGA does not use huge pages, so why is that included in the calculation? Elsewhere, for instance on some Oracle sites, I've seen a hugepages_setting.sh script that does its calculations from ipcs.

Hi Anne,

You are right on the PGA does not use huge pages.The goal of setting up huge pages is to provide the correct number of pages to handle the running shared memory segments used by the SGA. An easy way to calculate the number of huge pages without using the hugepages_settings.sh script created by Oracle is to take the size of your SGA in kilobytes and divide that by the huge page size from the OS. In RHEL, the page size is 2048 kb. So the calculation would be: SGA in kilobytes / 2048 (huge page size). Once you have this size, you should add a few additional huge pages just to ensure all of the SGA can fit in large pages. After you have set this piece up, you also need to set the memlock parameter within your limits.d conf file located in /etc/security/limits.d/ directory). Setting memlock allows the oracle user to lock a certain amount of memory from physical RAM that isn't swapped out. The value is expressed in kilobytes and is important from the Oracle perspective because it provides the oracle user permission to use huge pages. So the next question might be how do I know if Oracle is using the huge pages? Well you can check cat /proc/meminfo and look at the hugepages total and hugepages free and see how the value goes down when you start your Oracle db instance. The other place to look is in the Oracle alert log. Within the alert log search for the word 'Large' and you should see something like:

************************ Large Pages Information *******************
Thu Jan 16 16:55:07 2014
Per process system memlock (soft) limit = UNLIMITED
Thu Jan 16 16:55:07 2014

Total System Global Area in large pages = 19 GB (100%)
Thu Jan 16 16:55:07 2014

Large pages used by this instance: 9665 (19 GB)
Thu Jan 16 16:55:07 2014
Large pages unused system wide = 3 (6144 KB)
Thu Jan 16 16:55:07 2014
Large pages configured system wide = 9668 (19 GB)
Thu Jan 16 16:55:07 2014
Large page size = 2048 KB
Thu Jan 16 16:55:07 2014


In this example the key line was: Total System Global Area in large pages = 19 GB (100%) as it shows that the Total SGA is in large pages.

As a shameless plug check out my Deploying Oracle 11gR2 on RHEL6 reference architectures at: https://access.redhat.com/site/articles/395013 and my Deploying Oracle 11gR2 RAC on RHEL6 at: https://access.redhat.com/site/articles/479093

If you have any other questions, feel free to reply to this. :)

Hope that helped and have a great weekend,

Roger

Forgot to mention, to value of memlock should be slightly bigger than the size of the Oracle SGA in kilobytes.

Thank you very much for your responses. A couple of questions: Any strong reason not to set memlock for Oracle to unlimited (as shown in your log example)? Is there a really great risk of Oracle getting out of hand and fixing more memory than its SGA? This would certainly reduce the need to change this setting every time SGA is increased.
And, yes, I have seen at least one of your articles mentioned above, which brings me to my next question. A couple of the settings you recommend differ significantly from those in this article, the most dramatic difference being dirty_pages_ratio, which you recommend raising significantly from the default, while this article recommends lowering it. Did you specifically have RH 6 in mind when you made that recommendation - would you make the same recommendation for RH 5? I have seen the KB article 348683 that seems to imply that the setting should be done in bytes, not ratio, if using huge pages at RH 5 - another thing on the to-do list for huge pages?

Hi Anne,

I'm not sure why the log shows the memlock value as unlimited, because I definitely have a value set. I don't think there is a great risk of Oracle getting out of hand and fixing more memory than its SGA. Oracle is just looking to see if large pages are available and if it can accommodate all the memory of the SGA into large pages. As for the value set for memlock, I'd set it to slightly larger than what you believe will be your largest SGA at any point in time. This way all your use cases with a smaller SGA are taken care of.

With regards to the dirty_ratio, the reason I increased the value from what you normally see as 15 to 80 is based on my understanding of dirty ratio. Dirty ratio defines the maximum percentage of total memory that can be filled with dirty pages before user processes are forced to write dirty buffers themselves during their time slice instead of being allowed to do more writes. Note that all processes are blocked for writes when this happens, not just the one that filled the write buffers. This can cause what is perceived as an unfair behavior where a single proces can hog all the I/O on a system. However, an application that can handle their writes being blocked altogether might benefit from decreasing the value.

The biggest thing I want to point out about kernel parameters in general is that there is no perfect answer. Every workload is different and requires performance testing. These values should just be seen as a starting point for your workload. The behavior of this parameter to my knowledge is the same for RHEL5 and RHEL6 so it would apply to both distributions.

The vm.dirty_background_bytes and vm.dirty_bytes is something I'm looking into further but have not tested personally. Due to that, I'm sticking to the use of ratios until I'm able to do some performance testing around the difference between using ratios vs bytes parameters.

Hope that helps and if you have any other questions feel free to ask :)

Take care,

Roger

The grub modification can be done by issuing the following command:
grubby --update-kernel=ALL --args="transparent_hugepage=never"

(via eminguez)



+ Recent posts