Understanding Linux buffers/cached (part I)

For a while now we've been trying to have a better understanding of the Linux buffer/page cache. In recent kernel versions we have noticed on our database servers an increase in kswapd CPU usage and have been trying to understand where it comes from.

As a first step we wanted to get a better of what kind of disk activity make use of the buffer/page cache. This utilisation can be seen by running:
# free -m
total used free shared buffers cached
Mem: 32240 123 32116 0 0 3
-/+ buffers/cache: 119 32120
Swap: 9546 21 9524
In this case the server has 32G RAM, and practically no buffers and no cache usage. Kernel is 2.6.32.6.

The kernel 
/proc documentation has the following to say about buffers and cached:
710      Buffers: Relatively temporary storage for raw disk blocks
711 shouldn't get tremendously large (20MB or so)
712 Cached: in-memory cache for files read from the disk (the
713 pagecache). Doesn't include SwapCached
We have performed various tests using dd to try and verify this. We used dd to copy over a chunk of 16G to/from various sources/destinations:
  • virtual devices (/dev/zero, /dev/null)
  • real devices (/dev/sdb (SATA), /dev/sdc (SAS))
  • files on a filesystem (/mnt/sdc1/file.in (xfs), /mnt/sdb1/file.out (ext3))
For now I'm using 16G chunks because I want to make sure that no cleanup is required in the caches (i.e both input an output can fit in memory concurrently). Before each test I cleared the Linux caches using the following command
# echo 3 > /proc/sys/vm/drop_caches
to ensure that any cache contents during or after the test were in fact put there by the test. I also ran free every 10 seconds during each test and captured the first and last outputs to demonstrate the cache usage change.

Here are the tests results:
1- dd if=/dev/zero of=/dev/null bs=1024k count=16384
Mem: 33014136 129440 32884696 0 296 4044
Mem: 33014136 132016 32882120 0 352 4784
17179869184 bytes (17 GB) copied, 2.33022 s, 7.4 GB/s

2- dd if=/dev/zero of=/dev/sdb2 bs=1024k count=16384
Mem: 33014136 138268 32875868 0 6428 4220
Mem: 33014136 18763244 14250892 0 16777668 4712
17179869184 bytes (17 GB) copied, 174.92 s, 98.2 MB/s

3- dd if=/dev/zero of=/mnt/sdb1/file.out bs=1024k count=16384
Mem: 33014136 126256 32887880 0 104 4308
Mem: 33014136 17297180 15716956 0 17668 4528
17179869184 bytes (17 GB) copied, 201.586 s, 85.2 MB/s

4- dd if=/dev/sdc2 of=/dev/null bs=1024k count=16384
Mem: 33014136 968960 32045176 0 1976 4424
Mem: 33014136 20255872 12758264 0 15836544 814648
17179869184 bytes (17 GB) copied, 93.7654 s, 183 MB/s

5- dd if=/dev/sdc2 of=/dev/sdb2 bs=1024k count=16384
Mem: 33014136 128228 32885908 0 192 4616
Mem: 33014136 32944064 70072 0 28176400 4524
17179869184 bytes (17 GB) copied, 170.546 s, 101 MB/s

6- dd if=/dev/sdc2 of=/mnt/sdb1/file.out bs=1024k count=16384
Mem: 33014136 129748 32884388 0 120 4260
Mem: 33014136 32946356 67780 0 14586200 14619404
17179869184 bytes (17 GB) copied, 207.602 s, 82.8 MB/s

7- dd if=/mnt/sdc1/file.in of=/dev/null bs=1024k count=16384
Mem: 33014136 132580 32881556 0 136 4780
Mem: 33014136 16410388 16603748 0 212 16253484
17179869184 bytes (17 GB) copied, 82.7158 s, 208 MB/s

8- dd if=/mnt/sdc1/file.in of=/dev/sdb2 bs=1024k count=16384
Mem: 33014136 126680 32887456 0 232 4040
Mem: 33014136 32941804 72332 0 15528188 15531504
17179869184 bytes (17 GB) copied, 171.806 s, 100 MB/s

9- dd if=/mnt/sdc1/file.in of=/mnt/sdb1/file.out bs=1024k count=16384
Mem: 33014136 130808 32883328 0 128 4600
Mem: 33014136 32944700 69436 0 18676 32287016
17179869184 bytes (17 GB) copied, 204.688 s, 83.9 MB/s
Here's what I conclude from these results:
  • I/O to/from virtual devices doesn't really impact buffer/page cache;
  • I/O to/from real devices uses the buffer cache;
  • I/O to/from files uses the page cache (maybe buffers as well but they are freed?).
Other observations:
  • Writing to /dev/sdb2: 100MB/s (expected);
  • Writing to the /mnt/sdb1 ext3 file system: 82-85MB/s (expected to be slower that raw device).
  • Reading from /dev/sdc2 slower than reading from /mnt/sdc1 xfs filesystem?
In my next post I'll use 32G chunks to see how the different caches "compete" against each other and how cache cleanup (kswapd) affects performance.

 

 

Understanding Linux buffers/cached (part II)

As mentionned in our previous post, we ran the test with 32G chunks. The results were similar at all levels and no significant performance degradation was noticed.

So we decided to concentrate on the test that really matters to us: copying large files from one filesystem to another. In our case specifically this is from a SATA disk to a SAS disk.

We devised a new test in which we copied a 32G file from one system to another. It the first run we clear the buffer cache before the copy. In the second one we "pollute" the buffer cache with the contents of a different 32G file. Also we added a "progress indicator" (based on the size of the destination file) to see if the copy speed remained constant.

Here are the results:
root 676 2 0 Mar01 ? 00:05:09 [kswapd0]
root 677 2 0 Mar01 ? 00:04:02 [kswapd1]
root 678 2 0 Mar01 ? 00:04:21 [kswapd2]
root 679 2 0 Mar01 ? 00:06:30 [kswapd3]
root 13597 13589 0 08:55 pts/1 00:00:00 grep kswapd

cp -a /mnt/sdc1/file.in /mnt/sdb1/file.out
[ ] Mem: 33014136 174576 32839560 0 34576 7452
[## ] Mem: 33014136 3740720 29273416 0 36324 3514124
[#### ] Mem: 33014136 5812776 27201360 0 37344 5553932
[##### ] Mem: 33014136 7512568 25501568 0 38184 7226332
[###### ] Mem: 33014136 9180456 23833680 0 39020 8868640
[####### ] Mem: 33014136 10854124 22160012 0 39848 10515264
[######### ] Mem: 33014136 12530548 20483588 0 40680 12164880
[########## ] Mem: 33014136 14217816 18796320 0 41512 13824832
[########### ] Mem: 33014136 15925228 17088908 0 42356 15504192
[############ ] Mem: 33014136 17619376 15394760 0 43204 17170640
[############## ] Mem: 33014136 19333432 13680704 0 44052 18857744
[############### ] Mem: 33014136 21029260 11984876 0 44888 20525884
[################ ] Mem: 33014136 22713904 10300232 0 45724 22183624
[################# ] Mem: 33014136 24412944 8601192 0 46568 23854636
[################### ] Mem: 33014136 26094416 6919720 0 47404 25509428
[#################### ] Mem: 33014136 27780548 5233588 0 48236 27169612
[##################### ] Mem: 33014136 29482868 3531268 0 49084 28843852
[###################### ] Mem: 33014136 31173420 1840716 0 49920 30507344
[####################### ] Mem: 33014136 32863196 150940 0 50760 32169592
[######################### ] Mem: 33014136 32944236 69900 0 20880 32278936
[########################## ] Mem: 33014136 32946780 67356 0 21720 32281000
[########################### ] Mem: 33014136 32945324 68812 0 22572 32278828
[############################ ] Mem: 33014136 32949628 64508 0 23412 32281148
[############################## ] Mem: 33014136 32948952 65184 0 24212 32279692
[############################### ] Mem: 33014136 32948996 65140 0 25044 32279332
[################################ ] Mem: 33014136 32946632 67504 0 25788 32274668
[################################# ] Mem: 33014136 32948208 65928 0 26768 32275636
[################################### ] Mem: 33014136 32943144 70992 0 27592 32269580
[#################################### ] Mem: 33014136 32950200 63936 0 28432 32275752
[##################################### ] Mem: 33014136 32943692 70444 0 29288 32268232
[###################################### ] Mem: 33014136 32947748 66388 0 30108 32272304
[######################################## ] Mem: 33014136 32950288 63848 0 30944 32273396
[######################################### ] Mem: 33014136 32942008 72128 0 31996 32260860
[########################################## ] Mem: 33014136 32949476 64660 0 32812 32267992
[########################################### ] Mem: 33014136 32950176 63960 0 33644 32268792
[############################################# ] Mem: 33014136 32944552 69584 0 34488 32262224
[############################################## ] Mem: 33014136 32949952 64184 0 35316 32266516
[############################################### ] Mem: 33014136 32948532 65604 0 36120 32264488
[################################################ ] Mem: 33014136 32946364 67772 0 36792 32260784
[################################################# ] Mem: 33014136 32945344 68792 0 37616 32258632
root 676 2 0 Mar01 ? 00:05:12 [kswapd0]
root 677 2 0 Mar01 ? 00:04:05 [kswapd1]
root 678 2 0 Mar01 ? 00:04:24 [kswapd2]
root 679 2 0 Mar01 ? 00:06:33 [kswapd3]
root 13822 13589 0 09:02 pts/1 00:00:00 grep kswapd

real 7m28.263s
user 0m0.996s
sys 1m36.458s
32768+0 records in
32768+0 records out
34359738368 bytes (34 GB) copied, 307.01 s, 112 MB/s
root 676 2 0 Mar01 ? 00:05:12 [kswapd0]
root 677 2 0 Mar01 ? 00:04:05 [kswapd1]
root 678 2 0 Mar01 ? 00:04:25 [kswapd2]
root 679 2 0 Mar01 ? 00:06:33 [kswapd3]
root 13833 13589 0 09:08 pts/1 00:00:00 grep kswapd

cp -a /mnt/sdc1/file.in /mnt/sdb1/file.out
[ ] Mem: 33014136 32944112 70024 0 67444 32663080
[## ] Mem: 33014136 32945600 68536 0 69152 32613868
[### ] Mem: 33014136 32946012 68124 0 69924 32590048
[#### ] Mem: 33014136 32950248 63888 0 70752 32571628
[###### ] Mem: 33014136 32948868 65268 0 71584 32546728
[####### ] Mem: 33014136 32948012 66124 0 72300 32522152
[######## ] Mem: 33014136 32945536 68600 0 73132 32498400
[######### ] Mem: 33014136 32950892 63244 0 73960 32481168
[########### ] Mem: 33014136 32946804 67332 0 74800 32453096
[############ ] Mem: 33014136 32950404 63732 0 75648 32432716
[############# ] Mem: 33014136 32951148 62988 0 76604 32408564
[############## ] Mem: 33014136 32942296 71840 0 77448 32378208
[################ ] Mem: 33014136 32948272 65864 0 78284 32359056
[################# ] Mem: 33014136 32948044 66092 0 79116 32337364
[################## ] Mem: 33014136 32949352 64784 0 71808 32322048
[################### ] Mem: 33014136 32951664 62472 0 72660 32307852
[##################### ] Mem: 33014136 32948476 65660 0 73504 32287968
[###################### ] Mem: 33014136 32944516 69620 0 74332 32265644
[####################### ] Mem: 33014136 32946920 67216 0 74620 32253148
[######################## ] Mem: 33014136 32948564 65572 0 70044 32248776
[######################### ] Mem: 33014136 32945656 68480 0 70864 32234052
[########################### ] Mem: 33014136 32946396 67740 0 71720 32222548
[############################ ] Mem: 33014136 32949044 65092 0 56092 32240792
[############################# ] Mem: 33014136 32945180 68956 0 56928 32235440
[############################## ] Mem: 33014136 32951796 62340 0 57696 32239140
[################################ ] Mem: 33014136 32950372 63764 0 58488 32237256
[################################# ] Mem: 33014136 32949888 64248 0 59288 32235072
[################################## ] Mem: 33014136 32941704 72432 0 60048 32226888
[################################### ] Mem: 33014136 32943220 70916 0 60860 32224908
[##################################### ] Mem: 33014136 32943984 70152 0 61680 32225116
[###################################### ] Mem: 33014136 32943844 70292 0 62492 32224848
[####################################### ] Mem: 33014136 32951580 62556 0 63260 32231156
[######################################## ] Mem: 33014136 32944956 69180 0 64060 32223200
[########################################## ] Mem: 33014136 32946480 67656 0 64880 32223108
[########################################### ] Mem: 33014136 32943476 70660 0 65692 32219428
[############################################ ] Mem: 33014136 32943184 70952 0 66504 32219832
[############################################# ] Mem: 33014136 32944148 69988 0 67308 32218456
[############################################### ] Mem: 33014136 32949336 64800 0 68140 32221992
[################################################ ] Mem: 33014136 32949608 64528 0 68960 32222100
[################################################# ] Mem: 33014136 32948848 65288 0 69784 32219240
root 676 2 0 Mar01 ? 00:05:18 [kswapd0]
root 677 2 0 Mar01 ? 00:04:11 [kswapd1]
root 678 2 0 Mar01 ? 00:04:31 [kswapd2]
root 679 2 0 Mar01 ? 00:06:39 [kswapd3]
root 14041 13589 0 09:15 pts/1 00:00:00 grep kswapd

real 7m28.182s
user 0m0.808s
sys 1m35.530s
In both cases the total time is about equal, but what's interesting is the CPU time spent in kswapd: 12 seconds in the first case and 24 seconds in the second case. This is precisely the type of activity that we are interested in understanding.

In this case it makes sense: since the RAM was initially polluted in the second run, it requires more cleanup by kswapd to remove old caches pages in order to make space to insert new ones.

+ Recent posts