RHEL6.4 - 6.5: data corruption when multiple NFSv4 clients write to the same NFS file on a NFSv4 server with write delegations
23시 01분 2014년 3월 13일 업데이트
문제
- Data corruption when multiple NFS clients write to the same NFS file on a NFS server with write delegations, introduced in 2.6.32-358.23.1.el6
- Data corruption when appending to a NFS file
환경
- Red Hat Enterprise Linux 6 (NFS client)
- kernel 2.6.32-358.23.1.el6 or later (RHEL 6.4.x)
- kernel 2.6.32-408.el6 or later (RHEL 6.5)
- NFS Server with write delegations
- Seen on Solaris or NetApp NFS server
- NFSv4
해결
- A fix is currently in progress, tracked by private Red Hat bugs 1054493 (RHEL6.6 or later) and 1066942 (RHEL6.5 maintenance kernel). Contact your Red Hat Support representative for more information.
- Upstream commit 263b4509ec4d47e0da3e753f85a39ea12d1eff24 (nfs: always make sure page is up-to-date before extending a write to cover the entire page) addresses this problem.
Workaround
-
Downgrade to a kernel earlier than 2.6.32-358.23.1.el6 or earlier than 2.6.32-408.el6.
-
Use NFSv3
-
Disable NFSv4 write delegations on the NFS server.
- For NetApp server, contact NetApp support for official recommendations. Unofficially, you should be able to use the option 'nfs.v4.write_delegation' to determine if write delegations are enabled.
근본 원인
- This is a regression caused by commit c7559663e42f4294ffe31fe159da6b6a66b35d61
[fs] nfs: Allow nfs_updatepage to extend a write under additional circumstances
- When determining whether to extend a write to cover an entire page in memory, the writer needs to determine whether the page is up-to-date. Commit c7559663e42f4294ffe31fe159da6b6a66b35d61 added logic to skip this check when the writer was holding a write delegation. Not reading the contents of the entire page first could cause data corruption when the page was written out to disk.
진단 단계
Reproducer
- On NFS server, start packet dump
snoop -o capturefile nfsclient2
- From nfsclient1
echo 123456789 > /nfs/newfile
- From nfsclient2
echo abcdefghi >> /nfs/newfile
- The resulting file is corrupted. The resulting file looks like this:
$ hexdump -C newfile
00000000 74 63 68 30 33 2e 61 74 6c 61 61 62 63 64 65 66
|tch03.atlaabcdef|
00000010 67 68 69 0a
snoop -i capturefile -V -x 0 shows in the
write packet: # use -v for more verbosity
NFS C 4 () PUTFH FH=BB66 WRITE ST=1DBE:0 at 0 for 20
240: 0002 0000 0014 7463 6830 332e 6174 6c61 ......tch03.atla
256: 6162 6364 6566 6768 690a abcdefghi.
- The second nfs client writes 10 bytes of garbage over the 123456789[newline], and then writes abcdefghi[newline]
So the size of the file is not modified, but the original content (10 bytes) is overwritten with 10 bytes of junk, followed by the 10 bytes of abcdefghi[newline].