RHEL6.4 - 6.5: data corruption when multiple NFSv4 clients write to the same NFS file on a NFSv4 server with write delegations

23시 01분 2014년 3월 13일 업데이트

문제

  • Data corruption when multiple NFS clients write to the same NFS file on a NFS server with write delegations, introduced in 2.6.32-358.23.1.el6
  • Data corruption when appending to a NFS file

환경

  • Red Hat Enterprise Linux 6 (NFS client)
    • kernel 2.6.32-358.23.1.el6 or later (RHEL 6.4.x)
    • kernel 2.6.32-408.el6 or later (RHEL 6.5)
  • NFS Server with write delegations
    • Seen on Solaris or NetApp NFS server
  • NFSv4

해결

  • A fix is currently in progress, tracked by private Red Hat bugs 1054493 (RHEL6.6 or later) and 1066942 (RHEL6.5 maintenance kernel). Contact your Red Hat Support representative for more information.
  • Upstream commit 263b4509ec4d47e0da3e753f85a39ea12d1eff24 (nfs: always make sure page is up-to-date before extending a write to cover the entire page) addresses this problem.

Workaround

  • Downgrade to a kernel earlier than 2.6.32-358.23.1.el6 or earlier than 2.6.32-408.el6.

  • Use NFSv3

  • Disable NFSv4 write delegations on the NFS server.

    • For NetApp server, contact NetApp support for official recommendations. Unofficially, you should be able to use the option 'nfs.v4.write_delegation' to determine if write delegations are enabled.

근본 원인

  • This is a regression caused by commit c7559663e42f4294ffe31fe159da6b6a66b35d61
    [fs] nfs: Allow nfs_updatepage to extend a write under additional circumstances
  • When determining whether to extend a write to cover an entire page in memory, the writer needs to determine whether the page is up-to-date. Commit c7559663e42f4294ffe31fe159da6b6a66b35d61 added logic to skip this check when the writer was holding a write delegation. Not reading the contents of the entire page first could cause data corruption when the page was written out to disk.

진단 단계

Reproducer

  • On NFS server, start packet dump
snoop -o capturefile nfsclient2
  • From nfsclient1
echo 123456789 > /nfs/newfile
  • From nfsclient2
 echo abcdefghi  >> /nfs/newfile
  • The resulting file is corrupted. The resulting file looks like this:
$ hexdump -C newfile
00000000  74 63 68 30 33 2e 61 74  6c 61 61 62 63 64 65 66
|tch03.atlaabcdef|
00000010  67 68 69 0a     

snoop -i capturefile -V -x 0 shows in the
write packet:      # use -v for more verbosity

NFS C 4 () PUTFH FH=BB66 WRITE ST=1DBE:0 at 0 for 20

         240: 0002 0000 0014 7463 6830 332e 6174 6c61 ......tch03.atla
         256: 6162 6364 6566 6768 690a                   abcdefghi.
  • The second nfs client writes 10 bytes of garbage over the 123456789[newline], and then writes abcdefghi[newline]
    So the size of the file is not modified, but the original content (10 bytes) is overwritten with 10 bytes of junk, followed by the 10 bytes of abcdefghi[newline].

 

+ Recent posts