How NFS clients fail over
A fail-over is when the client switches from one replica location to another after it determines that the current server it is communicating with is no longer accessible.
- NFS mount option timeo
- This mount option specifies the time that the TCP/IP layer must wait before it returns with a timeout response.
- NFS mount option retrans
- This mount option specifies the number of times the NFS RPC layer should retry the client's request before returning an RPC timeout error (ETIMEDOUT).
- nfso option nfs_v4_fail_over_timeout
- You can use this nfso option to specify the minimum amount of time the client must wait before failing over to a replica. This option is global to the NFS client and overrides the default per mount behavior. By default, thenfs_v4_fail_over_timeout is not active. It's value is 0.
When nfs_v4_fail_over_timeout is not active, the fail-over threshold is set to twice the mount timeo option value. When no successful RPC calls have occurred for this duration, the client will begin fail-over processing to find another available replica. However, the actual time the client will wait is influenced by the retrans option. If retrans is greater than 2, the client will likely wait until it receives an RPC timeout based on the retrans value times the timeovalue (retrans × timeo). Therefore, the combination of the timeo and retrans options can be adjusted to control fail-over behavior on a per-NFS mount basis. You can also set these options at a more granular level by using thenfs4cl command.
When nfso nfs_v4_fail_over_timeout is set to a non-zero value, it represents the number of seconds the client will wait on an unavailable server before considering replica fail-over. If the timeo and retrans options result in RPC timeout behavior beyond the nfso setting, fail-over processing may not start until the RPC timeout is generated.
For more information about the retrans, timeo, and nfs_v4_fail_over_timeout options, refer to the NFS-specific options of the mount, nfs4cl, and nfso commands.
In addition to replica fail-over in the event of an unavailable server, there are cases where the client will voluntarily switch from one replica location to another. One case is when you use the nfs4cl command to establish a preferred replica. In this case, the client initiates a switch to the preferred server, if that is not the current server the client is using. The client will also refetch replica location information from the NFS server on approximately 30 minute intervals when there has been recent activity on the associated data. If the ordering of the locations has changed, the client attempts to switch to the first location, if that is different from the current server the client is using and you have not set a replica preference with the nfs4cl command.