Linux서버 VCS HA 환경에서 Heartbeat NIC 교체한 경우 작업 방법


Heartbeat을 관장하는 llt가 사용하는 /etc/llttab이라는 테이블이 있다.
/etc/llttab 을 보면 Heartbeat 디바이스 정보가 들어있는데,
Physical Address가 변경이 되므로, 알맞게 수정해 줘야 한다.

보통 /etc/llttab을 변경하면 lltstat 정보가 정상으로 나오는 것으로 보인다.
하지만 안되는 경우 전에 아래 방법도 시도해보았다.

hastop -local (-force)
/etc/init.d/vxfen stop
/etc/init.d/gab stop
/etc/init.d/llt stop
lltconfig -U
hastart

lltstat에서 어떠한 링크가 죽었는지 확인하려면
lltstat -vvn 


LLTCONFIG (1M)

Maintenance Commands

Table of contents


NAME

lltconfig - Low Latency Transport (LLT) Protocol configuration utility


SYNOPSIS

lltconfig -h

lltconfig -c [-f file]

lltconfig [-v ] [-C clusterid] [-n systemid] [-i low_high] [-x low_high] [-o]

lltconfig -t devtag -d device [-b link_type -s SAP -m mtu]

lltconfig [-u device_tag]

lltconfig -U

lltconfig -V

lltconfig [-T timer:value]

lltconfig [-F limit:value]

lltconfig -a list | flush | delete | set [system] [ device_tag] [ address]

lltconfig -K 0 | 1 | 10 | 2 | 20

lltconfig -E 0 | 1 | 3

lltconfig -A 0 | 1


DESCRIPTION

The lltconfig utility initializes and maintains the configuration of the LLT protocol stack. It is responsible for managing the STREAMS plumbing between the LLT protocol driver and the network drivers to which it is connected, as well as many internal protocol parameters.

At system startup, lltconfig reads the /etc/llttab file to determine the local system ID and network devices, links the drivers, checks if there is another node with the same node id and then sets parameter information into LLT and starts the protocol running.

The lltconfig command without options reports the running status of the LLT protocol. The lltconfig command listens for 5 seconds on each link to see if it can detect a duplicate node in the cluster. It detects a duplicate node it prints an error message and exits.


OPTIONS

-a list | flush | delete | set [system] [device_tag] [address]

Display or manipulate MAC addresses associated with specific systems on specific network links. The option list displays the address table.flush deletes all automatically learned addresses. delete removes one address as specified by the systemid and the device_tagset adds oneaddress for the system with the systemid on the link specified by device_tagaddress should consist of hexadecimal digits separated by colons (:) or dots (.), depending on the link type.
-b link_type
Choose the link type required: "ether."
-C
Set the clusterid. This option is needed only if more than one cluster is sharing network hardware being used by LLT. In this case, each cluster needs its own clusterid (or, alternately, a unique SAP), so that the clusters do not interfere with each other. Systems with different cluster ids cannot communicate with each other. The default cluster ID is 0.
-c
Configure the LLT protocol from the /etc/llttab file.
-d device
Configure a network interface link below LLT. This link is bound to the LLT SAP and is used to send heartbeats and data to other systems.device is the name of the network device; it may be followed by a colon (:) and an integer specifying which unit or PPA to attach to (for example, -d /dev/qfe:1 ).
-f
Specify an alternate configuration file to use instead of /etc/llttab. This option is valid only with the -c option.
-F limit:value
Query or change the values of the flow control limits. Valid values for limit are querylowwaterhighwaterwindowackvalsws, and linkburst. The limits are the low water mark, high water mark, and window size, respectively. The value is specified in number of packets, and is not used with the query option. The values should not be changed haphazardly, or the protocol may fail to operate.
-h
Display a help message and exit.
-i low-high
Set a range, low-high, of system ids valid for participation in the cluster. This command alters the limits of system ids that applications may use to prevent them from trying to communicate with non-existent systems. The default is to include 0-nn, where nn is the maximum supported systemid as determined by the kernel configuration.
-l
Used only with the -d option, this option specifies that the network link is to be used only as a last resort for sending data, although it is used to send heartbeats.
-m mtu
Specify the maximum transmission unit to use for packets on the network links. This number must be less than or equal to the lowest MTU of all the network links. The current default is 1500. Packets having a value greater than this number are submitted to LLT as fragments.
-n systemid
Set the systemidsystemid may be an integer in the range of valid systemids. It may also be a symbolic name, which is translated via/etc/llthosts to a systemid, or it may be a filename beginning with a slash (/), in which case the first word from the file is used as a symbolic name and translated via /etc/llthosts to a systemid. Systemids must be unique within a cluster. If LLT detects a configuration in which another system is using the same systemid, it disables the protocol until the system is rebooted.
-o
Override flag. Specify that values such as systemid need to be overwritten. It can also be used to force LLT to configure a link even if a duplicate node is detected.
-s sap
Specify the SAP to bind on the network links using DLPI. The current default is 0xCAFE.
-T timer:value
Query or change the values of the protocol timers. Valid values for timer are queryheartbeatheartbeatlopeerinactpeertroubleoos,retransservice, and arp. These timers (except for query) are the heartbeat, heartbeat on low priority links, peer inactivity, link inactivity, out-of-sequence, retransmit, service procedure, and address resolution protocol cache flush timers respectively. Use lltconfig -T query to display the current timer settings. value is specified in 1/100ths of a second, and is not used with the query option. The values should not be changed haphazardly, or the protocol may fail to operate.
-t device_tag
Used only with the -d option, this option specifies a tag used to identify a particular link in subsequent commands, and is displayed bylltstat(1M).
-u device_tag
Unlink the LLT protocol from the network device indicated by device_tag.
-U
Unlink the LLT protocol from the all network devices.
-V
Print the LLT current and maximum supported protocol version information.
-v
Enable verbose output.
-x low-high
Set a range, low-high, of systemids not valid for participation in the cluster. This option alters the limits of systemids that applications may use to prevent attempts to communicate with non-existent systems.
-K 0|1|10|2|20
Set checksum mode.
When set to 1, LLT checksums each packet it sends to peer to guard against packet corruption on-the-wire. LLT will also offload checksum calculation to hardware if the underlying NIC supports it. In case checksum verification fails on the receiver LLT will drop that packet causing the sender to retransmit it.

Setting to 10 is same as setting to 1 except that LLT will strictly do checksums in software and will NOT offload checksumming to NIC even if it is capable of doing so.

When set to 2, LLT also checksums the whole data buffer submitted by the client to be verified by the peer before delivering it to peer-client. In case the checksum verification fails on the receiver, LLT will panic the machine. This is purposefully done to help in analysis of memory corruption from a crash dump.

Setting to 20 is same as setting to 2 except that LLT will strictly do checksums in software and will NOT offload checksumming to NIC even if it is capable of doing so.

Level 2 and level 20 checksums should only be used when diagnosing memory corruption under the advisement of the support center, since it does have the ability to panic the machine.

The default is 0 (no checksums) as LLT depends on the NIC's hardware to guarantee packet accuracy. Level 1 checksums may be enabled if the private network is suspected of packet corruption on-the-wire or in the NIC. There may be some tradeoff of peformance due to the CPU cycles needed to perform the checksum in addition to those performed by the NIC hardware.

Currently checksum offloading is only implemented on Linux and only for transmitting packets.

-E 0|1|3
Set trace level.
If set to 1 (the default), LLT will trace all events (upcalls, flow-control, link and connection state changes) in an internal circular buffer (called as trace buffer).

When set to 3, LLT will also trace packets that are received or transmitted. This has an overhead and may impact performance. Hence should be used only to debug.

Setting to 0 disables tracing.

-A 0|1
Enable strict source address checking.
If set to 1, LLT will check the source address of incoming packets and drop packets from unknown sources. When set to 0 this check is not performed.

This option is available only when UDP links are configured. For ethernet its a no-op.


ENVIRONMENT VARIABLES

LLT_LINK_TIMEOUT
lltconfig listens for 5 seconds on each link to detect if another node in the cluster has the same node id. To change the default value, set this environment variable to a value in seconds.

DISCLAIMER

When LLT and GAB are running under a cluster manager other than VCS, configure LLT and GAB as per the cluster manager's supplementary documentation on LLT and GAB. -f option is applicable only in VCS environment.


FILES

/etc/llttab


SEE ALSO

lltstat(1M), llttab(4)

Last updated: March 2006
Copyright ©2009 Symantec Corporation
All rights reserved.
















NAME

lltstat - report Low Latency Transport (LLT) Protocol statistics

SYNOPSIS

lltstat [-v [-v]] [-n | -c | -l | -z | -p | -C | -N | -H | -t]

DESCRIPTION

The lltstat utility reports the status of the LLT protocol and the values of counters that it maintains. Without any options, lltstat shows statistical counters.

OPTIONS

-vVerbose option.
-cDisplay various configuration parameters.
-pDisplay the current status of the ports in use.
-nDisplay the current status of the peer systems. The local systemid is indicated by an asterisk (*) among the other systems in the list. This option may be used with the -v or -vv verbose and very-verbose options to display link information and MAC addresses.
-zReset the statistical counters to zero.
-lDisplay the current status of network links configured in LLT.
-CDisplay the currently configured cluster number.
-NDisplay the currently configured systemid as a number.
-HDisplay the currently configured systemid as a name (from /etc/llthosts ).
-tDisplay value of various kernel tunables.

FILES

/etc/llthosts

SEE ALSO

lltconfig(1M)







NAME

gabconfig - Group Membership and Atomic Broadcast (GAB) configuration utility

SYNOPSIS

/sbin/gabconfig [-abBcCejJklRsuUvWx] [-f iofence][-n count ][-t stable ][-Q type:value][-V version][-S param:value]

DESCRIPTION

The gabconfig utility sets up and maintains the configuration of the GAB driver. The GAB driver is dependent on the Low Latency Transport (LLT) protocol, which must be configured prior to running gabconfig.

OPTIONS

-aDisplay GAB driver port memberships.If -C is also specified then, for kernel clients, also list the names of the corrpesponding clients, if registered.
GAB inlcudes the following types of membership:
oA regular membership includes systems that communicate with each other across more than one network channel.
oA jeopardy membership includes systems that have only one private communication link.
oA visible membership includes systems that have GAB running but the GAB client is no longer registered with GAB.
-bEnable system halt when the process fails to heartbeat. By default, if a process fails to heartbeat in a given interval, GAB makes five attempts to kill the process. With this option set, GAB panics the system without making any attempts to kill the process. This option can be turned off using the -B option.
-BDisable system halt when the process fails to heartbeat. This option will turn off the functionality enabled by the -b option.
-ePrint out kernel tunables set for GAB. If the value of the tunables is changed, the changed value will get into effect on module reload.
-cConfigure the driver for use. Configuring the GAB driver enables client registrations and the joining of an already seeded group.
-CList the names of GAB (kernel) clients that have registered their names with GAB. Along with -a, this lists the ports along with their clients, if a name is registered.
-jEnable halt on rejoin. A network failure may cause systems to form independant clusters, or partitions. When the connections are restored, systems will attempt to rejoin into one cluster. By default, GAB kills processes associated with ports on rejoining systems. This option directs GAB to halt the system.
-JDisable halt on rejoin. This option will turn off the functionality enabled by the -j option.
-kRepeat attempts to kill a process that does not die. By default, after five attempts to kill a process, GAB halts the system. This option directs GAB to close the client port and repeatedly and silently attempt to kill a process without halting the system.
-lDisplay the GAB driver configuration.
-pEnable halt system on process death. If had and hashadow are killed using kill -9, the system can potentially lose high availability. If this option is enabled, then the GAB will PANIC the system on detecting the death of the client process.
-PDisable halt system on process death. This is the default behavior.
-QSpecify send and recv queue limits for GAB.
-sSingle network. This flag enables network partition arbitration and should be used only to test configurations. It is required for operating GAB over one network connection.
-S param:value
 Change the value of the gab tunables. Valid param fields are:

isolate_time:Specify a timeout (in milliseconds) for clients to respond to SIGKILL signal. When gab clients receive SIGKILL signal they must unregister from the GAB driver within isolate_time milliseconds to avoid halting the system. The default value is 120000 ms or 120 seconds.

kill_ntries: Specify the maximum no of SIGABRT signals sent to the client when client stops sending heartbeats to GAB driver. The default value is 5.

-uUnconfigure the GAB driver. Close the seed control port (port a) if all client ports are closed.
-UUnconfigure the GAB driver and reinitialize all configuration states.
-vDisplay GAB version information like the product version, build time stamp, interface version, minimum, maximum, and on-the-wire protocol versions.
-WDisplay the supported range of the GAB protocol versions and the current version.
-xSeed control port. This option affords protection from pre-existing network partitions. The control port (port a) propagates the seed to all configured systems. GAB must be seeded to enable the delivery of membership on client ports.
-RStart the rolling upgrade process. If cluster is running at the GAB protocol version lower than the version which all the cluster nodes can understand, this option will start the rolling upgrade process. Once rolling upgrade process is finished, GAB on all the nodes will start talking in the maximum supported version. The rolling upgrade process may fail if cluster membership changes during the process of rolling upgrade. This option must be used only on the lowest nodeid in the cluster.
-V protocol_version
 Specify the GAB protocol version. GAB can be configured to operate at any version within the version range returned by the -v option.
-f iofence_timeout
 Specify a timeout (in milliseconds) for clients to respond to an IOFENCE message before the system halts. When clients receive an IOFENCE message, they must unregister from the GAB driver within iofence_timeout milliseconds to avoid halting the system. The default is 15000ms or 15 seconds.
-n system_count
 Count of systems in the cluster. A non-zero system count auto-seeds the cluster when all systems are present. The default is zero, for no auto-seeding.
-t stable_timeout
 Specifies the time GAB waits to reconfigure membership after the last report from LLT of a change in the state of local node connections for a given port. Any change in the state of connections will restart GAB waiting period. stable_timeout applies during membership transitions. The default value for stable_timeout is five seconds. Note that message latency for connection state messages, typically less than one second, should be taken into consideration when calculating stable_timeout value.

DISCLAIMER

When LLT and GAB are running under a cluster manager other than VCS, configure LLT and GAB as per the cluster manager’s supplementary documentation on LLT and GAB.

-k, -b, -B, -f, -j, -J, -p, -P options are applicable only in VCS environment.

COPYRIGHTS

Copyright (c) 2012 Symantec.

All rights reserved.


+ Recent posts