Red Hat Enterprise Linux 4
Enterprise Storage Quickstart
By Rod Nayfield
(nayfield at redhat dot com)
A hands-on guide to enterprise storage features of Red Hat Enterprise Linux 4, including Fiber Channel connected volume management and multipath I/O.
Revised 02/21/06
Introduction
Recent releases of Red Hat Enterprise Linux include new features for advanced volume management and multipath disk I/O. This document provides a brief tour of the basic use of these features, and can be used to familiarize yourself with how LVM2 and device-mapper-multipath works, and provide a jumping off point for your own storage work.
The multipath features in RHEL 4 operate via device-mapper, a light-weight kernel component that enables the creation of block devices to support volume management, such as LVM2. Device-mapper-multipath allows you to multipath across any HBA supported by the kernel, regardless of vendor. It is possible to have eight paths with differing priorities on a single system. With grouping policies, you can effectively utilize different speed HBAs, different storage controllers, and create complex environments. The tools are also extensible if you desire behaviors which are not included by default.
The file /usr/share/doc/device-mapper-multipath-0.4.5/README contains more information on the supported controllers, which include most active-active SANs as well as active-passive storage such as the CX products. The file multipath.conf.defaults contains more information.
Configuration Used
The configuration described within was tested on a 1U 32-bit x86 server with two FC HBAs. The HBAs used were a Qlogic 2100 and a Qlogic 2200, in order to test different models. The SAN was a DotHill active/active system which presented three LUNs to the host.
I installed RHEL 4 Update 3, only selecting the web-server package group.
Let's dig in!
Initial Baseline
After the system boots, let's see what we can see:
# more /proc/scsi/scsi Attached devices: Host: scsi0 Channel: 00 Id: 00 Lun: 00 Vendor: SEAGATE Model: ST318305LC Rev: 2203 Type: Direct-Access ANSI SCSI revision: 03 Host: scsi0 Channel: 00 Id: 06 Lun: 00 Vendor: DELL Model: 1x3 U2W SCSI BP Rev: 1.21 Type: Processor ANSI SCSI revision: 02 Host: scsi2 Channel: 00 Id: 00 Lun: 00 Vendor: DotHill Model: SANnet RAID X300 Rev: 0315 Type: Direct-Access ANSI SCSI revision: 03 Host: scsi2 Channel: 00 Id: 00 Lun: 04 Vendor: DotHill Model: SANnet RAID X300 Rev: 0315 Type: Direct-Access ANSI SCSI revision: 03 Host: scsi2 Channel: 00 Id: 00 Lun: 06 Vendor: DotHill Model: SANnet RAID X300 Rev: 0315 Type: Direct-Access ANSI SCSI revision: 03 Host: scsi4 Channel: 00 Id: 00 Lun: 00 Vendor: DotHill Model: SANnet RAID X300 Rev: 0315 Type: Direct-Access ANSI SCSI revision: 03 Host: scsi4 Channel: 00 Id: 00 Lun: 04 Vendor: DotHill Model: SANnet RAID X300 Rev: 0315 Type: Direct-Access ANSI SCSI revision: 03 Host: scsi4 Channel: 00 Id: 00 Lun: 06 Vendor: DotHill Model: SANnet RAID X300 Rev: 0315 Type: Direct-Access ANSI SCSI revision: 03
This shows us our two controllers, and each sees three LUNs.
If you need re-do a SCSI scan, you can do
echo "- - -" > /sys/class/scsi_host/host0/scan
Where host0 is replaced by the HBA you wish to use. You also can do a fabric rediscover like this:
echo “1” > /sys/class/fc_host/host0/issue_lip echo "- - -" > /sys/class/scsi_host/host0/scan
This will send a LIP (loop initialization primitive) to the fabric. During the initialization, HBA access may be slow and/or experience timeouts.
Install and Configure Multipath
The only package not installed by default is device-mapper-multipath. I installed it like so:
[root@clu1 RPMS]# rpm -Uvh device-mapper-multipath-0.4.5-8.0.RHEL4.i386.rpm warning: device-mapper-multipath-0.4.5-8.0.RHEL4.i386.rpm: V3 DSA signature: NOKEY, key ID 897da07a Preparing... ########################################### [100%] 1:device-mapper-multipath########################################### [100%]
Multipathing is off by default. This prevents unexpected behavior on a default installation where it is not needed.
I edited the /etc/multipath.conf and commented out the blacklist everything stanza
#devnode_blacklist { # devnode "*" #}
It is possible to blacklist devices in this section, more information is available in multipath.conf.annotated. This configuration file also contains information on adding new device types, custom actions, as well as grouping and prioritizing paths.
Let's start the multipath daemon. This service is responsible for restoring failed paths automatically.
[root@clu1 RPMS]# chkconfig multipathd on [root@clu1 RPMS]# service multipathd start Starting multipathd daemon: [ OK ]
Now we need to load the dm_multipath module ...
[root@clu1 RPMS]# modprobe dm_multipath
... and turn on multipathing. Running `multipath` creates and updates the devmaps. The -v2 option prints out extended information but is not required.
[root@clu1 ~]# multipath -v2 create: mpath1 (3600d0230003228bc000339414edb8102) [size=52 GB][features="0"][hwhandler="0"] \_ round-robin 0 [prio=1] \_ 2:0:0:0 sdb 8:16 [ready] \_ round-robin 0 [prio=1] \_ 3:0:0:0 sde 8:64 [ready] create: mpath2 (3600d0230003228bc000339414edb8100) [size=58 GB][features="0"][hwhandler="0"] \_ round-robin 0 [prio=1] \_ 2:0:0:4 sdc 8:32 [ready] \_ round-robin 0 [prio=1] \_ 3:0:0:4 sdf 8:80 [ready] create: mpath3 (3600d0230003228bc000339414edb8101) [size=58 GB][features="0"][hwhandler="0"] \_ round-robin 0 [prio=1] \_ 2:0:0:6 sdd 8:48 [ready] \_ round-robin 0 [prio=1] \_ 3:0:0:6 sdg 8:96 [ready]
Out of the box, RHEL multipathing creates friendly names such as /dev/mapper/mpath1. These names are persistently bound to the WWID (shown in parenthesis) and are stored in /var/lib/multipath/bindings, and persist across reboots. It is possible to comment out the user_friendly_names option and instead of seeing devices like /dev/mapper/mpath1 you will see /dev/mapper/3600d0230003228bc000339414edb8102.
The devices sdd and sdg represent single path devices and should not be used when the multipath device (mpath3) is in use.
I have decided I want to do multipath on the mpath3 LUN. Note that you can run vgscan if the LUNs already contain LVM information.
Let's lookat the multipath information for mpath3.
[root@clu1 ~]# multipath -ll mpath3 mpath3 (3600d0230003228bc000339414edb8101) [size=58 GB][features="0"][hwhandler="0"] \_ round-robin 0 [prio=1][active] \_ 2:0:0:6 sdd 8:48 [active][ready] \_ round-robin 0 [prio=1][enabled] \_ 3:0:0:6 sdg 8:96 [active][ready]
You can see that the LUN is available via two paths, 2:0:0:6 and 3:0:0:6. There are two priority groups of equal priority, and each group has one device. The system will fail-over between these groups. This is due to the default of “path_grouping_policy failover” in multipath.conf.
Placing multiple paths within a group will provide round-robin between the members (note even the single-member groups above are listed as round-robin). You can accomplish this by “path_grouping_policy multibus”. Adding this to the defaults section in multipath.conf results in the following output: (note: `multipath` was run first to reload the paths)
[root@clu1 ~]# multipath -ll mpath3 mpath3 (3600d0230003228bc000339414edb8101) [size=58 GB][features="0"][hwhandler="0"] \_ round-robin 0 [prio=2][enabled] \_ 2:0:0:6 sdd 8:48 [active][ready] \_ 3:0:0:6 sdg 8:96 [active][ready]
It is possible to create groups by serial, node name, and prioirity. See multipath.conf for more information.
LVM2 configuration
LVM2 has three logical components – the physical volume (pv), a logical volume group (vg) and the logical volume (lv).
First we will mark the LUN as a physical volume.
[root@clu1 ~]# pvcreate /dev/mapper/mpath3 Physical volume "/dev/mapper/mpath3" successfully created
Now I create a new volume group, new_volgrp. It only contains the pv we created, however you can add pv's in the future.
[root@clu1 ~]# vgcreate new_volgrp /dev/mapper/mpath3 Volume group "new_volgrp" successfully created
I have decided to create a 600M logical volume in the new volume group. The space not used in the group can be used to create additional lv's, or used to grow our new logical volume. Adding another PV to the group is trivial and allows you to grow for future needs. Again, it is just a single command to create the LV:
[root@clu1 ~]# lvcreate -L 600M -n new_lvol new_volgrp Logical volume "new_lvol" created
Now I will make a default ext3 filesystem on the logical volume.
[root@clu1 ~]# mkfs -t ext3 /dev/new_volgrp/new_lvol mke2fs 1.35 (28-Feb-2004) Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) 76800 inodes, 153600 blocks 7680 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=159383552 5 block groups 32768 blocks per group, 32768 fragments per group 15360 inodes per group Superblock backups stored on blocks: 32768, 98304 Writing inode tables: done Creating journal (4096 blocks): done Writing superblocks and filesystem accounting information: done This filesystem will be automatically checked every 29 mounts or 180 days, whichever comes first. Use tune2fs -c or -i to override.
Make a label:
[root@clu1 ~]# e2label /dev/new_volgrp/new_lvol TESTONE
Add to fstab
LABEL=TESTONE /mnt/TEST ext3 defaults 1 2
Make the mountpoint and mount
[root@clu1 ~]# mkdir /mnt/TEST [root@clu1 ~]# mount /mnt/TEST/
Our new filesystem has been mounted.
Testing Failures
Earlier, I mentioned that I was using different HBAs. The advantage for a lab environment is that I can simulate a failure by unloading one of the HBA drivers but not the other. Let's look at the multipath information, fail an adapter, and look again:
[root@clu1 ~]# multipath -ll mpath3 mpath3 (3600d0230003228bc000339414edb8101) [size=58 GB][features="0"][hwhandler="0"] \_ round-robin 0 [prio=1][active] \_ 3:0:0:6 sdg 8:96 [active][ready] \_ round-robin 0 [prio=1][enabled] \_ 4:0:0:6 sdj 8:144 [active][ready] [root@clu1 ~]# rmmod qla2100 [root@clu1 ~]# multipath -ll mpath3 mpath3 (3600d0230003228bc000339414edb8101) [size=58 GB][features="0"][hwhandler="0"] \_ round-robin 0 [enabled] \_ #:#:#:# - 8:96 [failed][faulty] \_ round-robin 0 [prio=1][active] \_ 4:0:0:6 sdj 8:144 [active][ready]
Let's bring it back:
[root@clu1 ~]# modprobe qla2100 [root@clu1 ~]# multipath -ll mpath3 mpath3 (3600d0230003228bc000339414edb8101) [size=58 GB][features="0"][hwhandler="0"] \_ round-robin 0 [enabled] \_ #:#:#:# - 8:96 [failed][faulty] \_ round-robin 0 [prio=1][active] \_ 4:0:0:6 sdj 8:144 [active][ready] [root@clu1 ~]# multipath -ll mpath3 mpath3 (3600d0230003228bc000339414edb8101) [size=58 GB][features="0"][hwhandler="0"] \_ round-robin 0 [prio=1][active] \_ 4:0:0:6 sdj 8:144 [active][ready] \_ round-robin 0 [prio=1][enabled] \_ 5:0:0:6 sdd 8:48 [active][ready]
Multipathd brought the controller back into the configuration after a few seconds as the adapter initialized.
Growing a parition with LVM
Let's use some of that extra space in our volume group to expand our filesystem.
[root@clu1 ~]# vgdisplay new_volgrp --- Volume group --- VG Name new_volgrp System ID Format lvm2 Metadata Areas 1 Metadata Sequence No 4 VG Access read/write VG Status resizable MAX LV 0 Cur LV 1 Open LV 1 Max PV 0 Cur PV 1 Act PV 1 VG Size 58.59 GB PE Size 4.00 MB Total PE 14999 Alloc PE / Size 150 / 600.00 MB Free PE / Size 14849 / 58.00 GB VG WWID YWltLw-v4CV-6gC1-6CZm-WoCw-1A9w-UVppAV [root@clu1 ~]# df Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda2 16682200 3195684 12639088 21% / /dev/sda1 147764 8825 131310 7% /boot none 127992 0 127992 0% /dev/shm /dev/sda5 147764 5664 134471 5% /spare /dev/mapper/new_volgrp-new_lvol 604736 16880 557136 3% /mnt/TEST
So we see that the current filesystem has 604736 blocks. Let's add 40G. If we didn't have 40G of free space in our VG we could add a PV to the VG first. We need two commands, the first will extend the logical volume. The second command resizes the ext3 filesystem while it is on-line.
[root@clu1 ~]# lvextend -L +40G /dev/new_volgrp/new_lvol Extending logical volume new_lvol to 40.59 GB Logical volume new_lvol successfully resized [root@clu1 ~]# ext2online /mnt/TEST/ ext2online v1.1.18 - 2001/03/18 for EXT2FS 0.5b [root@clu1 ~]# df Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda2 16682200 3195696 12639076 21% / /dev/sda1 147764 8825 131310 7% /boot none 127992 0 127992 0% /dev/shm /dev/sda5 147764 5664 134471 5% /spare /dev/mapper/new_volgrp-new_lvol 41930648 18116 39787220 1% /mnt/TEST
Now we can see that we have another 40G of space on our filesystem.
Appendix: Useful Commands
Get some information on the multipath, including the WWID:
[root@clu1 ~]# dmsetup info mpath3 Name: mpath3 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 1 Major, minor: 253, 2 Number of targets: 1 WWID: mpath-3600d0230003228bc000339414edb8101
Show the multipath devices configured.
[root@clu1 ~]# dmsetup ls --target=multipath mpath2 (253, 1) mpath1 (253, 0) mpath3 (253, 2)
Determine which multipath and WWID a /dev/sd* device maps to:
[root@clu1 ~]# multipath -l /dev/sdd mpath3 (3600d0230003228bc000339414edb8101) [size=58 GB][features="0"][hwhandler="0"] \_ round-robin 0 [active] \_ 2:0:0:6 sdd 8:48 [active][ready] \_ round-robin 0 [enabled] \_ 3:0:0:6 sdg 8:96 [active][ready]
Get a WWID from a sd* device (without using device mapper):
[root@clu1 ~]# scsi_id -g -s /block/sdd 3600d0230003228bc000339414edb8101
Note that this begins with “3” and therefore is page 83 type 3. Good news.
NOTE: If your system returns a value with a leading 1, you are looking at page 83 type 1, the T10 vendor ID based identifier. You will need to check with your vendor to see what they are providing – if they are providing the serial number of the controller, that will not be unique per LU and cause issues.
Appendix: Tuning
Setting “no_path_retry queue” in your multipath configuration (see multipath.conf.annotated) will set your devices to queue I/O forever. This will keep transient SAN issues (that affect both paths) from causing I/O errors. You probably want to test failover (and multipathd's ability to restore a path!) before implementing this.
You also will want to propagate errors up to the multipath level sooner – instead of retrying a particular HBA for several rounds, you may want to fail it and use the alternate path. [TODO: add info on how to tune this in modprobe.conf]
Links
Device-mapper resource page:
More information on LVM2 and device-mapper:
http://people.redhat.com/agk/talks/
TODO
Tuning failover on the HBA module options (ql2xloginretrycount and the like)
Using udev for persistence without device-mapper like
BUS="scsi", PROGRAM="/sbin/scsi_id", NAME="disk%c%n"
or something like that.
remove and add devices by hand (something like:)
echo "1" > /sys/bus/scsi/devices/0:0:1:1/rescan ...
examples of more complex configs – grouping, >2paths, etc.
Can I boot off of SAN? (mkinitrd rebuild)
Other stuff:
#echo “1” > /sys/bus/scsi/devices/0:0:1:1/delete
# cat /sys/bus/scsi/devices/0:0:1:1/block:sda/dev
8:0
# ls -l /sys/bus/scsi/devices/0:0:1:1/
This document is provided as-is and represents my own experiences and is not official advice from Red Hat, Inc.
'OS > Linux' 카테고리의 다른 글
[Linux] The Linux 2.4 SCSI subsystem HOWTO - Mid Level, Unifying layer (0) | 2009.07.02 |
---|---|
[Linux] The Linux 2.4 SCSI subsystem HOWTO (0) | 2009.07.02 |
[Linux] LVM 커맨드 pvs/vgs/lvs (0) | 2009.06.24 |
[Linux] Red Hat Linux - 시스템 리부팅없이 스토리지 LUN 인식하는 방법 (0) | 2009.06.09 |
[Linux] Recovery of RAID and LVM2 Volumes (0) | 2009.04.06 |
[Linux] 리눅스 시스템 최적화 튜닝 (0) | 2009.02.11 |
[Linux] 커널 - 프로세스간 통신 메커니즘 (0) | 2009.02.11 |
[Linux] core dump (0) | 2009.02.11 |
[Linux] Using Linux Logical Volume Manager with SDD (0) | 2009.01.08 |
[Linux] Red Hat Virtualization (Xen) (0) | 2008.12.26 |