A
RAID device is a
Redundant Array of Independent Disks. The concept was developed in 1987 at UC Berkeley and involves the creation of a virtual disk from multiple small disks in order to deliver improved performance and reliability. There are many flavors of RAID and lots of variations in how to implement it. We detail here a specific instance we use: software RAID1 using IDE disks on a Dell PowerEdge box running Debian "sarge" loaded with
grub, managed by
mdadm, using the ext3 journaling file system.
Overview
First, a list of references. None of these use exactly the combination of choices we use, but they provide all the pieces of information that are necessary:
-
The basic Software RAID HOWTO at the Linux Documentation Project. This provides general background information about the concepts and tools.
-
Philip McMahon's guide to using the bootloader
grub, which Debian now uses by default instead of lilo. Philip provides more explicit instructions about how to handle multiple partitions, but he doesn't use mdadm. Instead, he uses the older configuration management tool raidtools.
-
A detailed document
usr/share/doc/mdadm/rootraiddoc.97.html installed with the mdadm package. These instructions rely mostly on lilo but do have some comments about grub albeit in the context of using initrd as part of the boot process, which the latest sarge install doesn't do. Nevertheless, these instructions are primarily what we use here.
-
This brief comment highlights installing
grub on the second disk and makes clear how to generate a mdadm config file in /etc/mdadm.
So here is the process to convert an existing (or new) Debian box to software RAID1:
-
Configure the hardware.
-
Compile a RAID-savvy kernel and install
mdadm and hdparm.
-
Setup RAID1 with disk one "missing" and disk two operational. Copy over the disk one partition scheme to disk two.
-
Copy over disk one to disk two.
-
Configure /etc/fstab and grub on the RAID device.
-
Reformat the initial drive to 'fd'.
-
Reboot into the RAID device and add disk one into the RAID.
-
Test that the RAID can boot from either drive alone.
-
Optimize with
hdparm.
-
What to do when a drive fails.
1. Setup Hardware
The number of hard drives you need depends on the flavor of RAID you want. For RAID1 -- which is a simple mirror -- we need
two drives. These drives don't have to be the same size, though obviously the RAID will be the size of the smaller one. Also,
the drives don't need to be from the same manufacturer, though different drive geometries may result in peculiar problems, and
if you're going to the trouble of setting up RAID for a server, you might as well buy two identical drives.
IDE drives are run by controllers that can handle two drives, one a "master" and another a "slave". However, for a RAID1 setup,
both IDE drives need to be "masters" on their own channel. The problem with putting both drives on the same channel is this: if
the slave drive crashes, it will probably bring down the IDE controller also, which hoses the master drive as well. So if you
have only two IDE channels on your motherboard, you need to get another IDE controller (PCI IDE controllers are only $30-50 these days)
or else scavenge the second channel by disconnecting your CDROM drive. SCSI drives use controllers that function quite differently
and don't encounter this issue.
We'll assume at this point that you've got one drive -- /dev/hda -- with Debian installed and a second drive -- /dev/hdc -- that is
equal to or greater in size to /dev/hda. Each drive is "master" on its own IDE controller.
2. Compile Kernel
The kernel is best compiled with RAID capabilities built in. By using initrd, it's possible to load RAID in as a module, but the
default Debian install now doesn't use initrd, and besides there's no reason not to compile RAID in.
The mechanics of kernel compilation
are quite simple if you use the Debian kernel-package package.
In particular, we need to set several options for multi-device support and (if using IDE drives) options for DMA operation
of the hard drives. This probably means making sure that chipset-specific support for your IDE controllers is enabled.
Here are illustrative settings for a Dell PowerEdge 500SC box with ServerWorks CSB5 IDE Controllers:
# Multi-device support (RAID and LVM)
#
CONFIG_MD=y
CONFIG_BLK_DEV_MD=y
# CONFIG_MD_LINEAR is not set
# CONFIG_MD_RAID0 is not set
CONFIG_MD_RAID1=y
# CONFIG_MD_RAID5 is not set
# CONFIG_MD_MULTIPATH is not set
# CONFIG_BLK_DEV_LVM is not set
# CONFIG_BLK_DEV_DM is not set
# CONFIG_BLK_DEV_DM_MIRROR is not set
#
# IDE chipset support/bugfixes
#
CONFIG_BLK_DEV_CMD640=y
# CONFIG_BLK_DEV_CMD640_ENHANCED is not set
# CONFIG_BLK_DEV_ISAPNP is not set
CONFIG_BLK_DEV_IDEPCI=y
# CONFIG_BLK_DEV_GENERIC is not set
CONFIG_IDEPCI_SHARE_IRQ=y
CONFIG_BLK_DEV_IDEDMA_PCI=y
# CONFIG_BLK_DEV_OFFBOARD is not set
# CONFIG_BLK_DEV_IDEDMA_FORCED is not set
CONFIG_IDEDMA_PCI_AUTO=y
# CONFIG_IDEDMA_ONLYDISK is not set
CONFIG_BLK_DEV_IDEDMA=y
# CONFIG_IDEDMA_PCI_WIP is not set
# CONFIG_BLK_DEV_ADMA100 is not set
# CONFIG_BLK_DEV_AEC62XX is not set
# CONFIG_BLK_DEV_ALI15X3 is not set
# CONFIG_WDC_ALI15X3 is not set
# CONFIG_BLK_DEV_AMD74XX is not set
# CONFIG_AMD74XX_OVERRIDE is not set
# CONFIG_BLK_DEV_ATIIXP is not set
# CONFIG_BLK_DEV_CMD64X is not set
# CONFIG_BLK_DEV_TRIFLEX is not set
# CONFIG_BLK_DEV_CY82C693 is not set
# CONFIG_BLK_DEV_CS5530 is not set
# CONFIG_BLK_DEV_HPT34X is not set
# CONFIG_HPT34X_AUTODMA is not set
# CONFIG_BLK_DEV_HPT366 is not set
CONFIG_BLK_DEV_PIIX=y
# CONFIG_BLK_DEV_NS87415 is not set
# CONFIG_BLK_DEV_OPTI621 is not set
# CONFIG_BLK_DEV_PDC202XX_OLD is not set
# CONFIG_PDC202XX_BURST is not set
# CONFIG_BLK_DEV_PDC202XX_NEW is not set
CONFIG_BLK_DEV_RZ1000=y
# CONFIG_BLK_DEV_SC1200 is not set
CONFIG_BLK_DEV_SVWKS=y
# CONFIG_BLK_DEV_SIIMAGE is not set
# CONFIG_BLK_DEV_SIS5513 is not set
# CONFIG_BLK_DEV_SLC90E66 is not set
# CONFIG_BLK_DEV_TRM290 is not set
# CONFIG_BLK_DEV_VIA82CXXX is not set
# CONFIG_IDE_CHIPSETS is not set
CONFIG_IDEDMA_AUTO=y
# CONFIG_IDEDMA_IVB is not set
# CONFIG_DMA_NONPCI is not set
# CONFIG_BLK_DEV_ATARAID is not set
# CONFIG_BLK_DEV_ATARAID_PDC is not set
# CONFIG_BLK_DEV_ATARAID_HPT is not set
# CONFIG_BLK_DEV_ATARAID_MEDLEY is not set
# CONFIG_BLK_DEV_ATARAID_SII is not set
Once the kernel is compiled, installed, and successfully reboots, you need to confirm that the kernel indeed
is configured for RAID. This is done by checking /proc/mdstat which reports the "personalities" of RAID
the kernel is capable of:
$ cat /proc/mdstat
Personalities : [raid1]
read_ahead 1024 sectors
If you don't see any personalities in /proc/mdstat, then you need to redo the kernel. Similarly, if you see any raid
modules in /etc/modules or via lsmod, then you need to redo things. You want RAID compiled in the kernel!
You should also install mdadm and hdparm at this point.
dellboy:~# apt-get install mdadm hdparm
3. Setup RAID
This involves several steps. Here's the concept: for each of our original partitions (excluding swap since
we're not putting swap into the RAID but rather "striping" the swap -- see below)
/dev/hda1 ... /dev/hdan, we'll create RAID1 devices /dev/md0 ... /dev/md(n-1) with the /dev/hdax partition "missing"
and the /dev/hdcx present. We'll then copy over the contents of all the /dev/hdax partitions to /dev/hdcx, then boot to the new
RAID1 device, add back /dev/hda, and let the RAID1 system rebuild /dev/hda. Thus
-
Copy over the partition schema from the existing drive /dev/hda to the new drive /dev/hdc:
dellboy:~# mount
/dev/hda1 on / type ext3 (rw,errors=remount-ro)
proc on /proc type proc (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw)
/dev/hda5 on /tmp type ext3 (rw)
/dev/hda6 on /home type ext3 (rw)
/dev/hda7 on /usr type ext3 (rw)
/dev/hda8 on /var type ext3 (rw)
dellboy:~# sfdisk -l /dev/hda
Disk /dev/hda: 9729 cylinders, 255 heads, 63 sectors/track
Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0
Device Boot Start End #cyls #blocks Id System
/dev/hda1 * 0+ 11 12- 96358+ 83 Linux
/dev/hda2 12 254 243 1951897+ 82 Linux swap / Solaris
/dev/hda3 255 9728 9474 76099905 5 Extended
/dev/hda4 0 - 0 0 0 Empty
/dev/hda5 255+ 497 243- 1951866 83 Linux
/dev/hda6 498+ 2321 1824- 14651248+ 83 Linux
/dev/hda7 2322+ 4145 1824- 14651248+ 83 Linux
/dev/hda8 4146+ 9728 5583- 44845416 83 Linux
dellboy:~# sfdisk -d /dev/hda | sfdisk /dev/hdc
Apparently for some drives, sfdisk doesn't work right, and you may have to do it manually with cfdisk.
-
Set up the 'fd' partition signature on the new disk /dev/hdc:
dellboy:~# cfdisk /dev/hdc
For each of the partitions (except swap of course), set the partition type to 'fd' which is the RAID type. Then Write out
the partition table and Quit. We end up with this:
dellboy:~# sfdisk -l /dev/hdc
Disk /dev/hdc: 14593 cylinders, 255 heads, 63 sectors/track
Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0
Device Boot Start End #cyls #blocks Id System
/dev/hdc1 * 0+ 11 12- 96358+ fd Linux raid autodetect
/dev/hdc2 12 254 243 1951897+ 82 Linux swap / Solaris
/dev/hdc3 255 9728 9474 76099905 5 Extended
/dev/hdc4 0 - 0 0 0 Empty
/dev/hdc5 255+ 497 243- 1951866 fd Linux raid autodetect
/dev/hdc6 498+ 2321 1824- 14651248+ fd Linux raid autodetect
/dev/hdc7 2322+ 4145 1824- 14651248+ fd Linux raid autodetect
/dev/hdc8 4146+ 9728 5583- 44845416 fd Linux raid autodetect
-
Initialize the new swap (assuming that /dev/hdc2 is swap):
dellboy:~# mkswap /dev/hdc2
dellboy:~# swapon -a
-
Reboot -- to make sure that things still work as well as to initialize the changes to the partitions. NB: we're still booting to
/dev/hda at this point.
-
Create and format the new RAID1 devices with
mdadm and mkfs. Note how we pass mdadm
the two drive arguments -- the first one "missing" and the second one /dev/hdcx:
dellboy:~# mdadm --create /dev/md0 --level=1 --raid-disks=2 missing /dev/hdc1
dellboy:~# mkfs.ext3 /dev/md0
dellboy:~# mdadm --create /dev/md1 --level=1 --raid-disks=2 missing /dev/hdc5
dellboy:~# mkfs.ext3 /dev/md1
dellboy:~# mdadm --create /dev/md2 --level=1 --raid-disks=2 missing /dev/hdc6
dellboy:~# mkfs.ext3 /dev/md2
dellboy:~# mdadm --create /dev/md3 --level=1 --raid-disks=2 missing /dev/hdc7
dellboy:~# mkfs.ext3 /dev/md3
dellboy:~# mdadm --create /dev/md4 --level=1 --raid-disks=2 missing /dev/hdc8
dellboy:~# mkfs.ext3 /dev/md4
4. Copy the System
At this point, we mount the new RAID devices /dev/md0...n to mount points and copy over the appropriate stuff from /dev/hda, beginning
with the root partition and including all the others:
dellboy:~# mount /dev/md0 /mnt
dellboy:~# cp -dpRx / /mnt
dellboy:~# mount /dev/md1 /mnt/tmp
dellboy:~# cp -dpRx /tmp /mnt/tmp
dellboy:~# mount /dev/md2 /mtn/home
dellboy:~# cp -dpRx /home /mnt/home
dellboy:~# mount /dev/md3 /mnt/usr
dellboy:~# cp -dpRx /usr /mnt/usr
dellboy:~# mount /dev/md4 /mnt/var
dellboy:~# cp -dpRx /var /mnt/var
5. Configure the new /etc/fstab and grub
Edit with your favorite text editor fstab on the new RAID device. NB: we've disconnected our CDROM drive to get the
second IDE channel since we don't need a CDROM on this server. Note also the swap is left on /dev/hda2 and /dev/hdc2 and is
not put in the RAID; the "pri=1" option "stripes" the swap across the two drives. Copy this to the first drive.
dellboy:~# e3em /mnt/etc/fstab
# /etc/fstab: static file system information.
#
#
proc /proc proc defaults 0 0
/dev/md0 / ext3 defaults,errors=remount-ro 0 1
/dev/md1 /tmp ext3 defaults 0 2
/dev/md2 /home ext3 defaults 0 2
/dev/md3 /usr ext3 defaults 0 2
/dev/md4 /var ext3 defaults 0 2
/dev/hda2 none swap sw,pri=1 0 0
/dev/hdc2 none swap sw,pri=1 0 0
#/dev/hdc /media/cdrom0 iso9660 ro,user,noauto 0 0
/dev/fd0 /media/floppy0 auto rw,user,noauto 0 0
dellboy:~# cp -dp /mnt/etc/fstab /etc/fstab
Edit with your favorite text editor /boot/grub/menu.lst on the new RAID device, copy it to the first,
and install grub on the second drive.
dellboy:~# e3em /mnt/boot/grub/menu.lst
# add these entries at top of boot list
title Debian GNU/Linux, kernel 2.4.27 RAID
root (hd0,0)
kernel (hd0,0)/vmlinuz ro root=/dev/md0 md=0,/dev/hda1,/dev/hdc1
savedefault
boot
title Debian GNU/Linux, kernel 2.4.27 RAID Mirror Recovery
root (hd1,0)
kernel (hd1,0)/vmlinuz ro root=/dev/md0 md=0,/dev/hdc1
savedefault
boot
dellboy:~# cp -dp /mnt/boot/grub/menu.lst /boot/grub/menu.lst
dellboy:~# grub-install /dev/hdc
dellboy:~# grub
grub> device (hd0) /dev/hdc
grub> root (hd0,0)
grub> setup (hd0)
grub> quit
What this does is make grub think that either /dev/hda or /dev/hdc is equivalent to (hd0), the first hard drive the BIOS
finds during boot. In other words, this means that grub can boot from either drive when the other is out.
5. Reformat First Drive
We're nearly ready to reboot into the new RAID device, but first we need to reformat the initial hard drive so that
it can be synched with the second drive, which now has a copy of our entire system.
dellboy:~# cfdisk /dev/hda
For each of the partitions (except swap of course), set the partition type to 'fd' which is the RAID type. Then Write out
the partition table and Quit. We end up with this:
dellboy:~# sfdisk -l /dev/hda
Disk /dev/hda: 14593 cylinders, 255 heads, 63 sectors/track
Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0
Device Boot Start End #cyls #blocks Id System
/dev/hda1 * 0+ 11 12- 96358+ fd Linux raid autodetect
/dev/hda2 12 254 243 1951897+ 82 Linux swap / Solaris
/dev/hda3 255 9728 9474 76099905 5 Extended
/dev/hda4 0 - 0 0 0 Empty
/dev/hda5 255+ 497 243- 1951866 fd Linux raid autodetect
/dev/hda6 498+ 2321 1824- 14651248+ fd Linux raid autodetect
/dev/hda7 2322+ 4145 1824- 14651248+ fd Linux raid autodetect
/dev/hda8 4146+ 9728 5583- 44845416 fd Linux raid autodetect
6. Reboot into the RAID and add the first disk
Reboot the system, which will boot into the RAID device /dev/md0 -- the new root partition. Our first disk will still be
"missing" however, as shown by /proc/mdstat:
dellboy:~# cat /proc/mdstat
Personalities : [raid1]
read_ahead 1024 sectors
md0 : active raid1 hda1[2] hdc1[1]
96256 blocks [2/1] [_U]
md1 : active raid1 hdc5[1]
1951744 blocks [2/1] [_U]
md2 : active raid1 hdc6[1]
14651136 blocks [2/1] [_U]
md3 : active raid1 hdc7[1]
14651136 blocks [2/1] [_U]
md4 : active raid1 hdc8[1]
44845312 blocks [2/1] [_U]
unused devices:
We then use mdadm to add in the other volumes and then monitor /proc/mdstat until everything is synched:
dellboy:~# mdadm --add /dev/md0 /dev/hda1
dellboy:~# mdadm --add /dev/md1 /dev/hda5
dellboy:~# mdadm --add /dev/md2 /dev/hda6
dellboy:~# mdadm --add /dev/md3 /dev/hda7
dellboy:~# mdadm --add /dev/md4 /dev/hda8
dellboy:~# cat /proc/mdstat
Personalities : [raid1]
read_ahead 1024 sectors
md0 : active raid1 hdc1[1] hda1[0]
96256 blocks [2/2] [UU]
md1 : active raid1 hdc5[1] hda5[0]
1951744 blocks [2/2] [UU]
md2 : active raid1 hdc6[1] hda6[0]
14651136 blocks [2/2] [UU]
md3 : active raid1 hdc7[1] hda7[0]
14651136 blocks [2/2] [UU]
md4 : active raid1 hdc8[1] hda8[0]
44845312 blocks [2/2] [UU]
unused devices:
7. Test the RAID
We installed grub into both disks so we should be able to boot with either disk. To test this, shutdown and power off the computer,
unplug the power to one of the hard drives, then restart the computer. The computer should boot from the remaining disk.
It will reboot, that is, unless you have a brain-dead BIOS like the one Dell provides for the PowerEdge 500SC. What happens (even
with the latest BIOS revision A07) is that the BIOS detects the missing hard drive and waits for the user to press F1 to continue
or F2 to enter BIOS setup. There appears no way to step around this, so unattended reboot with a dead drive appears impossible
on this box. Totally lame. However, the good news is that with an adequate UPS and new drives, the likelihood of simultaneous
drive failure and reboot (usually from a power outage) is remote.
Here's some more info.
In any case, once the computer has rebooted, you'll see from /proc/mdstat that the one drive is missing. Shut down and power off
the computer again, then reconnect the drive's power. Reboot and now you can add the missing drive back in with mdadm.
Make certain to allow the drive volumes to re-synch completely before you do anything else. You then can repeat the
process with the other drive.
8. Optimize the RAID
For best performance, the IDE controllers should be using DMA - direct memory access. You can set this up from the command line,
but to set it up across reboots, you need to configure hdparm's defaults.
dellboy:~# hdparm -d1 -c3 /dev/hda /dev/hdc
dellboy:~# e3em /etc/default/hdparm
# To set the same options for a block of harddisks, do so with something
# like the following example options:
# harddisks="/dev/hda /dev/hdb"
# hdparm_opts="-d1 -X66"
# This is run before the configuration in hdparm.conf. Do not use
# this arrangement if you need modules loaded for your hard disks,
# or need udev to create the nodes, or have some other local quirk
# These are better addressed with the options in /etc/hdparm.conf
#
harddisks="/dev/hda /dev/hdc"
hdparm_opts="-d1 -c3"
Here's what we see after correct configuration:
dellboy:~# hdparm /dev/hda
/dev/hda:
multcount = 16 (on)
IO_support = 3 (32-bit w/sync)
unmaskirq = 0 (off)
using_dma = 1 (on)
keepsettings = 0 (off)
readonly = 0 (off)
readahead = 8 (on)
geometry = 9729/255/63, sectors = 156301488, start = 0
dellboy:~# hdparm /dev/hdc
/dev/hdc:
multcount = 16 (on)
IO_support = 3 (32-bit w/sync)
unmaskirq = 0 (off)
using_dma = 1 (on)
keepsettings = 0 (off)
readonly = 0 (off)
readahead = 8 (on)
geometry = 9729/255/63, sectors = 156301488, start = 0
9. What to do when a drive fails
The status of the raid disks is monitored continually by mdadm, and you can set it up to email an alert if one of the drives fails. If that happens, here's what you do. NB: This is based on what I've read in docs; I haven't actually had to test this, so proceed at your own risk. Presume that it is /dev/hda that has failed:
- Remove the faulty disk from the array. This involves removing each of the partitions. Make certain that you're removing the correct disk -- the faulty one! Removing the good disk will result in a very unhappy rest of the day.
mdadm --set-faulty /dev/md0 /dev/hda1
mdadm --remove /dev/md0 /dev/hda1
mdadm --set-faulty /dev/md1 /dev/hda5
mdadm --remove /dev/md1 /dev/hda5
mdadm --set-faulty /dev/md2 /dev/hda6
mdadm --remove /dev/md2 /dev/hda6
mdadm --set-faulty /dev/md3 /dev/hda7
mdadm --remove /dev/md3 /dev/hda7
mdadm --set-faulty /dev/md4 /dev/hda8
mdadm --remove /dev/md4 /dev/hda8
- Shutdown and power off the box.
- Physically remove the failed drive.
- Install a new drive.
- Restart the box. It should boot to the raid device -- and the new drive will show up as missing.
- Use
mdadm to add in the new drive as before. It appears that this automagically formats the new disk and copies all the data. However, it may be necessary first to copy over the good disk's partitions as we did before, and there certainly can be no harm in going through the formatting steps for the new drive.
- Confirm via
cat /proc/mdstat that the raid has rebuilt itself using the new drive.
Comments |
| i have a problem with hdparmwhen i type :
#hdparm -d1 -c3 /dev/sda
its say :
/dev/sda:
setting 32-bit IO_support flag to 3
HDIO_SET_32BIT failed: Invalid argument
setting using_dma to 1 (on)
HDIO_SET_DMA failed: Invalid argument
|
| Probably a problem with SCSI driveYou appear to be trying to do this with a SCSI drive (/dev/sda). This discussion only applies to IDE disks.
|
| Couple of errors and a questionHey,
what exactly does this do:
dellboy:~# grub
grub> device (hd0) /dev/hdc
grub> root (hd0,0)
grub> setup (hd0)
grub> quit
?
Some info would be nice on what to do when one of the drives fail. Does this thing have to be done again on the missing drive (whatever it does)?
Also, some typos:
dellboy:~# mdadm --add /dev/md0 /dev/hd1
should be:
dellboy:~# mdadm --add /dev/md0 /dev/hda1
Search for:
"with mdadm and msfs"
should be:
"with mdadm and mkfs"
//edit: also newlines don't get converted to hr tags in these comments properly, had to make double newlines or the text would all appear on one line (using FF).
Regards,
Jaka
|
| ThanksJaka, thanks for pointing out the typos; they're fixed in the document now. Also I've added a section on dealing with a failed drive. Note that I've not had to test this myself yet.
The function of the grub stuff you ask about is to make grub view either physical disk as a boot disk, as explained in the document. What that means at a lower "exact" level than that is unknown to me, but it doesn't seem important to understand that just to make this work.
This comment system isn't a wiki, so the kinds of formatting you were trying isn't supported. If you edit as html, you can use minimal html tags, but then line returns won't show up; you have to use paragraph tags. If you edit as plain text, then the line returns you type into your post will indeed display correctly.
|
If you were a registered user of the Epimetrics site, you would be able to add comments and not just read them.