.TH "LVMRAID" "7" "LVM TOOLS 2.02.168(2) (2016-11-30)" "Red Hat, Inc" "\"" .SH NAME lvmraid \(em LVM RAID .SH DESCRIPTION LVM RAID is a way to create logical volumes (LVs) that use multiple physical devices to improve performance or tolerate device failure. How blocks of data in an LV are placed onto physical devices is determined by the RAID level. RAID levels are commonly referred to by number, e.g. raid1, raid5. Selecting a RAID level involves tradeoffs among physical device requirements, fault tolerance, and performance. A description of the RAID levels can be found at .br www.snia.org/sites/default/files/SNIA_DDF_Technical_Position_v2.0.pdf LVM RAID uses both Device Mapper (DM) and Multiple Device (MD) drivers from the Linux kernel. DM is used to create and manage visible LVM devices, and MD is used to place data on physical devices. .SH Create a RAID LV To create a RAID LV, use lvcreate and specify an LV type. The LV type corresponds to a RAID level. The basic RAID levels that can be used are: .B raid0, raid1, raid4, raid5, raid6, raid10. .B lvcreate \-\-type .I RaidLevel [\fIOPTIONS\fP] .B \-\-name .I Name .B \-\-size .I Size .I VG [\fIPVs\fP] To display the LV type of an existing LV, run: .B lvs -o name,segtype \fIVG\fP/\fILV\fP (The LV type is also referred to as "segment type" or "segtype".) LVs can be created with the following types: .SS raid0 \& Also called striping, raid0 spreads LV data across multiple devices in units of stripe size. This is used to increase performance. LV data will be lost if any of the devices fail. .B lvcreate \-\-type raid0 [\fB\-\-stripes\fP \fINumber\fP \fB\-\-stripesize\fP \fISize\fP] \fIVG\fP [\fIPVs\fP] .HP .B \-\-stripes specifies the number of devices to spread the LV across. .HP .B \-\-stripesize specifies the size of each stripe in kilobytes. This is the amount of data that is written to one device before moving to the next. .P \fIPVs\fP specifies the devices to use. If not specified, lvm will choose \fINumber\fP devices, one for each stripe. .SS raid1 \& Also called mirroring, raid1 uses multiple devices to duplicate LV data. The LV data remains available if all but one of the devices fail. The minimum number of devices required is 2. .B lvcreate \-\-type raid1 [\fB\-\-mirrors\fP \fINumber\fP] \fIVG\fP [\fIPVs\fP] .HP .B \-\-mirrors specifies the number of mirror images in addition to the original LV image, e.g. \-\-mirrors 1 means there are two images of the data, the original and one mirror image. .P \fIPVs\fP specifies the devices to use. If not specified, lvm will choose \fINumber\fP devices, one for each image. .SS raid4 \& raid4 is a form of striping that uses an extra device dedicated to storing parity blocks. The LV data remains available if one device fails. The parity is used to recalculate data that is lost from a single device. The minimum number of devices required is 3. .B lvcreate \-\-type raid4 [\fB\-\-stripes\fP \fINumber\fP \fB\-\-stripesize\fP \fISize\fP] \fIVG\fP [\fIPVs\fP] .HP .B \-\-stripes specifies the number of devices to use for LV data. This does not include the extra device lvm adds for storing parity blocks. \fINumber\fP stripes requires \fINumber\fP+1 devices. \fINumber\fP must be 2 or more. .HP .B \-\-stripesize specifies the size of each stripe in kilobytes. This is the amount of data that is written to one device before moving to the next. .P \fIPVs\fP specifies the devices to use. If not specified, lvm will choose \fINumber\fP+1 separate devices. raid4 is called non-rotating parity because the parity blocks are always stored on the same device. .SS raid5 \& raid5 is a form of striping that uses an extra device for storing parity blocks. LV data and parity blocks are stored on each device. The LV data remains available if one device fails. The parity is used to recalculate data that is lost from a single device. The minimum number of devices required is 3. .B lvcreate \-\-type raid5 [\fB\-\-stripes\fP \fINumber\fP \fB\-\-stripesize\fP \fISize\fP] \fIVG\fP [\fIPVs\fP] .HP .B \-\-stripes specifies the number of devices to use for LV data. This does not include the extra device lvm adds for storing parity blocks. \fINumber\fP stripes requires \fINumber\fP+1 devices. \fINumber\fP must be 2 or more. .HP .B \-\-stripesize specifies the size of each stripe in kilobytes. This is the amount of data that is written to one device before moving to the next. .P \fIPVs\fP specifies the devices to use. If not specified, lvm will choose \fINumber\fP+1 separate devices. raid5 is called rotating parity because the parity blocks are placed on different devices in a round-robin sequence. There are variations of raid5 with different algorithms for placing the parity blocks. The default variant is raid5_ls (raid5 left symmetric, which is a rotating parity 0 with data restart.) See \fBRAID5 variants\fP below. .SS raid6 \& raid6 is a form of striping like raid5, but uses two extra devices for parity blocks. LV data and parity blocks are stored on each device. The LV data remains available if up to two devices fail. The parity is used to recalculate data that is lost from one or two devices. The minimum number of devices required is 5. .B lvcreate \-\-type raid6 [\fB\-\-stripes\fP \fINumber\fP \fB\-\-stripesize\fP \fISize\fP] \fIVG\fP [\fIPVs\fP] .HP .B \-\-stripes specifies the number of devices to use for LV data. This does not include the extra two devices lvm adds for storing parity blocks. \fINumber\fP stripes requires \fINumber\fP+2 devices. \fINumber\fP must be 3 or more. .HP .B \-\-stripesize specifies the size of each stripe in kilobytes. This is the amount of data that is written to one device before moving to the next. .P \fIPVs\fP specifies the devices to use. If not specified, lvm will choose \fINumber\fP+2 separate devices. Like raid5, there are variations of raid6 with different algorithms for placing the parity blocks. The default variant is raid6_zr (raid6 zero restart, aka left symmetric, which is a rotating parity 0 with data restart.) See \fBRAID6 variants\fP below. .SS raid10 \& raid10 is a combination of raid1 and raid0, striping data across mirrored devices. LV data remains available if one or more devices remains in each mirror set. The minimum number of devices required is 4. .B lvcreate \-\-type raid10 .RS [\fB\-\-mirrors\fP \fINumberMirrors\fP] .br [\fB\-\-stripes\fP \fINumberStripes\fP \fB\-\-stripesize\fP \fISize\fP] .br \fIVG\fP [\fIPVs\fP] .RE .HP .B \-\-mirrors specifies the number of mirror images within each stripe. e.g. \-\-mirrors 1 means there are two images of the data, the original and one mirror image. .HP .B \-\-stripes specifies the total number of devices to use in all raid1 images (not the number of raid1 devices to spread the LV across, even though that is the effective result). The number of devices in each raid1 mirror will be NumberStripes/(NumberMirrors+1), e.g. mirrors 1 and stripes 4 will stripe data across two raid1 mirrors, where each mirror is devices. .HP .B \-\-stripesize specifies the size of each stripe in kilobytes. This is the amount of data that is written to one device before moving to the next. .P \fIPVs\fP specifies the devices to use. If not specified, lvm will choose the necessary devices. Devices are used to create mirrors in the order listed, e.g. for mirrors 1, stripes 2, listing PV1 PV2 PV3 PV4 results in mirrors PV1/PV2 and PV3/PV4. RAID10 is not mirroring on top of stripes, which would be RAID01, which is less tolerant of device failures. .SH Synchronization Synchronization makes all the devices in a RAID LV consistent with each other. In a RAID1 LV, all mirror images should have the same data. When a new mirror image is added, or a mirror image is missing data, then images need to be synchronized. Data blocks are copied from an existing image to a new or outdated image to make them match. In a RAID 4/5/6 LV, parity blocks and data blocks should match based on the parity calculation. When the devices in a RAID LV change, the data and parity blocks can become inconsistent and need to be synchronized. Correct blocks are read, parity is calculated, and recalculated blocks are written. The RAID implementation keeps track of which parts of a RAID LV are synchronized. This uses a bitmap saved in the RAID metadata. The bitmap can exclude large parts of the LV from synchronization to reduce the amount of work. Without this, the entire LV would need to be synchronized every time it was activated. When a RAID LV is first created and activated the first synchronization is called initialization. Automatic synchronization happens when a RAID LV is activated, but it is usually partial because the bitmaps reduce the areas that are checked. A full sync may become necessary when devices in the RAID LV are changed. The synchronization status of a RAID LV is reported by the following command, where "image synced" means sync is complete: .B lvs -a -o name,sync_percent .SS Scrubbing Scrubbing is a full scan/synchronization of the RAID LV requested by a user. Scrubbing can find problems that are missed by partial synchronization. Scrubbing assumes that RAID metadata and bitmaps may be inaccurate, so it verifies all RAID metadata, LV data, and parity blocks. Scrubbing can find inconsistencies caused by hardware errors or degradation. These kinds of problems may be undetected by automatic synchronization which excludes areas outside of the RAID write-intent bitmap. The command to scrub a RAID LV can operate in two different modes: .B lvchange \-\-syncaction .BR check | repair .IR VG / LV .HP .B check Check mode is read\-only and only detects inconsistent areas in the RAID LV, it does not correct them. .HP .B repair Repair mode checks and writes corrected blocks to synchronize any inconsistent areas. .P Scrubbing can consume a lot of bandwidth and slow down application I/O on the RAID LV. To control the I/O rate used for scrubbing, use: .HP .B \-\-maxrecoveryrate .BR \fIRate [ b | B | s | S | k | K | m | M | g | G ] .br Sets the maximum recovery rate for a RAID LV. \fIRate\fP is specified as an amount per second for each device in the array. If no suffix is given, then KiB/sec/device is assumed. Setting the recovery rate to \fB0\fP means it will be unbounded. .HP .BR \-\-minrecoveryrate .BR \fIRate [ b | B | s | S | k | K | m | M | g | G ] .br Sets the minimum recovery rate for a RAID LV. \fIRate\fP is specified as an amount per second for each device in the array. If no suffix is given, then KiB/sec/device is assumed. Setting the recovery rate to \fB0\fP means it will be unbounded. .P To display the current scrubbing in progress on an LV, including the syncaction mode and percent complete, run: .B lvs -a -o name,raid_sync_action,sync_percent After scrubbing is complete, to display the number of inconsistent blocks found, run: .B lvs -o name,raid_mismatch_count Also, if mismatches were found, the lvs attr field will display the letter "m" (mismatch) in the 9th position, e.g. .nf # lvs -o name,vgname,segtype,attr vg/lvol0 LV VG Type Attr lvol0 vg raid1 Rwi-a-r-m- .fi .SS Scrubbing Limitations The \fBcheck\fP mode can only report the number of inconsistent blocks, it cannot report which blocks are inconsistent. This makes it impossible to know which device has errors, or if the errors affect file system data, metadata or nothing at all. The \fBrepair\fP mode can make the RAID LV data consistent, but it does not know which data is correct. The result may be consistent but incorrect data. When two different blocks of data must be made consistent, it chooses the block from the device that would be used during RAID intialization. However, if the PV holding corrupt data is known, lvchange \-\-rebuild can be used to reconstruct the data on the bad device. Future developments might include: Allowing a user to choose the correct version of data during repair. Using a majority of devices to determine the correct version of data to use in a three-way RAID1 or RAID6 LV. Using a checksumming device to pin-point when and where an error occurs, allowing it to be rewritten. .SH SubLVs An LV is often a combination of other hidden LVs called SubLVs. The SubLVs either use physical devices, or are built from other SubLVs themselves. SubLVs hold LV data blocks, RAID parity blocks, and RAID metadata. SubLVs are generally hidden, so the lvs \-a option is required display them: .B lvs -a -o name,segtype,devices SubLV names begin with the visible LV name, and have an automatic suffix indicating its role: .IP \(bu 3 SubLVs holding LV data or parity blocks have the suffix _rimage_#. These SubLVs are sometimes referred to as DataLVs. .IP \(bu 3 SubLVs holding RAID metadata have the suffix _rmeta_#. RAID metadata includes superblock information, RAID type, bitmap, and device health information. These SubLVs are sometimes referred to as MetaLVs. .P SubLVs are an internal implementation detail of LVM. The way they are used, constructed and named may change. The following examples show the SubLV arrangement for each of the basic RAID LV types, using the fewest number of devices allowed for each. .SS Examples .B raid0 .br Each rimage SubLV holds a portion of LV data. No parity is used. No RAID metadata is used. .nf lvcreate --type raid0 --stripes 2 --name lvr0 ... lvs -a -o name,segtype,devices lvr0 raid0 lvr0_rimage_0(0),lvr0_rimage_1(0) [lvr0_rimage_0] linear /dev/sda(...) [lvr0_rimage_1] linear /dev/sdb(...) .fi .B raid1 .br Each rimage SubLV holds a complete copy of LV data. No parity is used. Each rmeta SubLV holds RAID metadata. .nf lvcreate --type raid1 --mirrors 1 --name lvr1 ... lvs -a -o name,segtype,devices lvr1 raid1 lvr1_rimage_0(0),lvr1_rimage_1(0) [lvr1_rimage_0] linear /dev/sda(...) [lvr1_rimage_1] linear /dev/sdb(...) [lvr1_rmeta_0] linear /dev/sda(...) [lvr1_rmeta_1] linear /dev/sdb(...) .fi .B raid4 .br Two rimage SubLVs each hold a portion of LV data and one rimage SubLV holds parity. Each rmeta SubLV holds RAID metadata. .nf lvcreate --type raid4 --stripes 2 --name lvr4 ... lvs -a -o name,segtype,devices lvr4 raid4 lvr4_rimage_0(0),\\ lvr4_rimage_1(0),\\ lvr4_rimage_2(0) [lvr4_rimage_0] linear /dev/sda(...) [lvr4_rimage_1] linear /dev/sdb(...) [lvr4_rimage_2] linear /dev/sdc(...) [lvr4_rmeta_0] linear /dev/sda(...) [lvr4_rmeta_1] linear /dev/sdb(...) [lvr4_rmeta_2] linear /dev/sdc(...) .fi .B raid5 .br Three rimage SubLVs each hold a portion of LV data and parity. Each rmeta SubLV holds RAID metadata. .nf lvcreate --type raid5 --stripes 2 --name lvr5 ... lvs -a -o name,segtype,devices lvr5 raid5 lvr5_rimage_0(0),\\ lvr5_rimage_1(0),\\ lvr5_rimage_2(0) [lvr5_rimage_0] linear /dev/sda(...) [lvr5_rimage_1] linear /dev/sdb(...) [lvr5_rimage_2] linear /dev/sdc(...) [lvr5_rmeta_0] linear /dev/sda(...) [lvr5_rmeta_1] linear /dev/sdb(...) [lvr5_rmeta_2] linear /dev/sdc(...) .fi .B raid6 .br Six rimage SubLVs each hold a portion of LV data and parity. Each rmeta SubLV holds RAID metadata. .nf lvcreate --type raid6 --stripes 3 --name lvr6 lvs -a -o name,segtype,devices lvr6 raid6 lvr6_rimage_0(0),\\ lvr6_rimage_1(0),\\ lvr6_rimage_2(0),\\ lvr6_rimage_3(0),\\ lvr6_rimage_4(0),\\ lvr6_rimage_5(0) [lvr6_rimage_0] linear /dev/sda(...) [lvr6_rimage_1] linear /dev/sdb(...) [lvr6_rimage_2] linear /dev/sdc(...) [lvr6_rimage_3] linear /dev/sdd(...) [lvr6_rimage_4] linear /dev/sde(...) [lvr6_rimage_5] linear /dev/sdf(...) [lvr6_rmeta_0] linear /dev/sda(...) [lvr6_rmeta_1] linear /dev/sdb(...) [lvr6_rmeta_2] linear /dev/sdc(...) [lvr6_rmeta_3] linear /dev/sdd(...) [lvr6_rmeta_4] linear /dev/sde(...) [lvr6_rmeta_5] linear /dev/sdf(...) .B raid10 .br Four rimage SubLVs each hold a portion of LV data. No parity is used. Each rmeta SubLV holds RAID metadata. .nf lvcreate --type raid10 --stripes 2 --mirrors 1 --name lvr10 lvs -a -o name,segtype,devices lvr10 raid10 lvr10_rimage_0(0),\\ lvr10_rimage_1(0),\\ lvr10_rimage_2(0),\\ lvr10_rimage_3(0) [lvr10_rimage_0] linear /dev/sda(...) [lvr10_rimage_1] linear /dev/sdb(...) [lvr10_rimage_2] linear /dev/sdc(...) [lvr10_rimage_3] linear /dev/sdd(...) [lvr10_rmeta_0] linear /dev/sda(...) [lvr10_rmeta_1] linear /dev/sdb(...) [lvr10_rmeta_2] linear /dev/sdc(...) [lvr10_rmeta_3] linear /dev/sdd(...) .fi .SH Device Failure Physical devices in a RAID LV can fail or be lost for multiple reasons. A device could be disconnected, permanently failed, or temporarily disconnected. The purpose of RAID LVs (levels 1 and higher) is to continue operating in a degraded mode, without losing LV data, even after a device fails. The number of devices that can fail without the loss of LV data depends on the RAID level: .IP \[bu] 3 RAID0 (striped) LVs cannot tolerate losing any devices. LV data will be lost if any devices fail. .IP \[bu] 3 RAID1 LVs can tolerate losing all but one device without LV data loss. .IP \[bu] 3 RAID4 and RAID5 LVs can tolerate losing one device without LV data loss. .IP \[bu] 3 RAID6 LVs can tolerate losing two devices without LV data loss. .IP \[bu] 3 RAID10 is variable, and depends on which devices are lost. It can tolerate losing all but one device in a single raid1 mirror without LV data loss. .P If a RAID LV is missing devices, or has other device-related problems, lvs reports this in the health_status (and attr) fields: .B lvs -o name,lv_health_status .B partial .br Devices are missing from the LV. This is also indicated by the letter "p" (partial) in the 9th position of the lvs attr field. .B refresh needed .br A device was temporarily missing but has returned. The LV needs to be refreshed to use the device again (which will usually require partial synchronization). This is also indicated by the letter "r" (refresh needed) in the 9th position of the lvs attr field. See \fBRefreshing an LV\fP. This could also indicate a problem with the device, in which case it should be be replaced, see \fBReplacing Devices\fP. .B mismatches exist .br See .BR Scrubbing . Most commands will also print a warning if a device is missing, e.g. .br .nf WARNING: Device for PV uItL3Z-wBME-DQy0-... not found or rejected ... .fi This warning will go away if the device returns or is removed from the VG (see \fBvgreduce \-\-removemissing\fP). .SS Activating an LV with missing devices A RAID LV that is missing devices may be activated or not, depending on the "activation mode" used in lvchange: .B lvchange \-ay \-\-activationmode .RB { complete | degraded | partial } .IR VG / LV .B complete .br The LV is only activated if all devices are present. .B degraded .br The LV is activated with missing devices if the RAID level can tolerate the number of missing devices without LV data loss. .B partial .br The LV is always activated, even if portions of the LV data are missing because of the missing device(s). This should only be used to perform recovery or repair operations. .BR lvm.conf (5) .B activation/activation_mode .br controls the activation mode when not specified by the command. The default value is printed by: .nf lvmconfig --type default activation/activation_mode .fi .SS Replacing Devices Devices in a RAID LV can be replaced with other devices in the VG. When replacing devices that are no longer visible on the system, use lvconvert \-\-repair. When replacing devices that are still visible, use lvconvert \-\-replace. The repair command will attempt to restore the same number of data LVs that were previously in the LV. The replace option can be repeated to replace multiple PVs. Replacement devices can be optionally listed with either option. .B lvconvert \-\-repair .IR VG / LV [\fINewPVs\fP] .B lvconvert \-\-replace \fIOldPV\fP .IR VG / LV [\fINewPV\fP] .B lvconvert .B \-\-replace \fIOldPV1\fP .B \-\-replace \fIOldPV2\fP ... .IR VG / LV [\fINewPVs\fP] New devices require synchronization with existing devices, see .BR Synchronization . .SS Refreshing an LV Refreshing a RAID LV clears any transient device failures (device was temporarily disconnected) and returns the LV to its fully redundant mode. Restoring a device will usually require at least partial synchronization (see \fBSynchronization\fP). Failure to clear a transient failure results in the RAID LV operating in degraded mode until it is reactivated. Use the lvchange command to refresh an LV: .B lvchange \-\-refresh .IR VG / LV .nf # lvs -o name,vgname,segtype,attr,size vg LV VG Type Attr LSize raid1 vg raid1 Rwi-a-r-r- 100.00g # lvchange --refresh vg/raid1 # lvs -o name,vgname,segtype,attr,size vg LV VG Type Attr LSize raid1 vg raid1 Rwi-a-r--- 100.00g .fi .SS Automatic repair If a device in a RAID LV fails, device-mapper in the kernel notifies the .BR dmeventd (8) monitoring process (see \fBMonitoring\fP). dmeventd can be configured to automatically respond using: .BR lvm.conf (5) .B activation/raid_fault_policy Possible settings are: .B warn .br A warning is added to the system log indicating that a device has failed in the RAID LV. It is left to the user to repair the LV, e.g. replace failed devices. .B allocate .br dmeventd automatically attempts to repair the LV using spare devices in the VG. Note that even a transient failure is handled as a permanent failure; a new device is allocated and full synchronization is started. The specific command run by dmeventd to warn or repair is: .br .B lvconvert \-\-repair \-\-use\-policies .IR VG / LV .SS Corrupted Data Data on a device can be corrupted due to hardware errors, without the device ever being disconnected, and without any fault in the software. This should be rare, and can be detected (see \fBScrubbing\fP). .SS Rebuild specific PVs If specific PVs in a RAID LV are known to have corrupt data, the data on those PVs can be reconstructed with: .B lvchange \-\-rebuild PV .IR VG / LV The rebuild option can be repeated with different PVs to replace the data on multiple PVs. .SH Monitoring When a RAID LV is activated the \fBdmeventd\fP(8) process is started to monitor the health of the LV. Various events detected in the kernel can cause a notification to be sent from device-mapper to the monitoring process, including device failures and synchronization completion (e.g. for initialization or scrubbing). The LVM configuration file contains options that affect how the monitoring process will respond to failure events (e.g. raid_fault_policy). It is possible to turn on and off monitoring with lvchange, but it is not recommended to turn this off unless you have a thorough knowledge of the consequences. .SH Configuration Options There are a number of options in the LVM configuration file that affect the behavior of RAID LVs. The tunable options are listed below. A detailed description of each can be found in the LVM configuration file itself. .br mirror_segtype_default .br raid10_segtype_default .br raid_region_size .br raid_fault_policy .br activation_mode .SH RAID1 Tuning A RAID1 LV can be tuned so that certain devices are avoided for reading while all devices are still written to. .B lvchange .BR \-\- [ raid ] writemostly .BR \fIPhysicalVolume [ : { y | n | t }] .IR VG / LV The specified device will be marked as "write mostly", which means that reading from this device will be avoided, and other devices will be preferred for reading (unless no other devices are available.) This minimizes the I/O to the specified device. If the PV name has no suffix, the write mostly attribute is set. If the PV name has the suffix \fB:n\fP, the write mostly attribute is cleared, and the suffix \fB:t\fP toggles the current setting. The write mostly option can be repeated on the command line to change multiple devices at once. To report the current write mostly setting, the lvs attr field will show the letter "w" in the 9th position when write mostly is set: .B lvs -a -o name,attr When a device is marked write mostly, the maximum number of outstanding writes to that device can be configured. Once the maximum is reached, further writes become synchronous. When synchronous, a write to the LV will not complete until writes to all the mirror images are complete. .B lvchange .BR \-\- [ raid ] writebehind .IR IOCount .IR VG / LV To report the current write behind setting, run: .B lvs -o name,raid_write_behind When write behind is not configured, or set to 0, all LV writes are synchronous. .SH RAID Takeover RAID takeover is converting a RAID LV from one RAID level to another, e.g. raid5 to raid6. Changing the RAID level is usually done to increase or decrease resilience to device failures. This is done using lvconvert and specifying the new RAID level as the LV type: .B lvconvert --type .I RaidLevel \fIVG\fP/\fILV\fP [\fIPVs\fP] The most common and recommended RAID takeover conversions are: .HP \fBlinear\fP to \fBraid1\fP .br Linear is a single image of LV data, and converting it to raid1 adds a mirror image which is a direct copy of the original linear image. .HP \fBstriped\fP/\fBraid0\fP to \fBraid4/5/6\fP .br Adding parity devices to a striped volume results in raid4/5/6. .P Unnatural conversions that are not recommended include converting between striped and non-striped types. This is because file systems often optimize I/O patterns based on device striping values. If those values change, it can decrease performance. Converting to a higher RAID level requires allocating new SubLVs to hold RAID metadata, and new SubLVs to hold parity blocks for LV data. Converting to a lower RAID level removes the SubLVs that are no longer needed. Conversion often requires full synchronization of the RAID LV (see \fBSynchronization\fP). Converting to RAID1 requires copying all LV data blocks to a new image on a new device. Converting to a parity RAID level requires reading all LV data blocks, calculating parity, and writing the new parity blocks. Synchronization can take a long time and degrade performance (rate controls also apply to conversion, see \fB\-\-maxrecoveryrate\fP.) .P The following takeover conversions are currently possible: .br .IP \(bu 3 between linear and raid1. .IP \(bu 3 between striped and raid4. .SS Examples 1. Converting an LV from \fBlinear\fP to \fBraid1\fP. .nf # lvs -a -o name,segtype,size vg LV Type LSize lv linear 300.00g # lvconvert --type raid1 --mirrors 1 vg/lv # lvs -a -o name,segtype,size vg LV Type LSize lv raid1 300.00g [lv_rimage_0] linear 300.00g [lv_rimage_1] linear 300.00g [lv_rmeta_0] linear 3.00m [lv_rmeta_1] linear 3.00m .fi 2. Converting an LV from \fBmirror\fP to \fBraid1\fP. .nf # lvs -a -o name,segtype,size vg LV Type LSize lv mirror 100.00g [lv_mimage_0] linear 100.00g [lv_mimage_1] linear 100.00g [lv_mlog] linear 3.00m # lvconvert --type raid1 vg/lv # lvs -a -o name,segtype,size vg LV Type LSize lv raid1 100.00g [lv_rimage_0] linear 100.00g [lv_rimage_1] linear 100.00g [lv_rmeta_0] linear 3.00m [lv_rmeta_1] linear 3.00m .fi 3. Converting an LV from \fBlinear\fP to \fBraid1\fP (with 3 images). .nf Start with a linear LV: # lvcreate -L1G -n my_lv vg Convert the linear LV to raid1 with three images (original linear image plus 2 mirror images): # lvconvert --type raid1 --mirrors 2 vg/my_lv .fi .ig 4. Converting an LV from \fBstriped\fP (with 4 stripes) to \fBraid6_nc\fP. .nf Start with a striped LV: # lvcreate --stripes 4 -L64M -n my_lv vg Convert the striped LV to raid6_nc: # lvconvert --type raid6_nc vg/my_lv # lvs -a -o lv_name,segtype,sync_percent,data_copies LV Type Cpy%Sync #Cpy my_lv raid6_n_6 100.00 3 [my_lv_rimage_0] linear [my_lv_rimage_1] linear [my_lv_rimage_2] linear [my_lv_rimage_3] linear [my_lv_rimage_4] linear [my_lv_rimage_5] linear [my_lv_rmeta_0] linear [my_lv_rmeta_1] linear [my_lv_rmeta_2] linear [my_lv_rmeta_3] linear [my_lv_rmeta_4] linear [my_lv_rmeta_5] linear .fi This convert begins by allocating MetaLVs (rmeta_#) for each of the existing stripe devices. It then creates 2 additional MetaLV/DataLV pairs (rmeta_#/rimage_#) for dedicated raid6 parity. If rotating data/parity is required, such as with raid6_nr, it must be done by reshaping (see below). .. .SH RAID Reshaping RAID reshaping is changing attributes of a RAID LV while keeping the same RAID level, i.e. changes that do not involve changing the number of devices. This includes changing RAID layout, stripe size, or number of stripes. When changing the RAID layout or stripe size, no new SubLVs (MetaLVs or DataLVs) need to be allocated, but DataLVs are extended by a small amount (typically 1 extent). The extra space allows blocks in a stripe to be updated safely, and not corrupted in case of a crash. If a crash occurs, reshaping can just be restarted. (If blocks in a stripe were updated in place, a crash could leave them partially updated and corrupted. Instead, an existing stripe is quiesced, read, changed in layout, and the new stripe written to free space. Once that is done, the new stripe is unquiesced and used.) (The reshaping features are planned for a future release.) .ig .SS Examples 1. Converting raid6_n_6 to raid6_nr with rotating data/parity. This conversion naturally follows a previous conversion from striped to raid6_n_6 (shown above). It completes the transition to a more traditional RAID6. .nf # lvs -o lv_name,segtype,sync_percent,data_copies LV Type Cpy%Sync #Cpy my_lv raid6_n_6 100.00 3 [my_lv_rimage_0] linear [my_lv_rimage_1] linear [my_lv_rimage_2] linear [my_lv_rimage_3] linear [my_lv_rimage_4] linear [my_lv_rimage_5] linear [my_lv_rmeta_0] linear [my_lv_rmeta_1] linear [my_lv_rmeta_2] linear [my_lv_rmeta_3] linear [my_lv_rmeta_4] linear [my_lv_rmeta_5] linear # lvconvert --type raid6_nr vg/my_lv # lvs -a -o lv_name,segtype,sync_percent,data_copies LV Type Cpy%Sync #Cpy my_lv raid6_nr 100.00 3 [my_lv_rimage_0] linear [my_lv_rimage_0] linear [my_lv_rimage_1] linear [my_lv_rimage_1] linear [my_lv_rimage_2] linear [my_lv_rimage_2] linear [my_lv_rimage_3] linear [my_lv_rimage_3] linear [my_lv_rimage_4] linear [my_lv_rimage_5] linear [my_lv_rmeta_0] linear [my_lv_rmeta_1] linear [my_lv_rmeta_2] linear [my_lv_rmeta_3] linear [my_lv_rmeta_4] linear [my_lv_rmeta_5] linear .fi The DataLVs are larger (additional segment in each) which provides space for out-of-place reshaping. The result is: FIXME: did the lv name change from my_lv to r? .br FIXME: should we change device names in the example to sda,sdb,sdc? .br FIXME: include -o devices or seg_pe_ranges above also? .nf # lvs -a -o lv_name,segtype,seg_pe_ranges,dataoffset LV Type PE Ranges data r raid6_nr r_rimage_0:0-32 \\ r_rimage_1:0-32 \\ r_rimage_2:0-32 \\ r_rimage_3:0-32 [r_rimage_0] linear /dev/sda:0-31 2048 [r_rimage_0] linear /dev/sda:33-33 [r_rimage_1] linear /dev/sdaa:0-31 2048 [r_rimage_1] linear /dev/sdaa:33-33 [r_rimage_2] linear /dev/sdab:1-33 2048 [r_rimage_3] linear /dev/sdac:1-33 2048 [r_rmeta_0] linear /dev/sda:32-32 [r_rmeta_1] linear /dev/sdaa:32-32 [r_rmeta_2] linear /dev/sdab:0-0 [r_rmeta_3] linear /dev/sdac:0-0 .fi All segments with PE ranges '33-33' provide the out-of-place reshape space. The dataoffset column shows that the data was moved from initial offset 0 to 2048 sectors on each component DataLV. .. .SH RAID5 Variants raid5_ls .br \[bu] RAID5 left symmetric .br \[bu] Rotating parity N with data restart raid5_la .br \[bu] RAID5 left symmetric .br \[bu] Rotating parity N with data continuation raid5_rs .br \[bu] RAID5 right symmetric .br \[bu] Rotating parity 0 with data restart raid5_ra .br \[bu] RAID5 right asymmetric .br \[bu] Rotating parity 0 with data continuation .ig raid5_n .br \[bu] RAID5 striping .br \[bu] Same layout as raid4 with a dedicated parity N with striped data. .br \[bu] Used for .B RAID Takeover .. .SH RAID6 Variants raid6 .br \[bu] RAID6 zero restart (aka left symmetric) .br \[bu] Rotating parity 0 with data restart .br \[bu] Same as raid6_zr raid6_zr .br \[bu] RAID6 zero restart (aka left symmetric) .br \[bu] Rotating parity 0 with data restart raid6_nr .br \[bu] RAID6 N restart (aka right symmetric) .br \[bu] Rotating parity N with data restart raid6_nc .br \[bu] RAID6 N continue .br \[bu] Rotating parity N with data continuation .ig raid6_n_6 .br \[bu] RAID6 N continue .br \[bu] Fixed P-Syndrome N-1 and Q-Syndrome N with striped data .br \[bu] Used for .B RAID Takeover raid6_ls_6 .br \[bu] RAID6 N continue .br \[bu] Same as raid5_ls for N-1 disks with fixed Q-Syndrome N .br \[bu] Used for .B RAID Takeover raid6_la_6 .br \[bu] RAID6 N continue .br \[bu] Same as raid5_la for N-1 disks with fixed Q-Syndrome N .br \[bu] Used for .B RAID Takeover raid6_rs_6 .br \[bu] RAID6 N continue .br \[bu] Same as raid5_rs for N-1 disks with fixed Q-Syndrome N .br \[bu] Used for .B RAID Takeover raid6_ra_6 .br \[bu] RAID6 N continue .br \[bu] Same as raid5_ra for N-1 disks with fixed Q-Syndrome N .br \[bu] Used for .B RAID Takeover .. .ig .SH RAID Duplication RAID LV conversion (takeover or reshaping) can be done out\-of\-place by copying the LV data onto new devices while changing the RAID properties. Copying avoids modifying the original LV but requires additional devices. Once the LV data has been copied/converted onto the new devices, there are multiple options: 1. The RAID LV can be switched over to run from just the new devices, and the original copy of the data removed. The converted LV then has the new RAID properties, and exists on new devices. The old devices holding the original data can be removed or reused. 2. The new copy of the data can be dropped, leaving the original RAID LV unchanged and using its original devices. 3. The new copy of the data can be separated and used as a new independent LV, leaving the original RAID LV unchanged on its original devices. The command to start duplication is: .B lvconvert \-\-type .I RaidLevel [\fB\-\-stripes\fP \fINumber\fP \fB\-\-stripesize\fP \fISize\fP] .RS .B \-\-duplicate .IR VG / LV [\fIPVs\fP] .RE .HP .B \-\-duplicate .br Specifies that the LV conversion should be done out\-of\-place, copying LV data to new devices while converting. .HP .BR \-\-type , \-\-stripes , \-\-stripesize .br Specifies the RAID properties to use when creating the copy. .P \fIPVs\fP specifies the new devices to use. The steps in the duplication process: .IP \(bu 3 LVM creates a new LV on new devices using the specified RAID properties (type, stripes, etc) and optionally specified devices. .IP \(bu 3 LVM changes the visible RAID LV to type raid1, making the original LV the first raid1 image (SubLV 0), and the new LV the second raid1 image (SubLV 1). .IP \(bu 3 The RAID1 synchronization process copies data from the original LV image (SubLV 0) to the new LV image (SubLV 1). .IP \(bu 3 When synchronization is complete, the original and new LVs are mirror images of each other and can be separated. .P The duplication process retains both the original and new LVs (both SubLVs) until an explicit unduplicate command is run to separate them. The unduplicate command specifies if the original LV should use the old devices (SubLV 0) or the new devices (SubLV 1). To make the RAID LV use the data on the old devices, and drop the copy on the new devices, specify the name of SubLV 0 (suffix _dup_0): .B lvconvert \-\-unduplicate .BI \-\-name .IB LV _dup_0 .IR VG / LV To make the RAID LV use the data copy on the new devices, and drop the old devices, specify the name of SubLV 1 (suffix _dup_1): .B lvconvert \-\-unduplicate .BI \-\-name .IB LV _dup_1 .IR VG / LV FIXME: To make the LV use the data on the original devices, but keep the data copy as a new LV, ... FIXME: include how splitmirrors can be used. .SH RAID1E TODO .. .SH History The 2.6.38-rc1 version of the Linux kernel introduced a device-mapper target to interface with the software RAID (MD) personalities. This provided device-mapper with RAID 4/5/6 capabilities and a larger development community. Later, support for RAID1, RAID10, and RAID1E (RAID 10 variants) were added. Support for these new kernel RAID targets was added to LVM version 2.02.87. The capabilities of the LVM \fBraid1\fP type have surpassed the old \fBmirror\fP type. raid1 is now recommended instead of mirror. raid1 became the default for mirroring in LVM version 2.02.100.