Vous êtes sur la page 1sur 7

RAID-0 (stripe) on solaris 10 using solaris volume manager

By bilke on Mar 23, 2007


Motivation for this how to is partially implementation and excellent idea by Nemanja Lukic that it's
several time faster to delete whole zone by issuing newfs than to delete all zone files using rm, so that
each zone on our testing machine should be on separate FS. And it's not just about deleting zones;
speed is significant factor too, and also usage of other FS tools like ufsdump, ufsrestore, fssnap etc,
which is possible only if your zones are on separate file systems. So we have 4 zones and 3 hard drives
(we actually have 4 hard drives, but we can use only 3 for this purpose, since first drive is system
drive). We could of course create 2 slices on one drive, and one slice per remaining 2 drives, but that
would be so uncool :). Cool stuff is to use Solaris volume manager, create RAID-0 (Stripe)
metadevice/slice out of 3 hard drives, and then create 4 so called soft partitions within that metadevice.
Using this approach way we can have exactly same size per soft partition, and all 3 hard drives will be
used completely. Platform is brand new x4100, with 4 identical hard drives, 72 GB each; Operating
System: Solaris 10 u3. Oh yes, if you are asking why I'm not using zfs, it's because software that is
meant to be tested on zones is not supported (yet) on zfs :(.
First step is to prepare hard drives for raid 0. So we will create one big partition that will span across
whole drive. Run format and select first drive:

root@jsc-x4100-17:~# format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c2t0d0
/pci@7b,0/pci1022,7458@11/pci1000,3060@2/sd@0,0
1. c2t1d0
/pci@7b,0/pci1022,7458@11/pci1000,3060@2/sd@1,0
2. c2t2d0
/pci@7b,0/pci1022,7458@11/pci1000,3060@2/sd@2,0
3. c2t3d0
/pci@7b,0/pci1022,7458@11/pci1000,3060@2/sd@3,0
Specify disk (enter its number): 1
selecting c2t1d0
[disk formatted]
FORMAT MENU:
disk
- select a disk
type
- select (define) a disk type
partition - select (define) a partition table
current
- describe the current disk
format
- format and analyze the disk
fdisk
- run the fdisk program
repair
- repair a defective sector
label
- write label to the disk
analyze
- surface analysis
defect
- defect list management
backup
- search for backup labels
verify
- read and display labels
save
- save new disk/partition definitions

inquiry
- show vendor, product and revision
volname
- set 8-character volume name
!
- execute , then return
quit
format>p
PARTITION MENU:
0
- change `0' partition
1
- change `1' partition
2
- change `2' partition
3
- change `3' partition
4
- change `4' partition
5
- change `5' partition
6
- change `6' partition
7
- change `7' partition
select - select a predefined table
modify - modify a predefined partition table
name
- name the current table
print - display the current table
label - write partition map and label to the disk
! - execute , then return
quit
partition> 0
Part
Tag
Flag
Cylinders
Size
Blocks
0
home
wm
1 - 8920
68.33GB
(8920/0/0) 143299800
Enter partition id tag[home]:home
Enter partition permission flags[wm]:wm
Enter new starting cyl[1]: 1
Enter partition size[143299800b, 8920c, 8920e, 69970.61mb, 68.33gb]: $
partition> label
Ready to label disk, continue? y
partition> q
FORMAT MENU:
disk
- select a disk
type
- select (define) a disk type
partition - select (define) a partition table
current
- describe the current disk
format
- format and analyze the disk
fdisk
- run the fdisk program
repair
- repair a defective sector
label
- write label to the disk
analyze
- surface analysis
defect
- defect list management
backup
- search for backup labels
verify
- read and display labels
save
- save new disk/partition definitions
inquiry
- show vendor, product and revision
volname
- set 8-character volume name
!
- execute , then return
quit
format> q

At this point we have disk 1 partitioned with slice 0 spanning from cyl 1 to the end of drive $. Instead
of repeating same steps for disk 2 and disk 3, we will use Solaris prtvtoc to print disk's 1 partition table
and fmthard to apply that table to disk 2 and 3 (all disks are identical).

root@jsc-x4100-17:~#
root@jsc-x4100-17:~#
fmthard: New volume
root@jsc-x4100-17:~#
fmthard: New volume

prtvtoc /dev/rdsk/c2t1d0s2 > /var/tmp/prtvtoc.c2t1d0s2


fmthard -s /var/tmp/prtvtoc.c2t1d0s2 /dev/rdsk/c2t2d0s2
table of contents now in place.
fmthard -s /var/tmp/prtvtoc.c2t1d0s2 /dev/rdsk/c2t3d0s2
table of contents now in place.

Next step is to create replicas of metadevice state database. Metadevice database contains configuration
and state of all metadevices and hot spare pools on the system. Since this information is important, we
will be creating 3 replicas of this database, one per each drive. Metadevice state database can be
created on any slice on hard drive, including slice that will later became part of metadevice. Also it's
possible to create more than 1 replica of database per one slice. If one or more metadevice state
databases fails, volume management compare other databases and based on majority consensus
algorithm decides which replicas are valid. Command to create metadevice replicas is metadb.

root@jsc-x4100-17:~# metadb -a -f c2t1d0s0 c2t2d0s0 c2t3d0s0

-a is to add database replicas, and -f is to force adding (we have to force adding since there no
metadevice state replicas exists). Use metadb -i to check state of metadevice replicas. In our case we
can see that replicas are active a flag, and that they are up to date u flag

root@jsc-x4100-17:~# metadb -i
flags
first blk
block count
a
u
16
8192
/dev/dsk/c2t1d0s0
a
u
16
8192
/dev/dsk/c2t2d0s0
a
u
16
8192
/dev/dsk/c2t3d0s0
r - replica does not have device relocation information
o - replica active prior to last mddb configuration change
u - replica is up to date
l - locator for this replica was read successfully
c - replica's location was in /etc/lvm/mddb.cf
p - replica's location was patched in kernel
m - replica is master, this is replica selected as input
W - replica has device write errors
a - replica is active, commits are occurring to this replica
M - replica had problem with master blocks
D - replica had problem with data blocks
F - replica had format problems
S - replica is too small to hold current data base
R - replica had device read errors

metadb's -c switch determines how many replicas per slice we want. If we have had issued -c 3 on three
slices we would end up with 9 metadevice state database replicas:

root@jsc-x4100-17:~# metadb -a -f -c 3 c2t1d0s0 c2t2d0s0 c2t3d0s0


root@jsc-x4100-17:~# metadb -i
flags
first blk
block count
a
u
16
8192
/dev/dsk/c2t1d0s0
a
u
8208
8192
/dev/dsk/c2t1d0s0
a
u
16400
8192
/dev/dsk/c2t1d0s0
a
u
16
8192
/dev/dsk/c2t2d0s0
a
u
8208
8192
/dev/dsk/c2t2d0s0
a
u
16400
8192
/dev/dsk/c2t2d0s0
a
u
16
8192
/dev/dsk/c2t3d0s0
a
u
8208
8192
/dev/dsk/c2t3d0s0
a
u
16400
8192
/dev/dsk/c2t3d0s0
r - replica does not have device relocation information
o - replica active prior to last mddb configuration change
u - replica is up to date
l - locator for this replica was read successfully
c - replica's location was in /etc/lvm/mddb.cf
p - replica's location was patched in kernel
m - replica is master, this is replica selected as input
W - replica has device write errors
a - replica is active, commits are occurring to this replica
M - replica had problem with master blocks
D - replica had problem with data blocks
F - replica had format problems
S - replica is too small to hold current data base
R - replica had device read errors

Once we have metadevice state databases we will proceede with creating of metadevice. Command is
metainit metadevice ame number of stripes width logical name for slice1 slice2 .... Number of stripes
parameter determines how many stripes we want in metadevice. For example if number of stripes
equals to 1, we are creating simple stripe, if it's equal to number of slices than we have concatenation.
width specifies number of slices that make up a stripe. In our case number of stripes will be 1, and
width 3

root@jsc-x4100-17:~# metainit d0 1 3 c2t1d0s0 c2t2d0s0 c2t3d0s0


d0: Concat/Stripe is setup

To verify stripe and get some info we use metastat command

root@jsc-x4100-17:~# metastat
d0: Concat/Stripe
Size: 429835140 blocks (204 GB)
Stripe 0: (interlace: 32 blocks)
Device
Start Block Dbase
Reloc
c2t1d0s0
16065
Yes
Yes
c2t2d0s0
16065
Yes
Yes
c2t3d0s0
16065
Yes
Yes
Device Relocation Information:
Device
Reloc Device ID
c2t1d0
Yes
id1,sd@SSEAGATE_ST973401LSUN72G_3710ZJ07____________3LB0ZJ07
c2t2d0
Yes
id1,sd@SSEAGATE_ST973401LSUN72G_3710ZGLR____________3LB0ZGLR
c2t3d0
Yes
id1,sd@SSEAGATE_ST973401LSUN72G_3710Z1DG____________3LB0Z1DG

And now final steps is to create 4 soft partitons within metadevice d0. For creating soft partitions we
are using metainit with -p switch and specifying size of soft partition as last parameter (in our example
it's 204gb/4 = 51gb)

root@jsc-x4100-17:~# metainit
d1: Soft Partition is setup
root@jsc-x4100-17:~# metainit
d2: Soft Partition is setup
root@jsc-x4100-17:~# metainit
d3: Soft Partition is setup
root@jsc-x4100-17:~# metainit
d4: Soft Partition is setup

d1 -p d0 51g
d2 -p d0 51g
d3 -p d0 51g
d4 -p d0 51g

you can verify this with metastat

root@jsc-x4100-17:home# metastat
d4: Soft Partition
Device: d0
State: Okay
Size: 106954752 blocks (51 GB)
Extent
Start Block
0
320864384
d0: Concat/Stripe
Size: 429835140 blocks (204 GB)
Stripe 0: (interlace: 32 blocks)
Device
Start Block Dbase
c2t1d0s0
16065
Yes
c2t2d0s0
16065
Yes

Block count
106954752

State Reloc Hot Spare


Okay
Yes
Okay
Yes

c2t3d0s0
16065
Yes
Okay
Yes
d3: Soft Partition
Device: d0
State: Okay
Size: 106954752 blocks (51 GB)
Extent
Start Block
Block count
0
213909600
106954752
d2: Soft Partition
Device: d0
State: Okay
Size: 106954752 blocks (51 GB)
Extent
Start Block
Block count
0
106954816
106954752
d1: Soft Partition
Device: d0
State: Okay
Size: 106954752 blocks (51 GB)
Extent
Start Block
Block count
0
32
106954752
Device Relocation Information:
Device
Reloc Device ID
c2t1d0
Yes
id1,sd@SSEAGATE_ST973401LSUN72G_3710ZJ07____________3LB0ZJ07
c2t2d0
Yes
id1,sd@SSEAGATE_ST973401LSUN72G_3710ZGLR____________3LB0ZGLR
c2t3d0
Yes
id1,sd@SSEAGATE_ST973401LSUN72G_3710Z1DG____________3LB0Z1DG

after this you can use soft partitions as you would be using any other partition, format them, mount,
udsdump, ufsrestore etc ... for example:

root@jsc-x4100-17:~# echo y|newfs /dev/md/rdsk/d1


/dev/md/rdsk/d1:
106954752 sectors in 17408 cylinders of 48 tracks, 128
sectors
52224.0MB in 1088 cyl groups (16 c/g, 48.00MB/g, 5824 i/g)
super-block backups (for fsck -F ufs -o b=#) at:
32, 98464, 196896, 295328, 393760, 492192, 590624, 689056, 787488, 885920,
Initializing cylinder groups:
.....................
super-block backups for last 10 cylinder groups at:
105978656, 106077088, 106175520, 106273952, 106372384, 106470816, 106569248,
106667680, 106766112, 106864544

repeat this step for /dev/md/rdsk/d2 /dev/md/rdsk/d3 and /dev/md/rdsk/d4. when you'r finished you can
happily mount soft partitions into locations where zones will be installed.
if you want to remove your metadevice/metadb use reversed steps:
first remove soft partitions from meta device

root@jsc-x4100-17:~# metaclear -p d0
d4: Soft Partition is cleared
d3: Soft Partition is cleared
d2: Soft Partition is cleared
d1: Soft Partition is cleared

then remove metadevice


root@jsc-x4100-17:~# metaclear d0
d0: Concat/Stripe is cleared

and finaly metadb


root@jsc-x4100-17:~# metadb -f -d c2t1d0s0 c2t2d0s0 c2t3d0s0

I hope that this was helpfull. Feel free to comment, and see you in my next blog which will probably be
either about dtrace (basics) or x86 Crash Dump Analysis

Vous aimerez peut-être aussi