fscking a disk under solaris volume manager control

Some people may have noticed others may not, but the server went down for a while today. Well the root cause I think is that there were some file system problems. The one thing that lead me to that is this error in /var/adm/messages:

ufs: [ID 879645 kern.notice] NOTICE: /: unexpected free inode 48714, run fsck(1M) -o f

Well this is a little hard to fix especially if you have root under solaris volume manager control. So how do you do it? Sort of easy, but it assumes you have either a solaris boot cd or a jumpstart server you can boot off of to get the box into single user mode.

Once you have the box in single user mode off of either the CD or the jumpstart server, you will need to mount in readonly mode one side of the root mirror. Say /dev/dsk/c1t0d0s0. :

mount -o ro /dev/dsk/c1t0d0s0 /mnt

Once that is mounted (if it won’t mount then you will have to fsck that side of the mirror to fix the file system if it is really screwed up to not even mount in read only mode) you need to copy some files from it to the “temp” root that you are booted from. But first we need to unload the md driver:

# modinfo | grep md
 25 fffffffffbb04b88  30608  85   1  md (Solaris Volume Manager base mod)
#modunload -i 25

Now that the md driver is unloaded, you need to copy the following files:

cp /mnt/etc/lvm/mddb.cf  /etc/lvm/mddb.cf 
cp /mnt/etc/lvm/md.cf  /etc/lvm/md.cf
cp /mnt/kernel/drv/md.cf /kernel/drv/md.cf

Now unmount the /mnt

umount /mnt

Now we need to restart the md driver

modload /kernel/drv/md

Now if you run metastat or metadb, you will get a generic error like there is no devices or databases set up. To fix this run a :

metainit -r

This does the following:

    -r              Only used in a shell script  at  boot  time.
                     Sets up all metadevices that were configured
                     before the system crashed or was shut  down.
                     The  information about previously configured
                     metadevices  is  stored  in  the  metadevice
                     state database (see metadb(1M)).

You can now run metastat, but all your devices will say they need maintenance. To fix this run :

metasync -r

This will sync all the mirrors back up. Now we are finally able to run an fsck againest a mirrored slice that we weren’t able to run if the machine is up in full user mode.

So now I ran :

fsck -o f /dev/md/rdsk/d30

And I keep running fsck on the device till it comes back clean with no errors. Then lather, rinse, repeat for the other slices.

Now once all the slices are done, make sure to update your boot archive (if on an x86 machine) and then you can restart the machine:

mount /dev/md/dsk/d30 /mnt
bootadm update-archive -R /mnt
umount /mnt
shutdown -g0 -i6 -y

If on a SPARC box, just make sure all the file systems you mounted off of the disks are unmounted and then restart the machine.