Some people may have noticed others may not, but the server went down for a while today. Well the root cause I think is that there were some file system problems. The one thing that lead me to that is this error in /var/adm/messages:
ufs: [ID 879645 kern.notice] NOTICE: /: unexpected free inode 48714, run fsck(1M) -o f
Well this is a little hard to fix especially if you have root under solaris volume manager control. So how do you do it? Sort of easy, but it assumes you have either a solaris boot cd or a jumpstart server you can boot off of to get the box into single user mode.
Once you have the box in single user mode off of either the CD or the jumpstart server, you will need to mount in readonly mode one side of the root mirror. Say /dev/dsk/c1t0d0s0. :
mount -o ro /dev/dsk/c1t0d0s0 /mnt
Once that is mounted (if it won’t mount then you will have to fsck that side of the mirror to fix the file system if it is really screwed up to not even mount in read only mode) you need to copy some files from it to the “temp” root that you are booted from. But first we need to unload the md driver:
# modinfo | grep md 25 fffffffffbb04b88 30608 85 1 md (Solaris Volume Manager base mod) #modunload -i 25
Now that the md driver is unloaded, you need to copy the following files:
cp /mnt/etc/lvm/mddb.cf /etc/lvm/mddb.cf cp /mnt/etc/lvm/md.cf /etc/lvm/md.cf cp /mnt/kernel/drv/md.cf /kernel/drv/md.cf
Now unmount the /mnt
umount /mnt
Now we need to restart the md driver
modload /kernel/drv/md
Now if you run metastat or metadb, you will get a generic error like there is no devices or databases set up. To fix this run a :
metainit -r
This does the following:
-r Only used in a shell script at boot time. Sets up all metadevices that were configured before the system crashed or was shut down. The information about previously configured metadevices is stored in the metadevice state database (see metadb(1M)).
You can now run metastat, but all your devices will say they need maintenance. To fix this run :
metasync -r
This will sync all the mirrors back up. Now we are finally able to run an fsck againest a mirrored slice that we weren’t able to run if the machine is up in full user mode.
So now I ran :
fsck -o f /dev/md/rdsk/d30
And I keep running fsck on the device till it comes back clean with no errors. Then lather, rinse, repeat for the other slices.
Now once all the slices are done, make sure to update your boot archive (if on an x86 machine) and then you can restart the machine:
mount /dev/md/dsk/d30 /mnt bootadm update-archive -R /mnt umount /mnt shutdown -g0 -i6 -y
If on a SPARC box, just make sure all the file systems you mounted off of the disks are unmounted and then restart the machine.