ZFS + PCA, goodbye UFS

ZFS has been around for a while now.. I have used it for some data partitions, but when Sun added the ability to use it as the root filesystem, I was a little hesitant to start using it there. Part of it was because, I know if I get a root disk that crashes and it is on UFS, I can get in to it pretty well. ZFS was different and I was never really comfortable about using it for root, until last night. I have been looking for a way to keep a lot of Solaris machines up to date with the Recommended and Security patches and doing it with UFS seemed to be taking for ever. Part of the problem I had with keeping them updated with UFS was the shear downtime it required to install the cluster in single user mode. Multiply that by X number of machines and it is a never ending chore to update them.

This weekend I started looking at the PCA tool, since I have seen a lot of people mention good things about it. So off to my test machine and I installed a new VM with Solaris 10 10/09 ( update 8 ) in it. After the install was finished using a ZFS root, I decided to set up a PCA proxy server on another machine. The purpose of the PCA Proxy server is that it will be the one with access to the Internet to download the patches from sunsolve. It was extremely easy to do this, (in fact I have it running in a zone on my main server.)

  1. Created a new plain zone (can be on anything, but I wanted to keep it seperate).
  2. Configure the apache2 instance on the machine, by copying the /etc/apache2/httpd.conf-example to /etc/apache2/httpd.conf
  3. Edit the httpd.conf and change the line that says “Timeout 300” to be “Timeout 1800”. You need to make it at least 1800, if not more depending on the speed of your Internet connection. At 22Mb/s 1800 was ok for me.
  4. Create a directory /var/apache2/htdocs/patches, make it owned by webservd:webservd and 755 as the permissions.
  5. Download and save a copy of pca in /var/apache2/cgi-bin and call it pca-proxy.cgi. Make it owned by webservd:webservd and 755 as the permissions.
  6. Create a file in /etc called pca-proxy.conf. In it place the following:
    xrefdir=/var/apache2/htdocs/patches
    patchdir=/var/apache2/htdocs/patches
    user=sunsolveusername
    passwd=sunsolvepassword
    
  7. In order to make the proxy run a little faster on the first use, I decided to download and “cache” the latest security and recommended patch cluster. (You don’t need to do this, but if the patches are missing the pca proxy server will download them. Considering my machine needed 156 patches, this was faster…) Once the recommended and security patches were downloaded, I placed them in a temp place and unzipped the cluster. Once the cluster is unzipped, I needed to make zip files of each patch (so that the pca client can download the zip file). To do this, I went in to tmp/10_x86_Recommended/patches and ran the following:
    for i in `cat patch_order`
    do
    zip -r $i $i
    done
    
  8. Once the zipping is done, move all the patch zip files in to the /var/apache2/htdocs/patches directory.
  9. Start up the apache2 service “svcadm enable apache2”
  10. Now it is time to configure the client, copy the pca script to the client machine and place it some place, I used /root.
  11. Next create a config file /etc/pca.conf in it with the following:
    patchurl=http://pca-host/cgi-bin/pca-proxy.cgi
    xrefurl=http://pca-host/cgi-bin/pca-proxy.cgi
    syslog=local7
    safe=1
    

    The first two lines tells pca where to find the patches and the patchdiag.xref file. The syslog line tells it to log all activity to local7 syslog facaility. The last line “safe=1” means: Safe patch installation. Checks all files for local modifications before installing a patch. A patch will not be installed if files with local modifications would be overwritten.

  12. Now that the config file is created, make sure that syslog is set to handle local7 info, I have mine set to local7.info going to /var/adm/local7.log. PCA will log the patch installation stuff to that log (i.e.:
    Apr 11 17:10:50 zfstest2 pca: [ID 702911 local7.notice] Installed patch 124631-36 (SunOS 5.10_x86: System Administration Applications, Network, and C)
    Apr 11 19:07:04 zfstest2 pca: [ID 702911 local7.notice] Failed to install patch 118246-21 (comm_dssetup 6.4-5.05_x86: core patch) rc=15

Now comes the part that makes ZFS worth using… We are going to create a new “boot environment” and then patch that environment”

  1. First we need to create a new BE;
    lucreate -n p20100411

    The p20100411 can be anything, I used today’s date since I patched the machine today.. Makes it easy to remember when the last time the machine was patched.

  2. Now we need to mount it
    lumount p20100411 /.alt.root 
  3. Now we can start patching;
     pca -i -R /.alt.root
  4. Because I cached most of the patches locally on my pca proxy, it should not take too long for it to download, unzip and install the patches in the alt root
  5. Once the patching is done, it will give you a summary line telling you how many patches were downloaded and installed:
    Download Summary: 156 total, 156 successful, 0 skipped, 0 failed
    Install Summary : 156 total, 156 successful, 0 skipped, 0 failed
    
  6. Now we need to unmount the alt root and activate it to boot:
    luumount p20100411
    luactivate p20100411
    
  7. Now just reboot the machine. You MUST use init or shutdown, if you don’t then it won’t boot in to the new boot environment. I use
    shutdown -g0 -i6 -y
  8. Depending on how long it takes for your machine to boot, when it comes back up it should be on the new ZFS file system:
    bash-3.00# df -h
    Filesystem             size   used  avail capacity  Mounted on
    rpool/ROOT/p20100411    49G   6.6G    38G    15%    /
    
  9. Now you can run that new patched system for how ever long it takes to verify your patches didn’t break anything. Once you are sure everything is ok, then you can delete the old install, in my case:
    ludelete s10x_u8wos_08a
    

    This should let you recover a little bit of space. In my case it was about 1.5 gig.

The only thing left is to set up a bunch of scripts to do “pca -l” about once a month to see what patches need installed and to log that. PCA has a lot of other functions than I went over here, in a couple of words, it seems to be kick ass. On top of that it is free! The ability to create new BE’s will definitely hope any one with the right amount of disk space be able to keep their system up to date.

One Tip, make sure you watch the output of the luactivate command. This is what is displayed:

**********************************************************************

The target boot environment has been activated. It will be used when you
reboot. NOTE: You MUST NOT USE the reboot, halt, or uadmin commands. You
MUST USE either the init or the shutdown command when you reboot. If you
do not use either init or shutdown, the system will not boot using the
target BE.

**********************************************************************

In case of a failure while booting to the target BE, the following process
needs to be followed to fallback to the currently working boot environment:

1. Boot from Solaris failsafe or boot in single user mode from the Solaris
Install CD or Network.

2. Mount the Parent boot environment root slice to some directory (like
/mnt). You can use the following command to mount:

     mount -Fzfs /dev/dsk/c1t0d0s0 /mnt

3. Run  utility with out any arguments from the Parent boot
environment root slice, as shown below:

     /mnt/sbin/luactivate

4. luactivate, activates the previous working boot environment and
indicates the result.

5. Exit Single User mode and reboot the machine.

**********************************************************************

Why everyone should use bart (AKA do the Bart Man)

If you are using Solaris 10, and you have not used bart yet, you should stop everything and take a look at it.

For those who don’t know what bart is, it is the Basic Auditing and Reporting Tool that is in Solaris 10.

In a quick synopsis bart will create a report that shows all files/directories on a solaris machine. This report contains the permissions, owners, sizes, modify times and md5 hashes of all files on the system, along with acl’s if you are using ZFS.

So why is bart so important? First, it can be used as a security tool. When you install a new Solaris 10 system, the first thing you should do after you get it installed and patched and before it is placed on the network is run a bart on the system and save the report to a cd. This will be the “baseline” image of the system. Then every week/month you should run a bart against the machine again and then use the compare option to see what files have changed, added or deleted from the system. Where this comes in really handy is if your think that your machine has been hacked or compromised. You can use the comparison to determine which files may have been modified by the hacker.

But there is a non-security use for bart as well that is VERY useful. This use is one that I had not thought of until I needed it the other day. So what is this use? Reseting the permissions on files that were accidentally changed by an in-experienced UNIX person thinking that a “chmod -R 777 *” is the best way to fix their problems.

The first thing that came to my mind when I saw this happen was oh no, the machine had not even been backed up yet, and a day’s worth of work would have been lost. Even if the machine had been backed up, do you realize how long it would take to restore a file system with 40,000+ files, just because the permissions were screwed up. ( Note, the permissions on the various files were very different and even included some setuid, and setgiud files which were wiped out as well.)

So how did bart save the day? Luckly I had taken a bart of the machine before the work had begun on the file system. So after the chmod command was issued, I then took a bart of the file system again. I now could run a bart compare against the control and test manifest and see exactly what all had changed.

Once I had this output, I could then create a script to change the permissions of the files/directories back to the original values. All told after I finished tweaking my script it took about 20 minutes to reset the permissions on all the files and directories.

So here is a quick start to getting your first bart manifest of your system:

1. Create a bart_rules file. If you do not create a rules file, your output will only have Files and not directories listed in it. My simple bart_rules file looks like this:

/
CHECK ALL
/home
IGNORE ALL

I ignore the /home file system as in my case it was nfs mounted. In reality you would want to include all local file systems.

2. Create the bart, I keep the rules file in /root/bart_rules so I would run the command:

bart create -R / -r /root/bart_rules > /tmp/bart.output

This will create a bart manifest and output it to /tmp/bart.output. Looking at the first couple of lines of it looks like this:

unixwiz@sungeek:/home/unixwiz> head -20 /tmp/bart.out
! Version 1.0
! Saturday, May 17, 2008 (21:24:27)
# Format:
#fname D size mode acl dirmtime uid gid
#fname P size mode acl mtime uid gid
#fname S size mode acl mtime uid gid
#fname F size mode acl mtime uid gid contents
#fname L size mode acl lnmtime uid gid dest
#fname B size mode acl mtime uid gid devnode
#fname C size mode acl mtime uid gid devnode
/ D 1024 40755 user::rwx,group::r-x,mask:r-x,other:r-x 481d0e43 0 0
/.ICEauthority F 310 100600 user::rw-,group::---,mask:---,other:--- 44c581c2 0 0 3eb63faf448e8a2b2c1a7b2019a8bde3
/.Xauthority F 99 100600 user::rw-,group::---,mask:---,other:--- 44c560e0 0 0 5ffe2e5f4b6f73e662001f62f7cae4d3
/.bash_history F 649 100600 user::rw-,group::---,mask:---,other:--- 481d1109 0 0 9132e0e798d5d05644cafc90c2aa876a
/.dt D 512 40755 user::rwx,group::r-x,mask:r-x,other:r-x 44c560e0 0 0
/.dt/appmanager D 512 40755 user::rwx,group::r-x,mask:r-x,other:r-x 44c5534d 0 0
/.dt/help D 512 40755 user::rwx,group::r-x,mask:r-x,other:r-x 44c5534d 0 0
/.dt/icons D 512 40755 user::rwx,group::r-x,mask:r-x,other:r-x 44c5534d 0 0
/.dt/sessionlogs D 512 40755 user::rwx,group::r-x,mask:r-x,other:r-x 44c5534c 0 0
/.dt/sessionlogs/sungeek_DISPLAY=:0 F 132 100644 user::rw-,group::r--,mask:r--,other:r-- 44c560e0 0 0 6d4e62fc972046a7a85fdb36a0ce21fd

The first part of the file, the part that begins with #fname is a legend as to how each type of line is formed.
So looking at the first actual line of the contents :
/ D 1024 40755 user::rwx,group::r-x,mask:r-x,other:r-x 481d0e43 0 0
We see that the fnmae is /, it is a directory, with a size of 1024. Its mode is 755, the last modified time is the “481d0e43” and it is owned by uid 0 and gid 0.

Looking at a file in particular we see this:

/httpd/htdocs/index.html F 10 100644 user::rw-,group::r--,mask:r--,other:r-- 463d4f4b 0 0 b7a9369d4cc9f82ed707bce91ced8af8

In the above, we see that the file is 10 bytes, has a permissions of 644 and is owned by root/root.

Now suppose that I for some reason by accident was in the /httpd/htdocs directory and did a chmod -R 777 *. Since I had my control manifest, I would then run another bart and then use the compare option. What I would get is something like this:

#bart compare /tmp/bart.output /tmp/bart.output2
/httpd/htdocs/index.html:
mode control:100644 test:100777
acl control:user::rw-,group::r--,mask:r--,other:r-- test:user::rwx,group::rwx,mask:rwx,other:rwx

Here we can see that the permissions has changed from 644 to 777. But the output is not really easy to parse with a script. So we need to use the “-p” option on the bart compare:

#bart compare -p /tmp/bart.output /tmp/bart.output2
/httpd/htdocs/index.html mode 100644 100777 acl user::rw-,group::r--,mask:r--,other:r-- user::rwx,group::rwx,mask:rwx,other:rwx

In the above, since the only thing that was changed was the mode, that is the only thing that is listed.

here are some other examples:

/var/samba/locks/browse.dat mtime 482f8544 482f8800
/var/samba/locks/unexpected.tdb contents 7c3404e9622749702e3df56caf26fe72 72983947ada3260a236394a51aef0d31

The first line shows that the file browse.dat modify time changed, but nothing else. The second line shows that the unexpected.tdb had it’s contents change. This can been see by the 2 different hashes.

Here is another example of the index.html file above, after it had been edited:

bash-3.00# bart compare /tmp/bart.out /tmp/bart.out3
/httpd/htdocs/index.html:
size control:10 test:26
mode control:100644 test:100777
acl control:user::rw-,group::r--,mask:r--,other:r-- test:user::rwx,group::rwx,mask:rwx,other:rwx
mtime control:463d4f4b test:482f8b89
contents control:b7a9369d4cc9f82ed707bce91ced8af8 test:1567caf683e3859cb5da7335c35438f7

Once again this is in the “human” readable format, the “machine” readable looks like :

bash-3.00# bart compare -p /tmp/bart.out /tmp/bart.out3
/httpd/htdocs/index.html size 10 26 mode 100644 100777 acl user::rw-,group::r--,mask:r--,other:r-- user::rwx,group::rwx,mask:rwx,other:rwx mtime 463d4f4b 482f8b89 contents b7a9369d4cc9f82ed707bce91ced8af8 1567caf683e3859cb5da7335c35438f7

(the above is actually all on one line.)

Once you have the output of the bart after the “oops” you will need to run the bart compare with options to ignore some items. Since I am only interested in the mode, the size, mtime and contents can be ignored. I used the following:

bash-3.00# bart compare -i size,mtime,contents,uid,gid -p /tmp/bart.out /tmp/bart.out2

This only shows files that have had their mode changed:

bash-3.00# bart compare -i size,mtime,contents,uid,gid -p /tmp/bart.out /tmp/bart.out2
/httpd/htdocs/index.html mode 100644 100777 acl user::rw-,group::r--,mask:r--,other:r-- user::rwx,group::rwx,mask:rwx,other:rwx

You should redirect this output to a file, so that it can then be used to generate a script.
With the output in a file I then did this:

cat /tmp/bart.compare | awk '{print "chmod "$3" "$1}' > /tmp/CHANGEPERMS

So basicly I cat the file and print the chmod command allong with the 3rd field (100644) and then the first field (/httpd/htdocs/index.html) and redirect this to a new file. Once I spot check this file, you can then run it and it will “reset” the permissions back.

Now everything I have shown above is based on the machine having a UFS file system. If you run bart against a file system that is ZFS, you will get a manifest that looks something like this:


/home/unixwiz/bin/php F 10587732 100755 owner@::deny,owner@:read_data/write_data/append_data/write_xattr/execute/write_attributes/write_acl/write_owner:allow,group@:write_data/append_data:deny,group@:read_data/execute:allow,everyone@:write_data/append_data/write_xattr/write_attributes/write_acl/write_owner:deny,everyone@:read_data/read_xattr/execute/read_attributes/read_acl/synchronize:allow 4743a7fa 100 14 9b8cfb15ed069bd6e43d7c2ae11a3e23

It shows the ZFS extended acl’s.

So if you haven’t started using bart, you should start as soon as possible.

TSM on Solaris with ZFS

Forgot to mention that I have tried using TSM (Tivoli Storage Manager) client on Solaris againest a ZFS file system. From what I can tell it works fine… I have backed up and restored some files with it. When looking at the TSM Server it lists the file system as “unknown”, but so far everything has been ok with it.

Solaris,ZFS,TSM,Tivoli

Adaptec ASH-1233 and Solaris 10 part deux

So I switch the 4 drives around so the large zfs pool is on the onboard controller and the smaller zfs pool is on the PCI controller. I then reran bonnie and look at the results now:

Old Slow zfs pool:

# ./bonnie
File './Bonnie.581', size: 104857600
Writing with putc()...done
Rewriting...done
Writing intelligently...done
Reading with getc()...done
Reading intelligently...done
Seeker 1...Seeker 2...Seeker 3...start 'em...done...done...done...
              -------Sequential Output-------- ---Sequential Input-- --Random--
              -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
          100 15822 51.9 21557 18.7 12929 13.9 25061 97.6 23125 12.0 284.8  3.0

Is now nice and fast (or as fast as my on board controller will handle. The FTP now stays at 10M/s solid which is what I was looking for (this pool is used to serve Video to my Replay TV units).

Now the old fast pool is the new slow pool:

File './Bonnie.585', size: 104857600
Writing with putc()...done
Rewriting...done
Writing intelligently...done
Reading with getc()...done
Reading intelligently...done
Seeker 1...Seeker 2...Seeker 3...start 'em...done...done...done...
              -------Sequential Output-------- ---Sequential Input-- --Random--
              -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
          100   975  3.0  1125  1.1  1541  1.5 26689 99.9 209631 99.9  42.6  0.6

as you can see that controller card is awful.. Now I understand why my growfs took so long when I added the drives a while back (it took over 2 hours to run). So I guess now, I need to do one of the following:

  1. Find a new server that has SATA support on the mobo
  2. Find a new controller card that has a BIOS that I can force the UltraDMA to be on
  3. Get some IDE to scsi convertors to attach the drives to the scsi card I have
  4. Get a SATA card and new SATA drives to extend the pool with

All in all, it is going to be a small amount of money for any of the options, but probably getting SATA cards will be the next step I do. I just can’t see getting rid of a perfectly fine Dual PIII when all it is doing is file serving and running a Zone for DVArchive.

Solaris, Adaptec, DVArchive

Adaptec ASH-1233 and Solaris 10

Seems I have a little performance problem with the Adaptec ASH-1233 controller and Solaris 10. For what ever reason, the drives are falling back to MultiwordDMA mode 2 (bad mmmkay!). What is weird, on a fresh boot, I can start copying large files to the ZFS mirror that is on the ASH-1233. It starts out at 10M/s (using FTP from another machine). After a little while, it suddenly drops to about 6M/s, then a little while later it drops to 3M/s, then it drops to about 1M/s and then finally 150K/s and then starts timeing out.. In contrast if I FTP the same file to a zfs mirror that is not on that controller I get a sustained 10+ M/s transfer rate. So I went and downloaded bonnie to run on it intresting results:

ZFS mirror on the ASH-1233:

# ./bonnie 
File './Bonnie.682', size: 104857600
Writing with putc()...done
Rewriting...done
Writing intelligently...done
Reading with getc()...done
Reading intelligently...done
Seeker 1...Seeker 2...Seeker 3...start 'em...done...done...done...
              -------Sequential Output-------- ---Sequential Input-- --Random--
              -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
          100 22989 90.5  1360  2.0  1591  2.6  3071 14.9  3905  2.7 124.8  1.3

Here is a zfs mirror using the onboard IDE controller test using bonnie:

# /home/bonnie/bonnie
File './Bonnie.900', size: 104857600
Writing with putc()...done
Rewriting...done
Writing intelligently...done
Reading with getc()...done
Reading intelligently...done
Seeker 1...Seeker 2...Seeker 3...start 'em...done...done...done...
              -------Sequential Output-------- ---Sequential Input-- --Random--
              -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
          100 33636 100.0 39261 33.5 115563 100.0 27222 100.0 229894 100.0 10663.1 186.2

See the difference? I sure do. Needless to say my next step is to export the zfs pools and swap the cables and see how that works. Hopefully it will give me some better performance. (No the machine is not really starved for horse power, it is a Dual Pentium III with 1.5 gig of RAM and 1.1 TB of disk space on it.