I last wrote about my VMware home lab back in September, so here is an update. What I have found is that while the HP XW8600 is nice to have all those SAS/SATA connections and the memory, the IO performance is lacking. I currently have 4 SATA drives plugged in to the LSI 1068 Raid card that is on the motherboard. There are 2 1TB drives in a Raid 1, and 2 500GB drives in a Raid 1. But ever since moving to it I have been having really slow IO. As an example last night I was working with a simple MySQL database, it has one table with 2 columns in it. I went to insert 17,000+ rows and it took almost 20 minutes to do it. (On a different server with just IDE drives, it was less than a minute or two do to it.)
So I have been searching most of the weekend to see what I could find, ans there is tidbits of information everywhere on the interwebs. So I thought I would write down what I found and put it in one place for others to find.
It seems that the single biggest problem is "write cache". Since the LSI 1068 on board raid controller doesn't have a battery, it has to wait for the disk to report back that the data has been successfully written to the disk. This is complicated by the fact that I have a raid 1 set up, so both disks have to report that it is written and then the controller report back to VMware ok. In other words, there is no "cache" on this controller so the speed is limited to about 20Mb/s.
So how can I fix this? Well since I want the redundancy on the disks, making them single disks, while making it faster would not provide me any security of my data. This could work for a couple of my VM's that are "disposable" test vm's. But for ones that I want to keep I would need to keep them on a RAID. So to fix it, I need to find a PCI-E controller that has a cache and battery on it.
So my hunt begins, I will update once I can find one that works well.
As a continuation to my other home repairs (part 1, part 2) this summer was no different. This year it was 2 major projects, the first was to repair the chimney, and the second was a new roof. First some pics of the chimney:
The finish of the chimney is a parge look for now. I will probably paint it some time later. The chimney cap was a poured concrete one as the original had completely deteriorated.
Next up is the roof, as you can see in the picture above, or not, there are black streaks going through it. So before it got bad, I replaced with with 30 year dimensional shingles..
All told it was around another $10,000 in repairs. Totaling over $30,000 in the last 3 years.
About a year ago I purchased a 1U IBM X3550 server to run VMware vSphere 5 on. While it was cool to have a server that had dual quad procs and 8 gig of ram in it, the noise it put off was too much for my family room. (Just think of half a dozen 1 inch fans running at 15,000RPM almost constantly.) Recently I have been spending more time in the family room and the noise has gotten to a level that it is almost impossible to do anything in the room with out hearing it. (Like watch tv, a movie, play a game, etc.) So I started looking at my favorite used hardware site, geeks.com, for a new "server". Well it finally arrived today, an HP XW8600 workstation. It is another dual quad proc, however it has 16GB of ram, and 12 SATA ports and a larger case, and the best of all, almost absolutely quiet.
So with it installed, I needed to start moving the VM's from the IBM Server to the HP Server. In an enterprise environment, this usually isn't a problem as you usually have a shared storage (SAN) that each of the hosts connect to. Well in my little home lab I don't have shared storage. I did try to use COMSTAR in Solaris 10 to export a "Disk" as an iSCSI target. While this would work, it was going to take forever to transfer 1TB of VM's from one server to a VM running on my Mac and back to the new server.
So a googling I went, and what I found was a way easier way to copy the VM's over. ovftool, which runs on Windows, Linux and Mac. What it does is allow you to export and import OVF files to a VMware host. The side benefit of that is that you can export from one and import to another all on one line.
So I downloaded the Mac version and started coping. The basic syntax is like this:
So if one of my VM's is called mtdew, and I had it thin provisioned on the source host and wanted it the same on the destination host, and my datastore is called "vmwareraid" I would run this:
where ibmx3550 is the source server and hpxw8600 is the destination server. If you don't specify the "-dm=thin" then when it is copied over, it will become a "thick" disk, aka us the entire space allocated when created. (I.E. a 50GB disk that only has 10GB in use would still use 50GB if the -dm=thin is not used.)
There are some gotchas that you will have to look out for:
- Network configs, I had one VM that had multiple internal network's defined. Those were not defined on the new server, so there is a "mapping" that you have to do. I decided I didn't need them on the new server so I just deleted them before I copied it over.
- VM's must be in a powered off state. I tried them in a "paused" state and it did not want to run right.
- It takes time, depending on the speed of the network, disk, etc, it will take a lot of time to do this, and the VM's have to be down while it happens. So definitely not a way to move "production" vm's unless you have a maintenance window.
- It will show % complete as it goes, which is cool, but the way it does it is weird. It will show the % at like 11 or 12 and then I turn my head and all of the sudden it says it is completed.
- I did have some issues with a vm that I am not sure what happened to it, but when I try to copy it, I get an error: "Error: vim.fault.FileNotFound"... It may be due to me renaming something on the vm at some point in the past.
Hope this helps some other "home lab user"...
I recently came across a unique problem that didn't "stand" out until I got to thinking about a couple of different situations that I had tested this in. So the scenario is that I needed to create a static group of unique members in LDAP (Sun Directory Server Enterprise Edition 6.3.1 and/or Oracle Directory Server Enterprise Edition 22.214.171.124) that has a extremely huge amount of members in it. So I created the LDIF file with all 60,000+ uids in it and proceeded to run an ldapadd against the server with the file. Well it immediately would come back with:
However, when looking in the LDAP, the group never showed up. Also when you look at the access log on the server you would see something similar to this:
[07/Aug/2012:21:22:39 -0400] conn=3 op=-1 msgId=-1 - closing from 127.0.0.1:48160 - B1 - Client request contains an ASN.1 BER tag that is corrupt or connection aborted
Now some times, depending on the versions of LDAP server and ldapadd programs, I got a "broken pipe" right after the adding output.
As you can see from the output in the error log it is not very descriptive on what the actual error is. I know I spent about 6 hours looking in to it to figure out what was actually the problem. Well this morning I was poking around the cn=config docs and found this:
What this document shows is the attribute nsslapd-maxbersize, which is:
Defines the maximum size in bytes allowed for an incoming message. This limits the size of LDAP requests that can be handled by Directory Server. Limiting the size of requests prevents some kinds of denial of service attacks.
The limit applies to the total size of the LDAP request. For example, if the request is to add an entry, and the entry in the request is larger than two megabytes, then the add request is denied. Care should be taken when changing this attribute.
So by DEFAULT it is set to 2MB. Well my LDIF file was over 3.5MB in size. Which means that it was too big for the addition. To change it, do an ldapmodify with this ldif:
I changed mine to 6MB, or 6291456, to hopefully cover any sizable additions in the future. Once done I restarted the directory server and tested again, and everything was good. According to the docs, the max size you can make this attribute is 2GB in size, and a size of 0 means default to 2MB, or 2097152. I think Oracle needs to make the error that is in the access log a little more descriptive, like "hey your query/add is too big yo".
Hope this helps some one..