I last wrote about my VMware home lab back in September, so here is an update. What I have found is that while the HP XW8600 is nice to have all those SAS/SATA connections and the memory, the IO performance is lacking. I currently have 4 SATA drives plugged in to the LSI 1068 Raid card that is on the motherboard. There are 2 1TB drives in a Raid 1, and 2 500GB drives in a Raid 1. But ever since moving to it I have been having really slow IO. As an example last night I was working with a simple MySQL database, it has one table with 2 columns in it. I went to insert 17,000+ rows and it took almost 20 minutes to do it. (On a different server with just IDE drives, it was less than a minute or two do to it.)
So I have been searching most of the weekend to see what I could find, ans there is tidbits of information everywhere on the interwebs. So I thought I would write down what I found and put it in one place for others to find.
It seems that the single biggest problem is "write cache". Since the LSI 1068 on board raid controller doesn't have a battery, it has to wait for the disk to report back that the data has been successfully written to the disk. This is complicated by the fact that I have a raid 1 set up, so both disks have to report that it is written and then the controller report back to VMware ok. In other words, there is no "cache" on this controller so the speed is limited to about 20Mb/s.
So how can I fix this? Well since I want the redundancy on the disks, making them single disks, while making it faster would not provide me any security of my data. This could work for a couple of my VM's that are "disposable" test vm's. But for ones that I want to keep I would need to keep them on a RAID. So to fix it, I need to find a PCI-E controller that has a cache and battery on it.
So my hunt begins, I will update once I can find one that works well.
About a year ago I purchased a 1U IBM X3550 server to run VMware vSphere 5 on. While it was cool to have a server that had dual quad procs and 8 gig of ram in it, the noise it put off was too much for my family room. (Just think of half a dozen 1 inch fans running at 15,000RPM almost constantly.) Recently I have been spending more time in the family room and the noise has gotten to a level that it is almost impossible to do anything in the room with out hearing it. (Like watch tv, a movie, play a game, etc.) So I started looking at my favorite used hardware site, geeks.com, for a new "server". Well it finally arrived today, an HP XW8600 workstation. It is another dual quad proc, however it has 16GB of ram, and 12 SATA ports and a larger case, and the best of all, almost absolutely quiet.
So with it installed, I needed to start moving the VM's from the IBM Server to the HP Server. In an enterprise environment, this usually isn't a problem as you usually have a shared storage (SAN) that each of the hosts connect to. Well in my little home lab I don't have shared storage. I did try to use COMSTAR in Solaris 10 to export a "Disk" as an iSCSI target. While this would work, it was going to take forever to transfer 1TB of VM's from one server to a VM running on my Mac and back to the new server.
So a googling I went, and what I found was a way easier way to copy the VM's over. ovftool, which runs on Windows, Linux and Mac. What it does is allow you to export and import OVF files to a VMware host. The side benefit of that is that you can export from one and import to another all on one line.
So I downloaded the Mac version and started coping. The basic syntax is like this:
So if one of my VM's is called mtdew, and I had it thin provisioned on the source host and wanted it the same on the destination host, and my datastore is called "vmwareraid" I would run this:
where ibmx3550 is the source server and hpxw8600 is the destination server. If you don't specify the "-dm=thin" then when it is copied over, it will become a "thick" disk, aka us the entire space allocated when created. (I.E. a 50GB disk that only has 10GB in use would still use 50GB if the -dm=thin is not used.)
There are some gotchas that you will have to look out for:
- Network configs, I had one VM that had multiple internal network's defined. Those were not defined on the new server, so there is a "mapping" that you have to do. I decided I didn't need them on the new server so I just deleted them before I copied it over.
- VM's must be in a powered off state. I tried them in a "paused" state and it did not want to run right.
- It takes time, depending on the speed of the network, disk, etc, it will take a lot of time to do this, and the VM's have to be down while it happens. So definitely not a way to move "production" vm's unless you have a maintenance window.
- It will show % complete as it goes, which is cool, but the way it does it is weird. It will show the % at like 11 or 12 and then I turn my head and all of the sudden it says it is completed.
- I did have some issues with a vm that I am not sure what happened to it, but when I try to copy it, I get an error: "Error: vim.fault.FileNotFound"... It may be due to me renaming something on the vm at some point in the past.
Hope this helps some other "home lab user"...
Last week, I went to VMworld 2011 in Las Vegas. The conference was great, 20,000+ people all there and focused on one thing, VMware and every product they offer. This was my first time at the VMworld conference, and hopefully will get to go again some time in the future. The main reason I went was because of the recently released vSphere 5 and seeing what all it offered and what all was changed. Needless to say, there are many cool new features that were added, I am only going to mention a few here, but the full list is available in this PDF.
The first cool feature is : Auto Deploy. Simply said, (wish they would have chose a different name) it is PXE boot of the vSphere image from a TFTP server, so no local disk is required to "run" vSphere. For example if you have a "shit ton" of blades and don't want to have to go update and install all of them, just get their MAC address, setup the host in DHCP with a couple of DHCP options to tell it where to boot from and have the blade boot from the network. It will download the image from the TFTP server and run automagically. Once up and running all config is stored in vCenter 5 (a requirement!). So need to upgrade your hosts? Just reboot them after updating the image. A couple of notes for this, make sure you have logging set up to go to your syslog server, and that you set up the Dump Collector incase of a PSOD.
Another cool feature is: vSphere 5 supports Apple Xserve servers running OS X Server 10.6 (Snow Leopard) as a guest operating system. This is because vSphere now supports UEFI "bios". Now "supposedly" this does not require Xserve's (since Apple no longer sells them), but it "requires" them because of Apple's EULA for use of Mac OS X.
There are many other features that have been upgraded, or are new.. Too bad the conference wasn't a little longer, as the amount of sessions I wanted to go to were greater than the amount of time I had available to go to said sessions. (I.E. only one instance of a session and 2 sessions I wanted to see were at the same time.)
The Hands on Lab area was "freaking huge". There were over 800 workstations set up where you could do 1 of 16 LABS (you could do more, just had to stand in line, I was only able to do 1 in the week I was there). Ironically each "lab" station was a Wyse "chubby client" that had dual monitors so you could rdesktop to some windows XP and servers to do the work. The HOL area, sort of reminded me of the CTF area at DefCon, a huge big room, with nearly no light what so ever and hundreds of thousands of screens.
The most interesting part of the conference is that they have grown so big, that next year they have to go to San Francisco to host the event, as there is no place in Vegas that is big enough to house them. This year it was at the Venetian with some spill over to Wynn. They also had the Sands Expo hall, which is connected to the Venetian. The "dining" room was 1.5 million sq ft alone, you could barely see from one end to the other.
I will have to say out of the many conferences I have been to by different vendors, I will have to say so far VMware has been the best. Some of the things that has made it stand out from the rest:
- Food, while not "the greatest ever" it was far better than I have had at other places. They gave us breakfast and lunch every day. In addition the break periods between sessions had different items every day. One day they had fresh hot made pretzel sticks with cheese and different sauces.
- Hang out area: Most conferences if there is "downtime" you usually end up either walking around or going back to the hotel. VMware set up a "hang space" where they had a basketball court, badmitten court, huge chess sets, fake grass to sit on in front of a big screen (like 20+feet) TV. A Twitter vMeetup place, where you could meet other people that you have met on twitter.
- Scheduled sessions. While I was skeptical at first on "pre-registering" for the sessions you want to attend, I think in the end it was a good idea, as it "guaranteed" your spot in the session as long as you showed up 3 minutes before it started. (There were gaps between end and start, so you really had no reason not to be there.)
- Group Discussion: in some conferences, I have seen "group discussion" be these "huge" groups where it ends up being a more Q&A session. VMware had group discussions, where there were maybe max 30 people in a room, each one had a clicker, and everyone voted on how the session went and it was a free form for questions. One of the best ones was the Oracle on VMware vSphere one. I learned a lot from that session.
- P.A.R.T.Y. : By far the best conference / vendor party I have ever been to. First was the food, you name it, they probably had it. I didn't realize this till I had already ate a couple of slices of pizza. Then I saw a station where they were making fresh cut cheese-steak sandwiches, another was doing fresh made crab cakes. Like I said, name it, and it was probably there. In addition, a huge open bar (not that I drink, but it was there). So now that we got past the food, they had at least 4 different acts during the night. Two people doing fire tricks, then the openers was Recycled Percussion, which I didn't realize who they were till I got back to the hotel room that night, but they were on the America's Got Talent show, and previously had a show nightly in Vegas. The headliners were The Killers. They played for an hour and did all the "popular" songs along with some that I hadn't heard before.
This part of the party ended around 9PM. Which was the start time to the "after party" which was at the Venetian pool. I did not go to it, but it sounded like people had a bunch of fun there too.
So if you are still reading by now, you are probably trying to figure out the second part of the title "... should have stayed in Vegas". Well, it seems that some time either on Sunday or early Monday morning I either sprained or got a stress fracture in my left foot. Needless to say, the 30+miles of walking I did, (cause my hotel was 2 miles away from the conference hotel, it is a damn long walk from Planet Hollywood to the Venetian even if you take the monorail when your foot it hurting like a Mofo) did not help it any. By the time I got home it was still hurting and I noticed that the top of my foot started to have some swelling and bruising. I just iced it on Saturday and Sunday, but as of today it was still hurting and didn't seem to change much, so I ended up going to the doctor to have it X-ray'd. They said it didn't show any fractures, but thought it was just a really bad sprain or a damaged ligament. So it is more ice, and a ankle air cast for a while. So that is what I "wish that it should have stayed in Vegas."
In this day and age of computer hacks and security problems, why do companies make it awkward to change usernames and or passwords? One example of an awkward procedure to change a password is on the VMware vCenter server. If like any good security minded person you have all your passwords set to expire every 28 days or so, to change the password on the vCenter server you have to do some "command line fu" to change it. Heaven forbid that you have to change the username as well. So how do you do it? Well if you are running vCenter on a Windows 2008 server and connecting to a Oracle server (that actually holds all the data) there are a couple of things you need to do:
- Shutdown the vCenter server (disable it in the Services Control panel)
- Change the password for your vCenter user in the oracle DB
- Now here it the BIG gotcha. On the windows side you have to run a CMD prompt as an admin user. Just clicking on it in the start menu won't do it. You have to right click on it and do "Run as Administrator". If you fail to do this, the next step will fail and just piss you off even more. (The reason for this is the username and password are stored in the registry and I guess running cmd as normal user revokes all privs to modify the registry.)
- Now go to the location where VMware vCenter is installed and run the vpxd command with either a -p or a -P. If you use the lower case -p it will prompt you for the new database user password. If you use the -P option, right after the P you can put the new password on the command line.
- Now you should be able to start back up the vCenter processes.
Now if you need to change the userid, you need to use Regedit and go to :
HKEY_LOCAL_MACHINE\SOFTWARE\VMware, Inc.\VMware VirtualCenter\DB (under My Computer)
HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\VMware, Inc.\VMware VirtualCenter\DB for 64 bit versions of Windows.
and change #2 to be the new userid.
This is documented in the VMware KB Article : Changing the vCenter database userid and password. But if you don't pay attention go the run as part, you will spend a lot of time trying to figure it out even if you are logged in as an administrator.
If your password expires in Oracle while vCenter is up and running, it appears to continue to work while it is up. But if you reboot the vCenter server or restart the vCenter processes, it will "hang" and never start. They also need to make their error messages a little more detailed as to why it is 'failing' to start.
In this day and age everyone is trying to squeeze the last little drop out of every technological advance that they can. One of the technologies that is "big" is called Thin Provisioning. Basic in short terms, thin provisioning is where you tell a computer that you have X GB of disk (usually from a SAN or in VMware) but in reality you only have <X GB of disk backing it. This is big right now in SAN and VMware because enterprise disk is "expensive". But is it really worth the cost? No!
See the main reason people (SAN or VMware admins) use Thin Provisioning is to "save" disk space. Say you have a server that performs one function and does not really use a lot of disk space, say a DNS server (either virutalized or physical booting from a SAN). Now most admins usually like to keep all their servers with a standard config. So for the sake of this post, lets say the boot disk for this server is 50GB. Now once the OS and app is installed on it, it may only be using 4 GB of that 50GB disk.
Before thin provisioning that 50GB as far as a SAN admin is concerned is 50GB used. So in comes Thin Provisioning, now the SAN admin says "hey mister computer here is your 50GB disk " But in reality it only allocates as much space as being used by the server. So now on the SAN instead of a full 50GB "used" only 4GB would be used. Sounds awesome in theory, but what happens when you add other servers in that same SAN pool (say the pool is 100GB in size). So the server admin gets another "50GB" disk from the SAN, doesn't realize thin provisioning is in use, so they go on and install that server. Now we have 8GB in use out of the 100GB pool, but in reality all 100GB has been allocated as far as the 2 servers are concerned.
The next part is when the whole process starts to drown. The server admin asks for another disk, this time 200Gb for say a database or code repository server. Well the SAN administrator says "ok here is your 200GB disk " But put the disk in the same 100GB pool that the other two servers are in because "he knows" you won't use all "200GB". We have now over committed disk however the server admin does not know this has happened. Once the third servers OS has been installed (another 4GB) everything seems to be fine, and technically it is because we are only using 12 GB out of the 100GB pool. But in reality the servers are using 300GB of disk, because they are unaware that there is no space issues.
Where the fun starts is when you start loading data in to those disks. Lets say the second server was going to be a small database server, so we load Oracle and create some table spaces. We end up using up about 40 of the 50GB alloted to it. (So now we are up to 48GB of disk used in the 100GB pool). Still technically ok, but with only 52GB free we need to really start worrying about the disks and the servers. The fun begins when we start loading data on to the server with the 200GB disk. Once we get up to 52 GB used in this we have some problems. Basically all the servers will start reporting write errors or other weird issues. The server admin can't figure out what the problem is because when he looks at the servers he see plenty of "free" space on the servers. When stuff gets really weird is when processes start dying and they won't start when you try to restart them (maybe they write to a log file, etc). So the first thing the Server admin will try to do is reboot the server. This is where all hell breaks loose...
See when you start rebooting servers it can't flush out writes to the disk because there is "no" space left to write to. So the file-systems end up becoming corrupted. When the server reboots, it will try to write more to the disk thinking that it has plenty of free space, but again can't, so stuff starts hanging. So of course a reboot is done again, and again, etc...
So now you start seeing write errors showing up every where on the other servers, and from the looks it may be a SAN issue, like the disk has disappeared. So you call the SAN admin only to find out that you have been thin provisioned.
This my friends is why thin provisioning is bad and should NEVER be used. Yes it may save you some money on disk, but what you save there will be wasted when you have down time rebuilding servers and restoring data.