Cash for Cache

I decided to build a new VMware host for my home “lab” last week to replace the HP workstation I had been using. (The real motive was to turn the HP workstation in to a large NAS since it has 12 SATA ports on it, but more on that later.) So off to part out my new server. What I ended up purchasing was the following (Prices as of 3/24/2015 in USD):

 

The plan was to set this system up with VMware vSphere 6 and then migrate everything from my VMware 5.1 system to this. So I began building it as the parts arrived last friday night. Everything was going swimmingly until I forgot that the LSI2308 SAS/SATA RAID card doesn’t have any cache. What I found was that the 2 480GB SSD drives in a RAID 1 on that card were fast, extremely fast, as in I could boot a Windows 7 or Windows 2012R2 VM in about 3 seconds. However the 2 2TB SATA drives that I made a RAID 1 on there were slow as hell. (Same as the issue I was having with the HPXW8600 system.) I had originally thought it was just the RAID rebuilding, so I left it at the RAID bios over night rebuilding the array.

Well after leaving it at 51% completed and going to bed, waking up 8 hours later and it was only at 63%, I knew that I would never be able to use the SATA drives as a hardware mirror on that device. So I powered it down and disconnected them from the LSI2308 and moved them over to the Intel SATA side of the motherboard. This is where things get interesting, as I really wanted to have a large 2TB mirrored datastore for some of my test vm’s that I didn’t run 24×7 (the ones I do are on the SSD RAID 1.) In order to achieve this I had to do some virtualization of my storage…

The easiest way I could get the “mirrored” datastore to work was to do the following:

  1. Install FreeNAS vm on the SSD drive (pretty simple a small 8GB disk with 8GB of ram, which would leave me 24GB of ram for my other VM’s.)
  2. On each of the 2TB disks, create a VMware datastore, I called them nas-1 and nas-2, but it can be anything you want.
  3. Next create a VMDK that takes up nearly the full 2TB  (or smaller in my case, I created two 980GB VMDK’s per each 2TB disk.)
  4. Now present the VMDK’s to the FreeNAS VM.
  5. Next create a new RAID 1 volume in FreeNAS using the 2 disks (or 4 in my case) presented to it.
  6. Create a new iSCSI share of the new RAID 1 volume.

Now comes the part that gets a little funky. Because I didn’t want the iSCSI traffic to affect my physical 1GB on the motherboard I created a new vSwitch but didn’t assign any physical adapters to it. I then created a VMkernel Port on it and assigned the local vSphere host to it with a new IP in a different subnet. I then added another ethernet (e1000) card to the FreeNAS VM and placed it in that same vSwitch and assigned it an IP in the same subnet as the vSphere host.

With the networking “done”, it is now time to add the iSCSI software adapter:

  1. In the vSphere Client, click on the vSphere host, and then configuration
  2. Under Hardware, select Storage Adapter, then click Add in the upper right.
  3. The select the iSCSI adapter and hit ok. You should now have another adapter called iSCSI Software Adapter, in my case it was called vmhba38.
  4. Click on the new adapter and then click Properties
  5. Next I clicked on the Dynamic Discovery tab and clicked Add.
  6. In the iSCSI Server address I ended the IP address I made on the FreeNAS box on the second interface (the one on the “internal vSwitch”)
  7. Click ok (assuming you didn’t change the port from 3260)
  8. Now if you go back and click Rescan All at the top, you should see your iSCSI device.
  9. Now we just need to make a datastore out of it, so click on Storage under the Hardware box
  10. Then Add Storage…
  11. Then follow through adding the Disk/LUN and the naming stuff.

You should now have a new iSCSI datastore on the 2 disks that were not able to be “hardware” mirrored. Using HD Tune in a Windows 7 VM on that datastore I got this:

HD Tune running in Windows 7

As you can see, the left side of the huge spike was actually the writing portion of the test, which got drowned out by the read side of the test. Needless to say the cache on the FreeNAS makes it read extremely fast. As an example a cold boot of this Windows 7 VM took about 45 seconds to get to the login screen from power on. However a reboot is about 15 seconds or less..

Now on the FreeNAS side here is what the CPU utilization looked like during the test:

FreeNAS CPU usage

You can see that is barely touched the CPU’s while the test was running. So lets look at the disk’s to see how they dealt with it:

FreeNAS disks

It looks like the writes were averaging around 17MB/s, which for a SATA/6Gbps drive is a little slow, but we are also doing a software raid, with cacheing being handled in memory on the FreeNAS side. The reads looked to be about double the writes, which is expected in a RAID 1 config.

The final graph I have from the FreeNAS is the internal network card:

FreeNAS Network

Here we can see the transfer rates appear to be pretty close to that of the disk side. This is however on the e1000 card. I have yet to try it with the VMXNET3 driver to see if I get any faster speeds or not.

While the above may not show very “high” transfer speeds, the real test was when I was transferring the VM’s from the HP box to the new one. Before I created the iSCSI datastore and was just using the straight LSI2308 RAID1 on the 2x 2TB disks, the write speed was so bad that it was going to take hours to move a simple 10GB VM. After making the switch, it was down to minutes. In fact the largest one I moved, was 123GB in size and took 138 minutes to copy using the ovftool method.

So why did I title this post Cash for Cache, quite simple, if I had more cash to spend on a RAID controller that actually had a lot of cache on it, and a BBU, I wouldn’t have had to go the virtualized FreeNAS route. I should also mention that I would NEVER recommend some one doing this in a production environment as their is a HUGE catch 22. If you only have one vSphere host and no shared storage, when you power off the vSphere side (and consequently the FreeNAS VM) you will lose the iSCSI datastore (which would be expected). The problem is when you power it back on, you have to go and rescan to find the iSCSI datastore(s) after  you boot the FreeNAS vm back up. Sure you could have the FreeNAS boot automatically, but I have not tested that yet and to see if vSphere will automatically scan the iSCSI again to find the FreeNAS share.

 

Looking to the future, if SSD’s drop in price to where they are about equal to current spindle disks, I will likely replace all the SATA hard drives with SSD drives and then this would be the fastest VMware server ever.

 

So you want to be an IT Superstar?

Today is one of those days that I have to wonder why I took a career in Information Technology (IT)… You see, I have been doing IT for almost 20 years now and it is not like how the commercials on ITT Tech, or any of those other “tech” trade schools. The commercials make it look like it is just a easy 9 to 5 job, where everything is so cool and collect.

What I am going to tell you is it is the exact opposite. You will work all types of hours, some times days on end with out sleep when something dies. You will have unrealistic expectations assigned to your projects by people who more than likely have never even touched a computer or know how anything works on it, other than to send an email or do an Excel spread sheet. You will also probably give up one weekend a month for the famous “patching day” which can be at any time your management decides they want to be. And because they love to do it, it is usually at like 1AM on a sunday morning, which means you lose the entire weekend because you are trying to get sleep and rested up to work that one 8 hour shift that is not your normal work time.

Once you get past all that stuff, unless you are eager to learn on your own time, you can probably kiss any further training to the sky. In the days now of tight budgets and very high work loads, your best bet at training is some computer based training of “what’s new in Windows 7”, or something totally unrelated to your actual job.

So now that we have talked about that, what provoked me to say this stuff? Well one company, Microsoft. Today was one of those days where I needed to patch some Windows 2008 Servers because of the monthly release of “security” patches because Microsoft and other vendors are in this mode of getting shit out as fast as possible and not checking the code. So as normal, I approved the 7 or 8 patches for the July cycle in WSUS, so far so good. The part that blows is that the patches applied and the servers said, hey I need to reboot. This was no big surprise because how often have you applied a Windows patch and not had to reboot? So off to reboot the servers, and this is where this shit hit the fan. All of the sudden the server went in to a boot loop. In the off chance that you can catch the blue screen of death in the fraction of a second that it was on the screen, you would see that it mentioned something about an error 0x000007b and that you may have a virus.

Well, I can guarantee you that the machines don’t have virus’ on them. So investigating the error further it appears that the 0x7b is an error that says that the OS can’t find the hard drive. Which is ironic because it has booted off of it to get that far. This then starts the oh-shit moment. Luckily this was only 1 of 2 Active Directory servers. I spent a while trying to get it to boot buy following all these different articles. To no avail I could not get it to boot up.

The biggest thing that pissed me off was Microsoft used to have a boot mode where you could step through each driver as it was loading and say whether to load it or not. Unfortunately, I can’t find that any where in the F8 menu or any of the other google foo searches. So I tried each of the safe mode options, which each BSoD. I tried Debug Mode, BSoD. I tried to have it log the startup to the ntbtlog.txt, nope, doesn’t even write to it. So now I am extremely pissed, to the point where I just said F@#K it, and started a reinstall of Windows 2008R2 (the environment this was in I could do it). But before I did it I tested the other AD server, yup, it bit the dust too.

Luckily reinstalling W2K8 doesn’t take terribly long.  However it is a pain in the ass getting an entire environment set back up because one patch blew up your servers. So while I was reinstalling these two servers, I decided to test another less critical server on a different network. Guess what it died too with the same error. So now I am thinking about how bad this could have been if I were doing some heavily used servers.  (Once again this stuff isn’t shown in the “tech school” commercials.)

So how do you go forward from this, well there are 2 different type of “tech” people. Those who go home, and start testing every single possibility in their own private lab. Then there are those who don’t give a F and wait for other people to fix their problems as they don’t have the first clue how to fix stuff if a reboot doesn’t fix it.

Can you guess which type of a tech person I am? If you guessed the former, you are correct. First thing I did when I got home from work is created a new W2K8R2 VM and started the OS installing and trying to get it up to the patch level I had the machines at work. But because this is windows that takes FOREVER with all the reboots and waiting for it to “see” the patches offered to it.

The group in the later (those who don’t care and wait for others to fix it) really start to make me mad now days. Now I can say that I spend a lot of my own free time doing a lot of stuff to teach my self practically everything I know about IT, as when I went through school, none of this stuff was taught (Shit, I am a UNIX person, but bought a Microsoft TechNet subscription just to learn as much as I can about Windows Server, etc). But some “IT” people seem to get pissed when I make the notion that they need to learn this stuff on their own at home. It is almost the “how dare you ask me to do something on my free time to better my self when I can sit here and do nothing.” Well that is the only way you are going to better your self, and learn from your mistakes with out affecting something at your work that may affect something with your pay …

 

As I said at the beginning I have been doing IT for close to 20 years now. In that time I have had my hands on the following:

  • Every version of SunOS/Solaris from 4.1.1 up to the current (11)
  • Every version of Microsoft Windows from 3.11 through Server 2012
  • IBM AIX 3.1.2 through 6
  • VM/ESA
  • OpenVMS
  • SGI IRIX
  • Various distributions of Linux (and this is one of my huge pet peeves, but that is for another post)
  • Every version of MacOS from 7 through the current 10.9
  • Practically every version of VMware from the original VMware workstation 1.0 on Linux, to vSphere 5.1 to VMware fusion 6.
  • BeOS
  • OS2/Warp
  • Novell Netware

And that is just Operating systems, some of which don’t even exist any more. The hardware side is so numerous that is hard to even keep track of, but lets just say I got in to computers when an 80286 8MHz was considered fast and bleeding edge, not to mention a Commodore 64, and Atari 800.

 

So what is the moral of this post? Really think if you want to get in to IT, and do you have the thirst for learning and teaching yourself. If you don’t have that and don’t want to spend some times hours a night learning how stuff works, or if spending an entire weekend at work on a nice summer day doing patches is not your thing, please don’t take that type of job. IT is almost like a dedication and devotion, if you don’t have the time to do it, you probably shouldn’t start it.

VMware vsphere and HP XW8600

I last wrote about my VMware home lab back in September, so here is an update. What I have found is that while the HP XW8600 is nice to have all those SAS/SATA connections and the memory, the IO performance is lacking. I currently have 4 SATA drives plugged in to the LSI 1068 Raid card that is on the motherboard. There are 2 1TB drives in a Raid 1, and 2 500GB drives in a Raid 1. But ever since moving to it I have been having really slow IO. As an example last night I was working with a simple MySQL database, it has one table with 2 columns in it. I went to insert 17,000+ rows and it took almost 20 minutes to do it. (On a different server with just IDE drives, it was less than a minute or two do to it.)

So I have been searching most of the weekend to see what I could find, ans there is tidbits of information everywhere on the interwebs. So I thought I would write down what I found and put it in one place for others to find.

It seems that the single biggest problem is “write cache”. Since the LSI 1068 on board raid controller doesn’t have a battery, it has to wait for the disk to report back that the data has been successfully written to the disk. This is complicated by the fact that I have a raid 1 set up, so both disks have to report that it is written and then the controller report back to VMware ok. In other words, there is no “cache” on this controller so the speed is limited to about 20Mb/s.

So how can I fix this? Well since I want the redundancy on the disks, making them single disks, while making it faster would not provide me any security of my data. This could work for a couple of my VM’s that are “disposable” test vm’s. But for ones that I want to keep I would need to keep them on a RAID. So to fix it, I need to find a PCI-E controller that has a cache and battery on it.

So my hunt begins, I will update once I can find one that works well.

Moving VM’s between hosts

About a year ago I purchased a 1U IBM X3550 server to run VMware vSphere 5 on. While it was cool to have a server that had dual quad procs and 8 gig of ram in it, the noise it put off was too much for my family room. (Just think of half a dozen 1 inch fans running at 15,000RPM almost constantly.) Recently I have been spending more time in the family room and the noise has gotten to a level that it is almost impossible to do anything in the room with out hearing it. (Like watch tv, a movie, play a game, etc.) So I started looking at my favorite used hardware site, geeks.com, for a new “server”. Well it finally arrived today, an HP XW8600 workstation. It is another dual quad proc, however it has 16GB of ram, and 12 SATA ports and a larger case, and the best of all, almost absolutely quiet.

So with it installed, I needed to start moving the VM’s from the IBM Server to the HP Server. In an enterprise environment, this usually isn’t a problem as you usually have a shared storage (SAN) that each of the hosts connect to. Well in my little home lab I don’t have shared storage. I did try to use COMSTAR in Solaris 10 to export a “Disk” as an iSCSI target. While this would work, it was going to take forever to transfer 1TB of VM’s from one server to a VM running on my Mac and back to the new server.

So a googling I went, and what I found was a way easier way to copy the VM’s over. ovftool, which runs on Windows, Linux and Mac. What it does is allow you to export and import OVF files to a VMware host. The side benefit of that is that you can export from one and import to another all on one line.

So I downloaded the Mac version and started coping. The basic syntax is like this:


./ovftool -ds=TargetDataStoreName vi://root@sourcevSphereHost/SourceVM vi://root@destvSphereHost

So if one of my VM’s is called mtdew, and I had it thin provisioned on the source host and wanted it the same on the destination host, and my datastore is called “vmwareraid” I would run this:

./ovftool -ds=vmwareraid -dm=thin vi://root@ibmx3550/mtdew vi://root@hpxw8600

where ibmx3550 is the source server and hpxw8600 is the destination server. If you don’t specify the “-dm=thin” then when it is copied over, it will become a “thick” disk, aka us the entire space allocated when created. (I.E. a 50GB disk that only has 10GB in use would still use 50GB if the -dm=thin is not used.)

There are some gotchas that you will have to look out for:

  1. Network configs, I had one VM that had multiple internal network’s defined. Those were not defined on the new server, so there is a “mapping” that you have to do. I decided I didn’t need them on the new server so I just deleted them before I copied it over.
  2. VM’s must be in a powered off state. I tried them in a “paused” state and it did not want to run right.
  3. It takes time, depending on the speed of the network, disk, etc, it will take a lot of time to do this, and the VM’s have to be down while it happens. So definitely not a way to move “production” vm’s unless you have a maintenance window.
  4. It will show % complete as it goes, which is cool, but the way it does it is weird. It will show the % at like 11 or 12 and then I turn my head and all of the sudden it says it is completed.
  5. I did have some issues with a vm that I am not sure what happened to it, but when I try to copy it, I get an error: “Error: vim.fault.FileNotFound”… It may be due to me renaming something on the vm at some point in the past.

Hope this helps some other “home lab user”…

 

 

What happens in Vegas, should have stayed in Vegas

Last week, I went to VMworld 2011 in Las Vegas. The conference was great, 20,000+ people all there and focused on one thing, VMware and every product they offer. This was my first time at the VMworld conference, and hopefully will get to go again some time in the future. The main reason I went was because of the recently released vSphere 5 and seeing what all it offered and what all was changed. Needless to say, there are many cool new features that were added, I am only going to mention a few here, but the full list is available in this PDF.

The first cool feature is : Auto Deploy. Simply said, (wish they would have chose a different name) it is PXE boot of the vSphere image from a TFTP server, so no local disk is required to “run” vSphere. For example if you have a “shit ton” of blades and don’t want to have to go update and install all of them, just get their MAC address, setup the host in DHCP with a couple of DHCP options to tell it where to boot from and have the blade boot from the network. It will download the image from the TFTP server and run automagically. Once up and running all config is stored in vCenter 5 (a requirement!). So need to upgrade your hosts? Just reboot them after updating the image. A couple of notes for this, make sure you have logging set up to go to your syslog server, and that you set up the Dump Collector incase of a PSOD.

Another cool feature is: vSphere 5 supports Apple Xserve servers running OS X Server 10.6 (Snow Leopard) as a guest operating system. This is because vSphere now supports UEFI “bios”. Now “supposedly” this does not require Xserve’s (since Apple no longer sells them), but it “requires” them because of Apple’s EULA for use of Mac OS X.

There are many other features that have been upgraded, or are new.. Too bad the conference wasn’t a little longer, as the amount of sessions I wanted to go to were greater than the amount of time I had available to go to said sessions. (I.E. only one instance of a session and 2 sessions I wanted to see were at the same time.)

The Hands on Lab area was “freaking huge”. There were over 800 workstations set up where you could do 1 of 16 LABS (you could do more, just had to stand in line, I was only able to do 1 in the week I was there). Ironically each “lab” station was a Wyse “chubby client” that had dual monitors so you could rdesktop to some windows XP and servers to do the work. The HOL area, sort of reminded me of the CTF area at DefCon, a huge big room, with nearly no light what so ever and hundreds of thousands of screens.

The most interesting part of the conference is that they have grown so big, that next year they have to go to San Francisco to host the event, as there is no place in Vegas that is big enough to house them. This year it was at the Venetian with some spill over to Wynn. They also had the Sands Expo hall, which is connected to the Venetian. The “dining” room was 1.5 million sq ft alone, you could barely see from one end to the other.

I will have to say out of the many conferences I have been to by different vendors, I will have to say so far VMware has been the best. Some of the things that has made it stand out from the rest:

  1. Food, while not “the greatest ever” it was far better than I have had at other places. They gave us breakfast and lunch every day. In addition the break periods between sessions had different items every day. One day they had fresh hot made pretzel sticks with cheese and different sauces.
  2. Hang out area: Most conferences if there is “downtime” you usually end up either walking around or going back to the hotel. VMware set up a “hang space” where they had a basketball court, badmitten court, huge chess sets, fake grass to sit on in front of a big screen (like 20+feet) TV. A Twitter vMeetup place, where you could meet other people that you have met on twitter.
  3. Scheduled sessions. While I was skeptical at first on “pre-registering” for the sessions you want to attend, I think in the end it was a good idea, as it “guaranteed” your spot in the session as long as you showed up 3 minutes before it started. (There were gaps between end and start, so you really had no reason not to be there.)
  4. Group Discussion: in some conferences, I have seen “group discussion” be these “huge” groups where it ends up being a more Q&A session. VMware had group discussions, where there were maybe max 30 people in a room, each one had a clicker, and everyone voted on how the session went and it was a free form for questions. One of the best ones was the Oracle on VMware vSphere one. I learned a lot from that session.
  5. P.A.R.T.Y. : By far the best conference / vendor party I have ever been to. First was the food, you name it, they probably had it. I didn’t realize this till I had already ate a couple of slices of pizza. Then I saw a station where they were making fresh cut cheese-steak sandwiches, another was doing fresh made crab cakes. Like I said, name it, and it was probably there. In addition, a huge open bar (not that I drink, but it was there). So now that we got past the food, they had at least 4 different acts during the night. Two people doing fire tricks, then the openers was Recycled Percussion, which I didn’t realize who they were till I got back to the hotel room that night, but they were on the America’s Got Talent show, and previously had a show nightly in Vegas. The headliners were The Killers. They played for an hour and did all the “popular” songs along with some that I hadn’t heard before.
    This part of the party ended around 9PM. Which was the start time to the “after party” which was at the Venetian pool. I did not go to it, but it sounded like people had a bunch of fun there too.

So if you are still reading by now, you are probably trying to figure out the second part of the title “… should have stayed in Vegas”. Well, it seems that some time either on Sunday or early Monday morning I either sprained or got a stress fracture in my left foot. Needless to say, the 30+miles of walking I did, (cause my hotel was 2 miles away from the conference hotel, it is a damn long walk from Planet Hollywood to the Venetian even if you take the monorail when your foot it hurting like a Mofo) did not help it any. By the time I got home it was still hurting and I noticed that the top of my foot started to have some swelling and bruising. I just iced it on Saturday and Sunday, but as of today it was still hurting and didn’t seem to change much, so I ended up going to the doctor to have it X-ray’d. They said it didn’t show any fractures, but thought it was just a really bad sprain or a damaged ligament. So it is more ice, and a ankle air cast for a while. So that is what I “wish that it should have stayed in Vegas.”