Why Thin Provisioning is bad

In this day and age everyone is trying to squeeze the last little drop out of every technological advance that they can. One of the technologies that is “big” is called Thin Provisioning. Basic in short terms, thin provisioning is where you tell a computer that you have X GB of disk (usually from a SAN or in VMware) but in reality you only have <X GB of disk backing it. This is big right now in SAN and VMware because enterprise disk is “expensive”. But is it really worth the cost? No!

See the main reason people (SAN or VMware admins) use Thin Provisioning is to “save” disk space. Say you have a server that performs one function and does not really use a lot of disk space, say a DNS server (either virutalized or physical booting from a SAN).  Now most admins usually like to keep all their servers with a standard config. So for the sake of this post, lets say the boot disk for this server is 50GB. Now once the OS and app is installed on it, it may only be using 4 GB of that 50GB disk.

Before thin provisioning that 50GB as far as a SAN admin is concerned is 50GB used. So in comes Thin Provisioning, now the SAN admin says “hey mister computer here is your 50GB disk ;-)” But in reality it only allocates as much space as being used by the server. So now on the SAN instead of a full 50GB “used” only 4GB would be used. Sounds awesome in theory, but what happens when  you add other servers in that same SAN pool (say the pool is 100GB in size). So the server admin gets another “50GB” disk from the SAN, doesn’t realize thin provisioning is in use, so they go on and install that server. Now we have 8GB in use out of the 100GB pool, but in reality all 100GB has been allocated as far as the 2 servers are concerned.

The next part is when the whole process starts to drown. The server admin asks for another disk, this time 200Gb for say a database or code repository server. Well the SAN administrator says “ok here is your 200GB disk ;-)” But put the disk in the same 100GB pool that the other two servers are in because “he knows” you won’t use all “200GB”. We have now over committed disk however the server admin does not know this has happened. Once the third servers OS has been installed (another 4GB) everything seems to be fine, and technically it is because we are only using 12 GB out of the 100GB pool. But in reality the servers are using 300GB of disk, because they are unaware that there is no space issues.

Where the fun starts is when you start loading data in to those disks. Lets say the second server was going to be a small database server, so we load Oracle and create some table spaces. We end up using up about 40 of the 50GB alloted to it. (So now we are up to 48GB of disk used in the 100GB pool). Still technically ok, but with only 52GB free we need to really start worrying about the disks and the servers. The fun begins when we start loading data on to the server with the 200GB disk. Once we get up to 52 GB used in this we have some problems. Basically all the servers will start reporting write errors or other weird issues. The server admin can’t figure out what the problem is because when he looks at the servers he see plenty of “free” space on the servers. When stuff gets really weird is when processes start dying and they won’t start when you try to restart them (maybe they write to a log file, etc). So the first thing the Server admin will try to do is reboot the server. This is where all hell breaks loose…

See when you start rebooting servers it can’t flush out writes to the disk because there is “no” space left to write to. So the file-systems end up becoming corrupted. When the server reboots, it will try to write more to the disk thinking that it has plenty of free space, but again can’t, so stuff starts hanging. So of course a reboot is done again, and again, etc…

So now you start seeing write errors showing up every where on the other servers, and from the looks it may be a SAN issue, like the disk has disappeared. So you call the SAN admin only to find out that you have been thin provisioned.

This my friends is why thin provisioning is bad and should NEVER be used. Yes it may save you some money on disk, but what you save there will be wasted when you have down time rebuilding servers and restoring data.