Check the previous article on how to create your VHD so that Azure will like it.
One of the features of Azure, Microsoft’s cloud offering, is the ability to upload your very own disk image in VHD format. This allows you to prepare an OS to your taste and use it just liek you would on a bare metal server.
The VHD disk may come in two distinct flavors:
- A fixed disk: a smile, linear disk image. This is what you gen when you rip an existing disk. The main drawback comes when you need to create a new disk from scratch: the VHD will be just as large as the disk you create – no matter if the space is used or empty. For example, if you need a 100 GB disk but only 1 GB of data on it, you will still end up with a 100 GB VHD. It consumes space and, more importantly, a lot of time to upload to the cloud.
- A dynamic disk: an extract of the fixed disk with all unused sectors skipped. A special footer inside the file helps then recreate the full disk out of this extract. Some hypervisors may not even need to extract it – they can continue to add data to the dynamic, modifying the list of used sectors in the footer. For example, your 100 GB disk with only 1 GB of data will now ‘weight’ exactly 1 GB.
The Problem
So, the dynamic VHD is great for both storing the image and uploading it to the cloud? Well, almost – with Microsoft nothing is what it initially looks like. The official documentation on Microsoft Azure says that… only fixed images are supported. So, if you want a 100 GB disk with 1 GB data inside it, you still have to waste your time and bandwidth to upload the 100 GB. That’s rather silly for 2015, isn’t it?
The Remedy
Yet it seems Microsoft guys are not that stupid. If you go against their advice and try to upload a dynamic image using their CLI, you’ll be surprised to find it works! The image will be expanded on-the-fly, so you will still only upload your actual data (1 GB in our example), but on the server side you will have a 100 GB disk.
To most of the users this will be sufficient; however, if you want a multi-account provisioning system for Azure, using the CLI is a no-go – you either need the REST API or some wrapper around it (like the Node.js SDK). Looking into both of them reveals an unpleasant surprise: there is no method to upload a dynamic image and if you use the standard method do upload a file, your dynamic VHD won’t get expanded and, subsequently, won’t work. So, the conclusion is simple: Microsoft have intentionally crippled their REST API and SDK by not providing a direct mean to upload a dynamic VHD.
Still, knowing that the CLI does it, it means there is a way. Since there is a Node.js CLI, reverse-engineering it is not that hard. With some additional info from the official docs it’s not very difficult to reach the following conclusions:
- The weird page blob (what a name, eh!) with a page size of mere 512 bytes obviously exists exactly for our case (a standard disk sector is, you know, 512 bytes).
- A disk image must be uploaded as a page blob. Using the more obvious block blob (which allows for segments of up to 4 MB) results in error when you try to use the uploaded file as VM disk.
- A page blob of almost any size can be created on the server side with a simple command.
- You can always write to any page of the page blob.
From here we can derive a proper strategy for expanding the dynamic image while uploading it:
- Create on the server a blob of the desired size (you may read the value from the header of the dynamic VHD). You need to add one additional sector for the footer (which is not included in the dynamic VHD).
- Create a 512-byte VHD footer and upload it as the last page of the page blob. The footer is surprisingly easy to create, it only stores several constants (among the others, the disk type – dynamic or fixed).
- Read the dynamic VHD and determine where in the expanded disk each of its sectors should go. This is also easy to do, since a handy map is available. Then upload each of the sectors to its appropriate page in the page blob.
What will happen to the pages you haven’t written to? Nothing special, hey will remain zeroed. This is perfectly for for a VHD.
Functions to create a page blob on the server and to write to a specific page inside it are readily available both in the REST API and in the SDK. Do the steps 1-3 from above and voila! – you have expanded a dynamic VHD on the fly, creating a fixed disk on the server. You can now use your VHD as usual.
Bonus Track
Since the actual data is uploaded at small chunks of 512 bytes a time, to speed up the upload, use multiple concurrent connections. Microsoft in their CLI use up to 128 a time.
The only unanswered question that remains is why didn’t Microsoft bother to add a simple warper function in its API and SDK which does what I have just described? But, on the other hand, we cherish the fun of reverse engineering.