So one of the phrases which I keep seeing coming up in different social media posts and articles is about “Vendor Lock In”. You can see some of the ones I’ve found via a quick search of the interwebs below, all with their own theme and position.
To me a lot of this may as well be a map from the 1500s saying “here be dragons” (or to be accurate “HC SVNT DRACONES” as it was in Latin). There is something out there when making business decisions to be scared of, but it isn’t dragons…I mean vendor lock in, it is not probably reviewing our business requirements before making a decision
So how real is Vendor Lock In?
Very real, I can guarantee that any purchase you make will have some level of lock in to a technology or a process or something that will be difficult to change:
From my perspective I’m looking at the technology side; but we are of course as free as we want to believe we are.
“A puppet is free as long as he loves his strings.” – Sam Harris, Free Will
Is there a way out?!
Docker and Cloud Foundry are two organisations talking about the ability to remove vendor lock in. This is great for giving you portability between public and private clouds however it brings up that little issue….you are locked into their solution.
Even if there is no or little cost there is a considerable amount of time and effort you have to put into this solution to get it up and running and then, if a feature or entire product is impacted some how, you have to be able to migrate away from it.
The only solution….choose wisely!
In short we have to make our decisions wisely and weigh up all the options without emotion relating to what is best for our business. You could build up on Azure PaaS for example, and have a close working relationship with Microsoft giving you access to updates, new features, automatic scale, hyper-scale etc (shameless plug) or you could look at utilising Docker to enable you to switch between public cloud providers like Azure, Google and AWS depending on where the wind is blowing price wise and geography requirements.
Whatever you do just make sure you properly consider what you are trying to achieve and relate this to your design decision and, this is a big one, when you do make your decision take responsibility for it. Also make sure you have prepared at least a high level exit plan. If vendor X decided to deprecate a product you are using have an idea about what it would take to migrate off the product.
And the ultimate vendor lock in?
Grant Orchard (https://twitter.com/grantorchard) from VMware made a comment about the ultimate vendor lock in, and nailed it! It is…..drumroll……kids! Hopefully that is one kind of vendor lock in that you do like (I know I do)
So we’ve talked about the underlying hardware solution that StorSimple provides at in my first deep dive here, then we moved onto the storage efficiencies, life of a block and cloud integration here so in my third, and for now, final deep dive post I’m going to touch on how StorSimple provides the mechanism to efficiently backup, restore and even provide a DR solution without the need for secondary or tertiary sites and data centres.
Fingerprints, chunks and SnapShots…we know where your blocks live
StorSimple fingerprints, chunks and tracks all blocks that are written to the appliance. This allows it to take very efficient local snapshots that take up no space and have no performance impact. It doesn’t have to go out and read through all meta data to work out what blocks or files have changed, like a traditional backup. Reading through file information is one of the worst enemies of backing up unstructured data, if you have millions of files (which is common) it can take hours just to read through the data to work out what files have changed before you back up a single file. So StorSimple can efficiently give you local points in time for quick restores which are near instant to backup and restore from.
I disagree! A snapshot doesn’t count!
However it is my opinion that a snapshot is not a backup…so why the heck is my blog title about backup, restore and DR?! It is because I also believe that a snapshot is a backup if it is replicated. StorSimple provides another option for snapshots called “Cloud Snapshots”. This takes a copy of all the data on a single volume, or multiple volumes, up to Windows Azure, including all the metadata. Obviously the first cloud snapshot is the whole data set, we make this easier as all the data is deduplicated, compressed and protected with AES 256 bit encryption. After this first baseline only unique block changes, which are optimized with dedupe and compression and then encrypted, are taken up to Windows Azure. These cloud snapshots are policy based and can be kept for hours, days, weeks, months or years as required.
Data is offsite and multiple points of time are available and generally backup windows are reduced. Once data gets into Azure we further protect your information. Azure storage, by default, is configured with geo-replication turned on. This means that three copies of any block of data are copied to the primary data centre and three copies of any block of data are also copied to the partner data centre, even if you turn it off you still have three copies of your data sitting in the primary data centre. This means at least three, but generally six, copies of all data reside in Azure.
So we have simple, efficient and policy driven snapshots and all snapshot data replicated six times, across different geographies…I think I can safely call this a backup and probably with more resiliency than most legacy tape or local disk based backup systems customers are using now.
And now how do I restore my data?
So the scenario is someone requires some files back from months ago, or even a year ago. It is maybe a few GB at most but we still want to get it back quickly and easily, and the user also wants to search the directory structure too for some relevant information.
StorSimple offers the ability to connect to any of the cloud snapshots, create a clone and mount it to a server. This clone will not pull down any data, apart from any metadata which is not already on the StorSimple solution, so is extremely efficient. All data however will appear local and you can browse the directory structure and only copy back the files that are required….and all the blocks that constitutes these files is deduplicated, compressed AND only the blocks which are unique and not already located on the StorSimple solution are required to be copied back.
The process is as simple as going to your cloud snapshots in the management console, selecting the point in time you wish to recover and selecting “clone”. You will then be prompted for a mount point or drive letter and within seconds the drive is mounted up. Couldn’t be simpler!
How does this provide a DR solution?
Cloud Snapshots can be set up with an RPO as low as 15 minutes (rate of change and bandwidth dependent). In the event of a DR where your primary DataCentre is a smoking hole in the ground, or washed away in a cataclysmic tidal wave/tsunami, another StorSimple appliance can then connect up to any one of those cloud snapshots and mount it up. All it needs is an internet connection, the Azure storage blob credentials and the encryption key that was used to encrypt the data.
The StorSimple solution then only pulls down the metadata, which is a very small subset and very quick to download, and bingo all your data and files can be presented and appear to be local. Then as the users start opening their memes of their cats and other images to create YOLO memes the optimised blocks are then downloaded and cached locally on the StorSimple appliance. In this fashion the StorSimple appliance starts re-caching all the hot data which is requested and doesn’t have to pull down data which is cold as well.
My personal opinion is that we will only see enhancements to this solution; imagine being able to do this DR scenario all from out of Windows Azure, suddenly having a physical DR site and hardware no longer matters….now that would be cool
Extra benefit…faster tiering to the cloud!
In the previous deep dive here I talk about how StorSimple tiers data to the cloud based on it’s weighted storage layout algorithm but tries to keep as much data as possible locally so it provides optimal performance for hot and warm data. In the event that you want to copy a large amount of data to a StorSimple appliance, more than the available space left on the StorSimple appliance, you won’t have to wait for data to be tiered to be moved to the cloud if you have been taking cloud snapshots.
Where this already a copy of a block of data in the cloud, from a cloud snapshot, and it has to be tiered up only the metadata will change, to point to the block in the cloud, and no data will have to be uploaded letting you have your cake and eat it too. You get the efficiencies of tiering cold data to the cloud but the ability to still copy large amounts of data to the appliance without large data transfers immediately following the process.
Have your say
Don’t agree with me or agree with me about a snapshot being backup? Don’t like me using stupid sayings? Give your opinion below
Since I set up my MSDN account a few weeks ago I’ve been using it as my test lab to try out some things, including re-acquainting myself with Windows Server clustering, but with 2012 which is brand spanking new to me. I thought I’d do a quick post to show just how simple it is to create a virtual machine within Azure, with the newly available IaaS functionality. It is an amazingly simple process and you can go from login Azure to connecting to your new VM in under 5 minutes.
Set up your Azure account. You can do this with a credit card, and get a 60 day free trial and then pay as you go, or purchase an MSDN subscription and get a significant amount of Azure credits to utilise each month.
Once you have created your Azure account login to the Portal, which you can see below. This is pretty straight forward to use and you can see some of the services available on the left hand side. In this case select “Virtual Machines”. You can see a Windows VM I already created earlier is running right now.
A whole range of default VMs are available, as you can see below. This includes Windows and Linux VMs and many of them have been configured with the application service you require, even SharePoint Server 2013. For the new VM I’ve created I’m selecting Ubuntu server 13.04 as I haven’t touched Ubuntu in a few years and wanted to see what has changed.
Now I select my settings around version release date, host name, certificate, local credentials, size of VM (memory/CPU), Storage pool to use (including DataCentre Region), public DNS record and if I want to make this part of an availability set.
This is it. Now it kicks off the provisioning of the virtual machine
Once created at any stage you can go in and change the performance configuration, disk attached, view performance metrics and shut down/start up the VM.
My VM is created now how do I connect and start using it?!
For my Ubuntu VM I use Putty and connect to the public DNS record created by Azure and then simply login and get working. If this was a windows server I would use RDP and connect to the DNS name and RDP port specified by Azure. Could not be easier!
Sometime a visual helps to understand just how things work, and this Windows Azure poster is no exception! If you were curious about the different services and examples of architecture that you can get within Azure this is a great place to start before diving deeper. Screen grab below shows a part of the poster but download the full .pdf file at http://www.microsoft.com/en-us/download/details.aspx?id=35473
Been an exciting few weeks since my first deep dive post into StorSimple last month and a very busy end to the Microsoft year! Been great seeing the interest from customers when we start discussions about being able to leverage the cloud without having to change the way their application architecture.
In my first StorSimple deep dive, found here, I talked about StorSimple from a hardware and platform perspective. In this post I want to talk about the efficiencies that StorSimple offers.
What do you mean by efficiency?
When you talk about efficiencies it can mean many things to many different people. In this case I’m talking about the way StorSimple optimises data for performance, capacity and moves data between tiers based on a smart algorithmic approach where it views data at a block (sub LUN/file) level . That is really good…but of course other companies do this as well, so what differentiates StorSimple? The biggest differentiator here is that the lowest tier isn’t SATA/NL-SAS but the cloud. And when we are talking about low cost, highly available storage, it is hard to beat the economies of scale that the cloud can offer and of course it goes with out saying <AzurePlug> that Windows Azure is the best example of this with our price structure the same for any DC in the world and the fact we geo replicate all data by default </AzurePlug>.
How much of my data is really hot?
When data is created it is hot and will be referenced quite often however, after some time, this data generally grows cold very quickly. You only want to look at those old photos of your cat, when you have a new meme to create, on rare occasion. Generally anything north of 85% of the data can be cold. Keeping this data unoptimised and on local disk is obviously not the most effective use of your technology budget.
Life of a Block with StorSimple
So into the deep dive bit! I’m going to talk about the life of a block and how we treat this block, from a StorSimple perspective. I’ve done a series of whiteboards…or drawn on my screen with <MS_Plug>my touch enabled windows 8 laptop</MS_Plug> below to explain how we treat a block. Block sizes on StorSimple are variable, and we generally select the block size based on the kind of workload running on a specific volume (LUN). StorSimple deduplicates, compresses and encrypts data before it is tiered off to the cloud; this generally provides between 3x to 20x space savings, dependent on workload.
A block is first written into NVRAM (battery backed DRAM that is replicated between controllers).
The blocks are then written down on to a Linear tier, which is eMLC SSD drives; so low latency and high IO.
Blocks of data generally don’t stay on the linear tier for long, unless they are subject to continuous IO requests. Blocks are taken, near immediately, down to the dedupe tier. This data remains on SSD, with the low latency and performance you expect, but the data is deduplicated in line before arriving here on a block level, providing significant space savings.
As the blocks start to cool then are then taken to a SAS tier and compressed in line on the way down there. This all happens on a block (sub file/LUN) level so if a VM system file, for example, was located on StorSimple the parts of that installable which are hot remain on SSD while the majority of the capacity that is infrequently accessed will be taken to SAS.
As the StorSimple appliance starts to use it’s local capacity it will then encrypt the coldest blocks of data and tier them off to the Cloud using RESTful APIs. When this is Windows Azure this means three copies of the data will be kept in the primary data centre and three copies of the data will be in the partner data centre by default. Suddenly you can use the cost and availability efficiencies of the cloud, without having to change your application, operating system or, most importantly, the way you view your files. The data is encrypted with AES256 bit encryption and private key is specified by the customer.
Then in the event that is called back from the cloud it will be a seamless process. The metadata, which is always stored locally on the appliance, knows exactly which block of data is required. StorSimple will make a RESTful API call over HTTPS to bring the data back to the appliance in a very efficient manner, as the data is compressed, deduplicated and there is no need to search for the location of the block of data. Not only will the block you require be recalled but other corresponding blocks will be pre-fetched back for further performance optimisation based on the read pattern. For the below example I’ve show block “F” being recalled to the local appliance. This data will be stored on the deduped SSD tier as it is now hot data once more. This back end process is totally transparent and the only thing that will be that will occur is slightly higher latency on the blocks which have been tiered to the cloud.
How do we know who is insane cold…and how
Easy! Whoever has this stamp is insane…
Now that my obsession with the Simpsons is addressed how do we decide what blocks are cold, when they are tiered and what happens, from a performance perspective, when blocks of data need to be accessed and they are located on a Cloud provider.
StorSimple uses a Weighted Storage Layout (WSL). This then goes through the below process:
BlockRank – All volume data is dynamically broken into “chunks”, analyzed and weighted based on frequency of use, age, and other factors
Frequently used data remains on SSD for fast access
Less frequently used data compressed and stored on SAS
As appliance starts to fill up optimised data is encrypted and tiered to the cloud
But what about what I think?!
Automation is great, but what if I want to manually influence the priority around data sets are tiered to the cloud? StorSimple offers a solution for this as well. You manually can specify a volume to “local preferred” so it is the last data set that will be tiered to the cloud, and only tiered off if all other datasets have been tiered off and the local capacity of the appliance is reaching capacity.
Examples of data sets you might set to prefer local are:
“Rest “Azure-d” we’re coming Down Under! We’re excited to announce that we are expanding Windows Azure with a new region for Australia. This will allow us to deliver all Windows Azure services from within Australia, to make the cloud work even better for you! The new Windows Azure Australia Region will be made up of two sub-regions in New South Wales and Victoria. These two locations will be geo-redundant, letting you store and back up data across the two sites. We know you’ve been asking for it, and we look forward to making this available to all Windows Azure users.”
With the launch of Windows Azure regions in Australia today it brings another option for customers looking at true cloud services where data can reside in Australian; particularly if they have any concerns about with where data resides and latency.
In my opinion this is a good thing for Australian consumers as when there is competition the consumer wins through price and innovation. It does help me when talking Cloud Integrated Storage (CIS) to customers.
With StorSimple we obfuscate and encrypt any data before it goes into Azure, with the customer having sole knowledge and ownership of the private keys to decrypt the data, but there are still many instances of customers insisting that data remain on Australian soil. There are different driving factors behind this thought process which in some cases I do and don’t believe to be valid however it is still a hurdle we face. With the local regions now being available this challenge also goes away.
Network Cost and Performance
Data sovereignty is not the only thing that having local DCs helps with, it also helps with your network performance (latency) and potentially cost. Suddenly it is possible to get below 100ms latency from anywhere in the country and far less if you are located in Sydney or Melbourne. It also brings about the possibility of peering agreements with network service providers, including Telstra, AARNet, Pipe and others. I’m not saying when this will happen, or even that it will happen, as I don’t know, but it is a real possibility with DCs that are onshore.
Azure vs competition*
Since starting with Microsoft a little over two months ago I’ve been more impressed with every new feature and functionality that we can provide with Azure. When I look at what we provide with Azure for Australian hosted customers I see some clear value propositions:
1) Cost – other cloud providers with Australian DCs will charge different usage rates depending on which Geo you use (at least when I checked last week). With Azure we charge the same rate around the world, in US dollars, no matter which Geo you select.
2) Protection and redundancy – Azure Regions are always released in pairs…and for a very simple reason. Everything in Windows Azure Storage is replicated twice (to form a total of three copies) on separate hardware in the local Regions. Windows Azure then takes care of DataCentre redundancy, and takes in consideration the fault domains. This means that one copy is always in another geo-location. So if using the datacentres in Australia you will have three copies of your data in Sydney and three copies of your data in Melbourne by default.
3) Breadth – The variety of different services that we can provide with Azure are huge. The variety of different offers we can provide with Azure are huge….from PaaS (which supports Python, .NET, Java and many other development languages), Websites, IaaS, Mobile Services, Media, Storage and even Big Data as a Service.
Some of my colleagues have also blogged about this and you can read their blogs here:
* When talking about cloud service providers I work on the assumption that a cloud service provider can provide PaaS, IaaS, Storage and other services which are manageable and accessible via a self service portal AND RESTful API stack.
In my last post I did an introduction to Cloud Integrated Storage, the value proposition and how we address this at Microsoft. I figured for the next few posts I’d get into a bit of detail about StorSimple, which is the way we provide Cloud Integrated Storage to customers. This is a company that Microsoft acquired in October last year with much of the core team based out of Mountain View, CA. I was lucky enough to get over there in my second week in the role and meet the team, was great to get in there and hear how the solution came about and some of the success they have already had. This including a soothing ale with Marc Farley, who is currently writing a book on Cloud Integrated Storage.
StorSimple is a highly available, enterprise class iSCSI SAN. It is Flash optimised with redundant controllers, disks configured at RAID10 and hot spares….things you’d expect from your enterprise SAN.
We currently provide between 2TB of useable capacity to 20TB of useable capacity on premise, at this stage, which is flash optimised, deduplicated and compressed….meaning you will be able to realise much more capacity both on premise and within Azure; generally anything from 3x to 20x space savings. There are soft limits on the amount of Azure capacity we can address and this ranges from 100TB on the 2TB appliance all the way to 500TB on the 20TB appliance.
I think it is great that Microsoft is embracing open standards with StorSimple. It is certified for and supports both Windows and VMware (VSS and vStorage APIs for Data Protection respectively are utilised) and it is still an open system that not only can connect to Azure, but still supports connectivity to Atmos, OpenStack, HP, AWS and other cloud providers. This is a big part of StorSimple’s DNA and, prior to acquisition, it was not only the Azure go to solution for Cloud Integrated Storage, it was the AWS go to solution. That said why would you want to use anything but Azure…but more on that in a future post.