So one of the phrases which I keep seeing coming up in different social media posts and articles is about “Vendor Lock In”. You can see some of the ones I’ve found via a quick search of the interwebs below, all with their own theme and position.
To me a lot of this may as well be a map from the 1500s saying “here be dragons” (or to be accurate “HC SVNT DRACONES” as it was in Latin). There is something out there when making business decisions to be scared of, but it isn’t dragons…I mean vendor lock in, it is not probably reviewing our business requirements before making a decision
So how real is Vendor Lock In?
Very real, I can guarantee that any purchase you make will have some level of lock in to a technology or a process or something that will be difficult to change:
From my perspective I’m looking at the technology side; but we are of course as free as we want to believe we are.
“A puppet is free as long as he loves his strings.” – Sam Harris, Free Will
Is there a way out?!
Docker and Cloud Foundry are two organisations talking about the ability to remove vendor lock in. This is great for giving you portability between public and private clouds however it brings up that little issue….you are locked into their solution.
Even if there is no or little cost there is a considerable amount of time and effort you have to put into this solution to get it up and running and then, if a feature or entire product is impacted some how, you have to be able to migrate away from it.
The only solution….choose wisely!
In short we have to make our decisions wisely and weigh up all the options without emotion relating to what is best for our business. You could build up on Azure PaaS for example, and have a close working relationship with Microsoft giving you access to updates, new features, automatic scale, hyper-scale etc (shameless plug) or you could look at utilising Docker to enable you to switch between public cloud providers like Azure, Google and AWS depending on where the wind is blowing price wise and geography requirements.
Whatever you do just make sure you properly consider what you are trying to achieve and relate this to your design decision and, this is a big one, when you do make your decision take responsibility for it. Also make sure you have prepared at least a high level exit plan. If vendor X decided to deprecate a product you are using have an idea about what it would take to migrate off the product.
And the ultimate vendor lock in?
Grant Orchard (https://twitter.com/grantorchard) from VMware made a comment about the ultimate vendor lock in, and nailed it! It is…..drumroll……kids! Hopefully that is one kind of vendor lock in that you do like (I know I do)
StorSimple, since acquisition by Microsoft at the end of 2012, has been a great success so far with over a thousand customers now adopting the technology world-wide to reduce their financial challenges around data storage, data management and data growth.
It is with a lot of excitement that we have now launched today our next version, Microsoft Azure StorSimple. This sees a range of advances in the platform, management and DR capabilities. With the most recent IDC/EMC Digital Universe study we are seeing data growth sits at 40% year over year, on average. This ever expanding data growth sees hardware upgrades that never stop and the threat of going over a capacity cliff, as well as increasing software licensing costs, administration effort and facilities cost.
The Microsoft Azure StorSimple Solution
StorSimple continues to be an on premise storage array that integrates seamlessly with existing physical and virtual machines. It can be put into production in a matter of hours and, with no application modification, allow you to start leveraging the cloud for storing cold data, backing up data via cloud snapshots and as a location to retrieve data from in the event of a DR.
Microsoft Azure StorSimple also provides intelligence around how it treats data. As per my previous blog posts on StorSimple, Microsoft Azure StorSimple starts data off in SSD and then intelligently tiers data, at a block level, between SSD, SAS and the cloud; but it also provides inline deduplication, compression and (prior to moving data to the cloud) encryption.
In summary the StorSimple solution provides the below, without the need for any application modification or additional software.
Highly available primary storage array
Optimisation of data via in line deduplication and encryption
Tiering (and encryption) of cold data to the Cloud
Backing up data to the cloud via cloud snapshots
Ability to recover data from the cloud, for DR, from anywhere
The new Microsoft Azure StorSimple platform, labelled the 8000 series, introduces three new models and changes to the management. So what do these new releases bring us?
10GbE interfaces – this is a feature which has been requested numerous times by our customers
Unified management of multiple appliances, via the Microsoft Azure StorSimple manager
Increased performance – 2.5 times increase in internet bandwidth capabilities
Higher capacity hybrid storage arrays
The 8100 comes with 15TB of useable capacity (before dedupe or compression, as well as flash optimisation) and can address up to 200TB of Cloud storage
The 8600 comes with 40TB of useable capacity (before dedupe or compressions, as well as flash optimisation) and can address up to 500TB of Cloud storage
A virtual appliance available as a service in Azure that can access data that has been uploaded by an 8000 series array
With a new platform comes expanded use cases. Previously the main use cases were file, archive and SharePoint (and other document management products).
With the 8000 series we now include support SQL Server and virtual machine use cases. Also, due to the virtual appliance, we can now start running some Azure specific use cases as well, using your data from on premise. This includes DR (and DR Testing), Cloud applications and Dev/Test workloads.
Disaster Recovery and IT agility
So now you have a copy of your data in Azure via cloud snapshot…what can you do next? This is one of the best parts around the new release, the ability to access your data within Azure. How can this be used?
Having a dedicated DR site, especially for small and mid size organisations, is a huge financial strain. The ability to access and present your data inside Azure means that customers no longer need a secondary site and storage array. They can present the data up to the virtual appliance and very quickly be able to access their data from their last cloud snapshot
Development and Testing
Data has mass and moving it to compute can be slow. With the StorSimple solution a copy of your data already resides in Azure so you have the ability to quickly spin up VMs in Azure IaaS and for them to access your data for testing and development. You could potentially have a full test/dev environment Textup and running, in the cloud, in a matter of minutes or hours, rather than days or weeks.
On Demand Infrastructure
No need to provision infrastructure in advance, with a full copy of all your enterprise data available in Azure spin up VMs for projects and special requirements quickly and easily….
Some other great blogs, articles and information can be found here
So we’ve talked about the underlying hardware solution that StorSimple provides at in my first deep dive here, then we moved onto the storage efficiencies, life of a block and cloud integration here so in my third, and for now, final deep dive post I’m going to touch on how StorSimple provides the mechanism to efficiently backup, restore and even provide a DR solution without the need for secondary or tertiary sites and data centres.
Fingerprints, chunks and SnapShots…we know where your blocks live
StorSimple fingerprints, chunks and tracks all blocks that are written to the appliance. This allows it to take very efficient local snapshots that take up no space and have no performance impact. It doesn’t have to go out and read through all meta data to work out what blocks or files have changed, like a traditional backup. Reading through file information is one of the worst enemies of backing up unstructured data, if you have millions of files (which is common) it can take hours just to read through the data to work out what files have changed before you back up a single file. So StorSimple can efficiently give you local points in time for quick restores which are near instant to backup and restore from.
I disagree! A snapshot doesn’t count!
However it is my opinion that a snapshot is not a backup…so why the heck is my blog title about backup, restore and DR?! It is because I also believe that a snapshot is a backup if it is replicated. StorSimple provides another option for snapshots called “Cloud Snapshots”. This takes a copy of all the data on a single volume, or multiple volumes, up to Windows Azure, including all the metadata. Obviously the first cloud snapshot is the whole data set, we make this easier as all the data is deduplicated, compressed and protected with AES 256 bit encryption. After this first baseline only unique block changes, which are optimized with dedupe and compression and then encrypted, are taken up to Windows Azure. These cloud snapshots are policy based and can be kept for hours, days, weeks, months or years as required.
Data is offsite and multiple points of time are available and generally backup windows are reduced. Once data gets into Azure we further protect your information. Azure storage, by default, is configured with geo-replication turned on. This means that three copies of any block of data are copied to the primary data centre and three copies of any block of data are also copied to the partner data centre, even if you turn it off you still have three copies of your data sitting in the primary data centre. This means at least three, but generally six, copies of all data reside in Azure.
So we have simple, efficient and policy driven snapshots and all snapshot data replicated six times, across different geographies…I think I can safely call this a backup and probably with more resiliency than most legacy tape or local disk based backup systems customers are using now.
And now how do I restore my data?
So the scenario is someone requires some files back from months ago, or even a year ago. It is maybe a few GB at most but we still want to get it back quickly and easily, and the user also wants to search the directory structure too for some relevant information.
StorSimple offers the ability to connect to any of the cloud snapshots, create a clone and mount it to a server. This clone will not pull down any data, apart from any metadata which is not already on the StorSimple solution, so is extremely efficient. All data however will appear local and you can browse the directory structure and only copy back the files that are required….and all the blocks that constitutes these files is deduplicated, compressed AND only the blocks which are unique and not already located on the StorSimple solution are required to be copied back.
The process is as simple as going to your cloud snapshots in the management console, selecting the point in time you wish to recover and selecting “clone”. You will then be prompted for a mount point or drive letter and within seconds the drive is mounted up. Couldn’t be simpler!
How does this provide a DR solution?
Cloud Snapshots can be set up with an RPO as low as 15 minutes (rate of change and bandwidth dependent). In the event of a DR where your primary DataCentre is a smoking hole in the ground, or washed away in a cataclysmic tidal wave/tsunami, another StorSimple appliance can then connect up to any one of those cloud snapshots and mount it up. All it needs is an internet connection, the Azure storage blob credentials and the encryption key that was used to encrypt the data.
The StorSimple solution then only pulls down the metadata, which is a very small subset and very quick to download, and bingo all your data and files can be presented and appear to be local. Then as the users start opening their memes of their cats and other images to create YOLO memes the optimised blocks are then downloaded and cached locally on the StorSimple appliance. In this fashion the StorSimple appliance starts re-caching all the hot data which is requested and doesn’t have to pull down data which is cold as well.
My personal opinion is that we will only see enhancements to this solution; imagine being able to do this DR scenario all from out of Windows Azure, suddenly having a physical DR site and hardware no longer matters….now that would be cool
Extra benefit…faster tiering to the cloud!
In the previous deep dive here I talk about how StorSimple tiers data to the cloud based on it’s weighted storage layout algorithm but tries to keep as much data as possible locally so it provides optimal performance for hot and warm data. In the event that you want to copy a large amount of data to a StorSimple appliance, more than the available space left on the StorSimple appliance, you won’t have to wait for data to be tiered to be moved to the cloud if you have been taking cloud snapshots.
Where this already a copy of a block of data in the cloud, from a cloud snapshot, and it has to be tiered up only the metadata will change, to point to the block in the cloud, and no data will have to be uploaded letting you have your cake and eat it too. You get the efficiencies of tiering cold data to the cloud but the ability to still copy large amounts of data to the appliance without large data transfers immediately following the process.
Have your say
Don’t agree with me or agree with me about a snapshot being backup? Don’t like me using stupid sayings? Give your opinion below
Been an exciting few weeks since my first deep dive post into StorSimple last month and a very busy end to the Microsoft year! Been great seeing the interest from customers when we start discussions about being able to leverage the cloud without having to change the way their application architecture.
In my first StorSimple deep dive, found here, I talked about StorSimple from a hardware and platform perspective. In this post I want to talk about the efficiencies that StorSimple offers.
What do you mean by efficiency?
When you talk about efficiencies it can mean many things to many different people. In this case I’m talking about the way StorSimple optimises data for performance, capacity and moves data between tiers based on a smart algorithmic approach where it views data at a block (sub LUN/file) level . That is really good…but of course other companies do this as well, so what differentiates StorSimple? The biggest differentiator here is that the lowest tier isn’t SATA/NL-SAS but the cloud. And when we are talking about low cost, highly available storage, it is hard to beat the economies of scale that the cloud can offer and of course it goes with out saying <AzurePlug> that Windows Azure is the best example of this with our price structure the same for any DC in the world and the fact we geo replicate all data by default </AzurePlug>.
How much of my data is really hot?
When data is created it is hot and will be referenced quite often however, after some time, this data generally grows cold very quickly. You only want to look at those old photos of your cat, when you have a new meme to create, on rare occasion. Generally anything north of 85% of the data can be cold. Keeping this data unoptimised and on local disk is obviously not the most effective use of your technology budget.
Life of a Block with StorSimple
So into the deep dive bit! I’m going to talk about the life of a block and how we treat this block, from a StorSimple perspective. I’ve done a series of whiteboards…or drawn on my screen with <MS_Plug>my touch enabled windows 8 laptop</MS_Plug> below to explain how we treat a block. Block sizes on StorSimple are variable, and we generally select the block size based on the kind of workload running on a specific volume (LUN). StorSimple deduplicates, compresses and encrypts data before it is tiered off to the cloud; this generally provides between 3x to 20x space savings, dependent on workload.
A block is first written into NVRAM (battery backed DRAM that is replicated between controllers).
The blocks are then written down on to a Linear tier, which is eMLC SSD drives; so low latency and high IO.
Blocks of data generally don’t stay on the linear tier for long, unless they are subject to continuous IO requests. Blocks are taken, near immediately, down to the dedupe tier. This data remains on SSD, with the low latency and performance you expect, but the data is deduplicated in line before arriving here on a block level, providing significant space savings.
As the blocks start to cool then are then taken to a SAS tier and compressed in line on the way down there. This all happens on a block (sub file/LUN) level so if a VM system file, for example, was located on StorSimple the parts of that installable which are hot remain on SSD while the majority of the capacity that is infrequently accessed will be taken to SAS.
As the StorSimple appliance starts to use it’s local capacity it will then encrypt the coldest blocks of data and tier them off to the Cloud using RESTful APIs. When this is Windows Azure this means three copies of the data will be kept in the primary data centre and three copies of the data will be in the partner data centre by default. Suddenly you can use the cost and availability efficiencies of the cloud, without having to change your application, operating system or, most importantly, the way you view your files. The data is encrypted with AES256 bit encryption and private key is specified by the customer.
Then in the event that is called back from the cloud it will be a seamless process. The metadata, which is always stored locally on the appliance, knows exactly which block of data is required. StorSimple will make a RESTful API call over HTTPS to bring the data back to the appliance in a very efficient manner, as the data is compressed, deduplicated and there is no need to search for the location of the block of data. Not only will the block you require be recalled but other corresponding blocks will be pre-fetched back for further performance optimisation based on the read pattern. For the below example I’ve show block “F” being recalled to the local appliance. This data will be stored on the deduped SSD tier as it is now hot data once more. This back end process is totally transparent and the only thing that will be that will occur is slightly higher latency on the blocks which have been tiered to the cloud.
How do we know who is insane cold…and how
Easy! Whoever has this stamp is insane…
Now that my obsession with the Simpsons is addressed how do we decide what blocks are cold, when they are tiered and what happens, from a performance perspective, when blocks of data need to be accessed and they are located on a Cloud provider.
StorSimple uses a Weighted Storage Layout (WSL). This then goes through the below process:
BlockRank – All volume data is dynamically broken into “chunks”, analyzed and weighted based on frequency of use, age, and other factors
Frequently used data remains on SSD for fast access
Less frequently used data compressed and stored on SAS
As appliance starts to fill up optimised data is encrypted and tiered to the cloud
But what about what I think?!
Automation is great, but what if I want to manually influence the priority around data sets are tiered to the cloud? StorSimple offers a solution for this as well. You manually can specify a volume to “local preferred” so it is the last data set that will be tiered to the cloud, and only tiered off if all other datasets have been tiered off and the local capacity of the appliance is reaching capacity.
Examples of data sets you might set to prefer local are:
In my last post I did an introduction to Cloud Integrated Storage, the value proposition and how we address this at Microsoft. I figured for the next few posts I’d get into a bit of detail about StorSimple, which is the way we provide Cloud Integrated Storage to customers. This is a company that Microsoft acquired in October last year with much of the core team based out of Mountain View, CA. I was lucky enough to get over there in my second week in the role and meet the team, was great to get in there and hear how the solution came about and some of the success they have already had. This including a soothing ale with Marc Farley, who is currently writing a book on Cloud Integrated Storage.
StorSimple is a highly available, enterprise class iSCSI SAN. It is Flash optimised with redundant controllers, disks configured at RAID10 and hot spares….things you’d expect from your enterprise SAN.
We currently provide between 2TB of useable capacity to 20TB of useable capacity on premise, at this stage, which is flash optimised, deduplicated and compressed….meaning you will be able to realise much more capacity both on premise and within Azure; generally anything from 3x to 20x space savings. There are soft limits on the amount of Azure capacity we can address and this ranges from 100TB on the 2TB appliance all the way to 500TB on the 20TB appliance.
I think it is great that Microsoft is embracing open standards with StorSimple. It is certified for and supports both Windows and VMware (VSS and vStorage APIs for Data Protection respectively are utilised) and it is still an open system that not only can connect to Azure, but still supports connectivity to Atmos, OpenStack, HP, AWS and other cloud providers. This is a big part of StorSimple’s DNA and, prior to acquisition, it was not only the Azure go to solution for Cloud Integrated Storage, it was the AWS go to solution. That said why would you want to use anything but Azure…but more on that in a future post.
Chris Mellor from The Register (http://theregister.co.uk) recently wrote an article about cloud storage starting to savage the market share of traditional/legacy storage vendors. Link to the article is here.
It is predicting that due to the economies of scale and custom built hardware, out of commodity parts, cloud providers will be able to provide far cheaper storage than any traditional array vendor. An estimated market value graph was included, with cloud intersecting legacy around 2024.
Cloud Storage v Legacy Storage
So how do businesses start to embrace and utilise this cost effective storage, which is protected with multiple copies and redundancies, better than any legacy storage array located in a standard data centre? All this with making sure data is obfuscated for compliance and security reasons.
One option is to re-architect your applications and use RESTful APIs to send, fetch and modify data within a cloud provider…and this works extremely well for a lot of business (Netflix are a great example). But is this approach for everyone? I’d say probably not at this stage, not many companies can put the development and DevOps effort into an application to ensure it’s robustness like Netflix can.
The other option, which is one of the things I’m focussed on with Microsoft, is a storage array that resides on premise and can integrate with the cloud to provide the best of both worlds. Provide local, enterprise class storage, to your existing servers and applications, and be able to utilise the efficiencies of the cloud without having to re-write your applications. The local storage array is StorSimple, which was acquired by Microsoft in October 2012; more info can be found here http://www.storsimple.com/.
Suddenly you have enterprise on premise arrays with the below features on premise
flash optimised (as a tier)
dedupe of production data
compression of production data
snapshots without performance impacts
Then, as blocks of data grow cold, encrypts this data and migrates it, at a block level to Azure. This means you could have constantly accessed VM hot blocks on SSD, warm blocks on SAS and then the cold blocks residing on Azure. With more than 80% of data generally being cold data this suddenly makes a lot of sense.
Not only does this remove a lot of cost around storage and complexity around archiving it also opens another door about how you can mange your backup. Once this data resides on StorSimple it is possible to take a “cloud snapshot”. A full copy of the data (encrypted and optimised) is taken to Azure on the first snap…after this only changed data, at an optimised block level, is copied to Azure for the next snapshot. Suddenly you can keep hourly, daily, monthly and even yearly backups/archives within Azure and significantly reduce your backup window, operational expenditure and effort around backup, as well as increase your ability to restore data quickly and efficiently.
Finally, from a compliance perspective, all this data that sits within Azure is encrypted (256 bit AES encryption), three copies of the data are made in the primary data centre and a fourth copy is made to the Azure partner data centre. This is the case for the cloud snapshots and for any data tiered to Azure.
I think it is pretty cool tech and will be writing more on StorSimple and Azure in the future.
So two weeks ago I started a new and exciting step in my career and took a role with Microsoft as a Technical Solutions Professional for Cloud Integrated Storage (more to come in coming blog posts on what the heck this is) covering Australia and New Zealand. For those of you who don’t work for or deeply with Microsoft this a pre-sales role focussed on a specified technology set; in my case Cloud and Storage and, you know, stuff.
I have to admit if you asked me a year ago, or even six months ago, if I pictured myself working for Microsoft I would not of imagined it happening. However upon seeing what this role was about and the strategic value it would play with Microsoft customers and within Microsoft itself my mind was very, very quickly changed.
I started two weeks ago, in the Sydney office and, after three days in the role, I hopped over to the Mountain View office for a nice leisurely three days in the US, to really start drinking from the fire hose (for those wondering the koolaid flavour is blue). Now back it feels great to be diving into the work, head first, and adding some value. I’ve found the people and the resources available to me simply astounding since starting and can see why Microsoft have been a truly great company over the last few decades.
Obviously there are some things I’m going to have to try and get used to, such as using IE as my primary browser, bing as my search engine and even swapping out my fruity phone and tablet for a Win 8 phone and a surface….maybe some more blog posts to come on those experiences too. I’m sure it will be a good learning curve for me and a chance to get to know and appreciate technologies outside my current comfort zone.