Designing for failure does not necessarily mean multi-cloud

This content is 7 years old. I don't routinely update old blog posts as they are only intended to represent a view at a particular point in time. Please be warned that the information here may be out of date.

Earlier this week, Amazon Web Services’ S3 storage service suffered an outage that affected many websites (including popular sites to check if a website is down for everyone or just you!).

Unsurprisingly, this led to a lot of discussion about designing for failure – or not, it would seem in many cases, including the architecture behind Amazon’s own status pages:

The Amazon and Azure models are slightly different but in the past we’ve seen outages to the Azure identity system (for example) impact on other Microsoft services (Office 365). When that happened, Microsoft’s Office 365 status page didn’t update because of a caching/CDN issue. It seems Amazon didn’t learn from Microsoft’s mistakes!

Randy Bias (@RandyBias) is a former Director at OpenStack and a respected expert on many cloud concepts. Randy and I exchanged many tweets on the topic of the AWS outage but, after multiple replies, I thought a blog post might be more appropriate. You see, I hold the view that not all systems need to be highly available. Sometimes, failure is OK. It all comes down to requirements:

And, as my colleague Tim Siddle highlighted:

I agree. 100%.

So, what does that architecture look like? Well, it will vary according to the provider:

So, if we want to make sure our application can survive a region failure, there are ways to design around this. Just be ready for the solution we sold to the business based on using commodity cloud services to start to look rather expensive. Whereas on-premises we typically have two datacentres with resilient connections, then we’ll want to do the same in the cloud. But, just as not all systems are in all datacentres on-premises, that might also be the case in the cloud. If it’s a service for which some downtime can be tolerated, then we might not need to worry about a multi-region architecture. In cases where we’re not at all concerned about downtime we might not even use an availability set

Other times – i.e. if the application is a web service for which an outage would cause reputational or financial damage – we may have a requirement for higher availability.  That’s where so many of the services impacted by Tuesday’s AWS outage went wrong:

Of course, we might spread resources around regions for other reasons too – like placing them closer to users – but that comes back to my point about requirements. If there’s a requirement for fast, low-latency access then we need to design in the dedicated links (e.g. AWS Direct Connect or Azure ExpressRoute) and we’ll probably have more than one of them too, each terminating in a different region, with load balancers and all sorts of other considerations.

Because a cloud provider could be one of those single points of failure, many people are advocating multi-cloud architectures. But, if you think multi-region is expensive, get ready for some seriously complex architecture and associated costs in a multi-cloud environment. Just as in the on-premises world, many enterprises use a single managed services provider (albeit with multiple datacentres), in the cloud many of us will continue to use a single cloud provider.  Designing for failure does not necessarily mean multi-cloud.

Of course, a single-cloud solution has its risks. Randy is absolutely spot on in his reply below:

It could be argued that one man’s “lock-in” is another’s “making the most of our existing technology investments”. If I have a Microsoft Enterprise Agreement, I want to make sure that I use the software and services that I’m paying for. And running a parallel infrastructure on another cloud is probably not doing that. Not unless I can justify to the CFO why I’m running redundant systems just in case one goes down for a few hours.

That doesn’t mean we can avoid designing with the future in mind. We must always have an exit strategy and, where possible, think about designing systems with a level of abstraction to make them cloud-agnostic.

Ultimately though it all comes back to requirements – and the ability to pay. We might like an Aston Martin but if the budget is more BMW then we’ll need to make some compromises – with an associated risk, signed off by senior management, of course.

[Updated 2 March 2017 16:15 to include the Mark Twomey tweet that I missed out in the original edit]

Azure Connect – the missing link between on-premise and cloud

This content is 13 years old. I don't routinely update old blog posts as they are only intended to represent a view at a particular point in time. Please be warned that the information here may be out of date.

Azure Connect offers a way to connect on-premise infrastructure with Windows Azure but it’s lacking functionality that may hinder adoption.

While Microsoft is one of the most dominant players in client-server computing, until recently, its position in the cloud seemed uncertain.  More recently, we’ve seen Microsoft lay out its stall with both Software as a Service (SaaS) products including Office 365 and Platform as a Service (PaaS) offerings such as Windows Azure joining their traditional portfolio of on-premise products for consumers, small businesses and enterprise customers alike.

Whereas Amazon’s Elastic Compute Cloud (EC2) and Simple Storage Service (S3) offer virtualised Infrastructure as a Service (IaaS) and is about consumption of Software as a Service (SaaS), Windows Azure fits somewhere in between. Azure offers compute and storage services, so that an organisation can take an existing application, wrap a service model around it and specify how many instances to run, how to persist data, etc.

Microsoft also provides middleware to support claims based authentication and an application fabric that allows simplified connectivity between web services endpoints, negotiating firewalls using outbound connections and standard Internet protocols. In addition, there is a relational database component (SQL Azure), which exposes relational database services for cloud consumption, in addition to the standard Azure table storage.

It all sounds great – but so far everything I’ve discussed runs on a public cloud service and not all applications can be moved in their entirety to the cloud.

Sometimes makes it makes sense to move compute operations to the cloud and keep the data on-premise (more on that in a moment). Sometimes, it’s appropriate to build a data hub with multiple business partners connecting to a data source in cloud but with applications components in a variety of locations.

For European CIOs, information security, in particular data residency, is a real issue. I should highlight that I’m not a legal expert, but CIO Magazine recently reported how the Patriot Act potentially gives the United States authorities access to data hosted with US-based service providers – and selecting a European data centre won’t help.  That might make CIOs nervous about placing certain types of data in the cloud although they might consider a hybrid cloud solution.

Azure already provides federated security, application layer connectivity (via AppFabric) and some options for SQL Azure data synchronisation (currently limited to synchronisation between Microsoft data centres, expanding later this year to include synchronisation with on-premise SQL Server) but the missing component has been the ability to connect Windows Azure with on-premise infrastructure and applications. Windows Azure Connect provides this missing piece of the jigsaw.

Azure Connect is a new component for Windows Azure that provides secure network communications between compute instances in Azure and servers on premise (ie behind the corporate firewall). Using standard IP protocols (both TCP and UDP) it’s possible to take a web front end to the cloud and leave the SQL Server data on site, communicating over a virtual private network, secured with IPSec. In another scenario, a compute instance can be joined to an on-premise Active Directory  domain so a cloud-based application can take advantage of single sign-on functionality. IT departments can also use Azure Connect for remote administration and troubleshooting of cloud-based computing instances.

Currently in pre-release form, Microsoft is planning to make Azure Connect available during the first half of 2011. Whilst setup is relatively simple and requires no coding, Azure Connect is reliant on an agent running on the connected infrastructure (ie on each server that connects to Azure resources) in order to establish IPSec connectivity (a future version of Azure Connect will be able to take advantage of other VPN solutions). Once the agent is installed, the server automatically registers itself with the Azure Connect relay in the cloud and network policies are defined to manage connectivity. All that an administrator has to do is to enable Windows Azure roles for external connectivity via the service model; enable local computers to initiate an IPSec connection by installing the Azure Connect agent; define network policies and, in some circumstances, define appropriate outbound firewall rules on servers.

The emphasis on simplicity is definitely an advantage as many Azure operations seem to require developer knowledge and this is definitely targeted at Windows Administrators. Along with automatic IPSec provisioning (so no need for certificate servers) Azure Connect makes use of DNS so that there is no requirement to change application code (the same server names can be used when roles move between the on premise infrastructure and Azure).

For some organisations though, the presence of the Azure Connect agent may be seen as a security issue – after all, how many database servers are even Internet-connected? That’s not insurmountable but it’s not the only issue with Azure Connect.

For example, connected servers need to run Windows Vista, 7, Server 2008, or Server 2008 R2 [a previous version of this story erroneously suggested that only Windows Server 2008 R2 was supported] and many organisations will be running their applications on older operating system releases. This means that there may be server upgrade costs to consider when integrating with the cloud – and it certainly rules out any heterogeneous environments.

There’s an issue with storage. Windows Azure’s basic compute and storage services can make use of table-based storage. Whilst SQL Azure is available for applications that require a relational database, not all applications have this requirement – and SQL Azure presents additional licensing costs as well as imposing additional architectural complexity.  A significant number of cloud-based applications make use of table storage or combination of table storage and SQL Server – for them, the creation of a hybrid model for customers that rely on on-premise data storage may not be possible.

For many enterprises, Azure Connect will be a useful tool in moving applications (or parts of applications) to the cloud. If Microsoft can overcome the product’s limitations, it could represent a huge step forward for Microsoft’s cloud services in that it provides a real option for development of hybrid cloud solutions on the Microsoft stack, but there still some way to go.

[This post was originally written as an article for Cloud Pro.]