A couple of months ago, Facebook released a whole load of information about its servers and datacentres in a programme it calls the Open Compute Project. At around about the same time, I was sitting in a presentation at Microsoft, where I was introduced to some of the concepts behind their datacentres.Â These are not small operations – Facebook’s platform currently serves around 600 million users and Microsoft’s various cloud properties account for a good chunk of the Internet, withÂ the Windows Azure appliance concept under development for partners including Dell, HP, Fujitsu and eBay.
It’s been a few years since I was involved in any datacentre operations and it’s interesting to hear how times have changed.Â Whereas I knewÂ about redundant uninterruptible power sources andÂ rack-optimised servers, the model is now about containers of redundant servers and the unit of scale has shifted.Â An appliance used to be a 1U (pizza box) server with a dedicated purpose butÂ these days it’s a shipping container full of equipment!
There’s also been a shift from keeping the lights on at all costs, towards efficiency. Hardly surprising, given that the IT industry now accounts for around 3% of the world’s carbon emissions and we need to reduce the environmental impact.Â Google’s datacentre design best practices are all concerned with efficiency: measuring power usage effectiveness; measuring managing airflow; running warmer datacentres; using “free” cooling; and optimising power distribution.
So how do Microsoft (and, presumably others like Amazon too) design their datacentres? And how can we learn from them when developing our own private cloud operations?
Some of the fundamental principles include:
- Perception of infinite capacity.
- Perception of continuous availability.
- Drive predictability.
- Taking a service provider approach to delivering infrastructure.
- Resilience over redundancy mindset.
- Minimising human involvement.
- Optimising resource usage.
- Incentivising the desired resource consumption behaviour.
In addition, the following concepts need to be adopted to support the fundamental principles:
- Cost transparency.
- Homogenisation of physical infrastructure (aggressive standardisation).
- Pooling compute resource.
- Fabric management.
- Consumption-based pricing.
- Virtualised infrastructure.
- Service classification.
- Holistic approach to availability.
- Computer resource decay.
- Elastic infrastructure.
- Partitioning of shared services.
In short, provisioning the private cloud is about taking the same architectural patterns that Microsoft, Amazon, et al use for the public cloud and implementing them inside your own data centre(s). Thinking service, not server to develop an internal infrastructure as a service (IaaS) proposition.
I won’t expand on all of the concepts here (many are self-explanitory), but some of the key ones are:
- CreateÂ a fabric with resource pools of compute, storage and network, aggregated into logical building blocks.
- Introduced predictability by defining units of scale and planning activity based on predictable actions (e.g. certain rates of growth).
- Design across fault domains – understand what tends to fail first (e.g. the power in a rack) and make sure that services span these fault domains.
- Plan upgrade domains (think about how to upgrade services and move between versions so service levels can be maintained as new infrastructure is rolled out).
- Consider resource decay – what happens when things break?Â Think about component failure in terms of service delivery and design for that. In the same way that a hard disk has a number of spare sectors that are used when others are marked bad (and eventually too many fail, so the disk is replaced), take a unit of infrastructure and leave faulty components in place (but disabled) until a threshold is crossed, after which the unit is considered faulty and is replaced or refurbished.
A smaller company, with a small datacentre may still think in terms of server components – larger organisations may be dealing with shipping containers.Â Regardless of the size of the operation, the key to success is thinking in terms of services, not servers; and designing public cloud principles into private cloud implementations.