Some design principles for Microsoft Exchange

In a previous role, I managed a team that was responsible for Microsoft Exchange design. Working with Microsoft, we established a set of design principles, some template designs and a rule-book for any changes made to those designs. Soon afterwards, Microsoft published their Preferred Architecture for Exchange and the similarity was striking (to be honest, I’d have been concerned if it was different)!

As that Preferred Architecture is publicly available, I think it’s fine to talk about some of the principles we applied. They seem particularly pertinent now as I was recently working with a customer whose Exchange design doesn’t follow these… and they were experiencing a few difficulties that were affecting the stability of their systems…

Physical, not virtual

Virtualisation has many advantages in many circumstances but it is not a “silver bullet”. It also brings complexity and operational challenges into Exchange design, with few (if any) advantages that would not be already provided by Exchange out of the box. Exchange is designed to make full use of the available hardware and Microsoft is able to provide large, low cost mailboxes within Office 365 (Exchange Online), without a requirement to virtualise their Exchange 2013 platform. In addition to the operational and supportability complexities that virtualisation brings, virtualising the Exchange deployment requires more Exchange design effort.

Deploy multi-role Exchange servers

Microsoft’s current recommended practice is to deploy multi-role Exchange 2013 servers (i.e. client access and mailbox roles on the same server) for the following reasons:

  1. Reduced hardware. Multi-role servers make best use of processor capacity given the more powerful server specifications which are now available.
  2. Reduced operational and capital expenditure. Fewer servers to deploy and manage.
  3. Building block design which is simple to deploy and scale. Automated deployment of standard server builds.

The mailbox server role must be designed not to exceed the maximum processor capacity guidance for multi-role servers; this provides confidence that the hardware deployed can co-host all roles on a single server. This is where the Exchange 2013 Server Role Requirements Calculator comes in…

Use direct attached storage – not a SAN

Microsoft designed Exchange 2013 to run on commodity hardware and believes this is the most cost effective way to provide storage for the Exchange mailbox databases.  Changes to the Exchange 2013 storage engine have yielded a significant reduction in I/O over previous versions of Exchange, allowing customers to take advantage of larger, cheaper disks and reduce the overall solution costs. In general, Direct Attached Storage (DAS) should be used in a Just a Bunch of Discs (JBoD) configuration although there are some circumstances where a Redundant Array of Inexpensive Devices (RAID) configuration may be used.

Microsoft uses a commoditised email platform with DAS and JBoD architecture to provide and support large, low cost mail mailboxes within Office 365. There are many more solution elements to consider with a SAN (Host Bus Adapters (HBAs), fibre channel switches and SAN I/O modules) as well as additional software for managing the infrastructure and firmware to keep up-to-date. Consequently, there is an increased likelihood of technical integration issues using a SAN and, once installed, a SAN infrastructure has to be carefully monitored and managed with appropriately skilled staff. In stark contrast, the costs of direct-attached JBoD solutions is falling as larger disks become available.

Native resilience

Database availability groups (DAGs) were introduced in Exchange Server 2010 to replicate databases between up to 16 servers. A DAG with multiple mailbox database copies, can provide automatic recovery from a variety of server, storage, network and other hardware failures. Auto-reseed functionality in Exchange Server 2013 allows for automatically bringing spare disks on line in the event of failure and creating new database copies.

If four highly available copies of each database are deployed, Exchange native resilience can be used without the requirement for third party backup solutions. Only specific requirements (i.e. ability to recover to an offline datacentre; recovery of deleted mailbox outside the deleted mailbox recovery retention time; protection against operational immaturity; protection against security breaches etc.) drive a requirement for adoption of a third party backup solution

Exchange Online uses the Exchange native resilience to protect against database failures, without resorting to the use of third party backup solutions.

Whilst a DAG can support 16 servers, it may be prudent to artificially limit the number of DAG members (e.g. to 12) in order to provide flexibility in upgrade scenarios.

Site resilience

DAGs can be extended between sites and copies of databases replicated across sites to provide additional redundancy. Each member of the DAG must have a round trip latency no greater than 500ms to contact the other members, regardless of their physical location. In general, Exchange DAGs should span at least two physical sites and Microsoft also recommends that separate Active Directory sites are used.

Mailbox distribution

With multiple sites in use, the next consideration is whether both are active (i.e. providing live service) or whether one is a secondary, passive, datacentre (i.e. invoked for disaster recovery purposes).  If all active mailboxes are hosted in a single site, and all passive copies of the mailboxes reside in a secondary site, the user distribution model is referred to as active/passive. If there are active mailboxes in both primary and secondary datacentres then the user distribution model is known to be active/active.

This should not be confused with the databases within the DAG, where only one copy of each database is active at any time.

Exchange 2013 simplified the client access model and with all clients connecting using HTTPS an active/active architecture is simple and spreads the client load across all Client Access servers, making best use of the deployed hardware.

This also facilitates a simplified SMTP namespace and allows automatic site failure (assuming the File Share Witness is located in a tertiary datacentre).

Archiving

With today’s storage capabilities, large mailboxes are becoming normal.  The use of a native Exchange 2013 archive or a third party archiving solution is only required where there is a defined need for a user experience that warrants the management of email data (the ‘personal archive’ user experience, e.g. auto archive functionality) or by legal/policy requirements regarding the retention and discovery of email data (‘the regulatory archive’).

There is a common misconception that using a third party archive solution will provide a cost effective, single instance storage solution by differentiating ‘hot’ and ‘cold’ data and providing the ability to store ‘colder’ data on cheaper, slower disks. In fact, introducing a secondary system increases costs and complexity (in design and management) as well as reducing the flexibility of the solution.

Many organisations are electing to leave behind their archives with browser-only access as they migrate to larger online mailboxes in the cloud, e.g. using Exchange Online.

Conclusion

Whilst Exchange is supported in a virtualised environment, with SAN-attached storage, third party backup and making use of email archive solutions, deviating from the Preferred Architecture is a huge risk. The points in this blog post, combined with Microsoft’s advice linked above highlight the reasons to keep your Exchange design as simple as possible. Whilst a more complex design will probably work, identifying issues when it doesn’t will be a much bigger challenge.

 

Leave a Reply