A few months back, when Microsoft announced that many of its forthcoming server products would be 64-bit only, I confessed that I don’t know a lot about 64-bit computing. That all changed last week when I attended two events (the first sponsored by Intel and the second by AMD) where the two main 64-bit platforms were discussed and Microsoft’s decision to follow the 64-bit path suddenly makes a lot of sense.
For some time now, memory has been a constraint on scaling-up server capacity. The 32-bit x86 architecture that we’ve been using for so long now is limited to a maximum of 4GB of addressable memory (232 = 4,294,967,296) – once a huge amount of memory but not so any more. Once you consider that on a Windows system, half of this is reserved for the system address space (1GB if the /3GB switch is used in boot.ini) the remaining private address space for each process starts to look a little tight (even when virtual memory is considered). Even though Windows Server 2003 brought improvements over Windows 2000 in its support for the /3GB switch, not all applications recognise the extra private address space that it provides and the corresponding reduction in the system address space can be problematic for the operating system. Some technologies, like physical address extension (PAE) and address windowing extensions (AWE) can increase the addressable space to 64GB, but for those of us who remember trying to squeeze applications into high memory back in the days of MS-DOS, that all sounds a little bit like emm386.exe!
The answer to this problem of memory addressing is 64-bit computing, which allows much more memory to be addressed (as shown in the table below) but unfortunately means that new servers and a switch to a 64-bit operating systems are required. For Windows users that means Windows XP Professional x64 Edition on the desktop and either Windows Server 2003 Standard, Enterprise or Datacenter x64 or Itanium Edition at the back end (Windows Server 2003 Standard Edition is not compatible with Intel’s Itanium processor family and Windows Server 2003 Web Edition is 32-bit only).
It’s not all doom and gloom – for those who have recently invested in new servers, it’s worth noting that certain recent models of Intel’s Xeon and Pentium processors have included a technology called Intel Extended Memory 64 Technology (EM64T), so pretty much any server purchased from a tier 1 vendor (basically HP, IBM or Dell) over the last 12 months will have included 64-bit capabilities (with the exception of blade servers, which are generally still 32-bit for power consumption reasons – at least in the Intel space).
EM64T is effectively Intel’s port of the AMD x64 technology (AMD64, as used in the Opteron CPU line) and probably represents the simplest step towards 64-bit computing as Intel’s Itanium processor family (IPF) line uses a totally separate architecture called explicitly parallel instruction computing (EPIC).
In terms of application compatibility, 32-bit applications can run on a 64-bit Windows platform using a system called Windows on Windows 64 (WOW64). Although this is effectively emulating a 32-bit environment, the increased performance allowed with a 64-bit architecture means that performance is maintained and meanwhile, 64-bit applications on the same server can run natively. Itanium-based servers are an exception – they require an execution layer to convert the x64 emulation to EPIC, which can result in some degradation in 32-bit performance (which needs to be offset against Itanium 2’s high performance capabilities, e.g. when used natively as a database server). There are some caveats – Windows requires 64-bit device drivers (for many, hardware support is the main problem for 64-bit adoption) and there is no support for 16-bit applications (that means no MS-DOS support).
There is much discussion about which 64-bit model is “best”. There are many who think AMD are ahead of Intel (and AMD are happy to cite an Information Week article which suggests they are gaining market share), but then there are situations where Itanium 2’s processor architecture allows for high performance computing (cf. the main x64/EM64T advantage of increased memory addressability). Comparing Intel’s Xeon processors with EM64T and Itanium 2 models reveals that the Itanium supports 1024TB of external memory (cf. 1TB for the EM64T x64 implementation) as well as additional cache and general purpose registers but the main advantages with Itanium relate to the support for parallel processing. With a traditional architecture, source code is compiler to create sequential machine code and any parallel processing is reliant upon the best efforts of the operating system. EPIC compilers for the Itanium architecture produce machine code that is already optimised for parallel processing.
Whatever the processor model, both Intel and AMD’s marketing slides agree that CPU speed is no longer the bottleneck in improving server performance. A server is a complex system and performance is constrained by the component with the lowest performance. Identifying the bottleneck is an issue of memory bandwidth (how fast data can be moved to and from memory), memory latency (how fast data can be accessed), memory addressability (how much data can be accessed) and I/O performance (how much data must be moved). That’s why processor and server manufacturers also provide supporting technologies designed to increase performance – areas such as demand based power management, systems management, network throughput, memory management (the more memory that is added, the less reliable it is – hence the available of RAID and hot-swap memory technologies), expansion (e.g. PCI Express) and virtualisation.
Where AMD’s x64 processors shine is that their architecture is more efficient. Traditional front-side bus (FSB) computers rely on a component known as the northbridge. I/O and memory access share the same front side bus affecting memory bandwidth. Memory access is delayed because it must pass through the northbridge (memory latency). I/O performance is restricted due to bandwidth bottlenecks accessing attached devices. Additionally, in a multiprocessor system, each core competes for access to the same FSB, making the memory bandwidth issue worse.
Non-uniform memory access (NUMA) computers place the memory with the processor cores, avoiding the need to access memory via a northbridge. AMD’s Opteron processor allows a total address space of up to 256TB, allowing each CPU to access up to 6.4Gbps of dedicated memory bandwidth with a local memory latency of ~65ns (similar to L3 cache performance) and three HyperTransport connections, each allowing 8Gbps of I/O bandwidth.
Scaling out to multiple processors is straightforward with direct connections between processors using the 8Gbps HyperTransport (some OEMs, e.g. Fujitsu-Siemens, use this to allow two dual-CPU blade servers to be connected making a 4 CPU blade system).
Adding a second processor core results in higher performance and better throughput (higher density, lower latency and higher overall performance) but with equal power consumption. The power consumption issue is an important one, with data centre placement increasingly being related to sufficient power capacity and high density rack-based and blade servers leading to cooling issues.
AMD’s figures suggest that by switching from a dual-core 4-way Intel Xeon processor system (8 cores in total) to a similar AMD Opteron system results in a drop from 740W to 380W of power [source: AMD]. With 116,439 4-way servers shipped in Western Europe alone during 2005 [source: IDC], that represents a saving of 41.9MW per year.
In summary, 64-bit hardware has been shipping for a while now. Both Intel and AMD admit that there is more to processor performance than clock speed, and AMD’s direct connect architecture removes some of the bottlenecks associated with the traditional x86 architecture as well as reducing power consumption (and hence heat production). The transition to 64-bit software has also begun, with x64 servers (Opteron and EM64T) providing flexibility in the migration from a 32-bit operating system and associated applications.