However you look at it, there’s little doubt that creating an operating system release the size of Windows Server 2008 is a huge undertaking. A few months back, I was privileged to hear Alex Hinrichs, Group Program Manager for Windows Server, speak about the process of building Windows Server.
First of all, any major project needs strong leadership and the Windows Server Division management team includes a huge volume of collective experience from guys like Bob Muglia, Bill Laing and Iain McDonald.
Then, there’s the consistent vision – at the heart of the Windows Server 2008 product there have been a few major themes:
- Roles (customers don’t think "my Windows server"; they think "my domain controllers", "my web servers", etc.).
- Only install what the system needs.
- Make it secure, reliable, manageable… and fast.
- Quality should be determined by real world deployments (i.e. only ship when know the product is ready).
It’s also worth remembering that Windows Server 2003 has been an excellent operating system release. So, as they began to plan for the next release, Microsoft took a look at what worked well when developing Windows Server 2003:
- Deployments are key to quality (internal deployments – "dogfooding", early adoption customers).
- Include a long customer feedback cycle.
- Install fewer components and services by default.
- Lock down the feature set early (good ideas are all very well, but it requires discipline to say "no, we’ll add that later") and focus on quality during the last year of development rather than adding new things.
- Protect the daily build (the daily version of Windows Server needs to be solid).
- Focus on the basics (reliability, security, performance).
Even though the feature set was locked down some time ago, Microsoft did respond to customer feedback on the early builds of Windows Server codenamed Longhorn – that’s how we got features like IIS as a role in server core, Windows PowerShell as a feature, more granular group policy objects and read-only domain controllers.
The important point is that in the final year, major changes were limited. Customer requests were still the primary driver but there were no more changes for people who thought "it would be really cool if…" and changes were only implemented to unblock customer deployments.
For me (as I’m not a software engineer), the fascinating part of Alex’s presentation was the daily process.
Program Managers are responsible for delivering on a single component of the operating system, with each virtual build lab (VBL) being made up of a number of feature build labs (FBL) – for example:
- Core (platform, kernel, setup, etc.).
- Networking (TCPIP stack, DHCP, RRAS, etc.).
- Server roles (IIS, AD, etc.).
- Security (logon, licensing, etc.).
- Client (shell, Internet Explorer, etc.).
- File (including backup, storage server, etc.) .
The daily ship-room (engineering) meeting examines test results to see which product groups are ready to bring in code to the build based on:
- Distributed responsibility:
- FBLs need to get code ready in order to move up through the build process via the VBLs.
- Around 10,000 people have contributed code to Windows Server at some point in process (about 1000 developers are working on it every day)
- Daily tests (checks and balances) including:
- Build verification tests (BVTs) – examining whether the operating system will complete set up, can it upgrade, can it share files, etc. for a few thousand simple tests on all versions (32/64-bit, Itanium, etc.).
- Feature verification tests (FVTs) – tests at feature/role level, e.g. AD, IIS, etc.
- Stress – what does it take to break the system, using over 1000 machines running stress tests every day until something breaks, then attaching to a debugger to see what broke it.
- Reliability – how long can the system run – for every server role.
- Bugs and bug bar (what bugs are there to fix… and when by).
Hinrichs explained that the componentisation effort used for Windows Server 2008 has certainly helped to make the process easier. It has taken many years but dependencies have been broken as senior, strong, architects have run Windows code has through architectural layer tools to ensure that architectural policy is adhered too.
Using a component-based model helps to identify conflicts before they hit the main build. For example, if the Shell and Internet Explorer teams are working on same binary the code is checked in at the client layer and the developers can work together to resolve a conflict before the code is incorporated into the main tree.
Before code is accepted, there are a number of quality gates to be negotiated, consisting of a battery of tests to run at check-in stages, for example:
- Static code analysis (buffer overflows, other security problems, managed code problems, etc.).
- Architectural layer (check for implied dependencies and relate back to roles – a developer may think that they are working in their own universe but this is not necessarily so).
- Code coverage (automated tests all over the world constantly testing the system to hit as many code paths as possible, aiming for automation to cover as many as possible but realistically 70-75% automated with the rest tested manually).
- Policy check tools (looking for globalisation or localisation issues, political issues, etc.).
On a normal day in the Windows Server Division:
- FBL developers run checks and once the code is ready it is pushed to a VBL team for building.
- The VBL team pulls down the main build and merges it with new code from the FBLs, looking for conflicts/problems. Once everything is ready, code from all teams is brought together into the main build.
- Because there are checks and balances at all levels (and reverse integration), things tend to be pretty clean at the main build level.
- It’s not just about pushing code up – it is pulled down as well so that all teams pick up changes from each other.
I’m pleased to see Windows Server 2008 ship before its official release date. For some time now Microsoft would only commit to "shipping during the first quarter of 2008" but were adamant that quality was the primary goal and that the product would only ship when it was ready. Based on my experiences, it seems remarkably solid and I have no reservations about pushing my organisation as hard as I can to deploy Windows Server 2008 for our customers. And I’ll wrap this post up with one final comment – Kevin Lane, the lead for the technology adoption program (TAP) customers has been on call 24×7 to ensure that major issues affecting customers are resolved quickly. In the last 6 months he has only had one call that’s been important enough to disturb his sleep…