A few weeks ago, I rebuilt a recently decommissioned server to run as an infrastructure test and development rig at home.Â I installed Windows Server 2008 R2, enabled the Hyper-V role and all was good until I started to configure my networks, during which I experienced a “blue screen of death” (BSOD) – never a good thing on your virtualisation host, especially when it does the same thing again on reboot:
“Oh dear, my freshly built Windows Server 2008 R2 machine has just thrown 3 BSODs in a row… after running normally for an hour or so :-(“
The server is a Dell PowerEdge 840 (a small, workgroup server that I bought aÂ couple ofÂ years ago) with 8GB RAM and a quad core Xeon CPU.Â The hardware is nothing special – but fine for my infrastructure testing – and it hadÂ been running with Windows Server 2008 Hyper-V since new (with no issues) but this was the first time I’d tried R2.Â
I have 3 network adapters in the server: a built in Broadcom NetXtreme Gigabit card (which I’ve reserved for remote access); and 2 Intel PRO/100s (for VM workloads).Â Ideally I’d use Gigabit Ethernet cards for the VM workload too, but this is only my home network and they were what I had available!
Trying to find out the cause of the problem, I ran WhoCrashed, which gave me the following information:
This was likely caused by the following module: efe5b32e.sys
Bugcheck code: 0xD1 (0x0, 0x2, 0x0, 0xFFFFF88002C4A3F1)
Dump file: C:\Windows\Minidump\020410-15397-01.dmp
file path: C:\Windows\system32\drivers\efe5b32e.sys
product: Intel(R) PRO/100 Adapter
company: Intel Corporation
description: Intel(R) PRO/100 Adapter NDIS 5.1 driver
That confirmed that the issue was with the Intel NIC driver, which sounded right as, after enabling the Hyper-V role, I connected an Ethernet cable to one of the Intel NICs and got a BSOD each time the server came up. If I disconnected the cable, no BSOD.Â Back to the twitters:
“Does anyone know of any problems with Intel NICs and Hyper-V R2 (that might cause a BSOD)?”
I switched the in-box (Microsoft) drivers for some (older) Intel ones.Â That didn’t fix things, so I switched back to the latest drivers.Â Eventually I found that the issue was caused by the checkbox for “Allow management operating system to share this network adapter” and that,Â Â if the NIC is live and I selected this, I could reproduce the error:
“Found the source of yesterday’s WS08R2 Hyper-V crash… any idea why enabling this option http://twitpic.com/11b64y would trip a BSOD?”
Even though I could work around the issue (because I don’t want to share a NIC between the parent partition and the children anywayÂ – I have the Broadcom NIC for remote access) it seemed strange that this behaviour should occur.Â There was no NIC teaming involved and the server was still a straightforward UK installation (aside from enabling Hyper-V and setting up virtual networks).Â
Based on suggestions from other Virtual Machine MVPs I also:
- Flashed the NICs to the latest release of the Intel Boot Agent (these cards don’t have a BIOS).
- Updated the Broadcom NIC to the latest drivers too.
- Attempted to turn off Jumbo frames but the the option wasÂ not available in the properties so I could rule that out.
Thankfully, @stufox (from Microsoft in New Zealand) saw my tweets and was kind enough to step in to offer assistance.Â It took us a few days, thanks to timezone differences and my work schedule, but we got there in the end.
First up, I sent Stu a minidump from the crash, which he worked on with one of the Windows Server kernel developers. They suggested running the driver verifier (
verifier.exe) against the various physical network adapters (and against vmswitch.sys).Â More details of this tool can be found in Microsoft knowledge base article 244617Â but the response to the
verifier /query command was as follows:
Name: efe5b32e.sys, loads: 1, unloads: 0
Name: ndis.sys, loads: 1, unloads: 0
Name: b57nd60a.sys, loads: 1, unloads: 0
Name: vmswitch.sys, loads: 1, unloads: 0
To be honest, I haven’t a clue what half of that means but the guys at Microsoft did – and they also asked me for a kernel dump (Dirk A D Smith has written an article at Network World that gives a good description of the variousÂ types of memory dump: minidump; kernel; and full). Transmitting this file caused some issues (it was 256MB in size – too big for e-mail) but it compressed well, and 7-zip allowed me to split it into chunks to get under the 50GB file size limit on Windows Live SkyDrive.Â Using this, Stu and his kernel developer colleagues were able to see that there is a bug in the Intel driver I’m using but it turns out there is another workaround too – turning off Large Send Offload in the network adapter properties.Â Since I did this, the server has run without a hiccup (as I would have expected).
“Thanks to @stufox for helping me fix the BSOD on my Hyper-V R2 server. Turned out to be an Intel device driver issue – I will blog details”
It’s good to know that Hyper-V was not at fault here: sure, it shows that a rogue device driver can bring down a Windows system but that’s hardly breaking news – the good thing about the Hyper-V architecture is that I can easily update network device drivers.Â And, let’s face it, I was running enterprise-class software on a workgroup server with some old, unsupported,Â hardware – you could say that I was asking for trouble…