I’ve been working on an intriguing (and frustrating) issue for a few weeks now and a couple of days back we finally resolved the issue.
My client has an MS-DOS (FoxPro 2.6a database) application running within an NTVDM on Windows XP. Every now and then, the application will hang – seemingly randomly. Windows XP did have service pack 2 applied, but the issue also occurs on service pack 1 PCs (I didn’t try the RTM version). Only the application hangs – it is possible to terminate the NTVDM process and carry on working as normally.
Normal actions for troubleshooting MS-DOS applications in Windows XP were not helping to resolve the issue, but the software vendor managed to narrow the issue down to FoxPro waiting for input. Occasionally, the input does not timeout and return control to the calling program – it seems that this is the root cause of the NTVDM hang. Identifying this allowed them to construct a test program which polled for input, timing out every few seconds and would reliably hang an NTVDM at a seemingly random time, but always within an hour.
Using their test program on a variety of PCs, the software vendor found that the problem was related to the Intel hyper-threading technology (my client has standardised on a version of the IBM ThinkCentre M50 which includes a single 3GHz Intel Pentium 4 processor with HT technology). Whilst disabling hyper-threading is unlikely to result in any significant performance degradation (hyper-threading only provides an average 10-20% performance gain as most applications do not fit completely with the hyper-threading model), it was still considered by IBM, Microsoft, my client and myself as a tactical workaround, rather than a strategic fix.
After seeking advice from Microsoft, I ran the test program on a Compaq ProLiant DL380 G2 server with two Pentium III 1.26GHz processors and found that the issue is not limited to Windows XP and hyper-threading, but to both Windows XP and Windows Server 2003 when running with an ACPI Multiprocessor PC HAL. Turning off hyper-threading on the PCs was no longer good enough as we can expect to see multiple processor cores constructed on a single die in the near future, leading to a rise in the use of multi-threaded applications (the logical processor provided by the hyper-threading technology in the Intel Pentium 4 processor is simply a precursor to this).
So why does an MS-DOS application running within an NTVDM on a 32-bit version of Windows use multiple processors? The answer it seems is that although the MS-DOS application is not multi-threaded, modern versions of Windows are, and can allocate parts of the NTVDM process to any available processor. With that in mind we re-ran the test program with processor affinity set to use only CPU0 in Task Manager. The results were the same as disabling hyper-threading – no NTVDM hang! Obviously, setting processor affinity manually is not sustainable outside the test environment, and short of running the application on Windows Server 2003 Enterprise Edition (with the Windows System Resource Monitor to control processor affinity) we needed to find an alternative solution.
That solution came in the form of the
imagecfg.exe tool provided with the Microsoft Windows 2000 Resource Kit (supplement one). This can be used to edit an executable file and permanently set the processor affinity for a given application:
imagecfg -a 0x1 c:\windows\system32\ntvdm.exe command did the trick, although Windows File Protection/System File Checker quickly restored the original ntvdm.exe file so I needed to perform this on a copy of ntvdm.exe in a temporary folder, and then overwrite both c:\windows\system32\ntvdm.exe and c:\windows\system32\dllcache\ntvdm.exe.
Once updated, the NTVDM process runs on CPU0. Of course, this limits all programs under the control of the NTVDM subsystem but it is far more preferable to disabling logical or physical processors in the BIOS; however, as this is a change to an operating system file, it must be considered alongside the implementation of any service packs and/or hotfixes from now on. Reversing the change is simply a case of restoring the original ntvdm.exe file.