I just wasted 2 days (one of which was on my weekend), and a lot of sleep, trying to work out why I couldn’t upgrade the Windows 2000 server which looks after my domain, DHCP, RIS, SUS and a whole load of other bits at home.
Every time I tried to run Windows Server 2003 setup it seemed to hang – and everything else was pretty slow too. I had to launch control panel applets using their .cpl filenames (e.g.
appwiz.cpl for the Add or Remove Programs applet) and services would not stop cleanly.
I decided that my system was badly broken and quickly built a virtual machine on another piece of hardware, promoting that to a domain controller to provide a live backup of Active Directory. As in-place upgrades weren’t working, I resigned myself to the fact that I was going to have to migrate everything to the virtual server, then rebuild the original box but I wanted to cleanly remove the original domain controller from the directory.
Every time I ran the Active Directory installation wizard (
dcpromo.exe) it failed – usually with the following error.
Active Directory Installation Failed
The operation failed because:
Failed to prepare for or remove the sysvol replication “The file replication service cannot be stopped.”
(Even though logged events with IDs 13502 and 13503 suggested that the FRS had indeed stopped).
Microsoft knowledge base article 332199 led me to try the
dcpromo /forceremoval command but that failed in exactly the same way. I ran
dcdiag /s:localhost on each server to look for any issues, checked that each server could ping the other one, that
net view \\servername returned a list of shares, and all required DNS entries were present. I checked the DNS settings (to make sure that each server was using itself as the primary DNS server and the other domain controller as a secondary) and restarted just to be sure but all to no avail.
To cut a long story short, I found the answer purely by fluke. I couldn’t get the DHCP server service to stop cleanly (to let me migrate the database to my virtual machine) so I did a Google search for “windows services hang on stop”. This turned up a TechRepublic thread titled APC Java issues cause services to hang. I realised that I do have an APC UPS attached to the server, and that I was using a version of PowerChute Business Edition (PBE) that had been sitting there happily for a couple of years (v6.2.2) – I hadn’t upgraded to 7.x as recommended by APC knowledge base article 7202 because APC had never e-mailed me to notify me of a problem and services that aren’t broken (and that don’t have an inbuilt patching mechanism) generally get left well alone on my systems!
Lo and behold, the APC services had hung on startup and there were various events logged with ID 7022 (the APC PBE Agent service hung on starting). I disabled both the APC PBE client and server services, using the registry (as the services console was inoperable) to locate HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\servicename\ and set Start to
0x00000004 for disabled (0x00000002 is automatic and 0x00000003 is manual), restarted the server and had the fastest boot sequence in days! My Windows installation was responsive again and I was able to remove the offending applications in a few short clicks.
My problems were nothing to do with Active Directory, DNS or even Windows – they all boiled down to an expired Sun Java Runtime Environment (JRE) certificate and sloppy coding from APC which meant that if their services hung, then so did all subsequent ones. I’ve never been a fan of Java applications on Windows – generally they are slow and have a poor user interface – and this experience has done nothing to change my mind.
Once the APC PBE agent, client and server had been removed, I was able to successfully (and cleanly) demote the original domain controller (avoiding having to follow the steps in Microsoft knowledge base article 216498 to remove data left in the directory after an unsuccessful demotion) but having migrated all the services to my virtual machine, I decided to go ahead and perform a clean installation of Windows on the original hardware anyway. I’m currently mid-way through patching the rebuilt server but I’m so glad that P McGrath from Rocky Mount, VA posted his experience on TechRepublic and Google did it’s thing.
Remind me again – how did we ever manage to find things out before we had the web?