I was working for a government agency which was running a mission critical application on a fault tolerant VAX system. It was apparently one of only two fault-tolerant VAX systems in Canada. I was not a VAX/VMS expert by any means, but I could handle the basic functions, such as booting, minor configuration changes, etc… I had planned some scheduled down-time for this system with the user community in order to apply a Y2K patch. Every detail was planned out, including anticipated down-time.
At the scheduled date/time (very late one weekend evening) I connected remotely to the VMS system and applied the Y2K patch. I then typed in the commands to bring the system down and back up. Everything shut down fine, however upon reboot I could not re-establish a remote connection. I rushed into the server room, and to my horror I saw dozens of error messages on the console screen. Having little experience with VMS, I did not know where to begin in terms of troubleshooting.
I reviewed my one book on VMS administration and figured out that my boot files were completely messed up, and would have to be rebuilt. I decided to call DEC technical support in Canada, and was passed to their United States support line, since the Canadian support had little experience with fault tolerant VAX systems. The U.S. support line told me exactly what I needed to modify in order to get the systems to boot properly. Unfortunately, the console keyboard was not working – I could not type ANYTHING at the console. I asked the support person what could possibly be going on, and they had no clue, and assumed it might be a hardware problem, which would have to be dealt with through a different support line, and any required parts would take more than a day to ship from a US depot.
I struggled to troubleshoot the hardware problem by following the instructions of the tech support person. Nothing seemed to resolve the issue of the inoperative keyboard. After a few hours the user community was getting nervous, and I was getting tired (since I thought this would only be a 20 minute process, and I had now been awake for almost 24 hours solid).
Things get a bit fuzzy after that, but I do recall having a total fury meltdown in the server room cursing up a storm because this damn server would not boot, and the stupid keyboard would not operate. I sat on the floor in front of the servers and stared at the componentry inside the case. I then noticed a tiny switch which I think was labelled with the word “keyboard”. At this stage I was trying anything and everything (I figured I couldn’t break it any worse than it already was!) so I decided to throw the switch and see what happened. To my amazement the server’s keyboard suddenly operated normally.
This made me even angrier since the tech support people never mentioned the existence of this tiny “hidden” switch. I called them and fortunately they stayed on the line as a ranted and raved about their lack of knowledge of this switch. They then proceeded to step me through the boot file rebuild process, and I eventually got the system up and running again.
After sleeping for a few hours, I phoned some people within my organization asking them what could have happened to cause the boot file corruption. One of my contacts casually mentioned that they had copied some start-up files from a different VAX server to the server I was working on, and must have forgotten to move them back when they were finished their tasks. I was in too much shock to say anything, and they quickly hung up.
Rolf M. Gitt, Toronto