When something goes wrong, don’t panic. That’s the wrong reaction. We had a new and unusual failure mode in our datacenter UPS. It lost 40% of its capacity and was bouncing back and forth between bypass and battery. For the first few minutes of the situation, I panicked. I don’t think I’ve ever had anything happen that had the potential to take out the entire datacenter. I’ve had situations that had the potential to take the whole network or a critical system down, but that’s different. That’s terribly inconvenient. At the time, I envisioned everything getting fried or maybe just turning off and corrupting all the disks/arrays. Silly worst case scenario thinking. If I had taken a deep breath and calmly assessed the situation, I would have realized that the system should just go to bypass and we’d have to hope there were no major blips from the grid while the problem was assessed.
After recovering my wits, I put the system into bypass and worked with our systems guys to start shutting down anything we could spare. I did some debugging with support, shut down the offending part that was causing the problems, and brought the system back online in a degraded but stable state.
I was very displeased with myself for allowing panic to set in. All it does is slow down your reaction to the situation at hand and prevents you from thinking clearly. It’s also somewhat embarrassing and less than professional. So keep calm, take a deep breath, and think through the problem.
FIN