A UPS Story

Maintenance Fail

Several years ago we allowed the maintenance on our data center UPS to lapse. We recently decided it was necessary to bring it back into maintenance coverage, so we scheduled a re-certification visit. The plan was to have the system re-certified so we could have it back under maintenance. Yes, it never should have been out of maintenance, but that’s a different story.  Anyway, when the technician came out to re-certify our system, they found a variety of different problems.

The simple things were a controller hardware upgrade for the UPS itself. That’s a pair of hot swap modules. The PDU needed a hardware upgrade that was going to cause a 2 hour downtime for the whole data center. That was an unpopular revelation. The one that surprised me and that I found most interesting was the recommendation to replace all the batteries.

This UPS is highly modular and allows you to add and remove a power modules and batteries modules with an easy plug and play hot swap operation. Unfortunately, as nifty as this sounds there are caveats to having this level of flexibility.

Resistance of Lead

In any UPS, the batteries need to be a similar ages.  As batteries age their resistance increases and this is not a problem if all the batteries are of a similar age.  However, if the batteries are several years apart in age the newest ones will have the least resistance.  This means that when the UPS goes to battery power, the newest batteries will handle more than their fair share of the load.  This causes several issues. It reduces the life of the newest batteries, causes them to run hotter than normal, and causes the run time estimate to overstate the actual available run time.

When we had originally installed this UPS, it shipped with four rows of modular batteries and a modular battery cabinet that was empty.  When installed in 2006 this was giving us about 45 minutes of run time.  As our environment grew we found it necessary to add batteries. We added a few battery strings about every two years to keep our run time in the 30 minute range.  While I suspect that there may be some fine print somewhere in the UPS documentation that says this is not a good idea, none of us ever saw it. The system was marketed as a system that would grow with your needs. While this is true, you do need to be careful with the batteries and that was never explained to us.

I should note that we do have a generator, so this level of run time is either underkill or overkill, depending on how one looks at it. If the generator kicks in, we only need about 15 seconds of run time.  If the generator does not kick in, 30 minutes probably isn’t enough time for us to do anything about it unless there is a very simple fix.  We are not staffed 24/7, so if the power goes out in the middle of the night and the generator does not work it’s definitely not getting fixed in under 30 minutes.

So this put the total to re-certify the system at around $25k (mostly for new batteries) and a 2 hour downtime. Then we’d still have to pay a few grand more for the maintenance agreement.

A Better Idea

We recently added 30 fairly hefty servers to our data center and that put the UPS at 80% utilization. Given the costs to maintain the existing UPS and that it had hit 80% utilization, we asked to have funds in next year’s budget to upgrade it. This makes a lot more sense, especially since our fiscal year runs with the calendar year and we were at the beginning of the budget process for 2011. This was all back in September. In October, management decided we should look into doing the upgrade with this year’s remaining budget. No problem, how hard could it be?

You’ll find out.

FIN