michelangelus - Fotolia
Maintenance is crucial to keep a mission-critical data center reliable, but any work on live equipment might put the facility at risk. You can entrust certain hardware to in-house staff, but more complex systems that could cause damages should go to third-party service providers to avoid expensive errors or unexpected downtime.
Professional data center maintenance personnel know how to avoid unnecessary equipment shut down and bring hardware back online without incident. In-house personnel may be just as careful, but these tasks or components can be outside their regular routine or expertise.
Every data center is different, and you may not always get the same technicians. To ensure that you receive the same level of maintenance each time, have written checklists for technicians to follow and audit any service provider documentation. You can also double-check any certification requirements before you bring in third-party maintenance.
Keeping the data center cool
Air conditioners are mechanical devices, therefore error-prone and require regular maintenance. For in-house tasks, you should routinely change filters, and facilities staff can maintain the chiller plant, cooling towers, economizers or dry coolers.
The computer room air conditioning units (CRACs) and computer room air handlers (CRAHs) on your computing floor are classified as precision air conditioners. Facilities staff can check belts and refrigerant levels, but the manufacturer's certified service personnel go beyond these components as part of their data center maintenance routines.
Modern units have sensors that provide readouts a technician analyzes to detect impending failures. A small shaft vibration can be a critical indicator but can be difficult to pick up to the untrained eye.
Cooling systems are increasingly complex and now include direct drive fans and compressors with variable frequency drives that automatically adjust capacity. Plus, CRACs and CRAHs may be located around the data center's perimeter, packaged as in-row or overhead coolers, or mixed into cabinet rows.
Liquid cooling is becoming more common, particularly with rear door heat exchangers and even direct liquid cooled processors. Integrated controls can tie all of these components together and require technicians to work next to computing systems, which brings increased risk during data center maintenace tasks.
Addressing power supplies and electrical systems
To detect overheating, a professional should infrared scan power systems annually. This requires opening and working on live equipment, so technicians must suit-up to protect against potential arc flash. These suits require experience to work in without causing a disruption.
Most uninterruptible power supplies (UPSes) use batteries, which are most likely to die when power fails and they're suddenly put under load. Maintaining and replacing any battery is a job for certified professionals, as open terminals are a serious hazard.
There are three main UPS battery types. Valve regulated lead-acid batteries are the most common but generally last only three to five years. Lithium-ion batteries are the newest, and are supposed to last considerably longer, but there are unknowns around material properities and lifespan and some jurisdictions have outlawed them inside buildings. Flooded lead-acid batteries can last 25 years but need special room settings and regular acid level checks.
Battery monitoring is recommended. Some UPSes have built-in meters, and add-on hardware options are available.
Flywheel UPSes are very reliable, but must be dismantled at around 10 years for bearing replacements. That's a professional job. Motor generator sets and back-up generators are mechanical and absolutely require routine maintenance.
You should test UPSes under actual load conditions, even if you are hesitant to pull the plug and see if the UPS works. Professionals bring load banks to simulate a full data center load so you can see if your UPS can support real-time loads or must be replaced.
Automatic transfer switches in generators require maintenance, but they are rarely addressed because they lack bypass switches. Without a bypass switch, you can't test an automatic transfer switch without transferring a live load, so it could be the most vulnerable piece in your power chain. This component should at least be part of your data center maintenance infrared scan.
Maintaining fire protection
Fire detection and suppression systems are risky to service, so any mainentance should be done by certified professionals. Whether water or gas-based, they must be disabled during data center maintenance to prevent a false alarm or activation that shuts down the entire data center.
Gas-based systems must be checked for levels and pressures. Control system operation for both gas and pre-action systems must be verified. Detectors of all types must be tested to ensure they still operate correctly. Plus, the tiny holes in early smoke detection systems aspirating tubes must be cleared of any dirt and obstructions. These particles alone could activate the highly sensitive systems.
Your emergency power off switch -- that dreaded "big red button" firefighters can use to instantly crash your entire data center -- must be disconnected before control function testing. Professionals must conduct this shut off and testing, as they have the certifications and know-how to safely check each fire protection system.
The only way to fully test a fire system is to actually activate it, which is not ideal or realistic. Testing and maintenance must be as close as possible to activation without actually starting the system -- which is a specialized skill set.
Keeping facilities clean
Without proper cooling, data center hardware shuts down for self-preservation. All too often, the cause is simply dirt and particle accumulation on the small filters and on internal heat sinks.
No matter how zealous you are about foot wipes, closed doors and dust control for facility work, particulate matter can still enter the facility on your clothing. Simply damp-mopping the floor does not keep a data center clean.
Professional data center cleaning services know how to clean under a raised floor without disrupting cooling, how to clean server filters without causing shutdowns and have specialized equipment to remove particles on all parts of the facility.
Most critical facilities contract for professional cleaning on a yearly basis as part of data center maintenance. Even if your organization doesn't have scheduled cleaning, you should at least have an initial cleaning for your data center to decide how often you require professional services.