Having a 100% operational, always running, and never-failing IT system or component is vital for any business. When evaluating how to increase availability and reduce downtime, solutions are commonly categorized as either a Fault Tolerant solution or a Clustering (High Availability) solution.
High availability infrastructure should not just be an option, but a necessity. In our previous post, we discussed why it’s important to have a high availability infrastructure. For this post, we’ll discuss the two solutions for availability and which one actually provides higher availability.
What is Fault Tolerance?
Fault Tolerance relies on specialized hardware or software to detect a hardware fault and instantaneously switch to a redundant hardware component—whether the failed component is a processor, memory board, power supply, I/O subsystem, or storage subsystem. (Source)
It traditionally consists of a pair of tightly couple systems that provide redundancy. These two systems are like separate machines that are mirrored on an active-active scale. When the main system has a hardware failure, the second system takes over automatically without a user or a systems administrator noticing the switch and there is zero downtime.
What is Clustering (High Availability)?
Clustering (High Availability) views availability not as a series of replicated physical components, but rather as a set of system-wide, shared resources that cooperate to guarantee essential services. (Source)
Clustering combines software with industry-standard hardware to minimize downtime. It does this by quickly restoring vital services whenever there is a failure in a system, component, or application fails. However, it is not instantaneous, although it is restored rapidly often in less than a minute.
Fault Tolerance vs Clustering (High Availability)
‘NO’ downtime is better than ‘SOME’ downtime, which means Fault Tolerance is the solution that provides higher availability. Fault Tolerance gives off a guaranteed 99.9999% availability compared to Clustering which gives off around 99% availability.
In terms of cost-effectiveness, Fault Tolerance is still the winner. According to reports, 67% of best-in-class organizations use fault-tolerant servers and software fault-tolerant solutions to provide high availability. While cost meters for these types of hardware are considerably high, the complexity of implementation and level of human interaction after failure is very low. This just means that operational and management costs are still considered low on average.