What is Failover? How It Works and Why It Matters

24 June 2025 By BOBcloud Team BOBcloud

What is Failover?

Failover is the process of switching from a failed or impaired primary system to a standby backup system so that operations can continue with minimal interruption. The standby system takes over the workload of the primary, maintaining service availability even when the primary has failed.

Failover can be automatic — triggered by monitoring software detecting a failure and switching without human intervention — or manual, requiring an administrator to initiate the switch.

Why Failover Matters

Modern business operations depend on continuous system availability. Applications, databases, and network services that were once considered non-critical have become integral to day-to-day operations. When these systems fail, the cost is immediate and measurable: lost transactions, idle staff, missed SLAs, and damaged client relationships.

Failover is one of the primary mechanisms for achieving high availability — the design goal of ensuring systems remain operational and accessible even when individual components fail.

Types of Failover

Automatic Failover

Automatic failover uses monitoring and health-check mechanisms to detect when a primary system has become unavailable, and switches traffic or workload to the standby system without requiring manual intervention. This minimises downtime — in well-designed systems, failover can complete in seconds.

Automatic failover is used in high-stakes environments where even brief outages are unacceptable: database clusters, load-balanced web applications, and network infrastructure.

Manual Failover

Manual failover requires an administrator to detect the failure and initiate the switch. This introduces delay — failover takes as long as it takes a human to detect the problem, assess it, and execute the procedure. Manual failover is appropriate in situations where the consequences of an automatic switch could be worse than a brief outage, or where the standby environment requires preparation before it can take load.

Planned Failover

A planned failover is a scheduled switch to the standby system, typically for maintenance, upgrades, or testing. Unlike unplanned failover (responding to an actual failure), planned failover is executed in controlled conditions with time to prepare and verify.

Regular planned failover testing is one of the most important practices in high availability environments — it verifies that the failover mechanism works, that staff know the procedure, and that the standby system can handle production load.

Failover vs Failback

Failover is the switch from primary to standby. Failback is the return to the primary once it has been repaired or restored. Failback is often overlooked in disaster recovery planning, but it is equally important: running on a standby system indefinitely introduces its own risks, and the process of returning to primary needs to be as carefully planned as the initial failover.

Failover in Different Contexts

Database Failover

Database clusters use replication to maintain a synchronised standby database. If the primary database server fails, the standby is promoted to primary. Applications connect to a virtual IP or DNS name that points to whichever node is currently primary, so the switch is transparent to the application.

Technologies: SQL Server Always On Availability Groups, MySQL Group Replication, PostgreSQL with Patroni.

Server and VM Failover

In virtualised environments, failover typically involves restarting a virtual machine on a different host if the original host fails. Hypervisors like VMware vSphere and Hyper-V include built-in failover capabilities (vSphere HA, Hyper-V Replica).

Network Failover

Network failover involves switching to a backup internet connection, router, or network path if the primary fails. This can be automatic (using BGP routing or link aggregation) or manual.

Cloud Failover

Cloud environments support failover across availability zones and regions. Well-architected cloud applications are designed to tolerate the loss of an entire availability zone without service interruption.

Failover and Backup: Different Things

Failover and backup are frequently confused, but they serve different purposes:

Failover maintains availability — it keeps systems running when the primary fails. It does not protect against data loss caused by software errors, accidental deletion, or ransomware. A standby system that replicates in real time from a primary will faithfully replicate a ransomware infection.

Backup protects data integrity — it provides the ability to recover from data loss or corruption at a specific point in time. It does not provide continuous availability; restore takes time.

A complete resilience strategy needs both: failover for availability, backup for data protection.

For MSPs: Failover in Client Environments

MSPs managing client infrastructure need to assess failover requirements as part of service design. Key questions:

  • What systems are critical enough to require automatic failover?
  • What is the client's RTO tolerance — how long can they afford to be down?
  • Is the standby system sized to handle full production load?
  • Has failover been tested recently?

For most SMB clients, full automatic failover for every system is neither affordable nor necessary. Prioritising the most critical systems — primary database, email, line-of-business application — and designing failover for those, while accepting slightly longer recovery times for less critical systems, is the pragmatic approach.

BOBcloud supports backup and cloud-based disaster recovery for MSPs managing critical client workloads. Find out more about our MSP backup platform.