Failover

Failover is what happens when a primary system fails and traffic automatically switches to a backup. For game servers, failover is the difference between a provider outage causing extended downtime and players noticing nothing at all.

Why failover matters at launch

Launch day is the worst time to discover your infrastructure has a single point of failure. Player counts spike unpredictably, and the load itself can expose weaknesses in hardware, networking, or third-party providers. A studio that depends on a single cloud provider has no fallback if that provider has a regional outage — which all of them do, periodically.

Manual vs. automatic failover

Manual failover requires an engineer to detect the problem, decide on a course of action, and execute it — typically taking 15 to 60 minutes even with a practiced runbook. Automatic failover is handled by the infrastructure layer itself: the system detects the failure and reroutes within seconds, before most players have noticed a problem.

For live multiplayer games, automatic failover is the only viable approach. Human response time is too slow to prevent a significant player impact.

How multi-provider infrastructure enables failover

The most robust approach to failover is running infrastructure across multiple providers simultaneously. If one provider has issues in a region, the orchestrator can immediately place new sessions on an alternative provider in the same or a nearby region. Existing sessions continue unaffected on their original host; only new session requests are rerouted.

Gameye runs multi-provider infrastructure in every region — bare metal from Gcore and OVHCloud, with cloud and edge capacity available for burst. If one provider reports problems, session placement shifts to backup providers automatically. This is the architecture behind Gameye’s 99.99% uptime SLA.

See also: Downtime · Auto-scaling · Bare metal servers


Frequently asked questions

What is failover in gaming? Failover is when a primary game server fails and the infrastructure automatically reroutes new sessions to backup servers in the same or a nearby region. Existing in-progress sessions continue unaffected on their original host; players joining after the failure connect to the backup with no visible outage.

What is the difference between manual and automatic failover? Manual failover requires an engineer to detect the problem, decide on a response, and execute it — typically 15–60 minutes even with a practiced runbook. Automatic failover is handled by the infrastructure layer itself and reroutes within seconds, before most players have noticed a problem. For live multiplayer games, automatic failover is the only viable option.

How do game server providers implement failover? The most reliable approach is running infrastructure across multiple providers simultaneously. When one provider has issues in a region, the orchestrator places new sessions on an alternative provider in the same region. Gameye does this across bare-metal providers (Gcore, OVHCloud) in every deployment region — if one has problems, session placement shifts to backup providers automatically.

What is the difference between failover and load balancing? Load balancing distributes session requests across servers for efficiency under normal conditions. Failover is triggered by failure — when a server or provider goes down, traffic is rerouted to backup capacity. They are complementary: load balancing optimises healthy-state distribution; failover handles degraded-state recovery.

Back to Glossary