Database load balancing

With database load balancing, read-only queries can be distributed across multiple PostgreSQL nodes to increase performance.

This documentation provides a technical overview on how database load balancing is implemented in GitLab Rails and Sidekiq.

Nomenclature

Host: Each database host. It could be a primary or a replica.
Primary: Primary PostgreSQL host that is used for write-only and read-and-write operations.
Replica: Secondary PostgreSQL hosts that are used for read-only operations.
Workload: a Rails request or a Sidekiq job that requires database connections.

Components

A few Ruby classes are involved in the load balancing process. All of them are in the namespace Gitlab::Database::LoadBalancing:

Host
LoadBalancer
ConnectionProxy
Session

Each workload begins with a new instance of Gitlab::Database::LoadBalancing::Session. The Session keeps track of the database operations that have been performed. It then determines if the workload requires a connection to either the primary host or a replica host.

When the workload requires a database connection through ActiveRecord, ConnectionProxy first redirects the connection request to LoadBalancer. ConnectionProxy requests either a read or read_write connection from the LoadBalancer depending on a few criteria:

Whether the query is a read-only or it requires write.
Whether the Session has recorded a write operation previously.
Whether any special blocks have been used to prefer primary or replica, such as:
- use_primary
- ignore_writes
- use_replicas_for_read_queries
- fallback_to_replicas_for_ambiguous_queries

LoadBalancer then yields the requested connection from the respective database connection pool. It yields either:

A read_write connection from the primary's connection pool.
A read connection from the replicas' connection pools.

When responding to a request for a read connection, LoadBalancer would first attempt to load balance the connection across the replica hosts. It looks for the next online replica host and yields a connection from the host's connection pool. A replica host is considered online if it is up-to-date with the primary, based on either the replication lag size or time. The thresholds for these requirements are configurable.