Consensus algorithms are at the core of distributed systems. How do you manage consistency across multiple servers or nodes?
The Raft Consensus Algorithm is a distributed system protocol that’s widely used (including by systems like Kubernetes, via etcd). It is equivalent in fault tolerance and consistency guarantees to Paxos, which is often seen as a more complex approach.
Here’s a simplification of the algorithm:
Overall design: Elect a leader among the servers, which is responsible for managing replication and ensuring that all followers have the same data. If the leader fails, the system elects a new leader.
Some of the steps (simplified)
Initialization:
- All nodes start as followers
Elect a Leader
- If a follower does not receive a heartbeat message from the leader within a certain time period, it becomes a candidate for leadership.
- The candidate votes for itself and asks all other nodes for votes
- A candidate becomes the leader if it gets a majority of votes
Replicate Logs
- The leader accepts commands from clients and appends them to its log
- It sends the logs to all followers
- When the majority acknowledges the entry, the leader applies it to its own state machine and informs the clients.
Each step has many other nuances, but this is a very high-level description of the algorithm. The Raft paper is the best place for more information. And the etcd raft implementation is a good starting point if you’re more comfortable looking through the code.
Some other systems that use Raft.
- CockroachDB
- ClickHouse
- MongoDB
- Etcd