Interface RaftNode
-
public interface RaftNode
A Raft node runs the Raft consensus algorithm as a member of a Raft group.Operations and queries passed to Raft nodes must be deterministic, i.e., they must produce the same result independent of when or on which Raft node it is being executed.
Raft nodes are identified by [group id, node id] pairs. Multiple Raft groups can run in the same environment, distributed or even in a single JVM process, and they can be discriminated from each other with unique group ids. A single JVM process can run multiple Raft nodes that belong to different Raft groups or even the same Raft group.
Before a new Raft group is created, its initial member list must be decided. Then, a Raft node is created for each one of its members. When a new member is added to an existing Raft group later on, its Raft node must be initialized with the same initial member list as well.
Status of a Raft node is
RaftNodeStatus.INITIAL
on creation, and it moves toRaftNodeStatus.ACTIVE
and starts executing the Raft consensus algorithm whenstart()
is called.No further operations can be triggered on a Raft node after it is terminated or leaves its Raft group, i.e., removed from the Raft group member list.
Raft nodes execute the Raft consensus algorithm with the Actor model. You can read about the Actor Model at the following link: https://en.wikipedia.org/wiki/Actor_model In this model, each Raft node runs in a single-threaded manner. It uses a
RaftNodeExecutor
to sequentially handle API calls andRaftMessage
objects created viaRaftModelFactory
.The communication between Raft nodes are implemented with the message-passing approach and abstracted away with the
Transport
interface.Raft nodes use
StateMachine
to execute queries and committed operations.Last, Raft nodes use
RaftStore
to persist internal Raft state to stable storage to be able to recover from crashes.
-
-
Nested Class Summary
Nested Classes Modifier and Type Interface Description static interface
RaftNode.RaftNodeBuilder
The builder interface for configuring and creating Raft node instances.
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Modifier and Type Method Description CompletableFuture<Ordered<RaftGroupMembers>>
changeMembership(RaftEndpoint endpoint, MembershipChangeMode mode, long expectedGroupMembersCommitIndex)
Replicates and commits the given membership change to the Raft group, if the given group members commit index is equal to the current group members commit index in the local Raft state.RaftGroupMembers
getCommittedMembers()
Returns the last committed member list of the Raft group this Raft node belongs to.RaftConfig
getConfig()
Returns the config object this Raft node is initialized with.RaftGroupMembers
getEffectiveMembers()
Returns the currently effective member list of the Raft group this Raft node belongs to.Object
getGroupId()
Returns the unique id of the Raft group which this Raft node belongs to.RaftGroupMembers
getInitialMembers()
Returns the initial member list of the Raft group.RaftEndpoint
getLocalEndpoint()
Returns the local endpoint of this Raft node.CompletableFuture<Ordered<RaftNodeReport>>
getReport()
Returns a report object that contains information about this Raft node's local state related to the execution of the Raft consensus algorithm.RaftNodeStatus
getStatus()
Returns the current status of this Raft node.RaftTerm
getTerm()
Returns the locally known term information.void
handle(RaftMessage message)
Handles the given Raft message which can be either a Raft RPC request or a response.static RaftNode.RaftNodeBuilder
newBuilder()
Returns a new builder to configure RaftNode that is going to be created.<T> CompletableFuture<Ordered<T>>
query(Object operation, QueryPolicy queryPolicy, long minCommitIndex)
Executes the given query operation based on the given query policy.<T> CompletableFuture<Ordered<T>>
replicate(Object operation)
Replicates, commits, and executes the given operation via this Raft node.CompletableFuture<Ordered<Object>>
start()
Triggers this Raft node to start executing the Raft consensus algorithm.CompletableFuture<Ordered<RaftNodeReport>>
takeSnapshot()
Takes a new snapshot at the local RaftNode at the current commit index.CompletableFuture<Ordered<Object>>
terminate()
Forcefully sets the status of this Raft node toRaftNodeStatus.TERMINATED
and makes the Raft node stops executing the Raft consensus algorithm.CompletableFuture<Ordered<Object>>
transferLeadership(RaftEndpoint endpoint)
Transfers the leadership role to the given endpoint, if this Raft node is the current Raft group leader with theRaftNodeStatus.ACTIVE
status and the given endpoint is in the committed member list of the Raft group.
-
-
-
Method Detail
-
newBuilder
static RaftNode.RaftNodeBuilder newBuilder()
Returns a new builder to configure RaftNode that is going to be created.- Returns:
- a new builder to configure RaftNode that is going to be created
-
getGroupId
@Nonnull Object getGroupId()
Returns the unique id of the Raft group which this Raft node belongs to.- Returns:
- the unique id of the Raft group which this Raft node belongs to
-
getLocalEndpoint
@Nonnull RaftEndpoint getLocalEndpoint()
Returns the local endpoint of this Raft node.- Returns:
- the local endpoint of this Raft node
-
getConfig
@Nonnull RaftConfig getConfig()
Returns the config object this Raft node is initialized with.- Returns:
- the config object this Raft node is initialized with
-
getTerm
@Nonnull RaftTerm getTerm()
Returns the locally known term information.Please note that the other Raft nodes in the Raft group may have already switched to a higher term.
- Returns:
- the locally known term information
-
getStatus
@Nonnull RaftNodeStatus getStatus()
Returns the current status of this Raft node.- Returns:
- the current status of this Raft node
-
getInitialMembers
@Nonnull RaftGroupMembers getInitialMembers()
Returns the initial member list of the Raft group.- Returns:
- the initial member list of the Raft group
-
getCommittedMembers
@Nonnull RaftGroupMembers getCommittedMembers()
Returns the last committed member list of the Raft group this Raft node belongs to.Please note that the returned member list is read from the local state and can be different from the currently effective applied member list, if this Raft node is part of the minority and there is an ongoing (appended but not-yet-committed) membership change in the majority of the Raft group. Similarly, it can be different from the current committed member list of the Raft group, also if a new membership change is committed by the majority Raft nodes but not learnt by this Raft node yet.
- Returns:
- the last committed member list of the Raft group
-
getEffectiveMembers
@Nonnull RaftGroupMembers getEffectiveMembers()
Returns the currently effective member list of the Raft group this Raft node belongs to.Please note that the returned member list is read from the local state and can be different from the committed member list, if there is an ongoing (appended but not-yet committed) membership change in the Raft group.
- Returns:
- the currently effective member list of the Raft group
-
start
@Nonnull CompletableFuture<Ordered<Object>> start()
Triggers this Raft node to start executing the Raft consensus algorithm.The returned future is completed with
IllegalStateException
if this Raft node has already started.- Returns:
- the future object to be notified after this Raft node starts
-
terminate
@Nonnull CompletableFuture<Ordered<Object>> terminate()
Forcefully sets the status of this Raft node toRaftNodeStatus.TERMINATED
and makes the Raft node stops executing the Raft consensus algorithm.- Returns:
- the future object to be notified after this Raft node terminates
-
handle
void handle(@Nonnull RaftMessage message)
Handles the given Raft message which can be either a Raft RPC request or a response.Silently ignores the given Raft message if this Raft node has already terminated or left the Raft group.
- Parameters:
message
- the object sent by another Raft node of this Raft group- Throws:
IllegalArgumentException
- if an unknown message is received.- See Also:
RaftMessage
-
replicate
@Nonnull <T> CompletableFuture<Ordered<T>> replicate(@Nonnull Object operation)
Replicates, commits, and executes the given operation via this Raft node. The given operation is executed once it is committed in the Raft group, and the returned future object is notified with its execution result.Please note that the given operation must be deterministic.
The returned future is notified with an
Ordered
object that contains the log index on which the given operation is committed and executed.The returned future be can notified with
NotLeaderException
,CannotReplicateException
orIndeterminateStateException
. Please see individual exception classes for more information.- Type Parameters:
T
- type of the result of the operation execution- Parameters:
operation
- the operation to be replicated on the Raft group- Returns:
- the future to be notified with the result of the operation execution, or the exception if the replication fails
- See Also:
NotLeaderException
,CannotReplicateException
,IndeterminateStateException
-
query
@Nonnull <T> CompletableFuture<Ordered<T>> query(@Nonnull Object operation, @Nonnull QueryPolicy queryPolicy, long minCommitIndex)
Executes the given query operation based on the given query policy.The returned future is notified with an
Ordered
object that contains the commit index on which the given query is executed.If the caller is providing a query policy which is weaker than
QueryPolicy.LINEARIZABLE
, it can also provide a minimum commit index. Then, the local Raft node executes the given query only if its local commit index is greater than or equal to the required commit index. If the local commit index is smaller than the required commit index, then the returned future is notified withLaggingCommitIndexException
so that the caller could retry its query on the same Raft node after some time or forward it to another Raft node. This mechanism enables callers to execute queries on Raft nodes without hitting the log replication quorum and preserve monotonicity of query results. Please see the Section: 6.4 Processing read-only queries more efficiently of the Raft dissertation for more details.The returned future can be notified with
NotLeaderException
,CannotReplicateException
orLaggingCommitIndexException
. Please see individual exception classes for more information.- Type Parameters:
T
- type of the result of the query execution- Parameters:
operation
- the query operation to be executedqueryPolicy
- the query policy to decide how to execute the given queryminCommitIndex
- the minimum commit index that this Raft node has to have in order to execute the given query.- Returns:
- the future to be notified with the result of the query execution, or the exception if the query cannot be executed
- See Also:
QueryPolicy
,NotLeaderException
,CannotReplicateException
,LaggingCommitIndexException
-
changeMembership
@Nonnull CompletableFuture<Ordered<RaftGroupMembers>> changeMembership(@Nonnull RaftEndpoint endpoint, @Nonnull MembershipChangeMode mode, long expectedGroupMembersCommitIndex)
Replicates and commits the given membership change to the Raft group, if the given group members commit index is equal to the current group members commit index in the local Raft state.The initial group members commit index is 0. The current group members commit index can be accessed via
getCommittedMembers()
.When the membership change process is completed successfully, the returned future is notified with an
Ordered
object that contains the new member list of the Raft group and the log index at which the given membership change is committed.The majority quorum size of the Raft group can increase or decrease by 1 if the membership change updates the number of the voting members. Relatedly, since adding a new voting member increases the majority quorum size, if there are already failed Raft endpoints in the Raft group, it is recommended to remove them first before adding a new voting member in order to avoid availability issues.
The current leader Raft node can be removed from the Raft group member list safely without requiring a leadership transfer. The leader Raft node commits the membership change without its own vote, and the followers trigger a new leader election round on the next periodic heartbeat tick. Please note that in this scenario there is a short availability gap which is equal to the leader heartbeat period duration.
A new Raft endpoint can be added to the Raft group as a
RaftRole.LEARNER
viaMembershipChangeMode.ADD_LEARNER
. In this case, the quorum size of the Raft group does not change. In addition, a new Raft endpoint can be added to the Raft group as aRaftRole.FOLLOWER
, or an existingRaftRole.LEARNER
Raft endpoint can be promoted toRaftRole.FOLLOWER
withMembershipChangeMode.ADD_OR_PROMOTE_TO_FOLLOWER
. In this case, the quorum size of the Raft group is re-calculated based on the new number of voting members. The leader Raft node does not check if the given Raft endpoint's local Raft log is sufficiently up to date before this re-calculation. Hence, it is the caller's responsibility to check the promoted Raft endpoint's local Raft log in order to prevent availability gaps if the quorum size will increase after the membership change. This check can be done by gettingRaftNodeReport
from the leader Raft node and comparing last log indices of the leader and theRaftRole.LEARNER
Raft endpoint.If the given group members commit index is different than the current group members commit index in the local Raft state, then the returned future is notified with
MismatchingRaftGroupMembersCommitIndexException
.If the Raft group contains a single voting member and that member is attempted to be removed, then the returned future is notified with
IllegalStateException
.If the given Raft endpoint is already in the committed Raft group member list, or it is being added as a
RaftRole.LEARNER
while there areRaftGroupMembers.MAX_LEARNER_COUNT
RaftRole.LEARNER
s in the Raft group member list, then the returned future is notified withIllegalArgumentException
.The returned future can be notified with
NotLeaderException
,CannotReplicateException
orIndeterminateStateException
. Please see individual exception classes for more information.- Parameters:
endpoint
- the endpoint to add to or remove from the Raft groupmode
- the type of the membership changeexpectedGroupMembersCommitIndex
- the expected members commit index- Returns:
- the future to be notified with the new member list of the Raft group if the membership change is successful, or the exception if the membership change failed
- See Also:
replicate(Object)
,MismatchingRaftGroupMembersCommitIndexException
,NotLeaderException
,CannotReplicateException
,IndeterminateStateException
,IllegalArgumentException
,IllegalStateException
-
transferLeadership
@Nonnull CompletableFuture<Ordered<Object>> transferLeadership(@Nonnull RaftEndpoint endpoint)
Transfers the leadership role to the given endpoint, if this Raft node is the current Raft group leader with theRaftNodeStatus.ACTIVE
status and the given endpoint is in the committed member list of the Raft group.The leadership transfer process is considered to be completed when this Raft node moves to the follower role. There is no strict guarantee that the given endpoint will be the new leader in the new term. However, it is very unlikely that another endpoint will become the new leader.
The returned future is notified with an
Ordered
object that contains the commit index on which this Raft node turns into a follower.This Raft node does not replicate any new operation until the leadership transfer process is completed and new
replicate(Object)
calls fail withCannotReplicateException
.The returned future can be notified with
NotLeaderException
if this Raft node is not leader,IllegalStateException
if the Raft node status is notRaftNodeStatus.ACTIVE
,IllegalArgumentException
if the given endpoint is not a voting member in the committed Raft group member list, andTimeoutException
if the leadership transfer process has timed out w.r.tRaftConfig.getLeaderHeartbeatTimeoutSecs()
.- Parameters:
endpoint
- the Raft endpoint to which the leadership will be transferred- Returns:
- the future to be notified when the leadership transfer is done, or with the execution if the leader transfer could not be done.
- See Also:
CannotReplicateException
,NotLeaderException
-
getReport
@Nonnull CompletableFuture<Ordered<RaftNodeReport>> getReport()
Returns a report object that contains information about this Raft node's local state related to the execution of the Raft consensus algorithm.- Returns:
- a report object that contains information about this Raft node's local state
-
takeSnapshot
@Nonnull CompletableFuture<Ordered<RaftNodeReport>> takeSnapshot()
Takes a new snapshot at the local RaftNode at the current commit index. If a snapshot is already taken at the current commit index, calling this method is a no-op.- Returns:
- a report object that contains information about this Raft node's local state if a new snapshot is taken, otherwise null
-
-