Interface RaftNode
Operations and queries passed to Raft nodes must be deterministic, i.e., they must produce the same result independent of when or on which Raft node it is being executed.
Raft nodes are identified by [group id, node id] pairs. Multiple Raft groups can run in the same environment, distributed or even in a single JVM process, and they can be discriminated from each other with unique group ids. A single JVM process can run multiple Raft nodes that belong to different Raft groups or even the same Raft group.
Before a new Raft group is created, its initial member list must be decided. Then, a Raft node is created for each one of its members. When a new member is added to an existing Raft group later on, its Raft node must be initialized with the same initial member list as well.
Status of a Raft node is RaftNodeStatus.INITIAL
on creation, and it
moves to RaftNodeStatus.ACTIVE
and starts executing the Raft
consensus algorithm when start()
is called.
No further operations can be triggered on a Raft node after it is terminated or leaves its Raft group, i.e., removed from the Raft group member list.
Raft nodes execute the Raft consensus algorithm with the Actor model. You can
read about the Actor Model at the following link:
https://en.wikipedia.org/wiki/Actor_model In this model, each Raft node runs
in a single-threaded manner. It uses a RaftNodeExecutor
to
sequentially handle API calls and RaftMessage
objects created via
RaftModelFactory
.
The communication between Raft nodes are implemented with the message-passing
approach and abstracted away with the Transport
interface.
Raft nodes use StateMachine
to execute queries and committed
operations.
Last, Raft nodes use RaftStore
to persist internal Raft state to
stable storage to be able to recover from crashes.
-
Nested Class Summary
Modifier and TypeInterfaceDescriptionstatic interface
The builder interface for configuring and creating Raft node instances. -
Method Summary
Modifier and TypeMethodDescriptionchangeMembership
(RaftEndpoint endpoint, MembershipChangeMode mode, long expectedGroupMembersCommitIndex) Replicates and commits the given membership change to the Raft group, if the given group members commit index is equal to the current group members commit index in the local Raft state.Returns the last committed member list of the Raft group this Raft node belongs to.Returns the config object this Raft node is initialized with.Returns the currently effective member list of the Raft group this Raft node belongs to.Returns the unique id of the Raft group which this Raft node belongs to.Returns the initial member list of the Raft group.Returns the local endpoint of this Raft node.Returns a report object that contains information about this Raft node's local state related to the execution of the Raft consensus algorithm.Returns the current status of this Raft node.getTerm()
Returns the locally known term information.void
handle
(io.microraft.model.message.RaftMessage message) Asynchronously handles the given Raft message which can be either a Raft RPC request or a response, or silently ignores the given Raft message if this Raft node has already terminated or left the Raft group.static RaftNode.RaftNodeBuilder
Returns a new builder to configure RaftNode that is going to be created.<T> CompletableFuture<Ordered<T>>
query
(Object operation, QueryPolicy queryPolicy, Optional<Long> minCommitIndex, Optional<Duration> timeout) Executes the given query with the given query policy.<T> CompletableFuture<Ordered<T>>
Replicates, commits, and executes the given operation via this Raft node.start()
Triggers this Raft node to start executing the Raft consensus algorithm.Takes a new snapshot at the local RaftNode at the current commit index.Forcefully sets the status of this Raft node toRaftNodeStatus.TERMINATED
and makes the Raft node stops executing the Raft consensus algorithm.transferLeadership
(RaftEndpoint endpoint) Transfers the leadership role to the given endpoint, if this Raft node is the current Raft group leader with theRaftNodeStatus.ACTIVE
status and the given endpoint is in the committed member list of the Raft group.The returned future is completed when the Raft node's last applied log index becomes greater than or equal to the given commit index.
-
Method Details
-
newBuilder
Returns a new builder to configure RaftNode that is going to be created.- Returns:
- a new builder to configure RaftNode that is going to be created
-
getGroupId
Returns the unique id of the Raft group which this Raft node belongs to.- Returns:
- the unique id of the Raft group which this Raft node belongs to
-
getLocalEndpoint
Returns the local endpoint of this Raft node.- Returns:
- the local endpoint of this Raft node
-
getConfig
Returns the config object this Raft node is initialized with.- Returns:
- the config object this Raft node is initialized with
-
getTerm
Returns the locally known term information.Please note that the other Raft nodes in the Raft group may have already switched to a higher term.
- Returns:
- the locally known term information
-
getStatus
Returns the current status of this Raft node.- Returns:
- the current status of this Raft node
-
getInitialMembers
Returns the initial member list of the Raft group.- Returns:
- the initial member list of the Raft group
-
getCommittedMembers
Returns the last committed member list of the Raft group this Raft node belongs to.Please note that the returned member list is read from the local state and can be different from the currently effective applied member list, if this Raft node is part of the minority and there is an ongoing (appended but not-yet-committed) membership change in the majority of the Raft group. Similarly, it can be different from the current committed member list of the Raft group, also if a new membership change is committed by the majority Raft nodes but not learnt by this Raft node yet.
- Returns:
- the last committed member list of the Raft group
-
getEffectiveMembers
Returns the currently effective member list of the Raft group this Raft node belongs to.Please note that the returned member list is read from the local state and can be different from the committed member list, if there is an ongoing (appended but not-yet committed) membership change in the Raft group.
- Returns:
- the currently effective member list of the Raft group
-
start
Triggers this Raft node to start executing the Raft consensus algorithm.The returned future is completed with
IllegalStateException
if this Raft node has already started.- Returns:
- the future object to be completed after this Raft node starts its execution
-
terminate
Forcefully sets the status of this Raft node toRaftNodeStatus.TERMINATED
and makes the Raft node stops executing the Raft consensus algorithm.- Returns:
- the future object to be completed after this Raft node terminates
-
handle
void handle(@Nonnull io.microraft.model.message.RaftMessage message) Asynchronously handles the given Raft message which can be either a Raft RPC request or a response, or silently ignores the given Raft message if this Raft node has already terminated or left the Raft group.- Parameters:
message
- the object sent by another Raft node of this Raft group- Throws:
IllegalArgumentException
- if an unknown message is received.- See Also:
-
RaftMessage
-
replicate
Replicates, commits, and executes the given operation via this Raft node. The given operation is executed once it is committed in the Raft group, and the returned future object is completed with its execution result, along with the commit index.The given operation must be deterministic, otherwise state machines maintained by each Raft node in the Raft group can diverge.
The returned future be can completed with
NotLeaderException
,CannotReplicateException
orIndeterminateStateException
. Please see individual exception classes for more information.- Type Parameters:
T
- type of the result of the operation execution- Parameters:
operation
- the operation to be replicated on the Raft group- Returns:
- the future to be completed with the result of the operation execution, or the exception if the replication fails
- See Also:
-
query
@Nonnull <T> CompletableFuture<Ordered<T>> query(@Nonnull Object operation, @Nonnull QueryPolicy queryPolicy, Optional<Long> minCommitIndex, Optional<Duration> timeout) Executes the given query with the given query policy.The returned future is completed with an
Ordered
object that contains the commit index on which the given query is executed and the result of the query.Callers can also provide a minimum commit index and a timeout duration. If no timeout is passed, the local Raft node executes the given query only if its local commit index is greater than or equal to the given commit index at the time of the call, or completes the returned future with
LaggingCommitIndexException
. If a timeout is passed and the local commit index is smaller than the given commit index, the local Raft node waits for the commit index to become greater than or equal to the given commit index, or completes the returned feature withLaggingCommitIndexException
if the timeout occurs before that. Callers could retry its query on the same Raft node, or forward it to another Raft node. This mechanism enables callers to execute queries on Raft nodes without hitting the log replication quorum and still achieve monotonic reads. Please see the Section: 6.4 Processing read-only queries more efficiently of the Raft dissertation for more details.When the method is called on the leader Raft node, the returned future is completed with
LaggingCommitIndexException
if the given commit index is greater than the current commit index of the Raft node and the query policy isQueryPolicy.LEADER_LEASE
orQueryPolicy.LINEARIZABLE
.The returned future can be completed with
NotLeaderException
,CannotReplicateException
orLaggingCommitIndexException
. Please see individual exception classes for more information.- Type Parameters:
T
- type of the result of the query execution- Parameters:
operation
- the query operation to be executedqueryPolicy
- the query policy to decide how to execute the given queryminCommitIndex
- (optional) minimum commit index that this Raft node's local commit index needs to advancetimeout
- (optional) duration to wait before timing out the query if the local commit index cannot advance until the given commit index have in order to execute the given query.- Returns:
- the future to be completed with the result of the query execution, or the exception if the query cannot be executed
- See Also:
-
waitFor
The returned future is completed when the Raft node's last applied log index becomes greater than or equal to the given commit index. It means all operations up to that log index are committed and applied to the state machine. The returned future can be completed with a commit index that is greater than the given commit index if the Raft node advances its local commit index beyond the given commit index with a single AppendEntries RPC. If the Raft node cannot advance its local commit index up to the given commit index for the given duration, the returned future is completed withLaggingCommitIndexException
.This is a barrier which can be used for achieving read your writes. After a client can learn commit index of an operation it replicates through the leader, then it can wait until a particular follower's local state machine applies that operation.
Basically, this method is a shorthand for RaftNode.query(no-op, QueryPolicy.EVENTUAL_CONSISTENCY, minCommitIndex, timeout).
- Parameters:
minCommitIndex
- (optional) minimum commit index that this Raft node's local commit index needs to advancetimeout
- (optional) duration to wait before timing out the return future
-
changeMembership
@Nonnull CompletableFuture<Ordered<RaftGroupMembers>> changeMembership(@Nonnull RaftEndpoint endpoint, @Nonnull MembershipChangeMode mode, long expectedGroupMembersCommitIndex) Replicates and commits the given membership change to the Raft group, if the given group members commit index is equal to the current group members commit index in the local Raft state.The initial group members commit index is 0. The current group members commit index can be accessed via
getCommittedMembers()
.When the membership change process is completed successfully, the returned future is completed with an
Ordered
object that contains the new member list of the Raft group and the log index at which the given membership change is committed.The majority quorum size of the Raft group can increase or decrease by 1 if the membership change updates the number of the voting members. Relatedly, since adding a new voting member increases the majority quorum size, if there are already failed Raft endpoints in the Raft group, it is recommended to remove them first before adding a new voting member in order to avoid availability issues.
The current leader Raft node can be removed from the Raft group member list safely without requiring a leadership transfer. The leader Raft node commits the membership change without its own vote, and the followers trigger a new leader election round on the next periodic heartbeat tick. Please note that in this scenario there is a short availability gap which is equal to the leader heartbeat period duration.
A new Raft endpoint can be added to the Raft group as a
RaftRole.LEARNER
viaMembershipChangeMode.ADD_LEARNER
. In this case, the quorum size of the Raft group does not change. In addition, a new Raft endpoint can be added to the Raft group as aRaftRole.FOLLOWER
, or an existingRaftRole.LEARNER
Raft endpoint can be promoted toRaftRole.FOLLOWER
withMembershipChangeMode.ADD_OR_PROMOTE_TO_FOLLOWER
. In this case, the quorum size of the Raft group is re-calculated based on the new number of voting members. The leader Raft node does not check if the given Raft endpoint's local Raft log is sufficiently up to date before this re-calculation. Hence, it is callers' responsibility to check the promoted Raft endpoint's local Raft log in order to prevent availability gaps if the quorum size will increase after the membership change. This check can be done by gettingRaftNodeReport
from the leader Raft node and comparing last log indices of the leader and theRaftRole.LEARNER
Raft endpoint.If the given group members commit index is different than the current group members commit index in the local Raft state, then the returned future is completed with
MismatchingRaftGroupMembersCommitIndexException
.If the Raft group contains a single voting member and that member is attempted to be removed, then the returned future is completed with
IllegalStateException
.If the given Raft endpoint is already in the committed Raft group member list, or it is being added as a
RaftRole.LEARNER
while there areRaftGroupMembers.MAX_LEARNER_COUNT
RaftRole.LEARNER
s in the Raft group member list, then the returned future is completed withIllegalArgumentException
.The returned future can be completed with
NotLeaderException
,CannotReplicateException
orIndeterminateStateException
. Please see individual exception classes for more information.- Parameters:
endpoint
- the endpoint to add to or remove from the Raft groupmode
- the type of the membership changeexpectedGroupMembersCommitIndex
- the expected members commit index- Returns:
- the future to be completed with the new member list of the Raft group if the membership change is successful, or the exception if the membership change failed
- See Also:
-
transferLeadership
Transfers the leadership role to the given endpoint, if this Raft node is the current Raft group leader with theRaftNodeStatus.ACTIVE
status and the given endpoint is in the committed member list of the Raft group.The leadership transfer process is considered to be completed when this Raft node moves to the follower role. There is no strict guarantee that the given endpoint will be the new leader in the new term. However, it is very unlikely that another endpoint will become the new leader.
The returned future is completed with an
Ordered
object that contains the commit index on which this Raft node turns into a follower.This Raft node does not replicate any new operation until the leadership transfer process is completed and new
replicate(Object)
calls fail withCannotReplicateException
.The returned future can be completed with
NotLeaderException
if this Raft node is not leader,IllegalStateException
if the Raft node status is notRaftNodeStatus.ACTIVE
,IllegalArgumentException
if the given endpoint is not a voting member in the committed Raft group member list, andTimeoutException
if the leadership transfer process has timed out w.r.tRaftConfig.getLeaderHeartbeatTimeoutSecs()
.- Parameters:
endpoint
- the Raft endpoint to which the leadership will be transferred- Returns:
- the future to be completed when the leadership transfer is done, or with the execution if the leader transfer could not be done.
- See Also:
-
getReport
Returns a report object that contains information about this Raft node's local state related to the execution of the Raft consensus algorithm.- Returns:
- a report object that contains information about this Raft node's local state
-
takeSnapshot
Takes a new snapshot at the local RaftNode at the current commit index. If a snapshot is already taken at the current commit index, calling this method is a no-op.- Returns:
- a report object that contains information about this Raft node's local state if a new snapshot is taken, otherwise null
-