Package io.microraft

Interface RaftNode


public interface RaftNode
A Raft node runs the Raft consensus algorithm as a member of a Raft group.

Operations and queries passed to Raft nodes must be deterministic, i.e., they must produce the same result independent of when or on which Raft node it is being executed.

Raft nodes are identified by [group id, node id] pairs. Multiple Raft groups can run in the same environment, distributed or even in a single JVM process, and they can be discriminated from each other with unique group ids. A single JVM process can run multiple Raft nodes that belong to different Raft groups or even the same Raft group.

Before a new Raft group is created, its initial member list must be decided. Then, a Raft node is created for each one of its members. When a new member is added to an existing Raft group later on, its Raft node must be initialized with the same initial member list as well.

Status of a Raft node is RaftNodeStatus.INITIAL on creation, and it moves to RaftNodeStatus.ACTIVE and starts executing the Raft consensus algorithm when start() is called.

No further operations can be triggered on a Raft node after it is terminated or leaves its Raft group, i.e., removed from the Raft group member list.

Raft nodes execute the Raft consensus algorithm with the Actor model. You can read about the Actor Model at the following link: https://en.wikipedia.org/wiki/Actor_model In this model, each Raft node runs in a single-threaded manner. It uses a RaftNodeExecutor to sequentially handle API calls and RaftMessage objects created via RaftModelFactory.

The communication between Raft nodes are implemented with the message-passing approach and abstracted away with the Transport interface.

Raft nodes use StateMachine to execute queries and committed operations.

Last, Raft nodes use RaftStore to persist internal Raft state to stable storage to be able to recover from crashes.

See Also:
  • Method Details

    • newBuilder

      static RaftNode.RaftNodeBuilder newBuilder()
      Returns a new builder to configure RaftNode that is going to be created.
      Returns:
      a new builder to configure RaftNode that is going to be created
    • getGroupId

      @Nonnull Object getGroupId()
      Returns the unique id of the Raft group which this Raft node belongs to.
      Returns:
      the unique id of the Raft group which this Raft node belongs to
    • getLocalEndpoint

      @Nonnull RaftEndpoint getLocalEndpoint()
      Returns the local endpoint of this Raft node.
      Returns:
      the local endpoint of this Raft node
    • getConfig

      @Nonnull RaftConfig getConfig()
      Returns the config object this Raft node is initialized with.
      Returns:
      the config object this Raft node is initialized with
    • getTerm

      @Nonnull RaftTerm getTerm()
      Returns the locally known term information.

      Please note that the other Raft nodes in the Raft group may have already switched to a higher term.

      Returns:
      the locally known term information
    • getStatus

      @Nonnull RaftNodeStatus getStatus()
      Returns the current status of this Raft node.
      Returns:
      the current status of this Raft node
    • getInitialMembers

      @Nonnull RaftGroupMembers getInitialMembers()
      Returns the initial member list of the Raft group.
      Returns:
      the initial member list of the Raft group
    • getCommittedMembers

      @Nonnull RaftGroupMembers getCommittedMembers()
      Returns the last committed member list of the Raft group this Raft node belongs to.

      Please note that the returned member list is read from the local state and can be different from the currently effective applied member list, if this Raft node is part of the minority and there is an ongoing (appended but not-yet-committed) membership change in the majority of the Raft group. Similarly, it can be different from the current committed member list of the Raft group, also if a new membership change is committed by the majority Raft nodes but not learnt by this Raft node yet.

      Returns:
      the last committed member list of the Raft group
    • getEffectiveMembers

      @Nonnull RaftGroupMembers getEffectiveMembers()
      Returns the currently effective member list of the Raft group this Raft node belongs to.

      Please note that the returned member list is read from the local state and can be different from the committed member list, if there is an ongoing (appended but not-yet committed) membership change in the Raft group.

      Returns:
      the currently effective member list of the Raft group
    • start

      @Nonnull CompletableFuture<Ordered<Object>> start()
      Triggers this Raft node to start executing the Raft consensus algorithm.

      The returned future is completed with IllegalStateException if this Raft node has already started.

      Returns:
      the future object to be completed after this Raft node starts its execution
    • terminate

      @Nonnull CompletableFuture<Ordered<Object>> terminate()
      Forcefully sets the status of this Raft node to RaftNodeStatus.TERMINATED and makes the Raft node stops executing the Raft consensus algorithm.
      Returns:
      the future object to be completed after this Raft node terminates
    • handle

      void handle(@Nonnull io.microraft.model.message.RaftMessage message)
      Asynchronously handles the given Raft message which can be either a Raft RPC request or a response, or silently ignores the given Raft message if this Raft node has already terminated or left the Raft group.
      Parameters:
      message - the object sent by another Raft node of this Raft group
      Throws:
      IllegalArgumentException - if an unknown message is received.
      See Also:
      • RaftMessage
    • replicate

      @Nonnull <T> CompletableFuture<Ordered<T>> replicate(@Nonnull Object operation)
      Replicates, commits, and executes the given operation via this Raft node. The given operation is executed once it is committed in the Raft group, and the returned future object is completed with its execution result, along with the commit index.

      The given operation must be deterministic, otherwise state machines maintained by each Raft node in the Raft group can diverge.

      The returned future be can completed with NotLeaderException, CannotReplicateException or IndeterminateStateException. Please see individual exception classes for more information.

      Type Parameters:
      T - type of the result of the operation execution
      Parameters:
      operation - the operation to be replicated on the Raft group
      Returns:
      the future to be completed with the result of the operation execution, or the exception if the replication fails
      See Also:
    • query

      @Nonnull <T> CompletableFuture<Ordered<T>> query(@Nonnull Object operation, @Nonnull QueryPolicy queryPolicy, Optional<Long> minCommitIndex, Optional<Duration> timeout)
      Executes the given query with the given query policy.

      The returned future is completed with an Ordered object that contains the commit index on which the given query is executed and the result of the query.

      Callers can also provide a minimum commit index and a timeout duration. If no timeout is passed, the local Raft node executes the given query only if its local commit index is greater than or equal to the given commit index at the time of the call, or completes the returned future with LaggingCommitIndexException. If a timeout is passed and the local commit index is smaller than the given commit index, the local Raft node waits for the commit index to become greater than or equal to the given commit index, or completes the returned feature with LaggingCommitIndexException if the timeout occurs before that. Callers could retry its query on the same Raft node, or forward it to another Raft node. This mechanism enables callers to execute queries on Raft nodes without hitting the log replication quorum and still achieve monotonic reads. Please see the Section: 6.4 Processing read-only queries more efficiently of the Raft dissertation for more details.

      When the method is called on the leader Raft node, the returned future is completed with LaggingCommitIndexException if the given commit index is greater than the current commit index of the Raft node and the query policy is QueryPolicy.LEADER_LEASE or QueryPolicy.LINEARIZABLE.

      The returned future can be completed with NotLeaderException, CannotReplicateException or LaggingCommitIndexException. Please see individual exception classes for more information.

      Type Parameters:
      T - type of the result of the query execution
      Parameters:
      operation - the query operation to be executed
      queryPolicy - the query policy to decide how to execute the given query
      minCommitIndex - (optional) minimum commit index that this Raft node's local commit index needs to advance
      timeout - (optional) duration to wait before timing out the query if the local commit index cannot advance until the given commit index have in order to execute the given query.
      Returns:
      the future to be completed with the result of the query execution, or the exception if the query cannot be executed
      See Also:
    • waitFor

      @Nonnull CompletableFuture<Ordered<Object>> waitFor(long minCommitIndex, Duration timeout)
      The returned future is completed when the Raft node's last applied log index becomes greater than or equal to the given commit index. It means all operations up to that log index are committed and applied to the state machine. The returned future can be completed with a commit index that is greater than the given commit index if the Raft node advances its local commit index beyond the given commit index with a single AppendEntries RPC. If the Raft node cannot advance its local commit index up to the given commit index for the given duration, the returned future is completed with LaggingCommitIndexException.

      This is a barrier which can be used for achieving read your writes. After a client can learn commit index of an operation it replicates through the leader, then it can wait until a particular follower's local state machine applies that operation.

      Basically, this method is a shorthand for RaftNode.query(no-op, QueryPolicy.EVENTUAL_CONSISTENCY, minCommitIndex, timeout).

      Parameters:
      minCommitIndex - (optional) minimum commit index that this Raft node's local commit index needs to advance
      timeout - (optional) duration to wait before timing out the return future
    • changeMembership

      @Nonnull CompletableFuture<Ordered<RaftGroupMembers>> changeMembership(@Nonnull RaftEndpoint endpoint, @Nonnull MembershipChangeMode mode, long expectedGroupMembersCommitIndex)
      Replicates and commits the given membership change to the Raft group, if the given group members commit index is equal to the current group members commit index in the local Raft state.

      The initial group members commit index is 0. The current group members commit index can be accessed via getCommittedMembers().

      When the membership change process is completed successfully, the returned future is completed with an Ordered object that contains the new member list of the Raft group and the log index at which the given membership change is committed.

      The majority quorum size of the Raft group can increase or decrease by 1 if the membership change updates the number of the voting members. Relatedly, since adding a new voting member increases the majority quorum size, if there are already failed Raft endpoints in the Raft group, it is recommended to remove them first before adding a new voting member in order to avoid availability issues.

      The current leader Raft node can be removed from the Raft group member list safely without requiring a leadership transfer. The leader Raft node commits the membership change without its own vote, and the followers trigger a new leader election round on the next periodic heartbeat tick. Please note that in this scenario there is a short availability gap which is equal to the leader heartbeat period duration.

      A new Raft endpoint can be added to the Raft group as a RaftRole.LEARNER via MembershipChangeMode.ADD_LEARNER. In this case, the quorum size of the Raft group does not change. In addition, a new Raft endpoint can be added to the Raft group as a RaftRole.FOLLOWER, or an existing RaftRole.LEARNER Raft endpoint can be promoted to RaftRole.FOLLOWER with MembershipChangeMode.ADD_OR_PROMOTE_TO_FOLLOWER. In this case, the quorum size of the Raft group is re-calculated based on the new number of voting members. The leader Raft node does not check if the given Raft endpoint's local Raft log is sufficiently up to date before this re-calculation. Hence, it is callers' responsibility to check the promoted Raft endpoint's local Raft log in order to prevent availability gaps if the quorum size will increase after the membership change. This check can be done by getting RaftNodeReport from the leader Raft node and comparing last log indices of the leader and the RaftRole.LEARNER Raft endpoint.

      If the given group members commit index is different than the current group members commit index in the local Raft state, then the returned future is completed with MismatchingRaftGroupMembersCommitIndexException.

      If the Raft group contains a single voting member and that member is attempted to be removed, then the returned future is completed with IllegalStateException.

      If the given Raft endpoint is already in the committed Raft group member list, or it is being added as a RaftRole.LEARNER while there are RaftGroupMembers.MAX_LEARNER_COUNT RaftRole.LEARNERs in the Raft group member list, then the returned future is completed with IllegalArgumentException.

      The returned future can be completed with NotLeaderException, CannotReplicateException or IndeterminateStateException. Please see individual exception classes for more information.

      Parameters:
      endpoint - the endpoint to add to or remove from the Raft group
      mode - the type of the membership change
      expectedGroupMembersCommitIndex - the expected members commit index
      Returns:
      the future to be completed with the new member list of the Raft group if the membership change is successful, or the exception if the membership change failed
      See Also:
    • transferLeadership

      @Nonnull CompletableFuture<Ordered<Object>> transferLeadership(@Nonnull RaftEndpoint endpoint)
      Transfers the leadership role to the given endpoint, if this Raft node is the current Raft group leader with the RaftNodeStatus.ACTIVE status and the given endpoint is in the committed member list of the Raft group.

      The leadership transfer process is considered to be completed when this Raft node moves to the follower role. There is no strict guarantee that the given endpoint will be the new leader in the new term. However, it is very unlikely that another endpoint will become the new leader.

      The returned future is completed with an Ordered object that contains the commit index on which this Raft node turns into a follower.

      This Raft node does not replicate any new operation until the leadership transfer process is completed and new replicate(Object) calls fail with CannotReplicateException.

      The returned future can be completed with NotLeaderException if this Raft node is not leader, IllegalStateException if the Raft node status is not RaftNodeStatus.ACTIVE, IllegalArgumentException if the given endpoint is not a voting member in the committed Raft group member list, and TimeoutException if the leadership transfer process has timed out w.r.t RaftConfig.getLeaderHeartbeatTimeoutSecs().

      Parameters:
      endpoint - the Raft endpoint to which the leadership will be transferred
      Returns:
      the future to be completed when the leadership transfer is done, or with the execution if the leader transfer could not be done.
      See Also:
    • getReport

      @Nonnull CompletableFuture<Ordered<RaftNodeReport>> getReport()
      Returns a report object that contains information about this Raft node's local state related to the execution of the Raft consensus algorithm.
      Returns:
      a report object that contains information about this Raft node's local state
    • takeSnapshot

      @Nonnull CompletableFuture<Ordered<RaftNodeReport>> takeSnapshot()
      Takes a new snapshot at the local RaftNode at the current commit index. If a snapshot is already taken at the current commit index, calling this method is a no-op.
      Returns:
      a report object that contains information about this Raft node's local state if a new snapshot is taken, otherwise null