Package io.microraft

Interface RaftNode


  • public interface RaftNode
    A Raft node runs the Raft consensus algorithm as a member of a Raft group.

    Operations and queries passed to Raft nodes must be deterministic, i.e., they must produce the same result independent of when or on which Raft node it is being executed.

    Raft nodes are identified by [group id, node id] pairs. Multiple Raft groups can run in the same environment, distributed or even in a single JVM process, and they can be discriminated from each other with unique group ids. A single JVM process can run multiple Raft nodes that belong to different Raft groups or even the same Raft group.

    Before a new Raft group is created, its initial member list must be decided. Then, a Raft node is created for each one of its members. When a new member is added to an existing Raft group later on, its Raft node must be initialized with the same initial member list as well.

    Status of a Raft node is RaftNodeStatus.INITIAL on creation, and it moves to RaftNodeStatus.ACTIVE and starts executing the Raft consensus algorithm when start() is called.

    No further operations can be triggered on a Raft node after it is terminated or leaves its Raft group, i.e., removed from the Raft group member list.

    Raft nodes execute the Raft consensus algorithm with the Actor model. You can read about the Actor Model at the following link: https://en.wikipedia.org/wiki/Actor_model In this model, each Raft node runs in a single-threaded manner. It uses a RaftNodeExecutor to sequentially handle API calls and RaftMessage objects created via RaftModelFactory.

    The communication between Raft nodes are implemented with the message-passing approach and abstracted away with the Transport interface.

    Raft nodes use StateMachine to execute queries and committed operations.

    Last, Raft nodes use RaftStore to persist internal Raft state to stable storage to be able to recover from crashes.

    See Also:
    RaftNode.RaftNodeBuilder, RaftEndpoint, RaftRole, RaftNodeStatus, RaftNodeExecutor, Transport, StateMachine, RaftStore, RaftNodeReportListener, RaftNodeLifecycleAware
    • Method Detail

      • newBuilder

        static RaftNode.RaftNodeBuilder newBuilder()
        Returns a new builder to configure RaftNode that is going to be created.
        Returns:
        a new builder to configure RaftNode that is going to be created
      • getGroupId

        @Nonnull
        Object getGroupId()
        Returns the unique id of the Raft group which this Raft node belongs to.
        Returns:
        the unique id of the Raft group which this Raft node belongs to
      • getLocalEndpoint

        @Nonnull
        RaftEndpoint getLocalEndpoint()
        Returns the local endpoint of this Raft node.
        Returns:
        the local endpoint of this Raft node
      • getConfig

        @Nonnull
        RaftConfig getConfig()
        Returns the config object this Raft node is initialized with.
        Returns:
        the config object this Raft node is initialized with
      • getTerm

        @Nonnull
        RaftTerm getTerm()
        Returns the locally known term information.

        Please note that the other Raft nodes in the Raft group may have already switched to a higher term.

        Returns:
        the locally known term information
      • getStatus

        @Nonnull
        RaftNodeStatus getStatus()
        Returns the current status of this Raft node.
        Returns:
        the current status of this Raft node
      • getInitialMembers

        @Nonnull
        RaftGroupMembers getInitialMembers()
        Returns the initial member list of the Raft group.
        Returns:
        the initial member list of the Raft group
      • getCommittedMembers

        @Nonnull
        RaftGroupMembers getCommittedMembers()
        Returns the last committed member list of the Raft group this Raft node belongs to.

        Please note that the returned member list is read from the local state and can be different from the currently effective applied member list, if this Raft node is part of the minority and there is an ongoing (appended but not-yet-committed) membership change in the majority of the Raft group. Similarly, it can be different from the current committed member list of the Raft group, also if a new membership change is committed by the majority Raft nodes but not learnt by this Raft node yet.

        Returns:
        the last committed member list of the Raft group
      • getEffectiveMembers

        @Nonnull
        RaftGroupMembers getEffectiveMembers()
        Returns the currently effective member list of the Raft group this Raft node belongs to.

        Please note that the returned member list is read from the local state and can be different from the committed member list, if there is an ongoing (appended but not-yet committed) membership change in the Raft group.

        Returns:
        the currently effective member list of the Raft group
      • start

        @Nonnull
        CompletableFuture<Ordered<Object>> start()
        Triggers this Raft node to start executing the Raft consensus algorithm.

        The returned future is completed with IllegalStateException if this Raft node has already started.

        Returns:
        the future object to be notified after this Raft node starts
      • terminate

        @Nonnull
        CompletableFuture<Ordered<Object>> terminate()
        Forcefully sets the status of this Raft node to RaftNodeStatus.TERMINATED and makes the Raft node stops executing the Raft consensus algorithm.
        Returns:
        the future object to be notified after this Raft node terminates
      • handle

        void handle​(@Nonnull
                    RaftMessage message)
        Handles the given Raft message which can be either a Raft RPC request or a response.

        Silently ignores the given Raft message if this Raft node has already terminated or left the Raft group.

        Parameters:
        message - the object sent by another Raft node of this Raft group
        Throws:
        IllegalArgumentException - if an unknown message is received.
        See Also:
        RaftMessage
      • replicate

        @Nonnull
        <T> CompletableFuture<Ordered<T>> replicate​(@Nonnull
                                                    Object operation)
        Replicates, commits, and executes the given operation via this Raft node. The given operation is executed once it is committed in the Raft group, and the returned future object is notified with its execution result.

        Please note that the given operation must be deterministic.

        The returned future is notified with an Ordered object that contains the log index on which the given operation is committed and executed.

        The returned future be can notified with NotLeaderException, CannotReplicateException or IndeterminateStateException. Please see individual exception classes for more information.

        Type Parameters:
        T - type of the result of the operation execution
        Parameters:
        operation - the operation to be replicated on the Raft group
        Returns:
        the future to be notified with the result of the operation execution, or the exception if the replication fails
        See Also:
        NotLeaderException, CannotReplicateException, IndeterminateStateException
      • query

        @Nonnull
        <T> CompletableFuture<Ordered<T>> query​(@Nonnull
                                                Object operation,
                                                @Nonnull
                                                QueryPolicy queryPolicy,
                                                long minCommitIndex)
        Executes the given query operation based on the given query policy.

        The returned future is notified with an Ordered object that contains the commit index on which the given query is executed.

        If the caller is providing a query policy which is weaker than QueryPolicy.LINEARIZABLE, it can also provide a minimum commit index. Then, the local Raft node executes the given query only if its local commit index is greater than or equal to the required commit index. If the local commit index is smaller than the required commit index, then the returned future is notified with LaggingCommitIndexException so that the caller could retry its query on the same Raft node after some time or forward it to another Raft node. This mechanism enables callers to execute queries on Raft nodes without hitting the log replication quorum and preserve monotonicity of query results. Please see the Section: 6.4 Processing read-only queries more efficiently of the Raft dissertation for more details.

        The returned future can be notified with NotLeaderException, CannotReplicateException or LaggingCommitIndexException. Please see individual exception classes for more information.

        Type Parameters:
        T - type of the result of the query execution
        Parameters:
        operation - the query operation to be executed
        queryPolicy - the query policy to decide how to execute the given query
        minCommitIndex - the minimum commit index that this Raft node has to have in order to execute the given query.
        Returns:
        the future to be notified with the result of the query execution, or the exception if the query cannot be executed
        See Also:
        QueryPolicy, NotLeaderException, CannotReplicateException, LaggingCommitIndexException
      • changeMembership

        @Nonnull
        CompletableFuture<Ordered<RaftGroupMembers>> changeMembership​(@Nonnull
                                                                      RaftEndpoint endpoint,
                                                                      @Nonnull
                                                                      MembershipChangeMode mode,
                                                                      long expectedGroupMembersCommitIndex)
        Replicates and commits the given membership change to the Raft group, if the given group members commit index is equal to the current group members commit index in the local Raft state.

        The initial group members commit index is 0. The current group members commit index can be accessed via getCommittedMembers().

        When the membership change process is completed successfully, the returned future is notified with an Ordered object that contains the new member list of the Raft group and the log index at which the given membership change is committed.

        The majority quorum size of the Raft group can increase or decrease by 1 if the membership change updates the number of the voting members. Relatedly, since adding a new voting member increases the majority quorum size, if there are already failed Raft endpoints in the Raft group, it is recommended to remove them first before adding a new voting member in order to avoid availability issues.

        The current leader Raft node can be removed from the Raft group member list safely without requiring a leadership transfer. The leader Raft node commits the membership change without its own vote, and the followers trigger a new leader election round on the next periodic heartbeat tick. Please note that in this scenario there is a short availability gap which is equal to the leader heartbeat period duration.

        A new Raft endpoint can be added to the Raft group as a RaftRole.LEARNER via MembershipChangeMode.ADD_LEARNER. In this case, the quorum size of the Raft group does not change. In addition, a new Raft endpoint can be added to the Raft group as a RaftRole.FOLLOWER, or an existing RaftRole.LEARNER Raft endpoint can be promoted to RaftRole.FOLLOWER with MembershipChangeMode.ADD_OR_PROMOTE_TO_FOLLOWER. In this case, the quorum size of the Raft group is re-calculated based on the new number of voting members. The leader Raft node does not check if the given Raft endpoint's local Raft log is sufficiently up to date before this re-calculation. Hence, it is the caller's responsibility to check the promoted Raft endpoint's local Raft log in order to prevent availability gaps if the quorum size will increase after the membership change. This check can be done by getting RaftNodeReport from the leader Raft node and comparing last log indices of the leader and the RaftRole.LEARNER Raft endpoint.

        If the given group members commit index is different than the current group members commit index in the local Raft state, then the returned future is notified with MismatchingRaftGroupMembersCommitIndexException.

        If the Raft group contains a single voting member and that member is attempted to be removed, then the returned future is notified with IllegalStateException.

        If the given Raft endpoint is already in the committed Raft group member list, or it is being added as a RaftRole.LEARNER while there are RaftGroupMembers.MAX_LEARNER_COUNT RaftRole.LEARNERs in the Raft group member list, then the returned future is notified with IllegalArgumentException.

        The returned future can be notified with NotLeaderException, CannotReplicateException or IndeterminateStateException. Please see individual exception classes for more information.

        Parameters:
        endpoint - the endpoint to add to or remove from the Raft group
        mode - the type of the membership change
        expectedGroupMembersCommitIndex - the expected members commit index
        Returns:
        the future to be notified with the new member list of the Raft group if the membership change is successful, or the exception if the membership change failed
        See Also:
        replicate(Object), MismatchingRaftGroupMembersCommitIndexException, NotLeaderException, CannotReplicateException, IndeterminateStateException, IllegalArgumentException, IllegalStateException
      • transferLeadership

        @Nonnull
        CompletableFuture<Ordered<Object>> transferLeadership​(@Nonnull
                                                              RaftEndpoint endpoint)
        Transfers the leadership role to the given endpoint, if this Raft node is the current Raft group leader with the RaftNodeStatus.ACTIVE status and the given endpoint is in the committed member list of the Raft group.

        The leadership transfer process is considered to be completed when this Raft node moves to the follower role. There is no strict guarantee that the given endpoint will be the new leader in the new term. However, it is very unlikely that another endpoint will become the new leader.

        The returned future is notified with an Ordered object that contains the commit index on which this Raft node turns into a follower.

        This Raft node does not replicate any new operation until the leadership transfer process is completed and new replicate(Object) calls fail with CannotReplicateException.

        The returned future can be notified with NotLeaderException if this Raft node is not leader, IllegalStateException if the Raft node status is not RaftNodeStatus.ACTIVE, IllegalArgumentException if the given endpoint is not a voting member in the committed Raft group member list, and TimeoutException if the leadership transfer process has timed out w.r.t RaftConfig.getLeaderHeartbeatTimeoutSecs().

        Parameters:
        endpoint - the Raft endpoint to which the leadership will be transferred
        Returns:
        the future to be notified when the leadership transfer is done, or with the execution if the leader transfer could not be done.
        See Also:
        CannotReplicateException, NotLeaderException
      • getReport

        @Nonnull
        CompletableFuture<Ordered<RaftNodeReport>> getReport()
        Returns a report object that contains information about this Raft node's local state related to the execution of the Raft consensus algorithm.
        Returns:
        a report object that contains information about this Raft node's local state
      • takeSnapshot

        @Nonnull
        CompletableFuture<Ordered<RaftNodeReport>> takeSnapshot()
        Takes a new snapshot at the local RaftNode at the current commit index. If a snapshot is already taken at the current commit index, calling this method is a no-op.
        Returns:
        a report object that contains information about this Raft node's local state if a new snapshot is taken, otherwise null