This command, that can only be send to a Redis Cluster slave node, forces the slave to start a manual failover of its master instance. 该命令只能在群集slave节点执行,让slave节点进行一次人工故障切换。

A manual failover is a special kind of failover that is usually executed when there are no actual failures, but we wish to swap the current master with one of its slaves (which is the node we send the command to), in a safe way, without any window for data loss. It works in the following way: 人工故障切换是预期的操作,而非发生了真正的故障,目的是以一种安全的方式(数据无丢失)将当前master节点和其中一个slave节点(执行cluster-failover的节点)交换角色。 流程如下:

  1. The slave tells the master to stop processing queries from clients.
  2. 当前slave节点告知其master节点停止处理来自客户端的请求

  3. The master replies to the slave with the current replication offset.
  4. master 节点将当前replication offset 回复给该slave节点

  5. The slave waits for the replication offset to match on its side, to make sure it processed all the data from the master before it continues.
  6. 该slave节点在未应用至replication offset之前不做任何操作,以保证master传来的数据均被处理。

  7. The slave starts a failover, obtains a new configuration epoch from the majority of the masters, and broadcast the new configuration.
  8. 该slave 节点进行故障转移,从群集中大多数的master节点获取epoch,然后广播自己的最新配置

  9. The old master receives the configuration update: unblocks its clients and start replying with redirection messages so that they’ll continue the chat with the new master.
  10. 原master节点收到配置更新:解除客户端的访问阻塞,回复重定向信息,以便客户端可以和新master通信。

This way clients are moved away from the old master to the new master atomically and only when the slave that is turning in the new master processed all the replication stream from the old master. 当该slave节点(将切换为新master节点)处理完来自master的所有复制,客户端的访问将会自动由原master节点切换至新master节点

FORCE option: manual failover when the master is down

FORCE 选项:master节点down的情况下的人工故障转移

The command behavior can be modified by two options: FORCE and TAKEOVER. 该命令有如下两个选项:FORCETAKEOVER

If the FORCE option is given, the slave does not perform any handshake with the master, that may be not reachable, but instead just starts a failover ASAP starting from point 4. This is useful when we want to start a manual failover while the master is no longer reachable. FORCE 选项:slave节点不和master协商(master也许已不可达),从上如4步开始进行故障切换。当master已不可用,而我们想要做人工故障转移时,该选项很有用。

However using FORCE we still need the majority of masters to be available in order to authorize the failover and generate a new configuration epoch for the slave that is going to become master. 但是,即使使用FORCE选项,我们依然需要群集中大多数master节点有效,以便对这次切换进行验证,同时为将成为新master的salve节点生成新的配置epoch。

TAKEOVER option: manual failover without cluster consensus

TAKEOVER 选项: 忽略群集一致验证的的人工故障切换

There are situations where this is not enough, and we want a slave to failover without any agreement with the rest of the cluster. A real world use case for this is to mass promote slaves in a different data center to masters in order to perform a data center switch, while all the masters are down or partitioned away. 有时会有这种情况,群集中master节点不够,我们想在未和群集中其余master节点验证的情况下进行故障切换。 实际用途举例:群集中主节点和从节点在不同的数据中心,当所有主节点down掉或被网络分区隔离,需要用该参数将slave节点 批量切换为master节点。

The TAKEOVER option implies everything FORCE implies, but also does not uses any cluster authorization in order to failover. A slave receiving CLUSTER FAILOVER TAKEOVER will instead: 选项 TAKEOVER 实现了FORCE的所有功能,同时为了能够进行故障切换放弃群集验证。当slave节点收到命令CLUSTER FAILOVER TAKEOVER会做如下操作:

  1. Generate a new configEpoch unilaterally, just taking the current greatest epoch available and incrementing it if its local configuration epoch is not already the greatest.
  2. 独自生成新的configEpoch,若本地配置epoch非最大的,则取当前有效epoch值中的最大值并自增作为新的配置epoch

  3. Assign itself all the hash slots of its master, and propagate the new configuration to every node which is reachable ASAP, and eventually to every other node.
  4. 将原master节点管理的所有哈希槽分配给自己,同时尽快分发最新的配置给所有当前可达节点,以及后续恢复的故障节点,期望最终配置分发至所有节点

Note that TAKEOVER violates the last-failover-wins principle of Redis Cluster, since the configuration epoch generated by the slave violates the normal generation of configuration epochs in several ways: 注意:TAKEOVER 违反Redis群集最新-故障转移-有效 原则,因为slave节点产生的配置epoch 会让正常产生的的配置epoch无效

  1. There is no guarantee that it is actually the higher configuration epoch, since, for example, we can use the TAKEOVER option within a minority, nor any message exchange is performed to generate the new configuration epoch.
  2. 使用TAKEOVER 产生的配置epoch 无法保证时最大值,因为我们是在少数节点见生成epoch,并且没有使用信息交互来保证新生成的epoch值最大。

  3. If we generate a configuration epoch which happens to collide with another instance, eventually our configuration epoch, or the one of another instance with our same epoch, will be moved away using the configuration epoch collision resolution algorithm.
  4. 如果新生成的配置epoch 恰巧和其他实例生成的发生冲突(epoch相同),最终我们生成的配置epoch或者其他实例生成的epoch,会通过使用配置epoch冲突解决算法 舍弃掉其中一个。

Because of this the TAKEOVER option should be used with care. 因为这个原因,选择TAKEOVER需小心使用

Implementation details and notes

实现细节与注意事项

CLUSTER FAILOVER, unless the TAKEOVER option is specified, does not execute a failover synchronously, it only schedules a manual failover, bypassing the failure detection stage, so to check if the failover actually happened, CLUSTER NODES or other means should be used in order to verify that the state of the cluster changes after some time the command was sent. CLUSTER FAILOVER,除非执行时使用选项TAKEOVER,否则故障切换不会同步执行,仅绕过故障检测阶段,添加一个人工故障转移任务。因此如果要检测故障转移 是否真的发生了,需要在CLUSTER FAILOVER发送一段时间后使用CLUSTER NODES 或其他方法来验证群集变动后的状态。

@return

@simple-string-reply: OK if the command was accepted and a manual failover is going to be attempted. An error if the operation cannot be executed, for example if we are talking with a node which is already a master. @simple-string-reply:该命令已被接受并进行人工故障转移回复OK,切换操作无法执行(如发送命令的已经时master节点)时回复错误