Issue #192: Update docs to the new peerset handling

Also, start removing mentions of `PeerAdd` operation, as things should happen with `--bootstrap` (Join). PeerAdd should not be part of the user workflow and might disappear or be hidden in the future. License: MIT Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-11-14 20:57:18 +01:00 · 2017-11-14 20:57:18 +01:00 · 15bb953afd
commit 15bb953afd
parent 398865f381
4 changed files with 98 additions and 99 deletions
--- a/architecture.md
+++ b/architecture.md
@ -79,7 +79,6 @@ This sections gives an overview of how some things work in Cluster. Doubts? Some
 ### Startup

 * Initialize the P2P host.
-* Initialize the PeerManager: needs to keep track of cluster peers.
 * Initialize Consensus: start looking for a leader asap.
 * Setup RPC in all componenets: allow them to communicate with different parts of the cluster.
 * Bootstrap: if we are bootstrapping from another node, do the dance (contact, receive cluster peers, join consensus)
@ -125,21 +124,13 @@ Notes:
 * If it's via an API request, it involves an RPC request to the cluster main component.
 * `Cluster.PeerAdd()`
 * Figure out the real multiaddress for that peer (the one we see).
-* Let the `PeerManager` component know about the new peer. This adds it to the Libp2p host and notifies any component which needs to know
-about peers. This means also ability to perform RPC requests to that peer.
+* Broadcast the address to every cluster peer and add it to the host's libp2p peerstore. This gives each member of the cluster the ability to perform RPC requests to that peer.
 * Figure out our multiaddress in regard to the new peer (the one it sees to connect to us). This is done with an RPC request and it also
-ensure that the peer is up and reachable.
-* Trigger a consensus `LogOp` indicating that there is a new peer.
-* Send the new peer the list of cluster peers. This is an RPC request to the new peers `PeerManager` which allows to keep a tab on the current cluster peers.
+ensures that the peer is up and reachable.
+* Add the new peer to the Consensus component. This operation gets forwarded to the leader. Internally, Raft commits a configuration change to the log which contains a new peerset. Every peer gets the new peerset, including the new peer.
+* Send the list of peer multiaddresses to the new peer so it host knows how to reach them. This is a remote RPC request to the new peers' `PeerManager.ImportAddresses`

-The consensus part has its own complexity:
-
-* As usual, the "add peer" operation is forwarded to the Raft leader.
-* The consensus component logs such operation and also uses Raft `AddPeer` method or equivalent.
-* This results in two log entries, one internal to Raft which updates the internal Raft peerstore in all peers, and one from Cluster which is
-received by the `Apply` method. This `Apply` operation does not modify the shared `State` (like when pinning), but rather only notifies the `PeerManager` about a new peer so the nodes can be set up to talk to it.
-
-As such we are efectively using the consensus log to broadcast a PeerAdd operation. That is because this is critical and should either succeed everywhere or fail. If we performed a regular "for loop broadcast" and some peers fail, we end up with a mess that needs to be cleaned up and uncertain state. By using the consensus log, those peers which did not receive the operation have the oportunity to receive it later (on restart if they were down). There are still pitfalls to this (restoring from snapshot might result in peers with missing cluster peers) but the errors are more obvious and isolated than in other ways.
+We use the Consensus' (or rather Raft's) internal peerset as source of truth for the current cluster peers. This is just a list of peer IDs. The associated multiaddresses for those peers are broadcasted to every member. This implies that peer additions are expected to happen on healthy cluster where all peers can learn about the new peer's multiaddresses.

 Notes:

--- a/docs/HOWTO_build_and_update_a_cluster.md
+++ b/docs/HOWTO_build_and_update_a_cluster.md
@ -4,105 +4,114 @@

 This step creates a single-node IPFS Cluster.

-First initialize the configuration (see [the **Initialization** section of the
-`ipfs-cluster-service` README](../ipfs-cluster-service/dist/README.md#initialization)
-for more info on entering a cluster secret):
+First, create a secret that we will use for all cluster peers:
+
+```
+node0 $ export CLUSTER_SECRET=$(od  -vN 32 -An -tx1 /dev/urandom | tr -d ' \n')
+node0 $ echo $CLUSTER_SECRET
+9a420ec947512b8836d8eb46e1c56fdb746ab8a78015b9821e6b46b38344038f
+```
+
+Second initialize the configuration (see [the **Initialization** section of the
+`ipfs-cluster-service` README](../ipfs-cluster-service/dist/README.md#initialization) for more details).

 ```
 node0 $ ipfs-cluster-service init
-Enter cluster secret (to automatically generate, rerun with --gen-secret): <enter-your-secret>
-ipfs-cluster-service configuration written to /home/user/.ipfs-cluster/service.json
+INFO     config: Saving configuration config.go:311
+ipfs-cluster-service configuration written to /home/hector/.ipfs-cluster/service.json
 ```

 Then run cluster:

 ```
 node0> ipfs-cluster-service
-13:33:34.044  INFO    cluster: IPFS Cluster v0.0.1 listening on: cluster.go:59
-13:33:34.044  INFO    cluster:         /ip4/127.0.0.1/tcp/9096/ipfs/QmQHKLBXfS7hf8o2acj7FGADoJDLat3UazucbHrgxqisim cluster.go:61
-13:33:34.044  INFO    cluster:         /ip4/192.168.1.103/tcp/9096/ipfs/QmQHKLBXfS7hf8o2acj7FGADoJDLat3UazucbHrgxqisim cluster.go:61
-13:33:34.044  INFO    cluster: starting Consensus and waiting for a leader... consensus.go:163
-13:33:34.047  INFO    cluster: PinTracker ready map_pin_tracker.go:71
-13:33:34.047  INFO    cluster: waiting for leader raft.go:118
-13:33:34.047  INFO    cluster: REST API: /ip4/127.0.0.1/tcp/9094 rest_api.go:309
-13:33:34.047  INFO    cluster: IPFS Proxy: /ip4/127.0.0.1/tcp/9095 -> /ip4/127.0.0.1/tcp/5001 ipfs_http_connector.go:168
-13:33:35.420  INFO    cluster: Raft Leader elected: QmQHKLBXfS7hf8o2acj7FGADoJDLat3UazucbHrgxqisim raft.go:145
-13:33:35.921  INFO    cluster: Consensus state is up to date consensus.go:214
-13:33:35.921  INFO    cluster: IPFS Cluster is ready cluster.go:191
-13:33:35.921  INFO    cluster: Cluster Peers (not including ourselves): cluster.go:192
-13:33:35.921  INFO    cluster:     - No other peers cluster.go:195
+INFO    cluster: IPFS Cluster v0.3.0 listening on: cluster.go:91
+INFO    cluster:         /ip4/127.0.0.1/tcp/9096/ipfs/QmZjSoXUQgJ9tutP1rXjjNYwTrRM9QPhmD9GHVjbtgWxEn cluster.go:93
+INFO    cluster:         /ip4/192.168.1.2/tcp/9096/ipfs/QmZjSoXUQgJ9tutP1rXjjNYwTrRM9QPhmD9GHVjbtgWxEn cluster.go:93
+INFO  consensus: starting Consensus and waiting for a leader... consensus.go:61
+INFO  consensus: bootstrapping raft cluster raft.go:108
+INFO    restapi: REST API: /ip4/127.0.0.1/tcp/9094 restapi.go:266
+INFO   ipfshttp: IPFS Proxy: /ip4/127.0.0.1/tcp/9095 -> /ip4/127.0.0.1/tcp/5001 ipfshttp.go:159
+INFO  consensus: Raft Leader elected: QmZjSoXUQgJ9tutP1rXjjNYwTrRM9QPhmD9GHVjbtgWxEn raft.go:261
+INFO  consensus: Raft state is catching up raft.go:273
+INFO  consensus: Consensus state is up to date consensus.go:116
+INFO    cluster: Cluster Peers (without including ourselves): cluster.go:450
+INFO    cluster:     - No other peers cluster.go:452
+INFO    cluster: IPFS Cluster is ready cluster.go:461
 ```

 ### Step 1: Add new members to the cluster

-Initialize and run cluster in a different node(s):
+Initialize and run cluster in a different node(s), bootstrapping them to the first node:

 ```
+node1> export CLUSTER_SECRET=<copy from node0>
 node1> ipfs-cluster-service init
-ipfs-cluster-service configuration written to /home/user/.ipfs-cluster/service.json
-node1> ipfs-cluster-service
-13:36:19.313  INFO    cluster: IPFS Cluster v0.0.1 listening on: cluster.go:59
-13:36:19.313  INFO    cluster:         /ip4/127.0.0.1/tcp/7096/ipfs/QmU7JJftGsP1zM8H37XwvfA7dwdepsB7xVnJA6sKpozc85 cluster.go:61
-13:36:19.313  INFO    cluster:         /ip4/192.168.1.103/tcp/7096/ipfs/QmU7JJftGsP1zM8H37XwvfA7dwdepsB7xVnJA6sKpozc85 cluster.go:61
-13:36:19.313  INFO    cluster: starting Consensus and waiting for a leader... consensus.go:163
-13:36:19.316  INFO    cluster: REST API: /ip4/127.0.0.1/tcp/7094 rest_api.go:309
-13:36:19.316  INFO    cluster: IPFS Proxy: /ip4/127.0.0.1/tcp/7095 -> /ip4/127.0.0.1/tcp/5001 ipfs_http_connector.go:168
-13:36:19.316  INFO    cluster: waiting for leader raft.go:118
-13:36:19.316  INFO    cluster: PinTracker ready map_pin_tracker.go:71
-13:36:20.834  INFO    cluster: Raft Leader elected: QmU7JJftGsP1zM8H37XwvfA7dwdepsB7xVnJA6sKpozc85 raft.go:145
-13:36:21.334  INFO    cluster: Consensus state is up to date consensus.go:214
-13:36:21.334  INFO    cluster: IPFS Cluster is ready cluster.go:191
-13:36:21.334  INFO    cluster: Cluster Peers (not including ourselves): cluster.go:192
-13:36:21.334  INFO    cluster:     - No other peers cluster.go:195
-```
-
-Add them to the original cluster with `ipfs-cluster-ctl peers add <multiaddr>`. The command will return the ID information of the newly added member:
-
-```
-node0> ipfs-cluster-ctl peers add /ip4/192.168.1.103/tcp/7096/ipfs/QmU7JJftGsP1zM8H37XwvfA7dwdepsB7xVnJA6sKpozc85
-{
-  "id": "QmU7JJftGsP1zM8H37XwvfA7dwdepsB7xVnJA6sKpozc85",
-  "public_key": "CAASpgIwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQDtjpvI+XKVGT5toXTimtWceONYsf/1bbRMxLt/fCSYJoSeJqj0HUtttCD3dcBv1M2rElIMXDhyLUpkET+AN6otr9lQnbgi0ZaKrtzphR0w6g/0EQZZaxI2scxF4NcwkwUfe5ceEmPFwax1+C00nd2BF+YEEp+VHNyWgXhCxncOGO74p0YdXBrvXkyfTiy/567L3PPX9F9x+HiutBL39CWhx9INmtvdPB2HwshodF6QbfeljdAYCekgIrCQC8mXOVeePmlWgTwoge9yQbuViZwPiKwwo1AplANXFmSv8gagfjKL7Kc0YOqcHwxBsoUskbJjfheDZJzl19iDs9EvUGk5AgMBAAE=",
-  "addresses": [
-    "/ip4/127.0.0.1/tcp/7096/ipfs/QmU7JJftGsP1zM8H37XwvfA7dwdepsB7xVnJA6sKpozc85",
-    "/ip4/192.168.1.103/tcp/7096/ipfs/QmU7JJftGsP1zM8H37XwvfA7dwdepsB7xVnJA6sKpozc85"
-  ],
-  "cluster_peers": [
-    "/ip4/192.168.123.103/tcp/7096/ipfs/QmU7JJftGsP1zM8H37XwvfA7dwdepsB7xVnJA6sKpozc85"
-  ],
-  "version": "0.0.1",
-  "commit": "83baa5c859b9b17b2deec4f782d1210590025c80",
-  "rpc_protocol_version": "/ipfscluster/0.0.1/rpc",
-  "ipfs": {
-    "id": "QmXZrtE5jQwXNqCJMfHUTQkvhQ4ZAnqMnmzFMJfLewur2n",
-    "addresses": [
-      "/ip4/127.0.0.1/tcp/4001/ipfs/QmXZrtE5jQwXNqCJMfHUTQkvhQ4ZAnqMnmzFMJfLewur2n",
-      "/ip4/192.168.1.103/tcp/4001/ipfs/QmXZrtE5jQwXNqCJMfHUTQkvhQ4ZAnqMnmzFMJfLewur2n"
-    ]
-  }
-}
+INFO     config: Saving configuration config.go:311
+ipfs-cluster-service configuration written to /home/hector/.ipfs-cluster/service.json
+node1> ipfs-cluster-service --bootstrap /ip4/192.168.1.2/tcp/9096/ipfs/QmZjSoXUQgJ9tutP1rXjjNYwTrRM9QPhmD9GHVjbtgWxEn
+INFO    cluster: IPFS Cluster v0.3.0 listening on: cluster.go:91
+INFO    cluster:         /ip4/127.0.0.1/tcp/10096/ipfs/QmYFYwnFUkjFhJcSJJGN72wwedZnpQQ4aNpAtPZt8g5fCd cluster.go:93
+INFO    cluster:         /ip4/192.168.1.3/tcp/10096/ipfs/QmYFYwnFUkjFhJcSJJGN72wwedZnpQQ4aNpAtPZt8g5fCd cluster.go:93
+INFO  consensus: starting Consensus and waiting for a leader... consensus.go:61
+INFO  consensus: bootstrapping raft cluster raft.go:108
+INFO    cluster: Bootstrapping to /ip4/127.0.0.1/tcp/9096/ipfs/QmZjSoXUQgJ9tutP1rXjjNYwTrRM9QPhmD9GHVjbtgWxEn cluster.go:471
+INFO    restapi: REST API: /ip4/127.0.0.1/tcp/10094 restapi.go:266
+INFO   ipfshttp: IPFS Proxy: /ip4/127.0.0.1/tcp/10095 -> /ip4/127.0.0.1/tcp/5001 ipfshttp.go:159
+INFO  consensus: Raft Leader elected: QmZjSoXUQgJ9tutP1rXjjNYwTrRM9QPhmD9GHVjbtgWxEn raft.go:261
+INFO  consensus: Raft state is catching up raft.go:273
+INFO  consensus: Consensus state is up to date consensus.go:116
+INFO  consensus: Raft Leader elected: QmZjSoXUQgJ9tutP1rXjjNYwTrRM9QPhmD9GHVjbtgWxEn raft.go:261
+INFO  consensus: Raft state is catching up raft.go:273
+INFO    cluster: QmYFYwnFUkjFhJcSJJGN72wwedZnpQQ4aNpAtPZt8g5fCd: joined QmZjSoXUQgJ9tutP1rXjjNYwTrRM9QPhmD9GHVjbtgWxEn's cluster cluster.go:777
+INFO    cluster: Cluster Peers (without including ourselves): cluster.go:450
+INFO    cluster:     - QmZjSoXUQgJ9tutP1rXjjNYwTrRM9QPhmD9GHVjbtgWxEn cluster.go:456
+INFO    cluster: IPFS Cluster is ready cluster.go:461
+INFO     config: Saving configuration config.go:311
 ```

 You can repeat the process with any other nodes.

+You can check the current list of cluster peers and see it shows 2 peers:
+
+```
+node1 > ipfs-cluster-ctl peers ls
+QmYFYwnFUkjFhJcSJJGN72wwedZnpQQ4aNpAtPZt8g5fCd | Sees 1 other peers
+  > Addresses:
+    - /ip4/127.0.0.1/tcp/10096/ipfs/QmYFYwnFUkjFhJcSJJGN72wwedZnpQQ4aNpAtPZt8g5fCd
+    - /ip4/192.168.1.3/tcp/10096/ipfs/QmYFYwnFUkjFhJcSJJGN72wwedZnpQQ4aNpAtPZt8g5fCd
+  > IPFS: Qmaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+    - /ip4/127.0.0.1/tcp/4001/ipfs/Qmaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+    - /ip4/192.168.1.3/tcp/4001/ipfs/Qmaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+QmZjSoXUQgJ9tutP1rXjjNYwTrRM9QPhmD9GHVjbtgWxEn | Sees 1 other peers
+  > Addresses:
+    - /ip4/127.0.0.1/tcp/9096/ipfs/QmZjSoXUQgJ9tutP1rXjjNYwTrRM9QPhmD9GHVjbtgWxEn
+    - /ip4/192.168.1.2/tcp/9096/ipfs/QmZjSoXUQgJ9tutP1rXjjNYwTrRM9QPhmD9GHVjbtgWxEn
+  > IPFS: Qmbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
+    - /ip4/127.0.0.1/tcp/4001/ipfs/Qmbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
+    - /ip4/192.168.1.2/tcp/4001/ipfs/Qmbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
+```
+
 ### Step 2: Remove no longer needed nodes

-You can use `ipfs-cluster-ctl peers rm <multiaddr>` to remove and disconnect any nodes from your cluster. The nodes will be automatically
+You can use `ipfs-cluster-ctl peers rm <peer_id>` to remove and disconnect any nodes from your cluster. The nodes will be automatically
 shutdown. They can be restarted manually and re-added to the Cluster any time:

 ```
-node0> ipfs-cluster-ctl peers rm QmbGFbZVTF3UAEPK9pBVdwHGdDAYkHYufQwSh4k1i8bbbb
-Request succeeded
+node0> ipfs-cluster-ctl peers rm QmYFYwnFUkjFhJcSJJGN72wwedZnpQQ4aNpAtPZt8g5fCd
 ```

-The `node1` is then disconnected and shuts down:
+The `node1` is then disconnected and shuts down, as its logs show:

 ```
-13:42:50.828 WARNI    cluster: this peer has been removed from the Cluster and will shutdown itself in 5 seconds peer_manager.go:48
-13:42:51.828  INFO    cluster: stopping Consensus component consensus.go:257
-13:42:55.836  INFO    cluster: shutting down IPFS Cluster cluster.go:235
-13:42:55.836  INFO    cluster: Saving configuration config.go:283
-13:42:55.837  INFO    cluster: stopping Cluster API rest_api.go:327
-13:42:55.837  INFO    cluster: stopping IPFS Proxy ipfs_http_connector.go:332
-13:42:55.837  INFO    cluster: stopping MapPinTracker map_pin_tracker.go:87
+INFO     config: Saving configuration config.go:311
+INFO    cluster: this peer has been removed and will shutdown cluster.go:386
+INFO    cluster: shutting down Cluster cluster.go:498
+INFO  consensus: stopping Consensus component consensus.go:159
+INFO  consensus: consensus data cleaned consensus.go:400
+INFO    monitor: stopping Monitor peer_monitor.go:161
+INFO    restapi: stopping Cluster API restapi.go:284
+INFO   ipfshttp: stopping IPFS Proxy ipfshttp.go:534
+INFO pintracker: stopping MapPinTracker maptracker.go:116
+INFO     config: Saving configuration config.go:311
 ```
--- a/docs/ipfs-cluster-guide.md
+++ b/docs/ipfs-cluster-guide.md
@ -146,18 +146,18 @@ Each section of the configuration file and the options in it depend on their ass

 When filling in `peers` with some other peers' listening multiaddresses (i.e. `/ip4/192.168.1.103/tcp/9096/ipfs/QmQHKLBXfS7hf8o2acj7FGADoJDLat3UazucbHrgxqisim`), the peer will expect to be part of a cluster with the given `peers`. On boot, it will wait to learn who is the leader (see `raft.wait_for_leader_timeout` option) and sync it's internal state up to the last known state before becoming `ready`.

-If you are using the `peers` configuration value, then **it is very important that the `peers` configuration value in all cluster members is the same for all peers,** that is, contain the multiaddresses for the other peers in the cluster. It may contain this peer's own multiaddress too (but it will removed on the next shutdown). If `peers` is not correct for all peer members, your node might not start or misbehave in not obvious ways.
+If you are using the `peers` configuration value, then **it is very important that the `peers` configuration value in all cluster members is the same for all peers**: it should contain the multiaddresses for the other peers in the cluster. It may contain a peer's own multiaddress too (but it will be removed automatically). If `peers` is not correct for all peer members, your node might not start or misbehave in not obvious ways.

 You are expected to start the majority of the nodes at the same time when using this method. If half of them are not started, they will fail to elect a cluster leader before `raft.wait_for_leader_timeout`. Then they will shut themselves down. If there are peers missing, the cluster will not be in a healthy state (error messages will be displayed). The cluster will operate, as long as a majority of peers is up.

 Alternatively, you can use the `bootstrap` variable to provide one or several bootstrap peers. In short, bootstrapping will use the given peer to request the list of cluster peers and fill-in the `peers` variable automatically. The bootstrapped peer will be, in turn, added to the cluster and made known to every other existing (and connected peer). You can also launch several peers at once, as long as they are bootstrapping from the same already-running-peer. The `--bootstrap` flag allows to provide a bootsrapping peer directly when calling `ipfs-cluster-service`.

-Use the `bootstrap` method only when the rest of the cluster is healthy and all current participating peers are running. If you need to, remove any unhealthy peers with `ipfs-cluster-ctl peers rm <pid>`. Bootstrapping peers should be in a `clean` state, that is, with no previous raft-data loaded.
+Use the `bootstrap` method only when the rest of the cluster is healthy and all current participating peers are running. If you need to, remove any unhealthy peers with `ipfs-cluster-ctl peers rm <pid>`. Bootstrapping peers should be in a `clean` state, that is, with no previous raft-data loaded. If they are not, remove or rename the `~/.ipfs-cluster/ipfs-cluster-data` folder.

 Once a cluster is up, peers are expected to run continuously. You may need to stop a peer, or it may die due to external reasons. The restart-behaviour will depend on whether the peer has left the consensus:

-* The *default* case - peer has not been removed and `cluster.leave_on_shutdown` is `false`: in this case the peer has not left the consensus peerset, and you may start the peer again normally. Do not manually update `cluster.peers`, even if other peers have joined/left the cluster.
-* The *left the cluster* case - peer has been manually removed or `cluster.leave_on_shutdown` is `true`: in this case, unless the peer died, it has probably been removed from the consensus (you can check if its missing from `ipfs-cluster-ctl peers ls` on a running peer). This will mean that the state of the peer has been cleaned up, and the last known `cluster.peers` have been moved to `cluster.bootstrap`. When the peer is restarted, it will attempt to rejoin the cluster from which it was removed by using the addresses in `cluster.bootstrap`.
+* The *default* case - peer has not been removed and `cluster.leave_on_shutdown` is `false`: in this case the peer has not left the consensus peerset, and you may start the peer again normally. Do not manually update `cluster.peers`, even if other peers have been removed from the cluster.
+* The *left the cluster* case - peer has been manually removed or `cluster.leave_on_shutdown` is `true`: in this case, unless the peer died, it has probably been removed from the consensus (you can check if it's missing from `ipfs-cluster-ctl peers ls` on a running peer). This will mean that the state of the peer has been cleaned up (see the Dynamic Cluster Membership considerations below), and the last known `cluster.peers` have been moved to `cluster.bootstrap`. When the peer is restarted, it will attempt to rejoin the cluster from which it was removed by using the addresses in `cluster.bootstrap`.

 Remember that a clean peer bootstrapped to an existing cluster will always fetch the latest state. A shutdown-peer which did not leave the cluster will also catch up with the rest of peers after re-starting. See the next section for more information about the consensus algorithm used by ipfs-cluster.

@ -174,11 +174,11 @@ In order to work, Raft elects a cluster "Leader", which is the only peer allowed

 For example, a commit operation to the log is triggered with  `ipfs-cluster-ctl pin add <cid>`. This will use the peer's API to send a Pin request. The peer will in turn forward the request to the cluster's Leader, which will perform the commit of the operation. This is explained in more detail in the "Pinning an item" section.

-The "peer add" and "peer remove" operations also trigger log entries and fully depend on a healthy consensus status. Modifying the cluster peers is a tricky operation because it requires informing every peer of the new peer set. If a peer is down during this operation, it is likely that it will not learn about it when it comes up again. Thus, it is recommended to bootstrap the peer again when the cluster has changed in the meantime.
+The "peer add" and "peer remove" operations also trigger log entries (internal to Raft) and depend too on a healthy consensus status. Modifying the cluster peers is a tricky operation because it requires informing every peer of the new peer's multiaddresses. If a peer is down during this operation, the operation will fail, as otherwise that peer will not know how to contact the new member. Thus, it is recommended remove and bootstrap any peer that is offline before making changes to the peerset.

-By default, the consensus log data is backed in the `ipfs-cluster-data` subfolder, next to the main configuration file. This folder stores two types of information: the [boltDB] database storing the Raft log, and the state snapshots. Snapshots from the log are performed regularly when the log grows too big (see the `raft` configuration section for options). When a peer is far behind in catching up with the log, Raft may opt to send a snapshot directly, rather than to send every log entry that make up the state individually. This data is initialized on the first start of a cluster peer and maintained throughout its life. Removing the `ipfs-cluster-data` folder effectively resets the peer to a clean state. Only peers with a clean state should bootstrap to already running clusters.
+By default, the consensus log data is backed in the `ipfs-cluster-data` subfolder, next to the main configuration file. This folder stores two types of information: the **boltDB** database storing the Raft log, and the state snapshots. Snapshots from the log are performed regularly when the log grows too big (see the `raft` configuration section for options). When a peer is far behind in catching up with the log, Raft may opt to send a snapshot directly, rather than to send every log entry that makes up the state individually. This data is initialized on the first start of a cluster peer and maintained throughout its life. Removing or renaming the `ipfs-cluster-data` folder effectively resets the peer to a clean state. Only peers with a clean state should bootstrap to already running clusters.

-When running a cluster peer, **it is very important that the consensus data folder does not contain any data from a different cluster setup**, or data from diverging logs. What this essentially means is that different Raft logs should not be mixed. Removing the `ipfs-cluster-data` folder, will destroy all consensus data from the peer, but, as long as the rest of the cluster is running, it will recover last state upon start by fetching it from a different cluster peer.
+When running a cluster peer, **it is very important that the consensus data folder does not contain any data from a different cluster setup**, or data from diverging logs. What this essentially means is that different Raft logs should not be mixed. Removing or renaming the `ipfs-cluster-data` folder, will clean all consensus data from the peer, but, as long as the rest of the cluster is running, it will recover last state upon start by fetching it from a different cluster peer.

 On clean shutdowns, ipfs-cluster peers will save a human-readable state snapshot in `~/.ipfs-cluster/backups`, which can be used to inspect the last known state for that peer. We are working in making those snapshots restorable.

@ -215,13 +215,13 @@ Static clusters expect every member peer to be up and responding. Otherwise, the

 ## Dynamic cluster membership considerations

-We call a dynamic cluster, that in which the set of `cluster.peers` changes. Nodes are bootstrapped to existing cluster peers (`cluster.bootstrap` option), the "peer add" and "peer rm" operations are used and/or the `cluster.leave_on_shutdown` configuration option is enabled. This option allows a node to abandon the consensus membership when shutting down. Thus reducing the cluster size by one.
+We call a dynamic cluster, that in which the set of `cluster.peers` changes. Nodes are bootstrapped to existing cluster peers (`cluster.bootstrap` option), the "peer rm" operation is used and/or the `cluster.leave_on_shutdown` configuration option is enabled. This option allows a node to abandon the consensus membership when shutting down. Thus reducing the cluster size by one.

-Dynamic clusters allow greater flexibility at the cost of stablity. Join and leave operations are tricky as they change the consensus membership and they are likely to create bad situations in unhealthy clusters. Also, bear in mind than removing a peer from the cluster will trigger a re-allocation of the pins that were associated to it. If the replication factor was 1, it is recommended to keep the ipfs daemon running so the content can actually be copied out to a daemon managed by a different peer.
+Dynamic clusters allow greater flexibility at the cost of stablity. Leave and, specially, join operations are tricky as they change the consensus membership. They are likely to fail in unhealthy clusters. All operations modifying the peerset require an elected and working leader. Also, bear in mind that removing a peer from the cluster will trigger a re-allocation of the pins that were associated to it. If the replication factor was 1, it is recommended to keep the ipfs daemon running so the content can actually be copied out to a daemon managed by a different peer.

-Peers joining an existing cluster should have a non-divergent state. Peers leaving a cluster are not expected to re-join it with stale consensus data. For this reason, **the consensus data folder is removed** when a peer leaves the current cluster. Conveniently, a backup of the state is created before doing so and placed in `~/.ipfs-cluster/backups`.
+Peers joining an existing cluster should not have any consensus state (contents in `./ipfs-cluster/ipfs-cluster-data`). Peers leaving a cluster are not expected to re-join it with stale consensus data. For this reason, **the consensus data folder is renamed** when a peer leaves the current cluster. For example, `ipfs-cluster-data` becomes `ipfs-cluster-data.old.0` and so on. Currently, up to 5 copies of the cluster data will be left around, with `old.0` being the most recent, and `old.4` the oldest.

-When a peer leaves or is removed, any existing peers will be saved as `bootstrap` peers, so that it is easier to re-join the cluster. Since the state has been wiped, the peer will be able to re-join and fetch the latest state cleanly. See "The consensus algorithm" and the "Starting your cluster peers" sections above for more information.
+When a peer leaves or is removed, any existing peers will be saved as `bootstrap` peers, so that it is easier to re-join the cluster by simply re-launching it. Since the state has been cleaned, the peer will be able to re-join and fetch the latest state cleanly. See "The consensus algorithm" and the "Starting your cluster peers" sections above for more information.

 This does not mean that there are not possibilities of somehow getting a broken cluster membership. The best way to diagnose it and fix it is to:

@ -233,7 +233,7 @@ This does not mean that there are not possibilities of somehow getting a broken
 * `ipfs-cluster-ctl --enc=json peers ls` provides additional useful information, like the list of peers for every responding peer.
 * In cases were leadership has been lost beyond solution (meaning faulty peers cannot be removed), it is best to stop all peers and restore the state from the backup (currently, a manual operation).

-Remember: if you have a problematic cluster peer trying to join an otherwise working cluster, the safest way is to remove the `ipfs-cluster-data` folder and to set the correct `bootstrap`. The consensus algorithm will then resend the state from scratch.
+Remember: if you have a problematic cluster peer trying to join an otherwise working cluster, the safest way is to rename the `ipfs-cluster-data` folder (keeping it as backup) and to set the correct `bootstrap`. The consensus algorithm will then resend the state from scratch.

 ipfs-cluster will fail to start if `cluster.peers` do not match the current Raft peerset. If the current Raft peerset is correct, you can manually update `cluster.peers`. Otherwise, it is easier to clean and bootstrap.

--- a/ipfs-cluster-ctl/dist/README.md
+++ b/ipfs-cluster-ctl/dist/README.md
@ -19,7 +19,6 @@ You can also obtain command-specific help with `ipfs-cluster-ctl help [cmd]`. Th
 ```
 $ ipfs-cluster-ctl id                                                       # show cluster peer and ipfs daemon information
 $ ipfs-cluster-ctl peers ls                                                 # list cluster peers
-$ ipfs-cluster-ctl peers add /ip4/1.2.3.4/tcp/1234/<peerid>                 # add a new cluster peer
 $ ipfs-cluster-ctl peers rm <peerid>                                        # remove a cluster peer
 $ ipfs-cluster-ctl pin add Qma4Lid2T1F68E3Xa3CpE6vVJDLwxXLD8RfiB9g1Tmqp58   # pins a CID in the cluster
 $ ipfs-cluster-ctl pin rm Qma4Lid2T1F68E3Xa3CpE6vVJDLwxXLD8RfiB9g1Tmqp58    # unpins a CID from the clustre