Issue #192: Update docs to the new peerset handling

Also, start removing mentions of `PeerAdd` operation, as things should
happen with `--bootstrap` (Join). PeerAdd should not be part of the user
workflow and might disappear or be hidden in the future.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
This commit is contained in:
Hector Sanjuan 2017-11-14 20:57:18 +01:00
parent 398865f381
commit 15bb953afd
4 changed files with 98 additions and 99 deletions

View File

@ -79,7 +79,6 @@ This sections gives an overview of how some things work in Cluster. Doubts? Some
### Startup
* Initialize the P2P host.
* Initialize the PeerManager: needs to keep track of cluster peers.
* Initialize Consensus: start looking for a leader asap.
* Setup RPC in all componenets: allow them to communicate with different parts of the cluster.
* Bootstrap: if we are bootstrapping from another node, do the dance (contact, receive cluster peers, join consensus)
@ -125,21 +124,13 @@ Notes:
* If it's via an API request, it involves an RPC request to the cluster main component.
* `Cluster.PeerAdd()`
* Figure out the real multiaddress for that peer (the one we see).
* Let the `PeerManager` component know about the new peer. This adds it to the Libp2p host and notifies any component which needs to know
about peers. This means also ability to perform RPC requests to that peer.
* Broadcast the address to every cluster peer and add it to the host's libp2p peerstore. This gives each member of the cluster the ability to perform RPC requests to that peer.
* Figure out our multiaddress in regard to the new peer (the one it sees to connect to us). This is done with an RPC request and it also
ensure that the peer is up and reachable.
* Trigger a consensus `LogOp` indicating that there is a new peer.
* Send the new peer the list of cluster peers. This is an RPC request to the new peers `PeerManager` which allows to keep a tab on the current cluster peers.
ensures that the peer is up and reachable.
* Add the new peer to the Consensus component. This operation gets forwarded to the leader. Internally, Raft commits a configuration change to the log which contains a new peerset. Every peer gets the new peerset, including the new peer.
* Send the list of peer multiaddresses to the new peer so it host knows how to reach them. This is a remote RPC request to the new peers' `PeerManager.ImportAddresses`
The consensus part has its own complexity:
* As usual, the "add peer" operation is forwarded to the Raft leader.
* The consensus component logs such operation and also uses Raft `AddPeer` method or equivalent.
* This results in two log entries, one internal to Raft which updates the internal Raft peerstore in all peers, and one from Cluster which is
received by the `Apply` method. This `Apply` operation does not modify the shared `State` (like when pinning), but rather only notifies the `PeerManager` about a new peer so the nodes can be set up to talk to it.
As such we are efectively using the consensus log to broadcast a PeerAdd operation. That is because this is critical and should either succeed everywhere or fail. If we performed a regular "for loop broadcast" and some peers fail, we end up with a mess that needs to be cleaned up and uncertain state. By using the consensus log, those peers which did not receive the operation have the oportunity to receive it later (on restart if they were down). There are still pitfalls to this (restoring from snapshot might result in peers with missing cluster peers) but the errors are more obvious and isolated than in other ways.
We use the Consensus' (or rather Raft's) internal peerset as source of truth for the current cluster peers. This is just a list of peer IDs. The associated multiaddresses for those peers are broadcasted to every member. This implies that peer additions are expected to happen on healthy cluster where all peers can learn about the new peer's multiaddresses.
Notes:

View File

@ -4,105 +4,114 @@
This step creates a single-node IPFS Cluster.
First initialize the configuration (see [the **Initialization** section of the
`ipfs-cluster-service` README](../ipfs-cluster-service/dist/README.md#initialization)
for more info on entering a cluster secret):
First, create a secret that we will use for all cluster peers:
```
node0 $ export CLUSTER_SECRET=$(od -vN 32 -An -tx1 /dev/urandom | tr -d ' \n')
node0 $ echo $CLUSTER_SECRET
9a420ec947512b8836d8eb46e1c56fdb746ab8a78015b9821e6b46b38344038f
```
Second initialize the configuration (see [the **Initialization** section of the
`ipfs-cluster-service` README](../ipfs-cluster-service/dist/README.md#initialization) for more details).
```
node0 $ ipfs-cluster-service init
Enter cluster secret (to automatically generate, rerun with --gen-secret): <enter-your-secret>
ipfs-cluster-service configuration written to /home/user/.ipfs-cluster/service.json
INFO config: Saving configuration config.go:311
ipfs-cluster-service configuration written to /home/hector/.ipfs-cluster/service.json
```
Then run cluster:
```
node0> ipfs-cluster-service
13:33:34.044 INFO cluster: IPFS Cluster v0.0.1 listening on: cluster.go:59
13:33:34.044 INFO cluster: /ip4/127.0.0.1/tcp/9096/ipfs/QmQHKLBXfS7hf8o2acj7FGADoJDLat3UazucbHrgxqisim cluster.go:61
13:33:34.044 INFO cluster: /ip4/192.168.1.103/tcp/9096/ipfs/QmQHKLBXfS7hf8o2acj7FGADoJDLat3UazucbHrgxqisim cluster.go:61
13:33:34.044 INFO cluster: starting Consensus and waiting for a leader... consensus.go:163
13:33:34.047 INFO cluster: PinTracker ready map_pin_tracker.go:71
13:33:34.047 INFO cluster: waiting for leader raft.go:118
13:33:34.047 INFO cluster: REST API: /ip4/127.0.0.1/tcp/9094 rest_api.go:309
13:33:34.047 INFO cluster: IPFS Proxy: /ip4/127.0.0.1/tcp/9095 -> /ip4/127.0.0.1/tcp/5001 ipfs_http_connector.go:168
13:33:35.420 INFO cluster: Raft Leader elected: QmQHKLBXfS7hf8o2acj7FGADoJDLat3UazucbHrgxqisim raft.go:145
13:33:35.921 INFO cluster: Consensus state is up to date consensus.go:214
13:33:35.921 INFO cluster: IPFS Cluster is ready cluster.go:191
13:33:35.921 INFO cluster: Cluster Peers (not including ourselves): cluster.go:192
13:33:35.921 INFO cluster: - No other peers cluster.go:195
INFO cluster: IPFS Cluster v0.3.0 listening on: cluster.go:91
INFO cluster: /ip4/127.0.0.1/tcp/9096/ipfs/QmZjSoXUQgJ9tutP1rXjjNYwTrRM9QPhmD9GHVjbtgWxEn cluster.go:93
INFO cluster: /ip4/192.168.1.2/tcp/9096/ipfs/QmZjSoXUQgJ9tutP1rXjjNYwTrRM9QPhmD9GHVjbtgWxEn cluster.go:93
INFO consensus: starting Consensus and waiting for a leader... consensus.go:61
INFO consensus: bootstrapping raft cluster raft.go:108
INFO restapi: REST API: /ip4/127.0.0.1/tcp/9094 restapi.go:266
INFO ipfshttp: IPFS Proxy: /ip4/127.0.0.1/tcp/9095 -> /ip4/127.0.0.1/tcp/5001 ipfshttp.go:159
INFO consensus: Raft Leader elected: QmZjSoXUQgJ9tutP1rXjjNYwTrRM9QPhmD9GHVjbtgWxEn raft.go:261
INFO consensus: Raft state is catching up raft.go:273
INFO consensus: Consensus state is up to date consensus.go:116
INFO cluster: Cluster Peers (without including ourselves): cluster.go:450
INFO cluster: - No other peers cluster.go:452
INFO cluster: IPFS Cluster is ready cluster.go:461
```
### Step 1: Add new members to the cluster
Initialize and run cluster in a different node(s):
Initialize and run cluster in a different node(s), bootstrapping them to the first node:
```
node1> export CLUSTER_SECRET=<copy from node0>
node1> ipfs-cluster-service init
ipfs-cluster-service configuration written to /home/user/.ipfs-cluster/service.json
node1> ipfs-cluster-service
13:36:19.313 INFO cluster: IPFS Cluster v0.0.1 listening on: cluster.go:59
13:36:19.313 INFO cluster: /ip4/127.0.0.1/tcp/7096/ipfs/QmU7JJftGsP1zM8H37XwvfA7dwdepsB7xVnJA6sKpozc85 cluster.go:61
13:36:19.313 INFO cluster: /ip4/192.168.1.103/tcp/7096/ipfs/QmU7JJftGsP1zM8H37XwvfA7dwdepsB7xVnJA6sKpozc85 cluster.go:61
13:36:19.313 INFO cluster: starting Consensus and waiting for a leader... consensus.go:163
13:36:19.316 INFO cluster: REST API: /ip4/127.0.0.1/tcp/7094 rest_api.go:309
13:36:19.316 INFO cluster: IPFS Proxy: /ip4/127.0.0.1/tcp/7095 -> /ip4/127.0.0.1/tcp/5001 ipfs_http_connector.go:168
13:36:19.316 INFO cluster: waiting for leader raft.go:118
13:36:19.316 INFO cluster: PinTracker ready map_pin_tracker.go:71
13:36:20.834 INFO cluster: Raft Leader elected: QmU7JJftGsP1zM8H37XwvfA7dwdepsB7xVnJA6sKpozc85 raft.go:145
13:36:21.334 INFO cluster: Consensus state is up to date consensus.go:214
13:36:21.334 INFO cluster: IPFS Cluster is ready cluster.go:191
13:36:21.334 INFO cluster: Cluster Peers (not including ourselves): cluster.go:192
13:36:21.334 INFO cluster: - No other peers cluster.go:195
```
Add them to the original cluster with `ipfs-cluster-ctl peers add <multiaddr>`. The command will return the ID information of the newly added member:
```
node0> ipfs-cluster-ctl peers add /ip4/192.168.1.103/tcp/7096/ipfs/QmU7JJftGsP1zM8H37XwvfA7dwdepsB7xVnJA6sKpozc85
{
"id": "QmU7JJftGsP1zM8H37XwvfA7dwdepsB7xVnJA6sKpozc85",
"public_key": "CAASpgIwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQDtjpvI+XKVGT5toXTimtWceONYsf/1bbRMxLt/fCSYJoSeJqj0HUtttCD3dcBv1M2rElIMXDhyLUpkET+AN6otr9lQnbgi0ZaKrtzphR0w6g/0EQZZaxI2scxF4NcwkwUfe5ceEmPFwax1+C00nd2BF+YEEp+VHNyWgXhCxncOGO74p0YdXBrvXkyfTiy/567L3PPX9F9x+HiutBL39CWhx9INmtvdPB2HwshodF6QbfeljdAYCekgIrCQC8mXOVeePmlWgTwoge9yQbuViZwPiKwwo1AplANXFmSv8gagfjKL7Kc0YOqcHwxBsoUskbJjfheDZJzl19iDs9EvUGk5AgMBAAE=",
"addresses": [
"/ip4/127.0.0.1/tcp/7096/ipfs/QmU7JJftGsP1zM8H37XwvfA7dwdepsB7xVnJA6sKpozc85",
"/ip4/192.168.1.103/tcp/7096/ipfs/QmU7JJftGsP1zM8H37XwvfA7dwdepsB7xVnJA6sKpozc85"
],
"cluster_peers": [
"/ip4/192.168.123.103/tcp/7096/ipfs/QmU7JJftGsP1zM8H37XwvfA7dwdepsB7xVnJA6sKpozc85"
],
"version": "0.0.1",
"commit": "83baa5c859b9b17b2deec4f782d1210590025c80",
"rpc_protocol_version": "/ipfscluster/0.0.1/rpc",
"ipfs": {
"id": "QmXZrtE5jQwXNqCJMfHUTQkvhQ4ZAnqMnmzFMJfLewur2n",
"addresses": [
"/ip4/127.0.0.1/tcp/4001/ipfs/QmXZrtE5jQwXNqCJMfHUTQkvhQ4ZAnqMnmzFMJfLewur2n",
"/ip4/192.168.1.103/tcp/4001/ipfs/QmXZrtE5jQwXNqCJMfHUTQkvhQ4ZAnqMnmzFMJfLewur2n"
]
}
}
INFO config: Saving configuration config.go:311
ipfs-cluster-service configuration written to /home/hector/.ipfs-cluster/service.json
node1> ipfs-cluster-service --bootstrap /ip4/192.168.1.2/tcp/9096/ipfs/QmZjSoXUQgJ9tutP1rXjjNYwTrRM9QPhmD9GHVjbtgWxEn
INFO cluster: IPFS Cluster v0.3.0 listening on: cluster.go:91
INFO cluster: /ip4/127.0.0.1/tcp/10096/ipfs/QmYFYwnFUkjFhJcSJJGN72wwedZnpQQ4aNpAtPZt8g5fCd cluster.go:93
INFO cluster: /ip4/192.168.1.3/tcp/10096/ipfs/QmYFYwnFUkjFhJcSJJGN72wwedZnpQQ4aNpAtPZt8g5fCd cluster.go:93
INFO consensus: starting Consensus and waiting for a leader... consensus.go:61
INFO consensus: bootstrapping raft cluster raft.go:108
INFO cluster: Bootstrapping to /ip4/127.0.0.1/tcp/9096/ipfs/QmZjSoXUQgJ9tutP1rXjjNYwTrRM9QPhmD9GHVjbtgWxEn cluster.go:471
INFO restapi: REST API: /ip4/127.0.0.1/tcp/10094 restapi.go:266
INFO ipfshttp: IPFS Proxy: /ip4/127.0.0.1/tcp/10095 -> /ip4/127.0.0.1/tcp/5001 ipfshttp.go:159
INFO consensus: Raft Leader elected: QmZjSoXUQgJ9tutP1rXjjNYwTrRM9QPhmD9GHVjbtgWxEn raft.go:261
INFO consensus: Raft state is catching up raft.go:273
INFO consensus: Consensus state is up to date consensus.go:116
INFO consensus: Raft Leader elected: QmZjSoXUQgJ9tutP1rXjjNYwTrRM9QPhmD9GHVjbtgWxEn raft.go:261
INFO consensus: Raft state is catching up raft.go:273
INFO cluster: QmYFYwnFUkjFhJcSJJGN72wwedZnpQQ4aNpAtPZt8g5fCd: joined QmZjSoXUQgJ9tutP1rXjjNYwTrRM9QPhmD9GHVjbtgWxEn's cluster cluster.go:777
INFO cluster: Cluster Peers (without including ourselves): cluster.go:450
INFO cluster: - QmZjSoXUQgJ9tutP1rXjjNYwTrRM9QPhmD9GHVjbtgWxEn cluster.go:456
INFO cluster: IPFS Cluster is ready cluster.go:461
INFO config: Saving configuration config.go:311
```
You can repeat the process with any other nodes.
You can check the current list of cluster peers and see it shows 2 peers:
```
node1 > ipfs-cluster-ctl peers ls
QmYFYwnFUkjFhJcSJJGN72wwedZnpQQ4aNpAtPZt8g5fCd | Sees 1 other peers
> Addresses:
- /ip4/127.0.0.1/tcp/10096/ipfs/QmYFYwnFUkjFhJcSJJGN72wwedZnpQQ4aNpAtPZt8g5fCd
- /ip4/192.168.1.3/tcp/10096/ipfs/QmYFYwnFUkjFhJcSJJGN72wwedZnpQQ4aNpAtPZt8g5fCd
> IPFS: Qmaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
- /ip4/127.0.0.1/tcp/4001/ipfs/Qmaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
- /ip4/192.168.1.3/tcp/4001/ipfs/Qmaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
QmZjSoXUQgJ9tutP1rXjjNYwTrRM9QPhmD9GHVjbtgWxEn | Sees 1 other peers
> Addresses:
- /ip4/127.0.0.1/tcp/9096/ipfs/QmZjSoXUQgJ9tutP1rXjjNYwTrRM9QPhmD9GHVjbtgWxEn
- /ip4/192.168.1.2/tcp/9096/ipfs/QmZjSoXUQgJ9tutP1rXjjNYwTrRM9QPhmD9GHVjbtgWxEn
> IPFS: Qmbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
- /ip4/127.0.0.1/tcp/4001/ipfs/Qmbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
- /ip4/192.168.1.2/tcp/4001/ipfs/Qmbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
```
### Step 2: Remove no longer needed nodes
You can use `ipfs-cluster-ctl peers rm <multiaddr>` to remove and disconnect any nodes from your cluster. The nodes will be automatically
You can use `ipfs-cluster-ctl peers rm <peer_id>` to remove and disconnect any nodes from your cluster. The nodes will be automatically
shutdown. They can be restarted manually and re-added to the Cluster any time:
```
node0> ipfs-cluster-ctl peers rm QmbGFbZVTF3UAEPK9pBVdwHGdDAYkHYufQwSh4k1i8bbbb
Request succeeded
node0> ipfs-cluster-ctl peers rm QmYFYwnFUkjFhJcSJJGN72wwedZnpQQ4aNpAtPZt8g5fCd
```
The `node1` is then disconnected and shuts down:
The `node1` is then disconnected and shuts down, as its logs show:
```
13:42:50.828 WARNI cluster: this peer has been removed from the Cluster and will shutdown itself in 5 seconds peer_manager.go:48
13:42:51.828 INFO cluster: stopping Consensus component consensus.go:257
13:42:55.836 INFO cluster: shutting down IPFS Cluster cluster.go:235
13:42:55.836 INFO cluster: Saving configuration config.go:283
13:42:55.837 INFO cluster: stopping Cluster API rest_api.go:327
13:42:55.837 INFO cluster: stopping IPFS Proxy ipfs_http_connector.go:332
13:42:55.837 INFO cluster: stopping MapPinTracker map_pin_tracker.go:87
INFO config: Saving configuration config.go:311
INFO cluster: this peer has been removed and will shutdown cluster.go:386
INFO cluster: shutting down Cluster cluster.go:498
INFO consensus: stopping Consensus component consensus.go:159
INFO consensus: consensus data cleaned consensus.go:400
INFO monitor: stopping Monitor peer_monitor.go:161
INFO restapi: stopping Cluster API restapi.go:284
INFO ipfshttp: stopping IPFS Proxy ipfshttp.go:534
INFO pintracker: stopping MapPinTracker maptracker.go:116
INFO config: Saving configuration config.go:311
```

View File

@ -146,18 +146,18 @@ Each section of the configuration file and the options in it depend on their ass
When filling in `peers` with some other peers' listening multiaddresses (i.e. `/ip4/192.168.1.103/tcp/9096/ipfs/QmQHKLBXfS7hf8o2acj7FGADoJDLat3UazucbHrgxqisim`), the peer will expect to be part of a cluster with the given `peers`. On boot, it will wait to learn who is the leader (see `raft.wait_for_leader_timeout` option) and sync it's internal state up to the last known state before becoming `ready`.
If you are using the `peers` configuration value, then **it is very important that the `peers` configuration value in all cluster members is the same for all peers,** that is, contain the multiaddresses for the other peers in the cluster. It may contain this peer's own multiaddress too (but it will removed on the next shutdown). If `peers` is not correct for all peer members, your node might not start or misbehave in not obvious ways.
If you are using the `peers` configuration value, then **it is very important that the `peers` configuration value in all cluster members is the same for all peers**: it should contain the multiaddresses for the other peers in the cluster. It may contain a peer's own multiaddress too (but it will be removed automatically). If `peers` is not correct for all peer members, your node might not start or misbehave in not obvious ways.
You are expected to start the majority of the nodes at the same time when using this method. If half of them are not started, they will fail to elect a cluster leader before `raft.wait_for_leader_timeout`. Then they will shut themselves down. If there are peers missing, the cluster will not be in a healthy state (error messages will be displayed). The cluster will operate, as long as a majority of peers is up.
Alternatively, you can use the `bootstrap` variable to provide one or several bootstrap peers. In short, bootstrapping will use the given peer to request the list of cluster peers and fill-in the `peers` variable automatically. The bootstrapped peer will be, in turn, added to the cluster and made known to every other existing (and connected peer). You can also launch several peers at once, as long as they are bootstrapping from the same already-running-peer. The `--bootstrap` flag allows to provide a bootsrapping peer directly when calling `ipfs-cluster-service`.
Use the `bootstrap` method only when the rest of the cluster is healthy and all current participating peers are running. If you need to, remove any unhealthy peers with `ipfs-cluster-ctl peers rm <pid>`. Bootstrapping peers should be in a `clean` state, that is, with no previous raft-data loaded.
Use the `bootstrap` method only when the rest of the cluster is healthy and all current participating peers are running. If you need to, remove any unhealthy peers with `ipfs-cluster-ctl peers rm <pid>`. Bootstrapping peers should be in a `clean` state, that is, with no previous raft-data loaded. If they are not, remove or rename the `~/.ipfs-cluster/ipfs-cluster-data` folder.
Once a cluster is up, peers are expected to run continuously. You may need to stop a peer, or it may die due to external reasons. The restart-behaviour will depend on whether the peer has left the consensus:
* The *default* case - peer has not been removed and `cluster.leave_on_shutdown` is `false`: in this case the peer has not left the consensus peerset, and you may start the peer again normally. Do not manually update `cluster.peers`, even if other peers have joined/left the cluster.
* The *left the cluster* case - peer has been manually removed or `cluster.leave_on_shutdown` is `true`: in this case, unless the peer died, it has probably been removed from the consensus (you can check if its missing from `ipfs-cluster-ctl peers ls` on a running peer). This will mean that the state of the peer has been cleaned up, and the last known `cluster.peers` have been moved to `cluster.bootstrap`. When the peer is restarted, it will attempt to rejoin the cluster from which it was removed by using the addresses in `cluster.bootstrap`.
* The *default* case - peer has not been removed and `cluster.leave_on_shutdown` is `false`: in this case the peer has not left the consensus peerset, and you may start the peer again normally. Do not manually update `cluster.peers`, even if other peers have been removed from the cluster.
* The *left the cluster* case - peer has been manually removed or `cluster.leave_on_shutdown` is `true`: in this case, unless the peer died, it has probably been removed from the consensus (you can check if it's missing from `ipfs-cluster-ctl peers ls` on a running peer). This will mean that the state of the peer has been cleaned up (see the Dynamic Cluster Membership considerations below), and the last known `cluster.peers` have been moved to `cluster.bootstrap`. When the peer is restarted, it will attempt to rejoin the cluster from which it was removed by using the addresses in `cluster.bootstrap`.
Remember that a clean peer bootstrapped to an existing cluster will always fetch the latest state. A shutdown-peer which did not leave the cluster will also catch up with the rest of peers after re-starting. See the next section for more information about the consensus algorithm used by ipfs-cluster.
@ -174,11 +174,11 @@ In order to work, Raft elects a cluster "Leader", which is the only peer allowed
For example, a commit operation to the log is triggered with `ipfs-cluster-ctl pin add <cid>`. This will use the peer's API to send a Pin request. The peer will in turn forward the request to the cluster's Leader, which will perform the commit of the operation. This is explained in more detail in the "Pinning an item" section.
The "peer add" and "peer remove" operations also trigger log entries and fully depend on a healthy consensus status. Modifying the cluster peers is a tricky operation because it requires informing every peer of the new peer set. If a peer is down during this operation, it is likely that it will not learn about it when it comes up again. Thus, it is recommended to bootstrap the peer again when the cluster has changed in the meantime.
The "peer add" and "peer remove" operations also trigger log entries (internal to Raft) and depend too on a healthy consensus status. Modifying the cluster peers is a tricky operation because it requires informing every peer of the new peer's multiaddresses. If a peer is down during this operation, the operation will fail, as otherwise that peer will not know how to contact the new member. Thus, it is recommended remove and bootstrap any peer that is offline before making changes to the peerset.
By default, the consensus log data is backed in the `ipfs-cluster-data` subfolder, next to the main configuration file. This folder stores two types of information: the [boltDB] database storing the Raft log, and the state snapshots. Snapshots from the log are performed regularly when the log grows too big (see the `raft` configuration section for options). When a peer is far behind in catching up with the log, Raft may opt to send a snapshot directly, rather than to send every log entry that make up the state individually. This data is initialized on the first start of a cluster peer and maintained throughout its life. Removing the `ipfs-cluster-data` folder effectively resets the peer to a clean state. Only peers with a clean state should bootstrap to already running clusters.
By default, the consensus log data is backed in the `ipfs-cluster-data` subfolder, next to the main configuration file. This folder stores two types of information: the **boltDB** database storing the Raft log, and the state snapshots. Snapshots from the log are performed regularly when the log grows too big (see the `raft` configuration section for options). When a peer is far behind in catching up with the log, Raft may opt to send a snapshot directly, rather than to send every log entry that makes up the state individually. This data is initialized on the first start of a cluster peer and maintained throughout its life. Removing or renaming the `ipfs-cluster-data` folder effectively resets the peer to a clean state. Only peers with a clean state should bootstrap to already running clusters.
When running a cluster peer, **it is very important that the consensus data folder does not contain any data from a different cluster setup**, or data from diverging logs. What this essentially means is that different Raft logs should not be mixed. Removing the `ipfs-cluster-data` folder, will destroy all consensus data from the peer, but, as long as the rest of the cluster is running, it will recover last state upon start by fetching it from a different cluster peer.
When running a cluster peer, **it is very important that the consensus data folder does not contain any data from a different cluster setup**, or data from diverging logs. What this essentially means is that different Raft logs should not be mixed. Removing or renaming the `ipfs-cluster-data` folder, will clean all consensus data from the peer, but, as long as the rest of the cluster is running, it will recover last state upon start by fetching it from a different cluster peer.
On clean shutdowns, ipfs-cluster peers will save a human-readable state snapshot in `~/.ipfs-cluster/backups`, which can be used to inspect the last known state for that peer. We are working in making those snapshots restorable.
@ -215,13 +215,13 @@ Static clusters expect every member peer to be up and responding. Otherwise, the
## Dynamic cluster membership considerations
We call a dynamic cluster, that in which the set of `cluster.peers` changes. Nodes are bootstrapped to existing cluster peers (`cluster.bootstrap` option), the "peer add" and "peer rm" operations are used and/or the `cluster.leave_on_shutdown` configuration option is enabled. This option allows a node to abandon the consensus membership when shutting down. Thus reducing the cluster size by one.
We call a dynamic cluster, that in which the set of `cluster.peers` changes. Nodes are bootstrapped to existing cluster peers (`cluster.bootstrap` option), the "peer rm" operation is used and/or the `cluster.leave_on_shutdown` configuration option is enabled. This option allows a node to abandon the consensus membership when shutting down. Thus reducing the cluster size by one.
Dynamic clusters allow greater flexibility at the cost of stablity. Join and leave operations are tricky as they change the consensus membership and they are likely to create bad situations in unhealthy clusters. Also, bear in mind than removing a peer from the cluster will trigger a re-allocation of the pins that were associated to it. If the replication factor was 1, it is recommended to keep the ipfs daemon running so the content can actually be copied out to a daemon managed by a different peer.
Dynamic clusters allow greater flexibility at the cost of stablity. Leave and, specially, join operations are tricky as they change the consensus membership. They are likely to fail in unhealthy clusters. All operations modifying the peerset require an elected and working leader. Also, bear in mind that removing a peer from the cluster will trigger a re-allocation of the pins that were associated to it. If the replication factor was 1, it is recommended to keep the ipfs daemon running so the content can actually be copied out to a daemon managed by a different peer.
Peers joining an existing cluster should have a non-divergent state. Peers leaving a cluster are not expected to re-join it with stale consensus data. For this reason, **the consensus data folder is removed** when a peer leaves the current cluster. Conveniently, a backup of the state is created before doing so and placed in `~/.ipfs-cluster/backups`.
Peers joining an existing cluster should not have any consensus state (contents in `./ipfs-cluster/ipfs-cluster-data`). Peers leaving a cluster are not expected to re-join it with stale consensus data. For this reason, **the consensus data folder is renamed** when a peer leaves the current cluster. For example, `ipfs-cluster-data` becomes `ipfs-cluster-data.old.0` and so on. Currently, up to 5 copies of the cluster data will be left around, with `old.0` being the most recent, and `old.4` the oldest.
When a peer leaves or is removed, any existing peers will be saved as `bootstrap` peers, so that it is easier to re-join the cluster. Since the state has been wiped, the peer will be able to re-join and fetch the latest state cleanly. See "The consensus algorithm" and the "Starting your cluster peers" sections above for more information.
When a peer leaves or is removed, any existing peers will be saved as `bootstrap` peers, so that it is easier to re-join the cluster by simply re-launching it. Since the state has been cleaned, the peer will be able to re-join and fetch the latest state cleanly. See "The consensus algorithm" and the "Starting your cluster peers" sections above for more information.
This does not mean that there are not possibilities of somehow getting a broken cluster membership. The best way to diagnose it and fix it is to:
@ -233,7 +233,7 @@ This does not mean that there are not possibilities of somehow getting a broken
* `ipfs-cluster-ctl --enc=json peers ls` provides additional useful information, like the list of peers for every responding peer.
* In cases were leadership has been lost beyond solution (meaning faulty peers cannot be removed), it is best to stop all peers and restore the state from the backup (currently, a manual operation).
Remember: if you have a problematic cluster peer trying to join an otherwise working cluster, the safest way is to remove the `ipfs-cluster-data` folder and to set the correct `bootstrap`. The consensus algorithm will then resend the state from scratch.
Remember: if you have a problematic cluster peer trying to join an otherwise working cluster, the safest way is to rename the `ipfs-cluster-data` folder (keeping it as backup) and to set the correct `bootstrap`. The consensus algorithm will then resend the state from scratch.
ipfs-cluster will fail to start if `cluster.peers` do not match the current Raft peerset. If the current Raft peerset is correct, you can manually update `cluster.peers`. Otherwise, it is easier to clean and bootstrap.

View File

@ -19,7 +19,6 @@ You can also obtain command-specific help with `ipfs-cluster-ctl help [cmd]`. Th
```
$ ipfs-cluster-ctl id # show cluster peer and ipfs daemon information
$ ipfs-cluster-ctl peers ls # list cluster peers
$ ipfs-cluster-ctl peers add /ip4/1.2.3.4/tcp/1234/<peerid> # add a new cluster peer
$ ipfs-cluster-ctl peers rm <peerid> # remove a cluster peer
$ ipfs-cluster-ctl pin add Qma4Lid2T1F68E3Xa3CpE6vVJDLwxXLD8RfiB9g1Tmqp58 # pins a CID in the cluster
$ ipfs-cluster-ctl pin rm Qma4Lid2T1F68E3Xa3CpE6vVJDLwxXLD8RfiB9g1Tmqp58 # unpins a CID from the clustre