Documentation: bring in line to 0.3.0

Review documentation to be in line with latest updates to Raft and
any other feature introduced since 0.12.0.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
This commit is contained in:
Hector Sanjuan 2017-11-02 20:19:17 +01:00
parent 5a7c1fc847
commit 2c3085586c
4 changed files with 127 additions and 88 deletions

View File

@ -56,7 +56,7 @@ You can download pre-compiled binaries for your platform from the [dist.ipfs.io]
* [Builds for `ipfs-cluster-service`](https://dist.ipfs.io/#ipfs-cluster-service)
* [Builds for `ipfs-cluster-ctl`](https://dist.ipfs.io/#ipfs-cluster-ctl)
Note that since IPFS Cluster is evolving fast, the these builds may not contain the latest features/bugfixes as they are updated only bi-weekly.
Note that since IPFS Cluster is evolving fast, the these builds may not contain the latest features/bugfixes as they are updated on a best-effort basis.
### Docker
@ -98,6 +98,8 @@ This will install `ipfs-cluster-service` and `ipfs-cluster-ctl` in your `$GOPATH
### Quickstart
** Remember: Start your ipfs daemon before running ipfs-cluster **
**`ipfs-cluster-service`** runs an ipfs-cluster peer:
- Initialize with `ipfs-cluster-service init`

View File

@ -12,6 +12,7 @@ These definitions are still evolving and may change:
* RPC API: the internal API that cluster peers and components use.
* Go API: the public interface offered by the Cluster object in Go.
* Component: an ipfs-cluster module which performs specific functions and uses the RPC API to communicate with other parts of the system. Implements the Component interface.
* Consensus: The Consensus component, specifically Raft.
## General overview
@ -29,13 +30,13 @@ These definitions are still evolving and may change:
* The definitions of components and their interfaces and related types (`ipfscluster.go`)
* The **Cluster** main-component which binds together the whole system and offers the Go API (`cluster.go`). This component takes an arbitrary:
* **API**: a component which offers a public facing API. Default: `RESTAPI`
* **IPFSConnector**: a component which talks to the IPFS daemon and provides a proxy to it. Default: `IPFSHTTPConnector`
* **State**: a component which keeps a list of Pins (maintained by the Consensus component)
* **PinTracker**: a component which tracks the pin set, makes sure that it is persisted by IPFS daemon as intended. Default: `MapPinTracker`
* **PeerMonitor**: a component to log metrics and detect peer failures. Default: `StdPeerMonitor`
* **PinAllocator**: a component to decide which peers should pin a CID given some metrics. Default: `NumPinAllocator`
* **Informer**: a component to collect metrics which are used by the `PinAllocator` and the `PeerMonitor`. Default: `NumPin`
* The **Consensus** component. This component is separate but internal to Cluster in the sense that it cannot be provided arbitrarily during initialization. The consensus component uses `go-libp2p-raft` via `go-libp2p-consensus`. While it is attempted to be agnostic from the underlying consensus implementation, it is not possible in all places. These places are however well marked.
* **IPFSConnector**: a component which talks to the IPFS daemon and provides a proxy to it. Default: `ipfshttp`
* **State**: a component which keeps a list of Pins (maintained by the Consensus component). Default: `mapstate`
* **PinTracker**: a component which tracks the pin set, makes sure that it is persisted by IPFS daemon as intended. Default: `maptracker`
* **PeerMonitor**: a component to log metrics and detect peer failures. Default: `basic`
* **PinAllocator**: a component to decide which peers should pin a CID given some metrics. Default: `descendalloc`
* **Informer**: a component to collect metrics which are used by the `PinAllocator` and the `PeerMonitor`. Default: `disk`
* The **Consensus** component. This component is separate but internal to Cluster in the sense that it cannot be provided arbitrarily during initialization. The consensus component uses `go-libp2p-raft` via `go-libp2p-consensus`. While it is attempted to be agnostic from the underlying consensus implementation, it is not possible in all places. These places are however well marked (everything that calls `Leader()`).
Components perform a number of functions and need to be able to communicate with eachothers: i.e.:
@ -49,7 +50,7 @@ Communication between components happens through the RPC API: a set of functions
The RPC API uses `go-libp2p-gorpc`. The main Cluster component runs an RPC server. RPC Clients are provided to all components for their use. The main feature of this setup is that **Components can use `go-libp2p-gorpc` to perform operations in the local cluster and in any remote cluster node using the same API**.
This makes broadcasting operations and contacting the Cluster leader really easy. It also allows to think of a future where components may be completely arbitrary and run from different applications. Local RPC calls, on their side, do not suffer any penalty as the execution is short-cut directly to the server component of the Cluster, without network intervention.
This makes broadcasting operations and contacting the Cluster leader really easy. It also allows to think of a future where components may be completely arbitrary and run from different applications. Local RPC calls, on their side, do not suffer any penalty as the execution is short-cut directly to the correspondant component of the Cluster, without network intervention.
On the down-side, the RPC API involves "reflect" magic and it is not easy to verify that a call happens to a method registered on the RPC server. Every RPC-based functionality should be tested. Bad operations will result in errors so they are easy to catch on tests.
@ -57,6 +58,10 @@ On the down-side, the RPC API involves "reflect" magic and it is not easy to ver
Components are organized in different submodules (i.e. `pintracker/maptracker` represents component `PinTracker` and implementation `MapPinTracker`). Interfaces for all components are on the base module. Executables (`ipfs-cluster-service` and `ipfs-cluster-ctl` are also submodules to the base module).
### Configuration
A `config` module provides support for a central configuration file which provides configuration sections defined by each component by providing configuration objects which implementing a `ComponentConfig` interface.
## Applications
### `ipfs-cluster-service`
@ -117,7 +122,7 @@ Notes:
### Adding a peer
* If it's an API requests, it involves an RPC request to the cluster main component.
* If it's via an API request, it involves an RPC request to the cluster main component.
* `Cluster.PeerAdd()`
* Figure out the real multiaddress for that peer (the one we see).
* Let the `PeerManager` component know about the new peer. This adds it to the Libp2p host and notifies any component which needs to know
@ -146,4 +151,3 @@ Notes:
See: https://ipfs.io/ipfs/QmWhbV7KX9toZbi4ycj6J9GVbTVvzGx5ERffc6ymYLT5HS
They need to be updated but they are mostly accurate.

View File

@ -1,20 +1,20 @@
# A guide to running ipfs-cluster
Revised for version `0.0.12`.
Revised for version `0.3.0`.
## Index
0. Introduction, definitions and other useful documentation
1. Installation and deployment
2. The configuration file
3. Starting your cluster
3. Starting your cluster peers
4. The consensus algorithm
5. The shared state, the local state and the ipfs state
6. Static cluster membership considerations
7. Dynamic cluster membership considerations
8. Pinning an item
9. Unpinning an item
10. Cluster monitoring and failover
10. Cluster monitoring and pin failover
11. Using the IPFS-proxy
12. Composite clusters
13. Security
@ -29,7 +29,7 @@ This guide aims to collect useful considerations and information for running an
* [The main README](../README.md)
* [The `ipfs-cluster-service` README](../ipfs-cluster-service/dist/README.md)
* [The `ipfs-cluster-ctl` README](../ipfs-cluster-ctl/dist/README.md)
* [The How to Build and Update Guide](../HOWTO_build_and_update_a_cluster.md)
* [The "How to Build and Update a Cluster" guide](../HOWTO_build_and_update_a_cluster.md)
* [ipfs-cluster architecture overview](../architecture.md)
* [Godoc documentation for ipfs-cluster](https://godoc.org/github.com/ipfs/ipfs-cluster)
@ -47,44 +47,49 @@ This guide aims to collect useful considerations and information for running an
There are several ways to install ipfs-cluster. They are described in the main README and summarized here:
* Download the repository and run `make install`.
* Run the [docker ipfs/ipfs-cluster container](https://hub.docker.com/r/ipfs/ipfs-cluster/). The container includes and runs ipfs.
* Download pre-built binaries for your platform at [dist.ipfs.io](https://dist.ipfs.io). Note that we test on Linux and ARM. We're happy to hear if other platforms are working or not. These builds may be slightly outdated compared to Docker.
* Install from the [snapcraft.io](https://snapcraft.io) store: `sudo snap install ipfs-cluster --edge`. Note that there is no stable snap yet.
* 1. Download the repository and run `make install`.
* 2. Run the [docker ipfs/ipfs-cluster container](https://hub.docker.com/r/ipfs/ipfs-cluster/). The container includes and runs ipfs.
* 3. Download pre-built binaries for your platform at [dist.ipfs.io](https://dist.ipfs.io). Note that we test on Linux and ARM. We're happy to hear if other platforms are working or not.
* 4. Install from the [snapcraft.io](https://snapcraft.io) store: `sudo snap install ipfs-cluster --edge`. Note that there is no stable snap yet.
You can deploy cluster in the way which fits you best, as both ipfs and ipfs-cluster have no dependencies. There are some [Ansible roles](https://github.com/hsanjuan/ansible-ipfs-cluster) available to help you.
We try to keep `master` stable. It will always have the latest bugfixes but it may also incorporate behaviour changes without warning. The pre-built binaries are only provided for tagged versions. They are stable builds to the best of our ability, but they may still contain serious bugs which have only been fixed in `master`. It is recommended to check the [`CHANGELOG`](../CHANGELOG.md) to get an overview of past changes, and any open [release-tagged issues](https://github.com/ipfs/ipfs-cluster/issues?utf8=%E2%9C%93&q=is%3Aissue%20label%3Arelease) to get an overview of what's coming in the next release.
## The configuration file
The ipfs-cluster configuration file is usually found at `~/.ipfs-cluster/service.json`. It holds all the configurable options for cluster and its different components.
The ipfs-cluster configuration file is usually found at `~/.ipfs-cluster/service.json`. It holds all the configurable options for cluster and its different components. The configuration file is divided in sections. Each section represents a component. Each item inside the section represents an implementation of that component and contains specific options. For more information on components, check the [ipfs-cluster architecture overview](../architecture.md).
A default configuration file can be generated with `ipfs-cluster-service init`. It is recommended that you re-create the configuration file after an upgrade, to make sure that you are up to date with any new options. The `-c` option to specify a different configuration folder path allows to create a default configuration in a temporary folder, which you can then compare with the existing one.
A default configuration file can be generated with `ipfs-cluster-service init`. It is recommended that you re-create the configuration file after an upgrade, to make sure that you are up to date with any new options. The `-c` option can be used to specify a different configuration folder path, and allows to create a default configuration in a temporary folder, which you can then compare with the existing one.
The `cluster` section of the configuration stores a `secret`: a 32 byte (hex-encoded) key which **must be shared by all cluster peers**. Using an empty key has security implications (see the Security section). Using different keys will prevent different peers from talking to each other.
The `cluster` section of the configuration stores a `secret`: a 32 byte (hex-encoded) key which **must be shared by all cluster peers**. Using an empty key has security implications (see the Security section below). Using different keys will prevent different peers from talking to each other.
Each section of the configuration file and the options in it depend on their associated component. We offer here a quick reference of the configuration format:
```json
```
{
"cluster": { // main cluster component configuration
"cluster": { // main cluster component configuration
"id": "QmZyXksFG3vmLdAnmkXreMVZvxc4sNi1u21VxbRdNa2S1b", // peer ID
"private_key": "<base64 representation of the key>",
"secret": "<32-bit hex encoded secret>",
"peers": [], // List of peers' multiaddresses
"bootstrap": [], // List of bootstrap peers' multiaddresses
"leave_on_shutdown": false, // Abandon consensus on exit
"listen_multiaddress": "/ip4/0.0.0.0/tcp/9096", // Cluster RPC listen
"state_sync_interval": "1m0s", // Time between state syncs
"ipfs_sync_interval": "2m10s", // Time between ipfs-state syncs
"replication_factor": -1, // Replication factor. -1 == all
"monitor_ping_interval": "15s" // Time between alive-pings
"peers": [], // List of peers' multiaddresses
"bootstrap": [], // List of bootstrap peers' multiaddresses
"leave_on_shutdown": false, // Abandon cluster on shutdown
"listen_multiaddress": "/ip4/0.0.0.0/tcp/9096", // Cluster RPC listen
"state_sync_interval": "1m0s", // Time between state syncs
"ipfs_sync_interval": "2m10s", // Time between ipfs-state syncs
"replication_factor": -1, // Replication factor. -1 == all
"monitor_ping_interval": "15s" // Time between alive-pings. See cluster monitoring section
},
"consensus": {
"raft": { // Raft options, see hashicorp/raft.Configuration
"data_folder": "<custom consensus data folder>"
"heartbeat_timeout": "1s",
"election_timeout": "1s",
"raft": {
"wait_for_leader_timeout": "15s", // How long to wait for a leader when there is none
"network_timeout": "10s", // How long to wait before timing out a network operation
"commit_retries": 1, // How many retries should we make before giving up on a commit failure
"commit_retry_delay": "200ms", // How long to wait between commit retries
"heartbeat_timeout": "1s", // Here and below: Raft options.
"election_timeout": "1s", // See https://godoc.org/github.com/hashicorp/raft#Config
"commit_timeout": "50ms",
"max_append_entries": 64,
"trailing_logs": 10240,
@ -95,22 +100,24 @@ Each section of the configuration file and the options in it depend on their ass
},
"api": {
"restapi": {
"listen_multiaddress": "/ip4/127.0.0.1/tcp/9094", // API listen
"read_timeout": "30s",
"listen_multiaddress": "/ip4/127.0.0.1/tcp/9094", // API listen
"ssl_cert_file": "path_to_certificate", // Path to SSL public certificate. Unless absolute, relative to config folder
"ssl_key_file": "path_to_key", // Path to SSL private key. Unless absolute, relative to config folder
"read_timeout": "30s", // Here and below, timeoutes for network operations
"read_header_timeout": "5s",
"write_timeout": "1m0s",
"idle_timeout": "2m0s",
"basic_auth_credentials": [ // leave null for no-basic-auth
"basic_auth_credentials": [ // Leave null for no-basic-auth
"user": "pass"
]
}
},
"ipfs_connector": {
"ipfshttp": {
"proxy_listen_multiaddress": "/ip4/127.0.0.1/tcp/9095", // ipfs-proxy
"node_multiaddress": "/ip4/127.0.0.1/tcp/5001", // ipfs-node location
"connect_swarms_delay": "7s",
"proxy_read_timeout": "10m0s",
"proxy_listen_multiaddress": "/ip4/127.0.0.1/tcp/9095", // ipfs-proxy listen address
"node_multiaddress": "/ip4/127.0.0.1/tcp/5001", // ipfs-node api location
"connect_swarms_delay": "7s", // after boot, how long to wait before trying to connect ipfs peers
"proxy_read_timeout": "10m0s", // Here and below, timeouts for network operations
"proxy_read_header_timeout": "5s",
"proxy_write_timeout": "10m0s",
"proxy_idle_timeout": "1m0s"
@ -118,34 +125,43 @@ Each section of the configuration file and the options in it depend on their ass
},
"monitor": {
"monbasic": {
"check_interval": "15s" // how often to check for expired metrics/alert
"check_interval": "15s" // How often to check for expired alerts. See cluster monitoring section
}
},
"informer": {
"disk": {
"metric_ttl": "30s", // determines how often metric is updated
"metric_type": "freespace" // or "reposize"
"disk": { // Used when using the disk informer (default)
"metric_ttl": "30s", // Amount of time this metric is valid. Will be polled at TTL/2.
"metric_type": "freespace" // or "reposize": type of metric
},
"numpin": {
"metric_ttl": "10s"
"numpin": { // Used when using the numpin informer
"metric_ttl": "10s" // Amount of time this metric is valid. Will be polled at TTL/2.
}
}
}
```
## Starting your cluster
## Starting your cluster peers
`ipfs-cluster-service` will launch your cluster peer. If you have not configured any `cluster.peers` in the configuration, nor any `cluster.bootstrap` addresses, a single-peer cluster will be launched.
When filling in `peers` with some other peers' listening multiaddresses (i.e. `/ip4/192.168.1.103/tcp/9096/ipfs/QmQHKLBXfS7hf8o2acj7FGADoJDLat3UazucbHrgxqisim`), the initialization process will consist in joining the current cluster's consensus, waiting for a leader (either to learn or to elect one), and syncing up to the last known state.
When filling in `peers` with some other peers' listening multiaddresses (i.e. `/ip4/192.168.1.103/tcp/9096/ipfs/QmQHKLBXfS7hf8o2acj7FGADoJDLat3UazucbHrgxqisim`), the peer will expect to be part of a cluster with the given `peers`. On boot, it will wait to learn who is the leader (see `raft.wait_for_leader_timeout` option) and sync it's internal state up to the last known state before becoming `ready`.
If you are using the `peers` configuration value, then **it is very important that the `peers` configuration value in all cluster members has the same value.** If not, your node will misbehave in not obvious ways. You can start all nodes at once when using this method. If they are not started at once, a node will be missing from the cluster, and since other peers expect it to be online, the cluster will not be in a healthy state (although it will operate if at least half of the peers are running).
If you are using the `peers` configuration value, then **it is very important that the `peers` configuration value in all cluster members is the same for all peers,** that is, contain the multiaddresses for the other peers in the cluster. It may contain this peer's own multiaddress too (but it will removed on the next shutdown). If `peers` is not correct for all peer members, your node might not start or misbehave in not obvious ways.
Alternatively, you can use the `bootstrap` variable to provide one or several bootstrap peers. Bootstrapping will use the given peer to request the list of cluster peers and fill-in the `peers` variable automatically. The bootstrapped peer will be, in turn, added to the cluster and made known to every other existing (and connected peer).
You are expected to start nodes at the same time when using this method. If half of them are not started, they will fail to elect a cluster leader. If there are peers missing, the cluster will not be in a healthy state (error messages will be displayed). The cluster will operate, as long as a majority of peers is up.
Use the `bootstrap` method only when the rest of the cluster is healthy and all peers are running. You can also launch several peers at once, as long as they are bootstrapping from the same already-running-peer. The `--bootstrap` flag allows to provide a bootsrapping peer directly when calling `ipfs-cluster-service`.
Alternatively, you can use the `bootstrap` variable to provide one or several bootstrap peers. In short, bootstrapping will use the given peer to request the list of cluster peers and fill-in the `peers` variable automatically. The bootstrapped peer will be, in turn, added to the cluster and made known to every other existing (and connected peer). You can also launch several peers at once, as long as they are bootstrapping from the same already-running-peer. The `--bootstrap` flag allows to provide a bootsrapping peer directly when calling `ipfs-cluster-service`.
If the startup initialization fails, `ipfs-cluster-service` will exit automatically (after 30 seconds). Pay attention to the INFO messages during startup. When ipfs-cluster is ready, a message will indicate it along with a list of peers.
Use the `bootstrap` method only when the rest of the cluster is healthy and all peers are running. Bootstrapping peers should be in a `clean` state, that is, with no previous raft-data loaded.
Once a cluster is up, peers are expected to run continiuosly. You may need to stop a peer, or it may die due to external reasons. The restart-behaviour will depend on whether the peer has left the consensus:
* The *default* case - peer has not been removed and `cluster.leave_on_shutdown` is `false`: in this case the peer has not left the consensus peerset, and you may start the peer again normally. Do not manually update `cluster.peers`, even if other peers have joined/left the cluster.
* The *left the cluster* case - peer has been manually removed or `cluster.leave_on_shutdown` is `true`: in this case, unless the peer died, it has probably been removed from the consensus (you can check if its missing from `ipfs-cluster-ctl peers ls` on a running peer). This will mean that the state of the peer has been cleaned up, and the last known `cluster.peers` have been moved to `cluster.bootstrap`. When the peer is restarted, it will attempt to rejoin the cluster from which it was removed by using the addresses in `cluster.bootstrap`.
Remember that a clean peer bootstrapped to an existing cluster will always fetch the latest state. A shutdown-peer which did not leave the cluster will also catch up with the rest of peers after re-starting. See the next section for more information about the consensus algorithm used by ipfs-cluster.
If the startup initialization fails, `ipfs-cluster-service` will exit automatically after a few seconds. Pay attention to the INFO and ERROR messages during startup. When ipfs-cluster is ready, a message will indicate it along with a list of peers.
## The consensus algoritm
@ -160,11 +176,11 @@ For example, a commit operation to the log is triggered with `ipfs-cluster-ctl
The "peer add" and "peer remove" operations also trigger log entries and fully depend on a healthy consensus status. Modifying the cluster peers is a tricky operation because it requires informing every peer of the new peer set. If a peer is down during this operation, it is likely that it will not learn about it when it comes up again. Thus, it is recommended to bootstrap the peer again when the cluster has changed in the meantime.
By default the consensus log data is backed in the `ipfs-cluster-data` subfolder, next to the main configuration file. This folder stores two types of information: the [boltDB] database storing the Raft log, and the state snapshots. Snapshots from the log are performed regularly when the log grows too big (or on shutdown). When a peer is far behind in catching up with the log, Raft may opt to send a snapshot directly, rather than to send every log entry that make up the state individually.
By default, the consensus log data is backed in the `ipfs-cluster-data` subfolder, next to the main configuration file. This folder stores two types of information: the [boltDB] database storing the Raft log, and the state snapshots. Snapshots from the log are performed regularly when the log grows too big (see the `raft` configuration section for options). When a peer is far behind in catching up with the log, Raft may opt to send a snapshot directly, rather than to send every log entry that make up the state individually. This data is initialized on the first start of a cluster peer and maintained throughout its life. Removing the `ipfs-cluster-data` folder effectively resets the peer to a clean state. Only peers with a clean state should bootstrap to already running clusters.
When running a cluster peer, **it is very important that the consensus data folder does not contain any data from a different cluster setup**, or data from diverging logs. What this essentially means is that different Raft logs should not be mixed. Removing the `ipfs-cluster-data` folder, will destroy all consensus data from the peer, but, as long as the rest of the cluster is running, it will recover last state upon start by fetching it from a different cluster peer.
On clean shutdowns, ipfs-cluster peers will save a human-readable state snapshot in the `~/.ipfs-cluster/backups` folder, which can be used to inspect the last known state for that peer.
On clean shutdowns, ipfs-cluster peers will save a human-readable state snapshot in `~/.ipfs-cluster/backups`, which can be used to inspect the last known state for that peer. We are working in making those snapshots restorable.
## The shared state, the local state and the ipfs state
@ -182,39 +198,46 @@ In normal operation, all three states are in sync, as updates to the *shared sta
* `ipfs-cluster-ctl status` shows information about the *local state* in every cluster peer. It does so by aggregating local state information received from every cluster member.
`ipfs-cluster-ctl sync` makes sure that the *local state* matches the *ipfs state*. In other words, it makes sure that what cluster expects to be pinned is actually pinned in ipfs. As mentioned, this also happens automatically. Every sync operations triggers an `ipfs pin ls --type=recursive` call to the local node. Depending on the size of your pinset, you may adjust the interval between sync operations using the `ipfs_sync_seconds` configuration variable.
`ipfs-cluster-ctl sync` makes sure that the *local state* matches the *ipfs state*. In other words, it makes sure that what cluster expects to be pinned is actually pinned in ipfs. As mentioned, this also happens automatically. Every sync operations triggers an `ipfs pin ls --type=recursive` call to the local node.
Depending on the size of your pinset, you may adjust the interval between the different sync operations using the `cluster.state_sync_interval` and `cluster.ipfs_sync_interval` configuration options.
As a final note, the *local state* may show items in *error*. This happens when an item took too long to pin/unpin, or the ipfs daemon became unavailable. `ipfs-cluster-ctl recover <cid>` can be used to rescue these items. See the "Pinning an item" section below for more information.
## Static cluster membership considerations
We call a static cluster, that in which the set of `cluster.peers` is fixed, where "peer add"/"peer rm"/bootstrapping operations don't happen (at least normally) and where every cluster member is expected to be running all the time.
Static clusters are a way to run ipfs-cluster in a stable fashion, since the membership of the consensus remains unchanged, they don't suffer the dangers of dynamic peer sets, where it is important that operations modifying the peer set suceed for every cluster member.
Static clusters are a way to run ipfs-cluster in a stable fashion, since the membership of the consensus remains unchanged, they don't suffer the dangers of dynamic peer sets, where it is important that operations modifying the peer set succeed for every cluster member.
Static clusters expect every member peer to be up and responding. Otherwise, the Leader will detect missing heartbeats start logging errors. When a peer is not responding, ipfs-cluster will detect that a peer is down and re-allocate any content pinned by that peer to other peers. ipfs-cluster will still work as long as there is a Leader (half of the peers are still running). In the case of a network split, or if a majority of nodes is down, cluster will be unable to commit any operations the the log and thus, it's functionality will be limited to read operations.
Static clusters expect every member peer to be up and responding. Otherwise, the Leader will detect missing heartbeats and start logging errors. When a peer is not responding, ipfs-cluster will detect that a peer is down and re-allocate any content pinned by that peer to other peers. ipfs-cluster will still work as long as there is a Leader (half of the peers are still running). In the case of a network split, or if a majority of nodes is down, cluster will be unable to commit any operations the the log and thus, it's functionality will be limited to read operations.
## Dynamic cluster membership considerations
We call a dynamic cluster, that in which the set of `cluster.peers` changes. Nodes are bootstrapped to existing cluster peers, the "peer add" and "peer rm" operations are used and/or the `cluster.leave_on_shutdown` configuration option is enabled. This option allows a node to abandon the consensus membership when shutting down. Thus reducing the cluster size by one.
We call a dynamic cluster, that in which the set of `cluster.peers` changes. Nodes are bootstrapped to existing cluster peers (`cluster.bootstrap` option), the "peer add" and "peer rm" operations are used and/or the `cluster.leave_on_shutdown` configuration option is enabled. This option allows a node to abandon the consensus membership when shutting down. Thus reducing the cluster size by one.
Dynamic clusters allow greater flexibility at the cost of stablity. Join and leave operations are tricky as they change the consensus membership and they are likely to create bad situations in unhealthy clusters. Also, bear in mind than removing a peer from the cluster will trigger a re-allocation of the pins that were associated to it. If the replication factor was 1, it is recommended to keep the ipfs daemon running so the content can actually be copied out to a daemon managed by a different peer.
Peers joining an existing cluster should have a non-divergent state. That means: their consensus `ipfs-cluster-data` folder should be empty or, if not, it should contain data belonging to the existing cluster. A joining peer should have not been running on its own separately from the running cluster. When a peer leaves or is removed, any existing peers will be saved as `bootstrap` peers, so that it is not easy to start the departing peer in "single-peer-mode" by mistake: that would essentially create a diverging state, preventing it to re-join its previous cluster cleanly.
Peers joining an existing cluster should have a non-divergent state. Peers leaving a cluster are not expected to re-join it with stale consensus data. For this reason, **the consensus data folder is removed** when a peer leaves the current cluster. Conveniently, a backup of the state is created before doing so and placed in `~/.ipfs-cluster/backups`.
The best way to diagnose and fix a broken cluster membership issue is to:
When a peer leaves or is removed, any existing peers will be saved as `bootstrap` peers, so that it is easier to re-join the cluster. Since the state has been wiped, the peer will be able to re-join and fetch the latest state cleanly. See "The consensus algorithm" and the "Starting your cluster peers" sections above for more information.
This does not mean that there are not possibilities of somehow getting a broken cluster membership. The best way to diagnose it and fix it is to:
* Select a healthy node
* Run `ipfs-cluster-ctl peers ls`
* Examine carefully the results and any errors
* Run `ipfs-cluster-ctl peers rm <peer in error>` for every peer not responding
* If the peer count is different in the peers responding, identify peers with
wrong peer count, stop them, fix `cluster_peers` manually and restart them.
* If the peer count is different depending on the peers responding, remove those peers too. Once stopped, remove the consensus data folder and bootstrap them to a healthy cluster peer. Always make sure to keep 1/2+1 peers alive and healthy.
* `ipfs-cluster-ctl --enc=json peers ls` provides additional useful information, like the list of peers for every responding peer.
* In cases were no Leader can be elected, then manual stop and editing of `cluster.peers` is necessary.
* In cases were leadership has been lost beyond solution (meaning faulty peers cannot be removed), it is best to stop all peers and restore the state from the backup (currently, a manual operation).
If you have a problematic cluster peer trying to join an otherwise working cluster, the safest way is to remove the `ipfs-cluster-data` folder and to set the correct `bootstrap`. The consensus algorithm will then resend the state from scratch.
Remember: if you have a problematic cluster peer trying to join an otherwise working cluster, the safest way is to remove the `ipfs-cluster-data` folder and to set the correct `bootstrap`. The consensus algorithm will then resend the state from scratch.
Note that when adding a peer to an existing cluster, **the new peer must be configured with the same `cluster.secret` as the rest of the cluster**.
ipfs-cluster will fail to start if `cluster.peers` do not match the current Raft peerset. If the current Raft peerset is correct, you can manually update `cluster.peers`. Otherwise, it is easier to clean and bootstrap.
Finally, note that when bootstrapping a peer to an existing cluster, **the new peer must be configured with the same `cluster.secret` as the rest of the cluster**.
@ -249,16 +272,20 @@ The reason pins (and unpin) requests are queued is because ipfs only performs on
The process is very similar to the "Pinning an item" described above. Removed pins are wiped from the shared and local states. When requesting the local `status` for a given CID, it will show as `UNPINNED`. Errors will be reflected as `UNPIN_ERROR` in the pin local status.
## Cluster monitoring and failover
## Cluster monitoring and pin failover
ipfs-cluster includes a basic monitoring component which gathers metrics and triggers alerts when a metric is no longer renewed. There are currently two types of metrics:
* `informer` metrics are used to decide on allocations when a pin request arrives. Different "informers" can be configured. The default is the disk informer using the `freespace` metric.
* a `ping` metric is used to monitor peers.
* a `ping` metric is used to signal that a peer is alive.
Every metric carries a Time-To-Live associated with it. ipfs-cluster peers pushes metrics from every peer to the cluster Leader in TTL/2 intervals. When a metric for an existing cluster peer stops arriving and previous metrics have outlived their Time-To-Live, the monitoring component triggers an alert for that metric.
Every metric carries a Time-To-Live associated with it. This TTL can be configued in the `informers` configuration section. The `ping` metric TTL is determined by the `cluster.monitoring_ping_interval`, and equals to 2x its value.
ipfs-cluster will react to `ping` metrics alerts by searching for pins allocated to the alerting peer and triggering re-pinning requests for them. If the node is down, informer metrics are likely to be also invalid, causing the content to be given allocations from different peers.
Every ipfs-cluster peers push metrics to the cluster Leader regularly. This happens TTL/2 intervals for the `informer` metrics and in `cluster.monitoring_ping_interval` for the `ping` metrics.
When a metric for an existing cluster peer stops arriving and previous metrics have outlived their Time-To-Live, the monitoring component triggers an alert for that metric. `monbasic.check_interval` determines how often the monitoring component checks for expired TTLs and sends these alerts. If you wish to detect expired metrics more quickly, decrease this interval. Otherwise, increase it.
ipfs-cluster will react to `ping` metrics alerts by searching for pins allocated to the alerting peer and triggering re-pinning requests for them.
The monitoring and failover system in cluster is very basic and requires improvements. Failover is likely to not work properly when several nodes go offline at once (specially if the current Leader is affected). Manual re-pinning can be triggered with `ipfs-cluster-ctl pin <cid>`. `ipfs-cluster-ctl pin ls <CID>` can be used to find out the current list of peers allocated to a CID.
@ -292,18 +319,17 @@ This means that the top cluster will think that it is performing requests to an
This allows to scale ipfs-cluster deployments and provides a method for building ipfs-cluster topologies that may be better adapted to certain needs.
Note that **this feature has not been extensively tested**.
Note that **this feature has not been extensively tested**, but we aim to introduce improvements and fully support it in the mid-term.
## Security
ipfs-cluster peers communicate which eachother using libp2p-encrypted streams (`secio`), with the ipfs daemon using plain http, provide an HTTP API themselves (used by `ipfs-cluster-ctl`) and an IPFS Proxy. This means that there are four endpoints to be wary about when thinking of security:
* `cluster_multiaddress`, defaults to `/ip4/0.0.0.0/tcp/9096` and is the listening address to communicate with other peers (via Remote RPC calls mostly). These endpoints are protected by the `cluster.secret` value specified in the configuration. Only peers holding the same secret can communicate between
each other. If the secret is empty, then **nothing prevents anyone from sending RPC commands to the cluster RPC endpoint** and thus, controlling the cluster and the ipfs daemon (at least when it comes to pin/unpin/pin ls and swarm connect operations. ipfs-cluster administrators should therefore be careful keep this endpoint unaccessible to third-parties when no `cluster.secret` is set.
* `api_listen_multiaddress`, defaults to `/ip4/127.0.0.1/tcp/9094` and is the listening address for the HTTP API that is used by `ipfs-cluster-ctl`. The considerations for `api_listen_multiaddress` are the same as for `cluster_multiaddress`, as access to this endpoint allows to control ipfs-cluster and the ipfs daemon to a extent. By default, this endpoint listens on locahost which means it can only be used by `ipfs-cluster-ctl` running in the same host.
* `ipfs_proxy_listen_multiaddress` defaults to `/ip4/127.0.0.1/tcp/9095`. As explained before, this endpoint offers control of ipfs-cluster pin/unpin operations and a full access to the underlying ipfs daemon. This endpoint should be treated with the same precautions as the ipfs HTTP API.
* `ipfs_node_multiaddress` defaults to `/ip4/127.0.0.1/tcp/5001` and contains the address of the ipfs daemon HTTP API. The recommendation is running IPFS on the same host as ipfs-cluster. This way it is not necessary to make ipfs API listen on other than localhost.
* `cluster.listen_multiaddress`, defaults to `/ip4/0.0.0.0/tcp/9096` and is the listening address to communicate with other peers (via Remote RPC calls mostly). These endpoints are protected by the `cluster.secret` value specified in the configuration. Only peers holding the same secret can communicate between each other. If the secret is empty, then **nothing prevents anyone from sending RPC commands to the cluster RPC endpoint** and thus, controlling the cluster and the ipfs daemon (at least when it comes to pin/unpin/pin ls and swarm connect operations. ipfs-cluster administrators should therefore be careful keep this endpoint unaccessible to third-parties when no `cluster.secret` is set.
* `restapi.listen_multiaddress`, defaults to `/ip4/127.0.0.1/tcp/9094` and is the listening address for the HTTP API that is used by `ipfs-cluster-ctl`. The considerations for `restapi.listen_multiaddress` are the same as for `cluster.listen_multiaddress`, as access to this endpoint allows to control ipfs-cluster and the ipfs daemon to a extent. By default, this endpoint listens on locahost which means it can only be used by `ipfs-cluster-ctl` running in the same host. The REST API component provides HTTPS support for this endpoint, along with Basic Authentication. These can be used to protect an exposed API endpoint.
* `ipfshttp.proxy_listen_multiaddress` defaults to `/ip4/127.0.0.1/tcp/9095`. As explained before, this endpoint offers control of ipfs-cluster pin/unpin operations and access to the underlying ipfs daemon. This endpoint should be treated with at least the same precautions as the ipfs HTTP API.
* `ipfshttp.node_multiaddress` defaults to `/ip4/127.0.0.1/tcp/5001` and contains the address of the ipfs daemon HTTP API. The recommendation is running IPFS on the same host as ipfs-cluster. This way it is not necessary to make ipfs API listen on other than localhost.
## Upgrading
@ -312,7 +338,9 @@ The safest way to upgrade ipfs-cluster is to stop all cluster peers, update and
As long as the *shared state* format has not changed, there is nothing preventing from stopping cluster peers separately, updating and launching them.
When the shared state format has changed, a state migration will be required. ipfs-cluster will refuse to start and inform the user in this case, although this feature is yet to be implemented next time the state format changes. It will also mean a minor version bump (where normally only the patch version increases).
When the shared state format has changed, a state migration will be required. ipfs-cluster will refuse to start and inform the user in this case, although this feature is yet to be implemented next time the state format changes.
The upgrading procedures is something which is actively worked on and will improve over time.
## Troubleshooting and getting help
@ -321,7 +349,7 @@ When the shared state format has changed, a state migration will be required. ip
Open an issue on [ipfs-cluster](https://github.com/ipfs/ipfs-cluster) or ask on [discuss.ipfs.io](https://discuss.ipfs.io).
If you want to collaborate in ipfs-cluster, open an issue and ask about what you can do.
If you want to collaborate in ipfs-cluster, look at the list of open issues. The are many are conveniently marked with [HELP WANTED](https://github.com/ipfs/ipfs-cluster/issues?q=is%3Aissue+is%3Aopen+label%3A%22help+wanted%22) and organized by difficulty. If in doubt, just ask!
### Debugging
@ -347,19 +375,20 @@ When your peer is not starting:
* Check the logs and look for errors
* Are all the listen addresses free or are they used by a different process?
* Are other peers of the cluster reachable?
* Is the `cluster_secret` the same for all peers?
* Double-check that the addresses in `cluster_peers` are correct.
* Is the `cluster.secret` the same for all peers?
* Double-check that the addresses in `cluster.peers` and `cluster.bootstrap` are correct.
* Double-check that the rest of the cluster is in a healthy state.
* In some cases, it may help to delete everything in the `consensus_data_folder`. Assuming that the cluster is healthy, this will allow the non-starting peer to pull a clean state from the cluster Leader. Make a backup first, just in case.
* In some cases, it may help to delete everything in the consensus data folder (specially if the reason for not starting is a mismatch between the raft state and the cluster peers). Assuming that the cluster is healthy, this will allow the non-starting peer to pull a clean state from the cluster Leader when bootstrapping.
### Peer stopped unexpectedly
When a peer stops unexpectedly:
* Make sure you simply haven't removed the peer from the cluster or triggered a shutdown
* Check the logs for any clues that the process died because of an internal fault
* Check your system logs to find if anything external killed the process
* Report any application panics, as they should not happen, along with the logs.
* Report any application panics, as they should not happen, along with the logs
### `ipfs-cluster-ctl status <cid>` does not report CID information for all peers

View File

@ -1,6 +1,6 @@
# `ipfs-cluster-service`
> IPFS cluster peer launcher
> IPFS cluster peer daemon
`ipfs-cluster-service` runs a full IPFS Cluster peer.
@ -24,7 +24,7 @@ $ ipfs-cluster-service init
`init` will randomly generate a `cluster_secret` (unless specified by the `CLUSTER_SECRET` environment variable or running with `--custom-secret`, which will prompt it interactively).
All peers in a cluster **must share the same cluster secret**. Using an empty secret may compromise the security of your cluster (see the documentation for more information).
**All peers in a cluster must share the same cluster secret**. Using an empty secret may compromise the security of your cluster (see the documentation for more information).
### Configuration
@ -39,11 +39,15 @@ The configuration file should probably be identical among all cluster peers, exc
The `peers` configuration variable holds a list of current cluster members. If you know the members of the cluster in advance, or you want to start a cluster fully in parallel, set `peers` in all configurations so that every peer knows the rest upon boot. Leave `bootstrap` empty. A cluster peer address looks like: `/ip4/1.2.3.4/tcp/9096/<id>`.
The list of `cluster.peers` is maintained automatically and saved by `ipfs-cluster-service` when it changes.
#### Clusters using `cluster.bootstrap`
When the `peers` variable is empty, the multiaddresses in `bootstrap` can be used to have a peer join an existing cluster. The peer will contact those addresses (in order) until one of them succeeds in joining it to the cluster. When the peer is shut down, it will save the current cluster peers in the `peers` configuration variable for future use (unless `leave_on_shutdown` is true, in which case it will save them in `bootstrap`)
When the `peers` variable is empty, the multiaddresses in `bootstrap` (or the `--bootstrap` parameter to `ipfs-cluster-service`) can be used to have a peer join an existing cluster. The peer will contact those addresses (in order) until one of them succeeds in joining it to the cluster. When the peer is shut down, it will save the current cluster peers in the `peers` configuration variable for future use (unless `leave_on_shutdown` is true, in which case it will save them in `bootstrap`).
Bootstrap is a convenient method, but more prone to errors than having a fixed set of peers. It can be used as well with `ipfs-cluster-service --bootstrap <multiaddress>`. Note that bootstrapping nodes with an old state (or diverging state) from the one running in the cluster may lead to problems with the consensus, so usually you would want to bootstrap clean nodes.
Bootstrap is a convenient method to sequentially start the peers of a cluster. **Only bootstrap clean nodes** which have not been part of a cluster before (or clean the `ipfs-cluster-data` folder). Bootstrapping nodes with an old state (or diverging state) from the one running in the cluster will fail or lead to problems with the consensus layer.
When setting the `leave_on_shutdown` option, or calling `ipfs-cluster-service` with the `--leave` flag, the node will attempt to leave the cluster in an orderly fashion when shutdown. The node will be cleaned up when this happens and can be bootstrapped safely again.
### Debugging