Merge pull request #393 from ipfs/docs/move-to-website

Docs/move to website
This commit is contained in:
Hector Sanjuan 2018-05-01 11:14:17 +02:00 committed by GitHub
commit a0a0898719
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
12 changed files with 32 additions and 1442 deletions

View File

@ -1,204 +0,0 @@
# IPFS Cluster - Captain's log
## 20180329 | @hsanjuan
In the last two months many things have happened in the ipfs-cluster project.
First, we have welcomed a new team member: @lanzafame has already started contributing
and has resolved a few issues already included in the last release.
Secondly, we have been working very hard on implementing the "sharding RFC" that I mentioned in my last update. @zenground0 has made very significant progress on this front. Sharding will be a unique feature of ipfs-cluster and will help to drive the adoption of ipfs by being able tu support huge datasets distributed among different nodes. We hope that the first "sharding" prototype is ready in the upcoming weeks.
Thirdly, we have made 3 releases (the latest being `0.3.5`) which bring a diverse set of features and some bugfixes. Some of the major ones are these:
* `ipfs-cluster-ctl healht graph` generates a `.dot` file which allows to quickly have an overview of connectivity among the peers in the cluster.
* The `refs` pinning method allows to download dags in parallel and pin only when they content is already on the disk.
* The Rest API now exposes the HTTP endpoints through libp2p. By using a libp2p host to communicate with it, users get an encrypted connection without having to setup SSL certificates.
We have also started working on the ipfs-cluster website, which we will use to provide a central and well organized place for documentation, roadmaps and other information related to the project.
Happy pinning!
## 20180125 | @hsanjuan
We are about to tag the `0.3.2` release and it comes with two nice features.
On one side, @zenground0 has been focused in implementing state offline export and import capabilities, a complement to the state upgrades added in the last release. They allow taking the shared from an offline cluster (and in a human readable format), and place it somewhere else, or in the same place. This feature might save the day in situations when the quorum of a cluster is completely lost and peers cannot be started anymore due to the lack of master.
Additionally, I have been putting some time into a new approach to replication factors. Instead of forcing cluster to store a pin an specific number of times, we now support a lower and upper bounds to the the replication factor (in the form of `replication_factor_min` and `replication_factor_max`). This feature (and great idea) was originally proposed by @segator.
Having this margin means that cluster will attempt its best when pinning an item (reach the max factor), but it won't error if it cannot find enough available peers, as long as it finds more than the minimum replication factor.
In the same way, a peer going offline, will not trigger a re-allocation of the CID as it did before, if the replication factor is still within the margin. This allows, for example, taking a peer offline for maintenance, without having cluster vacate all the pins associated to it (and then coming up empty).
Of course, the previous behaviour can still be obtained by setting both the max and the min to the same values.
Finally, it is very important to remark that we recently finished the [Sharding RFC draft](https://github.com/ipfs/ipfs-cluster/blob/master/docs/dag-sharding-rfc.md). This document outlines how we are going to approach the implementation of one of the most difficult but important features upcoming in cluster: the ability to distribute a single CID (tree) among several nodes. This will allow to use cluster to store files or archives too big for a single ipfs node. Input from the community on this draft can be provided at https://github.com/ipfs/notes/issues/278.
## 20171211 | @hsanjuan
During the last weeks we've been working hard on making the first "live" deployment of ipfs-cluster. I am happy to announce that a 10-peer cluster runs on ipfs-gateway nodes, maintaining a >2000-length pinset.
The nodes are distributed, run a vanilla ipfs-cluster docker container mounting a volume with a customized [cluster configuration](https://github.com/ipfs/infrastructure/blob/master/ipfs-cluster/service.json.tpl) which uses higher-than-default timeouts and intervals. The injection of the pin-set took a while, but enventually every pin in every node became PINNED. In one occassion, a single IPFS node hanged while pinning. After re-starting the IPFS node in question, all pins in the queue became PIN_ERRORs, but they could easily be fixed with a `recover` operation.
Additionally, the [IPFS IRC Pinbot](https://github.com/ipfs/pinbot-irc) now supports cluster-pinning, by using the ipfs-cluster proxy to ipfs, which intercepts pin requests and performs them in cluster. This allowed us to re-use the `go-ipfs-api` library to interact with cluster.
The first live setup has shown nevertheless that some things were missing. For example, we added `--local` flags to Sync, Status and Recover operations (and allowed a local RecoverAll). They are handy when a single node is at fault and you want to fix the pins on that specific node. We will also work on a `go-ipfs-cluster-api` library which provides a REST API client which allows to programatically interact with cluster more easily.
Parallel to all this, @zenground0 has been working on state migrations. The cluster's consensus state is stored on disk via snapshots in certain format. This format might evolve in the future and we need a way to migrate between versions without losing all the state data. In the new approach, we are able to extract the state from Raft snapshots, migrate it, and create a new snapshot with the new format so that the next time cluster starts everything works. This has been a complex feature but a very important step to providing a production grade release of ipfs-cluster.
Last but not least, the next release will include useful things like pin-names (a string associated to every pin) and peer names. This will allow to easily identify pins and peers by other than their multihash. They have been contributed by @te0d, who is working on https://github.com/te0d/js-ipfs-cluster-api, a JS Rest API client for our REST API, and https://github.com/te0d/bunker, a web interface to manage ipfs-cluster.
## 20171115 | @hsanjuan
This update comes as our `0.3.0` release is about to be published. This release includes quite a few bug fixes, but the main change is the upgrade of the underlying Raft libraries to a recently published version.
Raft 1.0.0 hardens the management of peersets and makes it more difficult to arrive to situations in which cluster peers have different, inconsistent states. These issues are usually very confusing for new users, as they manifest themselves with lots of error messages with apparently cryptic meanings, coming from Raft and LibP2P. We have embraced the new safeguards and made documentation and code changes to stress the workflows that should be followed when altering the cluster peerset. These can be summarized with:
* `--bootstrap` is the method to add a peer to a running cluster as it ensures that no diverging state exists during first boot.
* The `ipfs-cluster-data` folder is renamed whenever a peer leaves the cluster, resulting on a clean state for the next start. Peers with a dirty state will not be able to join a cluster.
* Whenever `ipfs-cluster-data` has been initialized, `cluster.peers` should match the internal peerset from the data, or the node will not start.
In the documentation, we have stressed the importance of the consensus data and described the workflows for starting peers and leaving the cluster in more detail.
I'm also happy to announce that we now build and publish "snaps". [Snaps](https://snapcraft.io/) are "universal Linux packages designed to be secure, sandboxed, containerised applications isolated from the underlying system and from other applications". We are still testing them. For the moment we publish a new snap on every master build.
You are welcome to check the [changelog](CHANGELOG.md) for a detailed list of other new features and bugfixes.
Our upcoming work will be focused on setting up a live ipfs-cluster and run it in a "production" fashion, as well as adding more capabilities to manage the internal cluster state while offline (migrate, export, import) etc.
## 20171023 | @hsanjuan
We have now started the final quarter of 2017 with renewed energy and plans for ipfs-cluster. The team has grown and come up with a set of priorities for the next weeks and months. The gist of these is:
* To make cluster stable and run it on production ourselves
* To start looking into the handling of "big datasets", including IPLD integration
* To provide users with a delightful experience with a focus in documentation and support
The `v0.2.0` marks the start of this cycle and includes. Check the [changelog](CHANGELOG.md) for a list of features and bugfixes. Among them, the new configuration options in the consensus component options will allow our users to experiment in environments with larger latencies than usual.
Finally, coming up in the pipeline we have:
* the upgrade of Raft library to v1.0.0, which is likely to provide a much better experience with dynamic-membership clusters.
* Swagger documentation for the Rest API.
* Work on connectivity graphs, allowing to easily spot any connectivity problem among cluster peers.
## 20170726 | @hsanjuan
Unfortunately, I have not thought of updating the Captain's log for some months. The Coinlist effort has had me very busy, which means that my time and mind were not fully focused on cluster as before. That said, there has been significant progress during this period. Much of that progress has happened thanks to @Zenground0 and @dgrisham, who have been working on cluster for most of Q2 making valuable contributions (many of them on the testing front).
As a summary, since my last update, we have:
* [A guide to running IPFS Cluster](docs/ipfs-cluster-guide.md), with detailed information on how cluster works, what behaviours to expect and why. It should answer many questions which are not covered by the getting-started-quickly guides.
* Added sharness tests, which make sure that `ìpfs-cluster-ctl` and `ipfs-cluster-service` are tested and not broken in obvious ways at least and complement our testing pipeline.
* Pushed the [kubernetes-ipfs](https://github.com/ipfs/kubernetes-ipfs) project great lengths, adding a lot of features to its DSL and a bunch of highly advanced ipfs-cluster tests. The goal is to be able to test deployments layouts which are closer to reality, including escalability tests.
* The extra tests uncovered and allowed us to fix a number of nasty bugs, usually around the ipfs-cluster behaviour when peers go down or stop responding.
* Added CID re-allocation on peer removal.
* Added "Private Networks" support to ipfs-cluster. Private Networks is a libp2p feature which allows to secure a libp2p connection with a key. This means that inter-peer communication is now protected and isolated with a `cluster_secret`. This brings a significant reduction on the security pitfalls of running ipfs-cluster: default setup does not allow anymore remote control of a cluster peer. More information on security can be read on the [guide](docs/ipfs-cluster-guide.md).
* Added HTTPs support for the REST API endpoint. This facilitates exposing the API endpoints directly and is a necessary preamble to supporting basic authentication (in the works).
All the above changes are about to crystallize in the `v0.1.0` release, which we'll publish in the next days.
## 20170328 | @hsanjuan
The last weeks were spent on improving go-ipfs/libp2p/multiformats documentation as part of the [documentation sprint](https://github.com/ipfs/pm/issues/357) mentioned earlier.
That said, a few changes have made it to ipfs-cluster:
* All components have now been converted into submodules. This clarifies
the project layout and actually makes the component borders explicit.
* Increase pin performance. By using `type=recursive` in IPFS API queries
they return way faster.
* Connect different ipfs nodes in the cluster: we now trigger `swarm connect` operations for each ipfs node associated to a cluster peer, both at start up and
upon operations like `peer add`. This should ensure that ipfs nodes in the
cluster know each others.
* Add `disk` informer. The default allocation strategy now is based on how
big the IPFS repository is. Pins will be allocated to peers with lower
repository sizes.
I will be releasing new builds/release for ipfs-cluster in the following days.
## 20170310 | @hsanjuan
This week has been mostly spent on making IPFS Cluster easy to install, writing end-to-end tests as part of the Test Lab Sprint and bugfixing:
* There is now an ipfs-cluster docker container, which should be part of the ipfs docker hub very soon
* IPFS Cluster builds are about to appear in dist.ipfs.io
* I shall be publishing some Ansible roles to deploy ipfs+ipfs-cluster
* There are now some tests using [kubernetes-ipfs](https://github.com/ipfs/kubernetes-ipfs)with new docker container. These tests are the first automated tests that are truly end-to-end, using a real IPFS-daemon under the hood.
* I have added replication-factor-per-pin support. Which means that for every pinned item, it can be specified what it's replication factor should be, and this factor can be updated. This allows to override the global configuration option for replication factor.
* Bugfixes: one affecting re-pinning+replication and some others in ipfs-cluster-ctl output.
Next week will probably focus on the [Delightful documentation sprint](https://github.com/ipfs/pm/issues/357). I'll try to throw in some more tests for `ipfs-cluster-ctl` and will send the call for early testers that I was talking about in the last update, now that we have new multiple install options.
## 20170302 | @hsanjuan
IPFS cluster now has basic peer monitoring and re-pinning support when a cluster peer goes down.
This is done by broadcasting a "ping" from each peer to the monitor component. When it detects no pings are arriving from a current cluster member, it triggers an alert, which makes cluster trigger re-pins for all the CIDs associated to that peer.
The next days will be spent fixing small things and figuring out how to get better tests as part of the [Test Lab Sprint](https://github.com/ipfs/pm/issues/354). I also plan to make a call for early testers, to see if we can get some people on board to try IPFS Cluster out.
## 20170215 | @hsanjuan
A global replication factor is now supported! A new configuration file option `replication_factor` allows to specify how many peers should be allocated to pin a CID. `-1` means "Pin everywhere", and maintains compatibility with the previous behaviour. A replication factor >= 1 pin request is subjec to a number of requirements:
* It needs to not be allocated already. If it is the pin will return with an error saying so.
* It needs to find enough peers to pin.
How the peers are allocated content has been most of the work in this feature. We have two three componenets for doing so:
* An `Informer` component. Informer is used to fetch some metric (agnostic to Cluster). The metric has a Time-to-Live and it is pushed in TTL/2 intervals to the Cluster leader.
* An `Allocator` component. The allocator is used to provide an `Allocate()` method which, given current allocations, candidate peers and the last valid metrics pushed from the `Informers`, can decide which peers should perform the pinning. For example, a metric could be the used disk space in a cluster peer, and the allocation algorithm would be to sort candidate peers according to that metrics. The first in the list are the ones with less disk used, and will then be chosen to perform the pin. An `Allocator` could also work by receiving a location metric and making sure that the most preferential location is different from the already existing ones etc.
* A `PeerMonitor` component, which is in charge of logging metrics and providing the last valid ones. It will be extended in the future to detect peer failures and trigger alerts.
The current allocation strategy is a simple one called `numpin`, which just distributes the pins according to the number of CIDs peers are already pinning. More useful strategies should come in the future (help wanted!).
The next steps in Cluster will be wrapping up this milestone with failure detection and re-balancing.
## 20170208 | @hsanjuan
So much for commitments... I missed last friday's log entry. The reason is that I was busy with the implementation of [dynamic membership for IPFS Cluster](https://github.com/ipfs/ipfs-cluster/milestone/2).
What seemed a rather simple task turned into a not so simple endeavour because modifying the peer set of Raft has a lot of pitfalls. This is specially if it is during boot (in order to bootstrap). A `peer add` operation implies making everyone aware of a new peer. In Raft this is achieved by commiting a special log entry. However there is no way to notify of such event on a receiver, and such entry only has the peer ID, not the full multiaddress of the new peer (needed so that other nodes can talk to it).
Therefore whoever adds the node must additionally broadcast the new node and also send back the full list of cluster peers to it. After three implementation attempts (all working but all improving on top of the previous), we perform this broadcasting by logging our own `PeerAdd` operation in Raft, with the multiaddress. This proved nicer and simpler than broadcasting to all the nodes (mostly on dealing with failures and errors - what do when a node has missed out). If the operation makes it to the log then everyone should get it, and if not, failure does not involve un-doing the operation in every node with another broadcast. The whole thing is still tricky when joining peers which have disjoint Raft states, so it is best to use it with clean, just started peers.
Same as `peer add`, there is a `join` operation which facilitates bootstrapping a node and have it directly join a cluster. On shut down, each node will save the current cluster peers in the configuration for future use. A `join` operation can be triggered with the `--bootstrap` flag in `ipfs-cluster-service` or with the `bootstrap` option in the configuration and works best with clean nodes.
The next days will be spent on implementing [replication factors](https://github.com/ipfs/ipfs-cluster/milestone/3), which implies the addition of new components to the mix.
## 20170127 | @hsanjuan
Friday is from now on the Captain Log entry day.
Last week, was the first week out of three the current ipfs-cluster *sprintino* (https://github.com/ipfs/pm/issues/353). The work has focused on addressing ["rough edges"](https://github.com/ipfs/ipfs-cluster/milestone/1), most of which came from @jbenet's feedback (#14). The result has been significant changes and improvements to ipfs-cluster:
* I finally nailed down the use of multicodecs in `go-libp2p-raft` and `go-libp2p-gorpc` and the whole dependency tree is now Gx'ed.
* It seems for the moment we have settled for `ipfs-cluster-service` and `ipfs-cluster-ctl` as names for the cluster tools.
* Configuration file has been overhauled. It now has explicit configuration key names and a stronger parser which will be more specific
on the causes of error.
* CLI tools have been rewritten to use `urfave/cli`, which means better help, clearer commands and more consistency.
* The `Sync()` operations, which update the Cluster pin states from the IPFS state have been rewritten. `Recover()` has been promoted
to its own endpoint.
* I have added `ID()` endpoint which provides information about the Cluster peer (ID, Addresses) and about the IPFS daemon it's connected to.
The `Peers()` endpoint retrieves this information from all Peers so it is easy to have a general overview of the Cluster.
* The IPFS proxy is now intercepting `pin add`, `pin rm` and `pin ls` and doing Cluster pinning operations instead. This not only allows
replacing an IPFS daemon by a Cluster peer, but also enables compositing cluster peers with other clusters (pointing `ipfs_node_multiaddress` to a different Cluster proxy endpoint).
The changes above include a large number of API renamings, re-writings and re-organization of the code, but ipfs-cluster has grown more solid as a result.
Next week, the work will focus on making it easy to [add and remove peers from a running cluster](https://github.com/ipfs/ipfs-cluster/milestone/2).
## 20170123 | @hsanjuan
I have just merged the initial cluster version into master. There are many rough edges to address, and significant changes to namings/APIs will happen during the next few days and weeks.
The rest of the quarter will be focused on 4 main issues:
* Simplify the process of adding and removing cluster peers
* Implement a replication-factor-based pinning strategy
* Generate real end to end tests
* Make ipfs-cluster stable
These endaevours will be reflected on the [ROADMAP](ROADMAP.md) file.

3
CONTRIBUTING.md Normal file
View File

@ -0,0 +1,3 @@
# Guidelines for contributing
Please see https://cluster.ipfs.io/developer/contribute .

157
README.md
View File

@ -1,4 +1,4 @@
# ipfs-cluster # IPFS Cluster
[![Made by](https://img.shields.io/badge/made%20by-Protocol%20Labs-blue.svg?style=flat-square)](https://protocol.ai) [![Made by](https://img.shields.io/badge/made%20by-Protocol%20Labs-blue.svg?style=flat-square)](https://protocol.ai)
@ -10,162 +10,53 @@
[![Build Status](https://travis-ci.org/ipfs/ipfs-cluster.svg?branch=master)](https://travis-ci.org/ipfs/ipfs-cluster) [![Build Status](https://travis-ci.org/ipfs/ipfs-cluster.svg?branch=master)](https://travis-ci.org/ipfs/ipfs-cluster)
[![Coverage Status](https://coveralls.io/repos/github/ipfs/ipfs-cluster/badge.svg?branch=master)](https://coveralls.io/github/ipfs/ipfs-cluster?branch=master) [![Coverage Status](https://coveralls.io/repos/github/ipfs/ipfs-cluster/badge.svg?branch=master)](https://coveralls.io/github/ipfs/ipfs-cluster?branch=master)
> Pinset orchestration for IPFS.
> Collective pinning and composition for IPFS. <p align="center">
<img src="https://cluster.ipfs.io/cluster/png/IPFS_Cluster_color_no_text.png" alt="logo" width="300" height="300" />
</p>
**THIS SOFTWARE IS ALPHA** IPFS Cluster allows to allocate, replicate and track Pins across a cluster of IPFS daemons.
`ipfs-cluster` allows to replicate content (by pinning) in multiple IPFS nodes: It provides:
* Works on top of the IPFS daemon by running one cluster peer per IPFS node (`ipfs-cluster-service`)
* A `replication_factor` controls how many times a CID is pinned in the cluster
* Re-pins stuff in a different place when a peer goes down
* Provides an HTTP API and a command-line wrapper (`ipfs-cluster-ctl`)
* Provides an IPFS daemon API Proxy which intercepts any "pin"/"unpin" requests and does cluster pinning instead
* The IPFS Proxy allows to build cluster composition, with a cluster peer acting as an IPFS daemon for another higher-level cluster.
* Peers share the state using Raft-based consensus. Uses the LibP2P stack (`go-libp2p-raft`, `go-libp2p-rpc`...)
* A cluster peer application: `ipfs-cluster-service`, to be run along with `go-ipfs`.
* A client CLI application: `ipfs-cluster-ctl`, which allows easily interacting with the peer's HTTP API.
## Table of Contents ## Table of Contents
- [Maintainers and Roadmap](#maintainers-and-roadmap) - [Documentation](#documentation)
- [News & Roadmap](#news--roadmap)
- [Install](#install) - [Install](#install)
- [Pre-compiled binaries](#pre-compiled-binaries)
- [Docker](#docker)
- [Install from sources](#install-from-sources)
- [Usage](#usage) - [Usage](#usage)
- [`Quickstart`](#quickstart)
- [`Go`](#go)
- [`Additional docs`](#additional-docs)
- [API](#api)
- [Architecture](#api)
- [Contribute](#contribute) - [Contribute](#contribute)
- [License](#license) - [License](#license)
## Maintainers and Roadmap ## Documentation
This project is captained by [@hsanjuan](https://github.com/hsanjuan). See the [captain's log](CAPTAIN.LOG.md) for a written summary of current status and upcoming features. You can also check out the project's [Roadmap](ROADMAP.md) for a high level overview of what's coming and the project's [Waffle Board](https://waffle.io/ipfs/ipfs-cluster) to see what issues are being worked on at the moment. Please visit https://cluster.ipfs.io/documentation/ to access user documentation, guides and any other resources, including detailed **download** and **usage** instructions.
## News & Roadmap
We regularly post project updates to https://cluster.ipfs.io/news/ .
The most up-to-date *Roadmap* is available at https://cluster.ipfs.io/roadmap/ .
## Install ## Install
### Pre-compiled binaries Instructions for different installation methods (including from source) are available at https://cluster.ipfs.io/documentation/download .
You can download pre-compiled binaries for your platform from the [dist.ipfs.io](https://dist.ipfs.io) website:
* [Builds for `ipfs-cluster-service`](https://dist.ipfs.io/#ipfs-cluster-service)
* [Builds for `ipfs-cluster-ctl`](https://dist.ipfs.io/#ipfs-cluster-ctl)
Note that since IPFS Cluster is evolving fast, these builds may not contain the latest features/bugfixes. Builds are updated monthly on a best-effort basis.
### Docker
You can build or download an automated build of the ipfs-cluster docker container. This container runs `ipfs-cluster-service` and includes `ipfs-cluster-ctl`. To launch the latest published version on Docker run:
`$ docker run ipfs/ipfs-cluster`
To build the container manually you can:
`$ docker build . -t ipfs-cluster`
You can mount your local ipfs-cluster configuration and data folder by passing `-v /data/ipfs-cluster your-local-ipfs-cluster-folder` to Docker. Otherwise, a new configuration will be generated. In that case, you can point it to the right IPFS location by setting `IPFS_API` like `--env IPFS_API="/ip4/1.2.3.4/tcp/5001"`.
### Install from the snap store
In any of the [supported Linux distros](https://snapcraft.io/docs/core/install):
```bash
sudo snap install ipfs-cluster --edge
```
(Note that this is an experimental and unstable release, at the moment)
### Install from sources
Installing from `master` is the best way to have the latest features and bugfixes. In order to install the `ipfs-cluster-service` the `ipfs-cluster-ctl` tools you will need `Go1.9+` installed in your system and the run the following commands:
```
$ go get -u -d github.com/ipfs/ipfs-cluster
$ cd $GOPATH/src/github.com/ipfs/ipfs-cluster
$ make install
```
This will install `ipfs-cluster-service` and `ipfs-cluster-ctl` in your `$GOPATH/bin` folder. See the usage below.
## Usage ## Usage
![ipfs-cluster-usage](https://ipfs.io/ipfs/QmVMKD39fYJG9QGyyFkGN3QuZRg3EfuuxqkG1scCo9ZUHp/cluster-mgmt.gif) Extensive usage information is provided at https://cluster.ipfs.io/documentation/ , including:
### Quickstart * [Docs for `ipfs-cluster-service`](https://cluster.ipfs.io/documentation/ipfs-cluster-service/)
* [Docs for `ipfs-cluster-ctl`](https://cluster.ipfs.io/documentation/ipfs-cluster-ctl/)
** Remember: Start your ipfs daemon before running ipfs-cluster **
**`ipfs-cluster-service`** runs an ipfs-cluster peer:
- Initialize with `ipfs-cluster-service init`
- This will randomly generate a secret which should be shared among all peers.
- Run with `ipfs-cluster-service`. Check `--help` for options
For more information about `ipfs-cluster-service` see the [`ipfs-cluster-service` README](ipfs-cluster-service/dist/README.md). Also, read [A guide to running IPFS Cluster](docs/ipfs-cluster-guide.md) for full a full overview of how cluster works.
**`ipfs-cluster-ctl`** is used to interface with the ipfs-cluster peer:
```
ipfs-cluster-ctl id # see peer information
ipfs-cluster-ctl pin add <cid> # Pin a CID in ipfs-cluster
ipfs-cluster-ctl pin rm <cid> # Upin a CID
ipfs-cluster-ctl ls # See current pins and allocations
ipfs-cluster-ctl status <cid> # See information from every allocation for a CID.
```
For information on how to manage and perform operations on an IPFS Cluster peer see the [`ipfs-cluster-ctl` README](ipfs-cluster-ctl/dist/README.md).
### Go
IPFS Cluster nodes can be launched directly from Go. The `Cluster` object provides methods to interact with the cluster and perform actions.
Documentation and examples on how to use IPFS Cluster from Go can be found in [godoc.org/github.com/ipfs/ipfs-cluster](https://godoc.org/github.com/ipfs/ipfs-cluster).
### Additional docs
You can find more information and detailed guides:
* [A guide to running IPFS Cluster](docs/ipfs-cluster-guide.md)
* [Building and updating an IPFS Cluster](docs/HOWTO_build_and_update_a_cluster.md)
Note: please contribute to improve and add more documentation!
## API
TODO: Swagger
This is a quick summary of API endpoints offered by the Rest API component (these may change before 1.0):
|Method|Endpoint |Comment|
|------|--------------------|-------|
|GET |/id |Cluster peer information|
|GET |/version |Cluster version|
|GET |/peers |Cluster peers|
|POST |/peers |Add new peer|
|DELETE|/peers/{peerID} |Remove a peer|
|GET |/allocations |List of pins and their allocations (consensus-shared state)|
|GET |/allocations/{cid} |Show a single pin and its allocations (from the consensus-shared state)|
|GET |/pins |Status of all tracked CIDs|
|POST |/pins/sync |Sync all|
|GET |/pins/{cid} |Status of single CID|
|POST |/pins/{cid} |Pin CID|
|DELETE|/pins/{cid} |Unpin CID|
|POST |/pins/{cid}/sync |Sync CID|
|POST |/pins/{cid}/recover |Recover CID|
## Architecture
The best place to get an overview of how cluster works, what components exist etc. is the [architecture.md](architecture.md) doc.
## Contribute ## Contribute
PRs accepted. PRs accepted. As part of the IPFS project, we have some [contribution guidelines](https://cluster.ipfs.io/developer/contribute).
Small note: If editing the README, please conform to the [standard-readme](https://github.com/RichardLitt/standard-readme) specification. Small note: If editing the README, please conform to the [standard-readme](https://github.com/RichardLitt/standard-readme) specification.

View File

@ -1,58 +0,0 @@
# IPFS Cluster - Roadmap
## Q4 2017
The Q4 roadmap has been crystalized in the [ipfs-cluster OKRs](https://docs.google.com/spreadsheets/d/1rLxvRfdYohv-dhzVAo1xfeJOUa7uGfq6N_yyKw0iRw0/edit?usp=sharing). They can be summarized as:
* Making cluster production grade
* Run a live deployment of ipfs-cluster
* Integrate it with pin-bot
* Run kubernetes tests in a real kubernetes cluster
* Start looking into big datasets
* Design a strategy to handle "big files"
* Look into ipld/ipfs-pack
* Support users
* Give prompt feedback
* Fix all bugs
* Improve documentation
* Make releases
## Q3 2017
Since Q1 ipfs-cluster has made some progress with a strong effort in reaching a minimal feature set that allows to use it in production settings along with lots of focus on testing. The summary of what has been achieved can be seen in the [Captain's Log](CAPTAIN.log.md).
For Q3, this is the tentative roadmap of things on which ipfs-cluster will focus on:
* Making all the ipfs-kubernetes tests fit
* Working on low-hanging fruit: easy features with significant impact
* Adopting ipfs-cluster as a replacement for the ipfs pin-bot (or deploying a production-maintained ipfs-cluster)
* Outlining more a Q4 roadmap where development efforts come back in-full to ipfs-cluster
## Q1 2017
This quarter is going to be focused on bringing ipfs-cluster to life as a usable product in the IPFS ecosystem. That means:
* It should be hard to crash
* It shouldn't lose track of content
* It should play well with go-ipfs
* It should support a replication-factor
* It should be very well tested
* It should be very easy to setup and manage
* It should have stable APIs
On these lines, there are several endeavours which stand out for themselves and are officially part of the general IPFS Roadmaps:
* Dynamically add and remove cluster peers in an easy fashion (https://github.com/ipfs/pm/issues/353)
This involves easily adding a peer (or removing) from a running cluster. `ipfs-cluster-service peer add <maddress>` should work and should update the peer set of all components of all peers, along with their configurations.
* Replication-factor-based pinning strategy (https://github.com/ipfs/pm/issues/353)
This involves being able to pin an item in, say, 2 nodes only. Cluster should re-pin whenever an item is found to be underpinned, which means that monitoring of pinsets must exist and be automated.
* Tests (https://github.com/ipfs/pm/issues/360)
In the context of the Interplanetary Test Lab, there should be tests end to end tests in which cluster is tested, benchmarked along with IPFS.

View File

@ -1,144 +0,0 @@
# IPFS Cluster architecture
## Definitions
These definitions are still evolving and may change:
* Peer: an ipfs cluster service node which is member of the consensus. Alternatively: node, member.
* ipfs-cluster: the IPFS cluster software
* ipfs-cluster-ctl: the IPFS cluster client command line application
* ipfs-cluster-service: the IPFS cluster node application
* API: the REST-ish API implemented by the RESTAPI component and used by the clients.
* RPC API: the internal API that cluster peers and components use.
* Go API: the public interface offered by the Cluster object in Go.
* Component: an ipfs-cluster module which performs specific functions and uses the RPC API to communicate with other parts of the system. Implements the Component interface.
* Consensus: The Consensus component, specifically Raft.
## General overview
### Modularity
`ipfs-cluster` architecture attempts to be as modular as possible, with interfaces between its modules (components) clearly defined, in a way that they can:
* Be swapped for alternative implementations, improved implementations, separately without affecting the rest of the system
* Be easily tested in isolation
### Components
`ipfs-cluster` consists of:
* The definitions of components and their interfaces and related types (`ipfscluster.go`)
* The **Cluster** main-component which binds together the whole system and offers the Go API (`cluster.go`). This component takes an arbitrary:
* **API**: a component which offers a public facing API. Default: `RESTAPI`
* **IPFSConnector**: a component which talks to the IPFS daemon and provides a proxy to it. Default: `ipfshttp`
* **State**: a component which keeps a list of Pins (maintained by the Consensus component). Default: `mapstate`
* **PinTracker**: a component which tracks the pin set, makes sure that it is persisted by IPFS daemon as intended. Default: `maptracker`
* **PeerMonitor**: a component to log metrics and detect peer failures. Default: `basic`
* **PinAllocator**: a component to decide which peers should pin a CID given some metrics. Default: `descendalloc`
* **Informer**: a component to collect metrics which are used by the `PinAllocator` and the `PeerMonitor`. Default: `disk`
* The **Consensus** component. This component is separate but internal to Cluster in the sense that it cannot be provided arbitrarily during initialization. The consensus component uses `go-libp2p-raft` via `go-libp2p-consensus`. While it is attempted to be agnostic from the underlying consensus implementation, it is not possible in all places. These places are however well marked (everything that calls `Leader()`).
Components perform a number of functions and need to be able to communicate with eachothers: i.e.:
* the API needs to use functionality provided by the main component
* the PinTracker needs to use functionality provided by the IPFSConnector
* the main component needs to use functionality provided by the main component of different peers
### RPC API
Communication between components happens through the RPC API: a set of functions which stablishes which functions are available to components (`rpc_api.go`).
The RPC API uses `go-libp2p-gorpc`. The main Cluster component runs an RPC server. RPC Clients are provided to all components for their use. The main feature of this setup is that **Components can use `go-libp2p-gorpc` to perform operations in the local cluster and in any remote cluster node using the same API**.
This makes broadcasting operations and contacting the Cluster leader really easy. It also allows to think of a future where components may be completely arbitrary and run from different applications. Local RPC calls, on their side, do not suffer any penalty as the execution is short-cut directly to the correspondant component of the Cluster, without network intervention.
On the down-side, the RPC API involves "reflect" magic and it is not easy to verify that a call happens to a method registered on the RPC server. Every RPC-based functionality should be tested. Bad operations will result in errors so they are easy to catch on tests.
### Code layout
Components are organized in different submodules (i.e. `pintracker/maptracker` represents component `PinTracker` and implementation `MapPinTracker`). Interfaces for all components are on the base module. Executables (`ipfs-cluster-service` and `ipfs-cluster-ctl` are also submodules to the base module).
### Configuration
A `config` module provides support for a central configuration file which provides configuration sections defined by each component by providing configuration objects which implement a `ComponentConfig` interface.
## Applications
### `ipfs-cluster-service`
This is the service application of IPFS Cluster. It brings up a cluster, connects to other peers, gets the latest consensus state and participates in cluster. Handles clean shutdowns when receiving Interrupts.
### `ipfs-cluster-ctl`
This is the client/control application of IPFS Cluster. It is a command line interface which uses the REST API to communicate with Cluster.
## How does it work?
This sections gives an overview of how some things work in Cluster. Doubts? Something missing? Open an issue and ask!
### Startup
* Initialize the P2P host.
* Initialize Consensus: start looking for a leader asap.
* Setup RPC in all componenets: allow them to communicate with different parts of the cluster.
* Bootstrap: if we are bootstrapping from another node, do the dance (contact, receive cluster peers, join consensus)
* Cluster is up, but it is only `Ready()` when consensus is ready (which in this case means it has found a leader).
* All components are doing its thing.
Consensus startup deserves a note:
* Raft is setup and wrapped with `go-libp2p-consensus`.
* Waits until there is a leader
* Waits until the state is fully synced (compare raft last log index with current)
* Triggers a `SyncState` to the `PinTracker`, which means that we have recovered the full state of the system and the pin tracker should
keep tabs on the Pins in that state (we don't wait for completion of this operation).
* Consensus is then ready.
If the above procedures don't work, Cluster might just itself down (i.e. if a leader is not found after a while, we automatically shutdown).
### Pinning
* If it's an API request, it involves an RPC request to the cluster main component.
* `Cluster.Pin()`
* We find some peers to pin (`Allocate`) using the metrics from the PeerMonitor and the `PinAllocator`.
* We have a CID and some allocations: time to make a log entry in the consensus log.
* The consensus component will forward this to the Raft leader, as only the leader can make log entries.
* The Raft leader makes a log entry with a `LogOp` that says "pin this CID with this allocations".
* Every peer receives the operation. Per `ConsensusOpLog`, the `Apply(state)` method is triggered. Every peer modifies the `State` accordingly to this operation. The state keeps the full map of CID/Allocations and is only updated via log operations.
* The Apply method triggers a `Track` RPC request to the `PinTracker` in each peer.
* The pin tracker now knows that it needs to track a certain CID.
* If the CID is allocated to the peer, the `PinTracker` will trigger an RPC request to the `IPFSConnector` asking it to pin the CID.
* If the CID is allocated to different peers, the `PinTracker` will mark is as `remote`.
* Now the content is officially pinned.
Notes:
* the log operation should never fail. It has no real places to fail. The calls to the `PinTracker` are asynchronous and a side effect of the state modification.
* the `PinTracker` also should not fail to track anything, BUT
* the `IPFSConnector` might fail to pin something. In this case pins are marked as `pin_error` in the pin tracker.
* the `MapPinTracker` (current implementation of `PinTracker`) queues pin requests so they happen one by one to IPFS. In the meantime things are in `pinning` state.
### Adding a peer
* If it's via an API request, it involves an RPC request to the cluster main component.
* `Cluster.PeerAdd()`
* Figure out the real multiaddress for that peer (the one we see).
* Broadcast the address to every cluster peer and add it to the host's libp2p peerstore. This gives each member of the cluster the ability to perform RPC requests to that peer.
* Figure out our multiaddress in regard to the new peer (the one it sees to connect to us). This is done with an RPC request and it also
ensures that the peer is up and reachable.
* Add the new peer to the Consensus component. This operation gets forwarded to the leader. Internally, Raft commits a configuration change to the log which contains a new peerset. Every peer gets the new peerset, including the new peer.
* Send the list of peer multiaddresses to the new peer so it host knows how to reach them. This is a remote RPC request to the new peers' `PeerManager.ImportAddresses`
We use the Consensus' (or rather Raft's) internal peerset as source of truth for the current cluster peers. This is just a list of peer IDs. The associated multiaddresses for those peers are broadcasted to every member. This implies that peer additions are expected to happen on healthy cluster where all peers can learn about the new peer's multiaddresses.
Notes:
* The `Join()` (used for bootstrapping) is just a `PeerAdd()` operation triggered remotely by an RPC request from the joining peer.
## Legacy illustrations
See: https://ipfs.io/ipfs/QmWhbV7KX9toZbi4ycj6J9GVbTVvzGx5ERffc6ymYLT5HS
They need to be updated but they are mostly accurate.

View File

@ -1,56 +0,0 @@
# Contribute
This document gathers a few contributing guidelines for ipfs-cluster. We attempt to go to the point and invite the readers eager for more details
to make themselves familiar with:
* The [go-ipfs contributing guidelines](https://github.com/ipfs/go-ipfs/blob/master/contribute.md) and builds upon:
* The [IPFS Community Code of Conduct](https://github.com/ipfs/community/blob/master/code-of-conduct.md)
* The [IFPS community contributing notes](https://github.com/ipfs/community/blob/master/contributing.md)
* The [Go contribution guidelines](https://github.com/ipfs/community/blob/master/go-contribution-guidelines.md)
## General Guidelines
To check what's going on in the project, check:
- the [changelog](CHANGELOG.md)
- the [Captain's Log](CAPTAIN.LOG.md)
- the [Waffle board](https://waffle.io/ipfs/ipfs-cluster)
- the [Roadmap](ROADMAP.md)
- the [upcoming release issues](https://github.com/ipfs/ipfs-cluster/issues?q=label%3Arelease)
If you need help:
- open an issue
- ask on the `#ipfs-cluster` IRC channel (Freenode)
## Code contribution guidelines
* ipfs-cluster uses the MIT license.
* All contributions are via Pull Request, which needs a Code Review approval from one of the project collaborators.
* Tests must pass
* Code coverage must be stable or increase
* We prefer meaningul branch names: `feat/`, `fix/`...
* We prefer commit messages which reference an issue `fix #999: ...`
* The commit message should end with the following trailers :
```
License: MIT
Signed-off-by: User Name <email@address>
```
where "User Name" is the author's real (legal) name and
email@address is one of the author's valid email addresses.
These trailers mean that the author agrees with the
[developer certificate of origin](docs/developer-certificate-of-origin)
and with licensing the work under the [MIT license](docs/LICENSE).
To help you automatically add these trailers, you can run the
[setup_commit_msg_hook.sh](https://raw.githubusercontent.com/ipfs/community/master/dev/hooks/setup_commit_msg_hook.sh)
script which will setup a Git commit-msg hook that will add the above
trailers to all the commit messages you write.
These are just guidelines. We are friendly people and are happy to help :)

View File

@ -1,117 +0,0 @@
# Building and updating an IPFS Cluster
### Step 0: Run your first cluster node
This step creates a single-node IPFS Cluster.
First, create a secret that we will use for all cluster peers:
```
node0 $ export CLUSTER_SECRET=$(od -vN 32 -An -tx1 /dev/urandom | tr -d ' \n')
node0 $ echo $CLUSTER_SECRET
9a420ec947512b8836d8eb46e1c56fdb746ab8a78015b9821e6b46b38344038f
```
Second initialize the configuration (see [the **Initialization** section of the
`ipfs-cluster-service` README](../ipfs-cluster-service/dist/README.md#initialization) for more details).
```
node0 $ ipfs-cluster-service init
INFO config: Saving configuration config.go:311
ipfs-cluster-service configuration written to /home/hector/.ipfs-cluster/service.json
```
Then run cluster:
```
node0> ipfs-cluster-service
INFO cluster: IPFS Cluster v0.3.0 listening on: cluster.go:91
INFO cluster: /ip4/127.0.0.1/tcp/9096/ipfs/QmZjSoXUQgJ9tutP1rXjjNYwTrRM9QPhmD9GHVjbtgWxEn cluster.go:93
INFO cluster: /ip4/192.168.1.2/tcp/9096/ipfs/QmZjSoXUQgJ9tutP1rXjjNYwTrRM9QPhmD9GHVjbtgWxEn cluster.go:93
INFO consensus: starting Consensus and waiting for a leader... consensus.go:61
INFO consensus: bootstrapping raft cluster raft.go:108
INFO restapi: REST API: /ip4/127.0.0.1/tcp/9094 restapi.go:266
INFO ipfshttp: IPFS Proxy: /ip4/127.0.0.1/tcp/9095 -> /ip4/127.0.0.1/tcp/5001 ipfshttp.go:159
INFO consensus: Raft Leader elected: QmZjSoXUQgJ9tutP1rXjjNYwTrRM9QPhmD9GHVjbtgWxEn raft.go:261
INFO consensus: Raft state is catching up raft.go:273
INFO consensus: Consensus state is up to date consensus.go:116
INFO cluster: Cluster Peers (without including ourselves): cluster.go:450
INFO cluster: - No other peers cluster.go:452
INFO cluster: IPFS Cluster is ready cluster.go:461
```
### Step 1: Add new members to the cluster
Initialize and run cluster in a different node(s), bootstrapping them to the first node:
```
node1> export CLUSTER_SECRET=<copy from node0>
node1> ipfs-cluster-service init
INFO config: Saving configuration config.go:311
ipfs-cluster-service configuration written to /home/hector/.ipfs-cluster/service.json
node1> ipfs-cluster-service --bootstrap /ip4/192.168.1.2/tcp/9096/ipfs/QmZjSoXUQgJ9tutP1rXjjNYwTrRM9QPhmD9GHVjbtgWxEn
INFO cluster: IPFS Cluster v0.3.0 listening on: cluster.go:91
INFO cluster: /ip4/127.0.0.1/tcp/10096/ipfs/QmYFYwnFUkjFhJcSJJGN72wwedZnpQQ4aNpAtPZt8g5fCd cluster.go:93
INFO cluster: /ip4/192.168.1.3/tcp/10096/ipfs/QmYFYwnFUkjFhJcSJJGN72wwedZnpQQ4aNpAtPZt8g5fCd cluster.go:93
INFO consensus: starting Consensus and waiting for a leader... consensus.go:61
INFO consensus: bootstrapping raft cluster raft.go:108
INFO cluster: Bootstrapping to /ip4/127.0.0.1/tcp/9096/ipfs/QmZjSoXUQgJ9tutP1rXjjNYwTrRM9QPhmD9GHVjbtgWxEn cluster.go:471
INFO restapi: REST API: /ip4/127.0.0.1/tcp/10094 restapi.go:266
INFO ipfshttp: IPFS Proxy: /ip4/127.0.0.1/tcp/10095 -> /ip4/127.0.0.1/tcp/5001 ipfshttp.go:159
INFO consensus: Raft Leader elected: QmZjSoXUQgJ9tutP1rXjjNYwTrRM9QPhmD9GHVjbtgWxEn raft.go:261
INFO consensus: Raft state is catching up raft.go:273
INFO consensus: Consensus state is up to date consensus.go:116
INFO consensus: Raft Leader elected: QmZjSoXUQgJ9tutP1rXjjNYwTrRM9QPhmD9GHVjbtgWxEn raft.go:261
INFO consensus: Raft state is catching up raft.go:273
INFO cluster: QmYFYwnFUkjFhJcSJJGN72wwedZnpQQ4aNpAtPZt8g5fCd: joined QmZjSoXUQgJ9tutP1rXjjNYwTrRM9QPhmD9GHVjbtgWxEn's cluster cluster.go:777
INFO cluster: Cluster Peers (without including ourselves): cluster.go:450
INFO cluster: - QmZjSoXUQgJ9tutP1rXjjNYwTrRM9QPhmD9GHVjbtgWxEn cluster.go:456
INFO cluster: IPFS Cluster is ready cluster.go:461
INFO config: Saving configuration config.go:311
```
You can repeat the process with any other nodes.
You can check the current list of cluster peers and see it shows 2 peers:
```
node1 > ipfs-cluster-ctl peers ls
QmYFYwnFUkjFhJcSJJGN72wwedZnpQQ4aNpAtPZt8g5fCd | Sees 1 other peers
> Addresses:
- /ip4/127.0.0.1/tcp/10096/ipfs/QmYFYwnFUkjFhJcSJJGN72wwedZnpQQ4aNpAtPZt8g5fCd
- /ip4/192.168.1.3/tcp/10096/ipfs/QmYFYwnFUkjFhJcSJJGN72wwedZnpQQ4aNpAtPZt8g5fCd
> IPFS: Qmaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
- /ip4/127.0.0.1/tcp/4001/ipfs/Qmaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
- /ip4/192.168.1.3/tcp/4001/ipfs/Qmaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
QmZjSoXUQgJ9tutP1rXjjNYwTrRM9QPhmD9GHVjbtgWxEn | Sees 1 other peers
> Addresses:
- /ip4/127.0.0.1/tcp/9096/ipfs/QmZjSoXUQgJ9tutP1rXjjNYwTrRM9QPhmD9GHVjbtgWxEn
- /ip4/192.168.1.2/tcp/9096/ipfs/QmZjSoXUQgJ9tutP1rXjjNYwTrRM9QPhmD9GHVjbtgWxEn
> IPFS: Qmbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
- /ip4/127.0.0.1/tcp/4001/ipfs/Qmbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
- /ip4/192.168.1.2/tcp/4001/ipfs/Qmbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
```
### Step 2: Remove no longer needed nodes
You can use `ipfs-cluster-ctl peers rm <peer_id>` to remove and disconnect any nodes from your cluster. The nodes will be automatically
shutdown. They can be restarted manually and re-added to the Cluster any time:
```
node0> ipfs-cluster-ctl peers rm QmYFYwnFUkjFhJcSJJGN72wwedZnpQQ4aNpAtPZt8g5fCd
```
The `node1` is then disconnected and shuts down, as its logs show:
```
INFO config: Saving configuration config.go:311
INFO cluster: this peer has been removed and will shutdown cluster.go:386
INFO cluster: shutting down Cluster cluster.go:498
INFO consensus: stopping Consensus component consensus.go:159
INFO consensus: consensus data cleaned consensus.go:400
INFO monitor: stopping Monitor peer_monitor.go:161
INFO restapi: stopping Cluster API restapi.go:284
INFO ipfshttp: stopping IPFS Proxy ipfshttp.go:534
INFO pintracker: stopping MapPinTracker maptracker.go:116
INFO config: Saving configuration config.go:311
```

View File

@ -1,182 +0,0 @@
# Introduction
This document is a Request For Comments outlining a proposal for supporting sharded DAGs in ipfs-cluster, it outlines motivations for sharding DAGs across the nodes of an ipfs-cluster, and proposes a development path to achieve the motivated goals.
# Motivation
There are two primary motivations for adding data sharding to ipfs-cluster. See the WIP use cases documents in PR #215, issue #212, and posts on the discuss.ipfs forum (ex: https://discuss.ipfs.io/t/ubuntu-archive-on-top-of-ipfs/1579 and https://discuss.ipfs.io/t/segmentation-file-in-ipfs-cluster/1465/2 and https://discuss.ipfs.io/t/when-file-upload-to-node-that-can-ipfs-segmentation-file-to-many-nodes/1451) for some more context.
## Motivation 1: store files too big for a single node
This is one of the most requested features for ipfs-cluster. ipfs-pack (https://github.com/ipfs/ipfs-pack) exists to fill a similar need by allowing data stored in a POSIX filesystem to be referenced from an ipfs-node's datastore without acutally copying all of the data into the node's repo. However certain use cases require functionality beyond ipfs-pack. For example, what if a cluster of nodes needs to download a large file bigger than any of its nodes' individual machines' disk size? In this case using ipfs-pack is not enough, such massive files would not fit on a single node's storage device and coordination between nodes to store the file becomes essential.
Storing large files is a feature that would serve many use cases and the more general ability to store large ipld DAGs of arbitrary structure has the potential to support others. As an example consider splitting a massive cryptocurrency ledger or git tree among the nodes of an ipfs cluster.
## Motivation 2: store files with different replication strategies for fault tolerance
This feature is also frequently requested, including in the original thread leading to the creation of ipfs-cluster (See issue #1), and in @raptortech-js 's thorough discussion in issue #9. The idea is to use sharding to incorporate space-efficient replication techniques to provide better fault tolerance of storage. This would open up ipfs-cluster to new usage patterns, especially storing archives with infrequent updates and long storage periods.
Again, while files storage will benefit from efficient, fault tolerant encodings, these properties are potentially also quite useful for storing arbitrary merkle DAGs.
# Background
To add a file ipfs must "import" it. ipfs chunks the file's raw bytes and builds a merkle DAG out of these chunks to organize the data for storage in the ipfs repo. The format of the merkle DAG nodes, the way the chunks are produced and the layout of the DAG depend on configurable strategies. Usually ipfs represents the data added as a tree which mimics a unix-style hierarchy so that the data can be directly translated into a unix-like filesystem representation.
Regardless of the chunking or layout strategy, importing a file can be viewed abstractly as a process that takes in a stream of data and outputs a stream of DAG nodes, or more precisely "blocks", which is the term for the representation of DAG nodes stored on disk. As blocks represent DAG nodes they contain links to other blocks. Together these blocks and their links determine the structure of the DAG built by the importing process. Furthermore these blocks are content-addressed by ipfs, which resolves blocks by these addresses upon user request. Although the current imorting process adds all blocks corresponding to a file to a single ipfs node, this is not important for preserving the DAG structure. ipfs nodes advertise the address of every block added to their storage repo. If a DAG's blocks exist across multiple ipfs peers the individual blocks can readily be discovered by any peer and the information the blocks carry can be put back together. This location flexibility makes partitioning a file's data at the block level an attractive mechanism for implementing sharding in ipfs-cluster.
Therefore we propose that sharding in ipfs-cluster amounts to allocating a stream of blocks among different ipfs nodes. Cluster should aim to do so:
1. Efficiently: best utilizing the resources available in a cluster as provided by the underlying ipfs nodes
2. Lightly: the cluster state should not carry more information than relevant to cluster, e.g. no more than is relevant for coordinating allocation of collections of blocks
3. Cleverly: the allocation of blocks might benefit from the information provided by the DAG layout or ipld-format of the DAG nodes. For example cluster should eventually support grouping together logical subgraphs of the DAG.
On a first approach we aim for a simple layout and format-agnostic strategy which simply groups blocks output from the importing process into a size-based unit, from here on a "shard", that fits within a single ipfs node's storage repo.
## Implementation
### Overview
We propose that cluster organizes sharding by pinning a management "Cluster DAG" defines links to groups of sharded data. We also propose a sharding pipeline for ingesting data, constructing a DAG, and sharding across cluster nodes. To allow for ingestion of datasets too large for any one ipfs node's repo we propose a pipeline that acts on a stream of data, forwarding blocks across the cluster while reading from the input stream to avoid overloading local resources. First the data passes through the HTTP api of the adding cluster peer. Here it is dispatched through an importing process to transform raw data into the blocks of a dag. Finally these blocks are used to create the Cluster DAG and the data of both graphs is distributed among the nodes of the cluster. Note that our goal of supporting large files implies that we cannot simply hand off data to the local ipfs daemon for importing, in general the data will not fit in the repo and AFAIK streaming blocks out of an ipfs daemon endpoint during importing is not something ipfs readily supports. The Cluster DAG's structure and the ingestiong pipeline steps are explained in further detail in the following sections.
### Cluster DAG
We propose to pin two dags in ipfs-cluster for every sharded file. The first is the ipfs unixfs dag encoded in the blocks output by the importing process. Additionally ipfs-cluster builds the Cluster DAG specifically built to track and pin the blocks of the file dag in a way that groups data into shards. Each shard is pinned by (at least) one cluster peer. The graph of shards is organized in a 3 level tree. The first level is the cluster-dag root, the second represents the shards into which data is divided with at least one shard per cluster peer. The third level lists all the cids of blocks of the file dag included in a shard:
```
The root would be like:
{
"shards" : [
{"/": <shard1>},
{"/": <shard2>},
...
]
}
where each shard looks like:
{
"blocks" : [
{"/": <block1>},
{"/": <block2>},
...
]
}
```
Each cluster node recursively pins a shard-node of the cluster-dag, ensuring that the blocks of file data referenced underneath the shard are pinned by that node. The index of shards, i.e. the root, can be (non-recursively) pinned on any one/all cluster nodes.
With this implementation the cluster can provide the entire original ipfs file dag on request. For example if an ipfs peer queries for the entire file they first resolve the root node of the dag which must be pinned on some shard. If the root has child links then ipfs peers in the cluster are guaranteed to resolve them with content, as all blocks are pinned somewhere in the cluster as children of a shard-node in the cluster-dag. Note that this implementation conforms well to the goal of coordinating "lightly" above because the cluster state need only keep track of 1 cid per shard. This prevents the state size from growing too large or more complex than necessary for the purpose of allocation.
The state format would change slightly to account for linking together the root cid of an ipfs file dag and the cluster-dag pinning its leaves
```
"<cid>": {
"name": <pinname>,
"cid": <cid>,
"clusterdag": <cidcluster>,
"allocations" : []
}
"<shard1-cid>": {
"name": "<pinname>-shard1",
"cid": <shard1-cid>,
"clusterdag": nil,
"allocations": [cluster peers]
}
....
```
Under this implementation replication of shards proceeds like replication of an ordinary pinned cid. As a consequence shards can be pinned with a replication factor.
### User APIs
#### Importing files for sharding
Data sharding must begin with a user facing endpoint. Ideally cluster could stream files, or other data to be imported, across its HTTP API. This way `ipfs-cluster-ctl` could launch the command for adding and sharding data in the usual way, and users also have the option of hitting the HTTP endpoint over the network. Multiple files are typically packaged in an HTTP request as the parts of a `Content-Type: multi-part` request and HTTP achieves streaming with `Transfer-Type: chunked`. Because golang http and mime/multipart [support chunked (a.k.a. streamed) transfer of multipart messages](https://gist.github.com/ZenGround0/49e4a1aa126736f966a1dfdcb84abdae), building the user endpoint into the HTTP API should be relatively straightforward.
The golang multipart-reader exposes a reader to each part of the body in succession. When the part's body is read completely, a reader to the next part's body can be generated by the multipart-reader. The part meta-data (e.g. this part is a directory, that part is a file) and the data read from the part, must then be passed to the importer stage of the pipeline. Our proposal is to call an external library, with a stream of data and metadata as input, from within the HTTP API to move to this next stage.
As this endpoint will be used for streaming large amounts of data it has the potential to be somewhat difficult to use as system limits could be pushed and errors could occur after annoyingly long time investments. Some questions that may come up as usability becomes a bigger concern:
* Is it possible that remote users pushing data to the endpoint could overload cluster's resources by pushing at a rate faster than processing can occur in cluster? If this is an issue how can the endpoint maintain connections in a resource-aware non-degrading way? One potential solution we are considering is to provide the HTTP API over libp2p, which is built for situations like this.
* What features can cluster provide to make the process easier? For example if cluster could report an estimate aggregate of all of the space in its nodes' repos then users could check that a huge file will fit on the cluster before importing it.
* How should we handle the case where the sharding process fails part way through leaving behind pinned blocks in nodes across the cluster? Could we have a built in mechanism for attempting cleanup after a failure is detected? Could we provide a command for launching such a cleanup?
* After implementing an HTTP API endpoint should we build out other methods of ingesting data such as pulling from a filesystem device or location address over the network?
#### Sharding existing DAGs
An endpoint should also exist for sharding an existing DAG among an ipfs-cluster. A user will run something like `ipfs-cluster-ctl pin add --shard <cid>` and pin shards of the DAG rooted at `cid` across different nodes. The importing step of the pipeline can be skipped in this case. The naive implementation is to interact with the local daemon's DAG API to retrieve a stream of blocks as input to the sharding component. While this implementation would still be quite useful for distributing DAGs shards from a node with a large amount of repo storage across a cluster, we currently suspect that this is not going to work well for large DAGs that cannot fit in the node's repo. The "Future Work" subsection: "sharding large existing DAGs" discusses some possibilities for scaling this to large DAGs on nodes of arbitrary repo size.
### Importer module
As they are currently implemented in ipfs, importers run in three main phases. The importer first begins by traversing a collection of data and metadata that can be interpreted as files and directories. For example the tar importer reads data from a tar archive and the file importer used by `ipfs add` iterates over a collection of file objects. Files are then broken into chunks with a chunker. These chunks are then used as the data to make the dag-nodes constructed by the dag layout building process. The dag-nodes are serialized as blocks and saved to the repo.
Importing is currently a stream friendly process. The objects traversed in the first stage are typically all wrappers around a reader of some kind and so the data and metadata (AFAIK) can be streamed in. Existing chunkers read from their reader as needed when `NextBytes()` is called to get the next chunk. The layout process creates in-memory DAG nodes derived from the leaf nodes, themselves thin wrappers around chunk data. Nodes are flushed to the dagservice's `Batch()` process as soon as their parent links are established, and care is taken to avoid leaving references in memory so the golang garbage collector can reclaim the memory. The balanced layout builder makes a balanced tree and so the memory high water mark is logN for a DAG of N nodes (I will have to investigate trickle's potential for streaming but my guess right now is it's good too). Furthermore the dagservice batching collects a group of nodes only up to a threshold before forwarding them to their destination in the dagservice's blockstore. Our least intrusive option is to create a "streaming blockstore" for the dagservice used in the importing process's dagbuilder helper for forwarding the imported blocks to the final stage of the pipeline.
As it stands it is not clear that the existing unix files version of the first stage (go-ipfs/core/coreunix/add.go) is particularly streaming friendly as at a glance its traversal structure implies that all of the files of a directory exist while processing the directory. Additionally there are a fair amount of non-importer specific ipfs dependencies used in the current implementation (most strikingly mfs) and we should evaluate how much of this we can live without. We are currently expecting to move this stage of the pipeline to an importers github repo, allowing the modified importing process to be called as a library function from the HTTP API. Perhaps this will be a step forward for the "dex" project to boot. The tar format is a good example moving forward of how we might rewrite the files importer, as it's input is a logical stream of readers bearing many similarities to the multipart reader interface that will serve as input to the cluster importer.
The format we use to encode file metadata in HTTP multipart requests can be mimicked from ipfs and should be standardized as much as possible for intuitive use of the API endpoint.
Designing a streaming blockstore for outputting block nodes begs the question of how the blocks will be transferred to the cluster sharding component. Multiple methods exist for streaming blocks out and we will need to investigate our desired assumptions and performance goals further to make a decision here. Note that in addition to transferring data over rpc or directly over a libp2p stream muxer we have another option: pass data to the sharding cluster component by instantiating the sharding functionality as a component of the blockstore used in the importer.
### Sharding cluster component
At a high level the sharding component takes in a stream of dag-node blocks, maintains the necessary data to build a shard node of the Cluster DAG, and forwards the stream of blocks to a peer for pinning. When blocks arrive the sharding process records the cids which are needed for constructing links between the shard node and the blocks of the DAG being sharded. This is all the sharding component must do before forwarding blocks to other peers. The peer receiving the shard is calculated once by querying the allocator. The sharding component must be aware of how much space peers have for storing shards and will maintain state for determining when a shard is complete and the receiving peer needs to be recalculated. The actual forwarding can be accomplished by periodically calling an RPC that runs DAG API `put` commands with processed blocks on the target peer. After all the blocks of a shard are transmitted the relevant sub-graph of the Cluster DAG is constructed and sent as a final transmission to the allocated peer. Upon receiving this final transmission the allocated peer recursively pins the node of the Cluster DAG that it receives, ensuring that all blocks of the shard stay in its ipfs repo. When all shards are constructed and allocated the final root node pointing to all shard node is pinned (non-recursively) throughout the cluster.
One remaining challenge to work out is preventing garbage collection in the receiving node from removing the dag blocks streamed and added between the first and final transmission. We will either need to find a way to disable gc over the ipfs API (AFAIK this is not currently supported), or work with the potentially expensive workaround of pinning every block as it arrives, recursively pinning the shard node at the end of transmission, and then unpinning the individual blocks.
# Future work
## Fault tolerance
Disclaimer: this needs more work and that work will go into its own RFC. This section provides a basis upon which we can build. It is included to demonstrate that the current sharding model works well for implementing this important extension. We will bring this effort back into focus once the prerequisite basic sharding discussed above is implemented.
### Background reading
This [report on RS coding for fault tolerance in RAID-like Systems by Plank](https://web.eecs.utk.edu/~plank/plank/papers/CS-96-332.pdf) is a very straightforward and practical guide. It has been helpful in planning out how to approach fault tolerance in cluster and will be very helpful when we actually implement. It has an excellent description of how to implement the algorithm that calculates the code from data, and how to implement recovery. Furthermore one of his example use case studies include algorithms for initialization of the checksum values that will be useful when replicating very large files.
### Proposed implementation
#### Overview
The general idea of fault tolerant storage is to store your data in such a way that if several machines go down all of your data can be reconstructed. Of course you could simply store all of your data on every machine in your cluster, but many clever approaches use data sharding and techniques like erasure codes to achieve the same result with fewer bits stored. A standard approach is to store shards of data across multiple data devices and then store some kind of checksum information in separate, checksum devices. It is a simple matter to extend the basic sharding implementation above to work well in this paradigm. When storing a file in a fault tolerant configuration ipfs-cluster, as in basic sharding, will store the ipfs files DAG without its leaves and an cluster-DAG. However now the cluster-DAG has additional shards not referencing the leaves of the ipfs files DAG, but rather to checksum data taken over all the file's data. For an m out of n encoding:
```
The root would be like:
{
"shards" : [
{"/": <shard1>},
{"/": <shard2>},
...
{"/": <shardN>},
]
"checksums" : [
{"/": <chksum1>},
{"/": <chksum2>},
...
{"/": <chksumM>},
}
```
While there are several RAID modes using different configurations of erasure codes and data to checksum device ratios, my opinion is that we probably can ignore most of these as using m,n RS coding is superior in terms of space efficiency for fault tolerance gained. However different RAID modes have different time efficiency properties in their original setting anyway. It is unclear if implementing something (maybe) more time efficient but less space efficient and fault tolerant than RS has much value in ipfs-cluster. I lean towards no but I should investigate further. (TODO -- answer these questions, any counter example use cases? any big gains for using other RAID modes that help these use cases?) On another note in Feb 2019 the tornado code patent is set to expire (\o/) and we could check back in then and look into the feasibility of using (perhaps implementing if no OSS exists yet?!?) tornado codes (which are faster). There are others we'll want to check the legal/implementation situation for (biff codes) so pluggability is important.
Overall this is pretty cool for users because the original DAG (recall how basic sharding works) and the original data exist within the cluster. This way users can query any cid from the original DAG and the cluster ipfs nodes will seamlessly provide it, all while silently and with very efficient overhead they are protecting the data from a potentially large number of peer faults.
We have some options for allowing users to specify this mode. It could be a cluster specific flag to the "add" endpoint or a config option setting a cluster wide default.
#### Importing with checksums
If memory/repo size constraints are not a limiting factor it should be straightforward for the cluster-DAG importer running on the adding node to keep a running tally of the checksum values and then allocate them to nodes after getting every data shard pinned. Note this claim is specific to RS as the coding calculations are simple linear combinations of operations and everything commutes, while I wouldn't be surprised if potential future codes also had this property it is something we'll need to check up on once we get serious about pluggability.
If we are in a situation where shards approach the size of ipfs repo or working memory then we can gather inspiration from the report by Plank, specifically the section "Implementation and Performance Details: Checkpointing Systems". In this section Plank outlines two algorithms for setting checksum values after the data shards are already stored by sending messages over the network. From my first read-through the broadcast algorithm looks the most promising. This algorithm would allow cluster to send shards one at a time to the peer holding a zeroed out checksum shard and then perform successive updates to calculate the checksums, rather than requiring that the cluster-DAG importer hold the one shard being filled up for pinning alongside the m checksum shards being calculated.
## Clever sharding
Future work on sharding should aim to keep parts of the DAG that are typically fetched together within the same shard. This feature should be able to take advantage of particular chunking and layout algorithms, for example grouping together subDAGs representing packages when importing a file as a linux package archive. It would also be nice to have some techniques, possibly pluggable, available for intelligently grouping blocks of an arbitrary DAG based on the DAG structure.
## Sharding large existing DAGs
AFAIK this would require new features in go-ipfs. Two approaches are apparent.
If ipfs-cluster is to stream DAG nodes from its local daemon to create a Cluster DAG, the ipfs api would need to include an endpoint that provides a stream of a DAG's blocks without over-committing the daemon's resources (i.e. not downloading the whole DAG first if the repo is too small). One relevant effort is the [CAR (Certified ARchives) project](https://github.com/ipld/specs/pull/51) which aims to define a format for storing DAGs. One of its goals as stated in a recent proposal is to explicitly allow users to "Traverse a large DAG streamed over HTTP without downloading the entire thing." Cluster could shard DAG nodes extracted from a streamed CAR file, either kept on disk or provided at a location on the network. CAR integration with go-ipfs could potentially allow resource aware streaming of DAG nodes over the ipfs daemon's api so that the sharding cluster peer need only know the DAG's root hash.
Another relevant effort is that of ipld selectors. ipld selectors are interesting because they are an orthogonal approach to the Cluster DAG for succinctly organizing large pinsets. We did not end up including them in the main proposal because adding large datasets necessitates importing data to a DAG anyway. In this case, from our perspective, building a management DAG for pinning and directly transferring blocks is simpler than adding a portion of the DAG to the local ipfs node, executing a remote selector pin, blocking on ipfs to do the transfer, and then clearing resources from the local node. However in the case that the dag is already on ipfs, pinning selectors for shards is a potential implementation strategy. The selectors would essentially act as the shard-nodes of the Cluster DAG, selecting shards of the DAG that could fit on individual nodes and getting pinned by the assigned nodes. At the very least this seems to require that ipld selectors can select based on sub-tree size. It is currently unclear how well matched ipld selectors would be for this use case.
# Work plan
There is a fair amount that needs to be built, updated and serviced to make all of this work. Here I am focusing on basic sharding, allowing support for files that wouldn't fit in an ipfs repo. Support for fault tolerance, more clever sharding strategies and sharding large existing DAGs will come after this is nailed down.
* Build up a version of the `importers` (aka `dagger` aka `dex`) project that suits cluster's needs (if possible keep it general enough for general use). In short we want a library that imports data to ipfs DAGs from arbitrary formats (tar, deb, unix files, etc) with arbitrary chunking and layout algorithms. For our use case it is important that the interfaces support taking in a stream of data and returning a stream of blocks or DAG nodes.
* Implement a refactored importers library (e.g. proto `dex`) to be imported and called by the HTTP API. It will need to provide an entry call that supports streaming and an output mechanism that can connect to the sharding component.
* Create a cluster add endpoint on the REST API that reads in file data on HTTP multipart requests and calls the cluster importer library entrypoint function. Listen on an output stream and put the output blocks into the local ipfs daemon and pin the root with cluster. This will require nailing down the APIs of these components and development of a garbage collection management strategy.
* Build out the sharding component, implementing creation of the Cluster DAG and pin blocks by way of pinning shards. This is still all confined to one cluster peer. This stage may require defining a cluster ipld-format (if we go that route).
* Update the state format so that pinned cids reference Cluster DAGs and so that the state differentiates between cids and shards.
* Build support for allocation of different Cluster DAG shard-nodes to different peers. This includes implementing RPC transmission of all chunks of a given shard to the allocated peer by the sharding component.
* Work to make the state format scale to large numbers of cids
* Add in usability features: don't make sharding default on add endpoint but trigger with --shard, maybe make configs that can set sharding as default or importer defaults, allow add to specify different chunking and importing algorithms like ipfs, ipfs add proxy endpoint can also take in --shard and then calls same functionality as add endpoint, add in tools useful for managing big downloads like reporting the total storage capacity of cluster or the expected download time.
* Testing should happen throughout and we should have a plan with regards to which tests we run in place early (maybe before we start). Eventually we will want a huge cluster in the cloud with a few TB of storage for ingesting huge files
* Move on beyond basic sharding and start design process again to support clever techniques for certain DAGs/usecases, sharding large existing DAGs, and fault tolerance

View File

@ -1,37 +0,0 @@
Developer Certificate of Origin
Version 1.1
Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129
Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.
Developer's Certificate of Origin 1.1
By making a contribution to this project, I certify that:
(a) The contribution was created in whole or in part by me and I
have the right to submit it under the open source license
indicated in the file; or
(b) The contribution is based upon previous work that, to the best
of my knowledge, is covered under an appropriate open source
license and I have the right under that license to submit that
work with modifications, whether created in whole or in part
by me, under the same open source license (unless I am
permitted to submit under a different license), as indicated
in the file; or
(c) The contribution was provided directly to me by some other
person who certified (a), (b) or (c) and I have not modified
it.
(d) I understand and agree that this project and the contribution
are public and that a record of the contribution (including all
personal information I submit with it, including my sign-off) is
maintained indefinitely and may be redistributed consistent with
this project or the open source license(s) involved.

View File

@ -1,440 +0,0 @@
# A guide to running ipfs-cluster
Revised for version `0.3.1`.
## Index
0. Introduction, definitions and other useful documentation
1. Installation and deployment
2. The configuration file
3. Starting your cluster peers
4. The consensus algorithm
5. The shared state, the local state and the ipfs state
6. Static cluster membership considerations
7. Dynamic cluster membership considerations
8. Pinning an item
9. Unpinning an item
10. Cluster monitoring and pin failover
11. Using the IPFS-proxy
12. Composite clusters
13. Security
14. Upgrading
15. Troubleshooting and getting help
## Introduction, definitions and other useful documentation
This guide aims to collect useful considerations and information for running an ipfs-cluster deployment. It provides extended information and expands on that provided by other docs, which are worth checking too:
* [The main README](../README.md)
* [The `ipfs-cluster-service` README](../ipfs-cluster-service/dist/README.md)
* [The `ipfs-cluster-ctl` README](../ipfs-cluster-ctl/dist/README.md)
* [The "How to Build and Update a Cluster" guide](HOWTO_build_and_update_a_cluster.md)
* [ipfs-cluster architecture overview](../architecture.md)
* [Godoc documentation for ipfs-cluster](https://godoc.org/github.com/ipfs/ipfs-cluster)
### Definitions
* ipfs-cluster: the software as a whole, usually referring to the peer which is run by `ipfs-cluster-service`.
* ipfs: the ipfs daemon, usually `go-ipfs`.
* peer: a member of the cluster.
* CID: Content IDentifier. The hash that identifies an ipfs object and which is pinned in cluster.
* Pin: A CID which is tracked by cluster and "pinned" in the underlying ipfs daemons.
* healty state: An ipfs-cluster is in healthy state when all peers are up.
## Installation and deployment
There are several ways to install ipfs-cluster. They are described in the main README and summarized here:
* 1. Download the repository and run `make install`.
* 2. Run the [docker ipfs/ipfs-cluster container](https://hub.docker.com/r/ipfs/ipfs-cluster/). The container runs `ipfs-cluster-service`.
* 3. Download pre-built binaries for your platform at [dist.ipfs.io](https://dist.ipfs.io). Note that we test on Linux and ARM. We're happy to hear if other platforms are working or not.
* 4. Install from the [snapcraft.io](https://snapcraft.io) store: `sudo snap install ipfs-cluster --edge`. Note that there is no stable snap yet.
You can deploy cluster in the way which fits you best, as both ipfs and ipfs-cluster have no dependencies. There are some [Ansible roles](https://github.com/hsanjuan/ansible-ipfs-cluster) available to help you.
We try to keep `master` stable. It will always have the latest bugfixes but it may also incorporate behaviour changes without warning. The pre-built binaries are only provided for tagged versions. They are stable builds to the best of our ability, but they may still contain serious bugs which have only been fixed in `master`. It is recommended to check the [`CHANGELOG`](../CHANGELOG.md) to get an overview of past changes, and any open [release-tagged issues](https://github.com/ipfs/ipfs-cluster/issues?utf8=%E2%9C%93&q=is%3Aissue%20label%3Arelease) to get an overview of what's coming in the next release.
## The configuration file
The ipfs-cluster configuration file is usually found at `~/.ipfs-cluster/service.json`. It holds all the configurable options for cluster and its different components. The configuration file is divided in sections. Each section represents a component. Each item inside the section represents an implementation of that component and contains specific options. For more information on components, check the [ipfs-cluster architecture overview](../architecture.md).
A default configuration file can be generated with `ipfs-cluster-service init`. It is recommended that you re-create the configuration file after an upgrade, to make sure that you are up to date with any new options. The `-c` option can be used to specify a different configuration folder path, and allows to create a default configuration in a temporary folder, which you can then compare with the existing one.
The `cluster` section of the configuration stores a `secret`: a 32 byte (hex-encoded) key which **must be shared by all cluster peers**. Using an empty key has security implications (see the Security section below). Using different keys will prevent different peers from talking to each other.
Each section of the configuration file and the options in it depend on their associated component. We offer here a quick reference of the configuration format:
```
{
"cluster": { // main cluster component configuration
"id": "QmZyXksFG3vmLdAnmkXreMVZvxc4sNi1u21VxbRdNa2S1b", // peer ID
"private_key": "<base64 representation of the key>",
"secret": "<32-bit hex encoded secret>",
"peers": [], // List of peers' multiaddresses
"bootstrap": [], // List of bootstrap peers' multiaddresses
"leave_on_shutdown": false, // Abandon cluster on shutdown
"listen_multiaddress": "/ip4/0.0.0.0/tcp/9096", // Cluster RPC listen
"state_sync_interval": "1m0s", // Time between state syncs
"ipfs_sync_interval": "2m10s", // Time between ipfs-state syncs
"replication_factor_min": -1, // Replication factor minimum threshold. -1 == all
"replication_factor_max": -1, // Replication factor maximum threshold. -1 == all
"monitor_ping_interval": "15s" // Time between alive-pings. See cluster monitoring section
},
"consensus": {
"raft": {
"wait_for_leader_timeout": "15s", // How long to wait for a leader when there is none
"network_timeout": "10s", // How long to wait before timing out a network operation
"commit_retries": 1, // How many retries should we make before giving up on a commit failure
"commit_retry_delay": "200ms", // How long to wait between commit retries
"heartbeat_timeout": "1s", // Here and below: Raft options.
"election_timeout": "1s", // See https://godoc.org/github.com/hashicorp/raft#Config
"commit_timeout": "50ms",
"max_append_entries": 64,
"trailing_logs": 10240,
"snapshot_interval": "2m0s",
"snapshot_threshold": 8192,
"leader_lease_timeout": "500ms"
}
},
"api": {
"restapi": {
"listen_multiaddress": "/ip4/127.0.0.1/tcp/9094", // API listen
"ssl_cert_file": "path_to_certificate", // Path to SSL public certificate. Unless absolute, relative to config folder
"ssl_key_file": "path_to_key", // Path to SSL private key. Unless absolute, relative to config folder
"read_timeout": "30s", // Here and below, timeoutes for network operations
"read_header_timeout": "5s",
"write_timeout": "1m0s",
"idle_timeout": "2m0s",
"basic_auth_credentials": { // Leave null for no-basic-auth
"user": "pass"
}
}
},
"ipfs_connector": {
"ipfshttp": {
"proxy_listen_multiaddress": "/ip4/127.0.0.1/tcp/9095", // ipfs-proxy listen address
"node_multiaddress": "/ip4/127.0.0.1/tcp/5001", // ipfs-node api location
"connect_swarms_delay": "7s", // after boot, how long to wait before trying to connect ipfs peers
"proxy_read_timeout": "10m0s", // Here and below, timeouts for network operations
"proxy_read_header_timeout": "5s",
"proxy_write_timeout": "10m0s",
"proxy_idle_timeout": "1m0s",
"pin_method": "pin" // Supports "pin" and "refs". "refs" will fetch all deps before pinning.
// Use refs when GC is disabled on ipfs.
// Increase maptracker.concurrent_pins to take advantange of concurrency.
}
},
"pin_tracker": {
"maptracker": {
"pinning_timeout": "1h0m0s", // How long before we transition a pinning CID to error state
"unpinning_timeout": "5m0s", // How long before we transition an unpinning CID to error state
"max_pin_queue_size": 4096, // How many pins to hold in the pinning queue
"concurrent_pins": 1 // How many concurrent pin requests we can perform.
// Only useful with ipfshttp.pin_method set to "refs"
}
}
"monitor": {
"monbasic": {
"check_interval": "15s" // How often to check for expired alerts. See cluster monitoring section
}
},
"informer": {
"disk": { // Used when using the disk informer (default)
"metric_ttl": "30s", // Amount of time this metric is valid. Will be polled at TTL/2.
"metric_type": "freespace" // or "reposize": type of metric
},
"numpin": { // Used when using the numpin informer
"metric_ttl": "10s" // Amount of time this metric is valid. Will be polled at TTL/2.
}
}
}
```
## Starting your cluster peers
`ipfs-cluster-service` will launch your cluster peer. If you have not configured any `cluster.peers` in the configuration, nor any `cluster.bootstrap` addresses, a single-peer cluster will be launched.
When filling in `peers` with some other peers' listening multiaddresses (i.e. `/ip4/192.168.1.103/tcp/9096/ipfs/QmQHKLBXfS7hf8o2acj7FGADoJDLat3UazucbHrgxqisim`), the peer will expect to be part of a cluster with the given `peers`. On boot, it will wait to learn who is the leader (see `raft.wait_for_leader_timeout` option) and sync it's internal state up to the last known state before becoming `ready`.
If you are using the `peers` configuration value, then **it is very important that the `peers` configuration value in all cluster members is the same for all peers**: it should contain the multiaddresses for the other peers in the cluster. It may contain a peer's own multiaddress too (but it will be removed automatically). If `peers` is not correct for all peer members, your node might not start or misbehave in not obvious ways.
You are expected to start the majority of the nodes at the same time when using this method. If half of them are not started, they will fail to elect a cluster leader before `raft.wait_for_leader_timeout`. Then they will shut themselves down. If there are peers missing, the cluster will not be in a healthy state (error messages will be displayed). The cluster will operate, as long as a majority of peers is up.
Alternatively, you can use the `bootstrap` variable to provide one or several bootstrap peers. In short, bootstrapping will use the given peer to request the list of cluster peers and fill-in the `peers` variable automatically. The bootstrapped peer will be, in turn, added to the cluster and made known to every other existing (and connected peer). You can also launch several peers at once, as long as they are bootstrapping from the same already-running-peer. The `--bootstrap` flag allows to provide a bootsrapping peer directly when calling `ipfs-cluster-service`.
Use the `bootstrap` method only when the rest of the cluster is healthy and all current participating peers are running. If you need to, remove any unhealthy peers with `ipfs-cluster-ctl peers rm <pid>`. Bootstrapping peers should be in a `clean` state, that is, with no previous raft-data loaded. If they are not, remove or rename the `~/.ipfs-cluster/ipfs-cluster-data` folder.
Once a cluster is up, peers are expected to run continuously. You may need to stop a peer, or it may die due to external reasons. The restart-behaviour will depend on whether the peer has left the consensus:
* The *default* case - peer has not been removed and `cluster.leave_on_shutdown` is `false`: in this case the peer has not left the consensus peerset, and you may start the peer again normally. Do not manually update `cluster.peers`, even if other peers have been removed from the cluster.
* The *left the cluster* case - peer has been manually removed or `cluster.leave_on_shutdown` is `true`: in this case, unless the peer died, it has probably been removed from the consensus (you can check if it's missing from `ipfs-cluster-ctl peers ls` on a running peer). This will mean that the state of the peer has been cleaned up (see the Dynamic Cluster Membership considerations below), and the last known `cluster.peers` have been moved to `cluster.bootstrap`. When the peer is restarted, it will attempt to rejoin the cluster from which it was removed by using the addresses in `cluster.bootstrap`.
Remember that a clean peer bootstrapped to an existing cluster will always fetch the latest state. A shutdown-peer which did not leave the cluster will also catch up with the rest of peers after re-starting. See the next section for more information about the consensus algorithm used by ipfs-cluster.
If the startup initialization fails, `ipfs-cluster-service` will exit automatically after a few seconds. Pay attention to the INFO and ERROR messages during startup. When ipfs-cluster is ready, a message will indicate it along with a list of peers.
## The consensus algoritm
ipfs-cluster peers coordinate their state (the list of CIDs which are pinned, their peer allocations and replication factor) using a consensus algorithm called Raft.
Raft is used to commit log entries to a "distributed log" which every peer follows. Every "Pin" and "Unpin" requests are log entries in that log. When a peer receives a log "Pin" operation, it updates its local copy of the shared state to indicate that the CID is now pinned.
In order to work, Raft elects a cluster "Leader", which is the only peer allowed to commit entries to the log. Thus, a Leader election can only succeed if at least half of the nodes are online. Log entries, and other parts of ipfs-cluster functionality (initialization, monitoring), can only happen when a Leader exists.
For example, a commit operation to the log is triggered with `ipfs-cluster-ctl pin add <cid>`. This will use the peer's API to send a Pin request. The peer will in turn forward the request to the cluster's Leader, which will perform the commit of the operation. This is explained in more detail in the "Pinning an item" section.
The "peer add" and "peer remove" operations also trigger log entries (internal to Raft) and depend too on a healthy consensus status. Modifying the cluster peers is a tricky operation because it requires informing every peer of the new peer's multiaddresses. If a peer is down during this operation, the operation will fail, as otherwise that peer will not know how to contact the new member. Thus, it is recommended remove and bootstrap any peer that is offline before making changes to the peerset.
By default, the consensus log data is backed in the `ipfs-cluster-data` subfolder, next to the main configuration file. This folder stores two types of information: the **boltDB** database storing the Raft log, and the state snapshots. Snapshots from the log are performed regularly when the log grows too big (see the `raft` configuration section for options). When a peer is far behind in catching up with the log, Raft may opt to send a snapshot directly, rather than to send every log entry that makes up the state individually. This data is initialized on the first start of a cluster peer and maintained throughout its life. Removing or renaming the `ipfs-cluster-data` folder effectively resets the peer to a clean state. Only peers with a clean state should bootstrap to already running clusters.
When running a cluster peer, **it is very important that the consensus data folder does not contain any data from a different cluster setup**, or data from diverging logs. What this essentially means is that different Raft logs should not be mixed. Removing or renaming the `ipfs-cluster-data` folder, will clean all consensus data from the peer, but, as long as the rest of the cluster is running, it will recover last state upon start by fetching it from a different cluster peer.
On clean shutdowns, ipfs-cluster peers will save a human-readable state snapshot in `~/.ipfs-cluster/backups`, which can be used to inspect the last known state for that peer. We are working in making those snapshots restorable.
## The shared state, the local state and the ipfs state
It is important to understand that ipfs-cluster deals with three types of states:
* The **shared state** is maintained by the consensus algorithm and a copy is kept in every cluster peer. The shared state stores the list of CIDs which are tracked by ipfs-cluster, their allocations (peers which are pinning them) and their replication factor.
* The **local state** is maintained separately by every peer and represents the state of CIDs tracked by cluster for that specific peer: status in ipfs (pinned or not), modification time etc.
* The **ipfs state** is the actual state in ipfs (`ipfs pin ls`) which is maintained by the ipfs daemon.
In normal operation, all three states are in sync, as updates to the *shared state* cascade to the local and the ipfs states. Additionally, syncing operations are regularly triggered by ipfs-cluster. Unpinning cluster-pinned items directly from ipfs will, for example, cause a mismatch between the local and the ipfs state. Luckily, there are ways to inspect every state:
* `ipfs-cluster-ctl pin ls` shows information about the *shared state*. The result of this command is produced locally, directly from the state copy stored the peer.
* `ipfs-cluster-ctl status` shows information about the *local state* in every cluster peer. It does so by aggregating local state information received from every cluster member.
`ipfs-cluster-ctl sync` makes sure that the *local state* matches the *ipfs state*. In other words, it makes sure that what cluster expects to be pinned is actually pinned in ipfs. As mentioned, this also happens automatically. Every sync operations triggers an `ipfs pin ls --type=recursive` call to the local node.
Depending on the size of your pinset, you may adjust the interval between the different sync operations using the `cluster.state_sync_interval` and `cluster.ipfs_sync_interval` configuration options.
As a final note, the *local state* may show items in *error*. This happens when an item took too long to pin/unpin, or the ipfs daemon became unavailable. `ipfs-cluster-ctl recover <cid>` can be used to rescue these items. See the "Pinning an item" section below for more information.
## Static cluster membership considerations
We call a static cluster, that in which the set of `cluster.peers` is fixed, where "peer add"/"peer rm"/bootstrapping operations don't happen (at least normally) and where every cluster member is expected to be running all the time.
Static clusters are a way to run ipfs-cluster in a stable fashion, since the membership of the consensus remains unchanged, they don't suffer the dangers of dynamic peer sets, where it is important that operations modifying the peer set succeed for every cluster member.
Static clusters expect every member peer to be up and responding. Otherwise, the Leader will detect missing heartbeats and start logging errors. When a peer is not responding, ipfs-cluster will detect that a peer is down and re-allocate any content pinned by that peer to other peers. ipfs-cluster will still work as long as there is a Leader (half of the peers are still running). In the case of a network split, or if a majority of nodes is down, cluster will be unable to commit any operations the the log and thus, it's functionality will be limited to read operations.
## Dynamic cluster membership considerations
We call a dynamic cluster, that in which the set of `cluster.peers` changes. Nodes are bootstrapped to existing cluster peers (`cluster.bootstrap` option), the "peer rm" operation is used and/or the `cluster.leave_on_shutdown` configuration option is enabled. This option allows a node to abandon the consensus membership when shutting down. Thus reducing the cluster size by one.
Dynamic clusters allow greater flexibility at the cost of stablity. Leave and, specially, join operations are tricky as they change the consensus membership. They are likely to fail in unhealthy clusters. All operations modifying the peerset require an elected and working leader. Note that peerset modifications may also trigger pin re-allocations if any of the pins from the departing cluster crosses below the `replication_factor_min` threshold.
Peers joining an existing cluster should not have any consensus state (contents in `./ipfs-cluster/ipfs-cluster-data`). Peers leaving a cluster are not expected to re-join it with stale consensus data. For this reason, **the consensus data folder is renamed** when a peer leaves the current cluster. For example, `ipfs-cluster-data` becomes `ipfs-cluster-data.old.0` and so on. Currently, up to 5 copies of the cluster data will be left around, with `old.0` being the most recent, and `old.4` the oldest.
When a peer leaves or is removed, any existing peers will be saved as `bootstrap` peers, so that it is easier to re-join the cluster by simply re-launching it. Since the state has been cleaned, the peer will be able to re-join and fetch the latest state cleanly. See "The consensus algorithm" and the "Starting your cluster peers" sections above for more information.
This does not mean that there are not possibilities of somehow getting a broken cluster membership. The best way to diagnose it and fix it is to:
* Select a healthy node
* Run `ipfs-cluster-ctl peers ls`
* Examine carefully the results and any errors
* Run `ipfs-cluster-ctl peers rm <peer in error>` for every peer not responding
* If the peer count is different depending on the peers responding, remove those peers too. Once stopped, remove the consensus data folder and bootstrap them to a healthy cluster peer. Always make sure to keep 1/2+1 peers alive and healthy.
* `ipfs-cluster-ctl --enc=json peers ls` provides additional useful information, like the list of peers for every responding peer.
* In cases were leadership has been lost beyond solution (meaning faulty peers cannot be removed), it is best to stop all peers and restore the state from the backup (currently, a manual operation).
Remember: if you have a problematic cluster peer trying to join an otherwise working cluster, the safest way is to rename the `ipfs-cluster-data` folder (keeping it as backup) and to set the correct `bootstrap`. The consensus algorithm will then resend the state from scratch.
ipfs-cluster will fail to start if `cluster.peers` do not match the current Raft peerset. If the current Raft peerset is correct, you can manually update `cluster.peers`. Otherwise, it is easier to clean and bootstrap.
Finally, note that when bootstrapping a peer to an existing cluster, **the new peer must be configured with the same `cluster.secret` as the rest of the cluster**.
## Pinning an item
`ipfs-cluster-ctl pin add <cid>` will tell ipfs-cluster to pin (or re-pin) a CID.
This involves:
* Deciding which peers will be allocated the CID (that is, which cluster peers will ask ipfs to pin the CID). This depends on the replication factor (min and max) and the allocation strategy (more details below).
* Forwarding the pin request to the Raft Leader.
* Commiting the pin entry to the log.
* *At this point, a success/failure is returned to the user, but ipfs-cluster has more things to do.*
* Receiving the log update and modifying the *shared state* accordingly.
* Updating the local state.
* If the peer has been allocated the content, then:
* Queueing the pin request and setting the pin status to `PINNING`.
* Triggering a pin operation
* Waiting until it completes and setting the pin status to `PINNED`.
Errors in the first part of the process (before the entry is commited) will be returned to the user and the whole operation is aborted. Errors in the second part of the process will result in pins with an status of `PIN_ERROR`.
Deciding where a CID will be pinned (which IPFS daemon will store it - receive the allocation) is a complex process. In order to decide, all available peers (those reporting valid/non-expired metrics) are sorted by the `allocator` component, depending on the value of their metrics. These values are provided by the configured `informer`. If a CID is already allocated to some peers (in the case of a re-pinning operation), those allocations are kept.
New allocations are only provided when the allocation factor (healthy peers holding the CID) is below the `replication_factor_min` threshold. In those cases, the new allocations (along with the existing valid ones), will attempt to total as much as `replication_factor_max`. When the allocation factor of a CID is within the margins indicated by the replication factors, no action is taken. The value "-1" and `replication_factor_min` and `replication_factor_max` indicates a "replicate everywhere" mode, where every peer will pin the CID.
Default replication factors are specified in the configuration, but every Pin object carries them associated to its own entry in the *shared state*. Changing the replication factor of existing pins requires re-pinning them (it does not suffice to change the configuration). You can always check the details of a pin, including its replication factors, using `ipfs-cluster-ctl pin ls <cid>`. You can use `ipfs-cluster-ctl pin add <cid>` to re-pin at any time with different replication factors. But note that the new pin will only be commited if it differs from the existing one in the way specified in the paragraph above.
In order to check the status of a pin, use `ipfs-cluster-ctl status <cid>`. Retries for pins in error state can be triggered with `ipfs-cluster-ctl recover <cid>`.
The reason pins (and unpin) requests are queued is because ipfs only performs one pin at a time, while any other requests are hanging in the meantime. All in all, pinning items which are unavailable in the network may create significants bottlenecks (this is a problem that comes from ipfs), as the pin request takes very long to time out. Facing this problem involves restarting the ipfs node.
## Unpinning an item
`ipfs-cluster-ctl pin rm <cid>` will tell ipfs-cluster to unpin a CID.
The process is very similar to the "Pinning an item" described above. Removed pins are wiped from the shared and local states. When requesting the local `status` for a given CID, it will show as `UNPINNED`. Errors will be reflected as `UNPIN_ERROR` in the pin local status.
## Cluster monitoring and pin failover
ipfs-cluster includes a basic monitoring component which gathers metrics and triggers alerts when a metric is no longer renewed. There are currently two types of metrics:
* `informer` metrics are used to decide on allocations when a pin request arrives. Different "informers" can be configured. The default is the disk informer using the `freespace` metric.
* a `ping` metric is used to signal that a peer is alive.
Every metric carries a Time-To-Live associated with it. This TTL can be configued in the `informers` configuration section. The `ping` metric TTL is determined by the `cluster.monitoring_ping_interval`, and is equal to 2x its value.
Every ipfs-cluster peers push metrics to the cluster Leader regularly. This happens TTL/2 intervals for the `informer` metrics and in `cluster.monitoring_ping_interval` for the `ping` metrics.
When a metric for an existing cluster peer stops arriving and previous metrics have outlived their Time-To-Live, the monitoring component triggers an alert for that metric. `monbasic.check_interval` determines how often the monitoring component checks for expired TTLs and sends these alerts. If you wish to detect expired metrics more quickly, decrease this interval. Otherwise, increase it.
ipfs-cluster will react to `ping` metrics alerts by searching for pins allocated to the alerting peer and triggering re-pinning requests for them. These re-pinning requests may result in re-allocations if the the CID's allocation factor crosses the `replication_factor_min` boundary. Otherwise, the current allocations are maintained.
The monitoring and failover system in cluster is very basic and requires improvements. Failover is likely to not work properly when several nodes go offline at once (specially if the current Leader is affected). Manual re-pinning can be triggered with `ipfs-cluster-ctl pin <cid>`. `ipfs-cluster-ctl pin ls <CID>` can be used to find out the current list of peers allocated to a CID.
## Using the IPFS-proxy
ipfs-cluster provides an proxy to ipfs (which by default listens on `/ip4/127.0.0.1/tcp/9095`). This allows ipfs-cluster to behave as if it was an ipfs node. It achieves this by intercepting the following requests:
* `/add`: the proxy adds the content to the local ipfs daemon and pins the resulting hash[es] in ipfs-cluster.
* `/pin/add`: the proxy pins the given CID in ipfs-cluster.
* `/pin/rm`: the proxy unpins the given CID from ipfs-cluster.
* `/pin/ls`: the proxy lists the pinned items in ipfs-cluster.
Responses from the proxy mimic ipfs daemon responses. This allows to use ipfs-cluster with the `ipfs` CLI as the following examples show:
* `ipfs --api /ip4/127.0.0.1/tcp/9095 pin add <cid>`
* `ipfs --api /ip4/127.0.0.1/tcp/9095 add myfile.txt`
* `ipfs --api /ip4/127.0.0.1/tcp/9095 pin rm <cid>`
* `ipfs --api /ip4/127.0.0.1/tcp/9095 pin ls`
Any other requests are directly forwarded to the ipfs daemon and responses and sent back from it.
Intercepted endpoints aim to mimic the format and response code from ipfs, but they may lack headers. If you encounter a problem where something works with ipfs but not with cluster, open an issue.
## Composite clusters
Since ipfs-cluster provides an IPFS Proxy (an endpoint that act likes an IPFS daemon), it is also possible to use an ipfs-cluster proxy endpoint as the `ipfs_node_multiaddress` for a different cluster.
This means that the top cluster will think that it is performing requests to an IPFS daemon, but it is instead using an ipfs-cluster peer which belongs to a sub-cluster.
This allows to scale ipfs-cluster deployments and provides a method for building ipfs-cluster topologies that may be better adapted to certain needs.
Note that **this feature has not been extensively tested**, but we aim to introduce improvements and fully support it in the mid-term.
## Security
ipfs-cluster peers communicate with each other using libp2p-encrypted streams (`secio`), with the ipfs daemon using plain http, provide an HTTP API themselves (used by `ipfs-cluster-ctl`) and an IPFS Proxy. This means that there are four endpoints to be wary about when thinking of security:
* `cluster.listen_multiaddress`, defaults to `/ip4/0.0.0.0/tcp/9096` and is the listening address to communicate with other peers (via Remote RPC calls mostly). These endpoints are protected by the `cluster.secret` value specified in the configuration. Only peers holding the same secret can communicate between each other. If the secret is empty, then **nothing prevents anyone from sending RPC commands to the cluster RPC endpoint** and thus, controlling the cluster and the ipfs daemon (at least when it comes to pin/unpin/pin ls and swarm connect operations. ipfs-cluster administrators should therefore be careful keep this endpoint unaccessible to third-parties when no `cluster.secret` is set.
* `restapi.listen_multiaddress`, defaults to `/ip4/127.0.0.1/tcp/9094` and is the listening address for the HTTP API that is used by `ipfs-cluster-ctl`. The considerations for `restapi.listen_multiaddress` are the same as for `cluster.listen_multiaddress`, as access to this endpoint allows to control ipfs-cluster and the ipfs daemon to a extent. By default, this endpoint listens on locahost which means it can only be used by `ipfs-cluster-ctl` running in the same host. The REST API component provides HTTPS support for this endpoint, along with Basic Authentication. These can be used to protect an exposed API endpoint.
* `ipfshttp.proxy_listen_multiaddress` defaults to `/ip4/127.0.0.1/tcp/9095`. As explained before, this endpoint offers control of ipfs-cluster pin/unpin operations and access to the underlying ipfs daemon. This endpoint should be treated with at least the same precautions as the ipfs HTTP API.
* `ipfshttp.node_multiaddress` defaults to `/ip4/127.0.0.1/tcp/5001` and contains the address of the ipfs daemon HTTP API. The recommendation is running IPFS on the same host as ipfs-cluster. This way it is not necessary to make ipfs API listen on other than localhost.
## Upgrading
ipfs-cluster persists the shared state to disk. Therefore, any upgrade must make sure that the old format in disk is compatible in order to parse correctly. If not, a message will be printed and instructions on how to ugprade will be displayed. We offer here a few more details.
### The state format has not changed
In this case, upgrading cluster requires stopping all cluster peers, updating the `ipfs-cluster-service` binary and restarting them.
When the version numbers change, peers running different versions will not be able to communicate as the libp2p protocol that they use is tagged with the version. If you are running untagged releases (like directly from master), then you should be able to run peers built from different commits as long as they share the same `x.x.x` version number. Version numbers are only updated when an official release happens.
### The state format has changed
In this case, we need to perform a state upgrade. `ipfs-cluster-service` should refuse to start if the state format is uncompatible with the new release. This procedure is a bit experimental so we recommend saving the list of your pinset (`ipfs-cluster-ctl --enc=json pin ls`) before attempting it.
In order to perform the upgrade, you need to stop all peers. You can also remove/rename the `ipfs-cluster-data` in all peers except one. You will have to perform the upgrade procedure or perform the upgrade procedure in all of them.
To update the state format, run `ipfs-cluster-service state upgrade`. This:
* Reads the last Raft snapshot
* Migrates to the new format
* Backups the `ipfs-cluster-data` folder and creates a new snapshot in the new format.
On the next run, `ipfs-cluster-service` should start normally. Any peers with a blank state should pick it up from the migrated ones as the Raft Leader sends the new snapshot to them.
## Troubleshooting and getting help
### Have a question, idea or suggestion?
Open an issue on [ipfs-cluster](https://github.com/ipfs/ipfs-cluster) or ask on [discuss.ipfs.io](https://discuss.ipfs.io).
If you want to collaborate in ipfs-cluster, look at the list of open issues. Many are conveniently marked with [HELP WANTED](https://github.com/ipfs/ipfs-cluster/issues?q=is%3Aissue+is%3Aopen+label%3A%22help+wanted%22) and organized by difficulty. If in doubt, just ask!
### Debugging
By default, `ipfs-cluster-service` prints only `INFO`, `WARNING` and `ERROR` messages. Sometimes, it is useful to increase verbosity with the `--loglevel debug` flag. This will make ipfs-cluster and its components much more verbose. The `--debug` flag will make ipfs-cluster, its components and its most prominent dependencies (raft, libp2p-raft, libp2p-gorpc) verbose.
`ipfs-cluster-ctl` offers a `--debug` flag which will print information about the API endpoints used by the tool. `--enc json` allows to print raw `json` responses from the API.
Interpreting debug information can be tricky. For example:
```
18:21:50.343 ERROR ipfshttp: error getting:Get http://127.0.0.1:5001/api/v0/repo/stat: dial tcp 127.0.0.1:5001: getsockopt: connection refused ipfshttp.go:695
```
The above line shows a message of `ERROR` severity, coming from the `ipfshttp` facility. This facility corresponds to the `ipfshttp` module which implements the IPFS Connector component. This information helps narrowing the context from which the error comes from. The error message indicates that the component failed to perform a GET request to the ipfs HTTP API. The log entry contains the file and line-number in which the error was logged.
When discovering a problem, it will probably be useful if you can provide some logs when asking for help.
### Peer is not starting
When your peer is not starting:
* Check the logs and look for errors
* Are all the listen addresses free or are they used by a different process?
* Are other peers of the cluster reachable?
* Is the `cluster.secret` the same for all peers?
* Double-check that the addresses in `cluster.peers` and `cluster.bootstrap` are correct.
* Double-check that the rest of the cluster is in a healthy state.
* In some cases, it may help to delete everything in the consensus data folder (specially if the reason for not starting is a mismatch between the raft state and the cluster peers). Assuming that the cluster is healthy, this will allow the non-starting peer to pull a clean state from the cluster Leader when bootstrapping.
### Peer stopped unexpectedly
When a peer stops unexpectedly:
* Make sure you simply haven't removed the peer from the cluster or triggered a shutdown
* Check the logs for any clues that the process died because of an internal fault
* Check your system logs to find if anything external killed the process
* Report any application panics, as they should not happen, along with the logs
### `ipfs-cluster-ctl status <cid>` does not report CID information for all peers
This is usually the result of a desync between the *shared state* and the *local state*, or between the *local state* and the ipfs state. If the problem does not autocorrect itself after a couple of minutes (thanks to auto-syncing), try running `ipfs-cluster-ctl sync [cid]` for the problematic item. You can also restart your node.
### libp2p errors
Since cluster is built on top of libp2p, many errors that new users face come from libp2p and have confusing messages which are not obvious at first sight. This list compiles some of them:
* `dial attempt failed: misdial to <peer.ID XXXXXX> through ....`: this means that the multiaddress you are contacting has a different peer in it than expected.
* `dial attempt failed: connection refused`: the peer is not running or not listening on the expected address/protocol/port.
* `dial attempt failed: context deadline exceeded`: this means that the address is not reachable or that the wrong secret is being used.
* `dial backoff`: same as above.
* `dial attempt failed: incoming message was too large`: this probably means that your cluster peers are not sharing the same secret.
* `version not supported`: this means that your nodes are running different versions of raft/cluster.

View File

@ -4,7 +4,6 @@
`ipfs-cluster-ctl` is the client application to manage the cluster nodes and perform actions. `ipfs-cluster-ctl` uses the HTTP API provided by the nodes and it is completely separate from the cluster service. `ipfs-cluster-ctl` is the client application to manage the cluster nodes and perform actions. `ipfs-cluster-ctl` uses the HTTP API provided by the nodes and it is completely separate from the cluster service.
### Usage ### Usage
Usage information can be obtained by running: Usage information can be obtained by running:
@ -13,29 +12,6 @@ Usage information can be obtained by running:
$ ipfs-cluster-ctl --help $ ipfs-cluster-ctl --help
``` ```
You can also obtain command-specific help with `ipfs-cluster-ctl help [cmd]`. The (`--host`) can be used to talk to any remote cluster peer (`localhost` is used by default). In summary, it works as follows: You can also obtain command-specific help with `ipfs-cluster-ctl help [cmd]`. The (`--host`) can be used to talk to any remote cluster peer (`localhost` is used by default).
For more information, please check the [Documentation](https://cluster.ipfs.io/documentation), in particular the [`ipfs-cluster-ctl` section](https://cluster.ipfs.io/documentation/ipfs-cluster-ctl).
```
$ ipfs-cluster-ctl id # show cluster peer and ipfs daemon information
$ ipfs-cluster-ctl peers ls # list cluster peers
$ ipfs-cluster-ctl peers rm <peerid> # remove a cluster peer
$ ipfs-cluster-ctl pin add Qma4Lid2T1F68E3Xa3CpE6vVJDLwxXLD8RfiB9g1Tmqp58 # pins a CID in the cluster
$ ipfs-cluster-ctl pin rm Qma4Lid2T1F68E3Xa3CpE6vVJDLwxXLD8RfiB9g1Tmqp58 # unpins a CID from the clustre
$ ipfs-cluster-ctl pin ls [CID] # list tracked CIDs (shared state)
$ ipfs-cluster-ctl status [CID] # list current status of tracked CIDs (local state)
$ ipfs-cluster-ctl sync Qma4Lid2T1F68E3Xa3CpE6vVJDLwxXLD8RfiB9g1Tmqp58 # re-sync seen status against status reported by the IPFS daemon
$ ipfs-cluster-ctl recover Qma4Lid2T1F68E3Xa3CpE6vVJDLwxXLD8RfiB9g1Tmqp58 # attempt to re-pin/unpin CIDs in error state
```
#### Exit codes
`ipfs-cluster-ctl` will exit with:
* `0`: the request/operation succeeded. The output contains the response data.
* `1`: argument error, network error or any other error which prevented the application to perform a request and obtain a response from the ipfs-cluster API. In such case, the output contains the contents of the error and the http code `0`.
* `2`: ipfs-cluster error. The request was performed correctly but the response is an error (http status 4xx or 5xx). In such case, the output contains the contents of the error and the HTTP code associated to it.
### Debugging
`ipfs-cluster-ctl` takes a `--debug` flag which allows to inspect request paths and raw response bodies.

View File

@ -1,57 +1,15 @@
# `ipfs-cluster-service` # `ipfs-cluster-service`
> IPFS cluster peer daemon > The IPFS cluster peer daemon
`ipfs-cluster-service` runs a full IPFS Cluster peer. `ipfs-cluster-service` runs a full IPFS Cluster peer.
![ipfs-cluster-service example](https://ipfs.io/ipfs/QmWf2asBu54nEaCzfJtdyP1KQjf4pWXmqeHYHZJm86eHAT)
### Usage ### Usage
Usage information can be obtained with: Usage information can be obtained with:
``` ```
$ ipfs-cluster-service -h $ ipfs-cluster-service --help
``` ```
### Initialization For more information, please check the [Documentation](https://cluster.ipfs.io/documentation), in particular the [`ipfs-cluster-service` section](https://cluster.ipfs.io/documentation/ipfs-cluster-service).
Before running `ipfs-cluster-service` for the first time, initialize a configuration file with:
```
$ ipfs-cluster-service init
```
`init` will randomly generate a `cluster_secret` (unless specified by the `CLUSTER_SECRET` environment variable or running with `--custom-secret`, which will prompt it interactively).
**All peers in a cluster must share the same cluster secret**. Using an empty secret may compromise the security of your cluster (see the documentation for more information).
### Configuration
After initialization, the configuration will be placed in `~/.ipfs-cluster/service.json` by default.
You can add the multiaddresses for the other cluster peers to the `cluster.peers` or `cluster.bootstrap` variables (see below). A configuration example with explanations is provided in [A guide to running IPFS Cluster](https://github.com/ipfs/ipfs-cluster/blob/master/docs/ipfs-cluster-guide.md).
The configuration file should probably be identical among all cluster peers, except for the `id` and `private_key` fields. Once every cluster peer has the configuration in place, you can run `ipfs-cluster-service` to start the cluster.
#### Clusters using `cluster.peers`
The `peers` configuration variable holds a list of current cluster members. If you know the members of the cluster in advance, or you want to start a cluster fully in parallel, set `peers` in all configurations so that every peer knows the rest upon boot. Leave `bootstrap` empty. A cluster peer address looks like: `/ip4/1.2.3.4/tcp/9096/ipfs/<id>`.
The list of `cluster.peers` is maintained automatically and saved by `ipfs-cluster-service` when it changes.
#### Clusters using `cluster.bootstrap`
When the `peers` variable is empty, the multiaddresses in `bootstrap` (or the `--bootstrap` parameter to `ipfs-cluster-service`) can be used to have a peer join an existing cluster. The peer will contact those addresses (in order) until one of them succeeds in joining it to the cluster. When the peer is shut down, it will save the current cluster peers in the `peers` configuration variable for future use (unless `leave_on_shutdown` is true, in which case it will save them in `bootstrap`).
Bootstrap is a convenient method to sequentially start the peers of a cluster. **Only bootstrap clean nodes** which have not been part of a cluster before (or clean the `ipfs-cluster-data` folder). Bootstrapping nodes with an old state (or diverging state) from the one running in the cluster will fail or lead to problems with the consensus layer.
When setting the `leave_on_shutdown` option, or calling `ipfs-cluster-service` with the `--leave` flag, the node will attempt to leave the cluster in an orderly fashion when shutdown. The node will be cleaned up when this happens and can be bootstrapped safely again.
### Debugging
`ipfs-cluster-service` offers two debugging options:
* `--debug` enables debug logging from the `ipfs-cluster`, `go-libp2p-raft` and `go-libp2p-rpc` layers. This will be a very verbose log output, but at the same time it is the most informative.
* `--loglevel` sets the log level (`[error, warning, info, debug]`) for the `ipfs-cluster` only, allowing to get an overview of the what cluster is doing. The default log-level is `info`.