Commit Graph

479 Commits

Author SHA1 Message Date
Hector Sanjuan
a656e45375 cluster: safeguard consensus not set when calling ID
SwarmConnect on the ipfs connector calls rpc Peers() which
requests IDs for every peer member. If that peer member
is booting, it might get the request after RPC is setup
but before consensus is initialized. In which case
a panic happens. Probability that this happens is small, but still.

Also increase the connect swarms delay to 30 seconds, which
should be a bit longer than the default wait_for_leader timeout,
otherwise we might connect swarms while there's not even a leader.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-11-15 16:38:21 +01:00
Hector Sanjuan
a1f1ef15d8
Merge pull request #236 from ipfs/fix/minor-fixes
Fix/minor fixes
2017-11-15 12:09:30 +01:00
Hector Sanjuan
76fe62fae4 Fix error message when not enough candidates for pinning exist
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-11-15 03:17:08 +01:00
Hector Sanjuan
145dced3e8 Cluster: Fix libp2p host getting shutdown in the middle of peer removal
This is what it was likely causing PeerRemove tests to fail randomly
but very often. We cancelled the Cluster context before shutting down
the Consensus component. This killed networking and aborted
the peer remove operations when the leader is removing itself.

As a result, it would error with "leadership lost", which would
trigger a retry which would set the final error to "context cancelled"
because the shutdown of the consensus component proceeds during the
retry, cancelling the consensus context.

This is not only affecting tests, it might affected operations when
running cluster.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-11-15 02:33:46 +01:00
Hector Sanjuan
41b6a114c1
Merge pull request #223 from ipfs/192-docs
Documentation: bring in line to 0.3.0
2017-11-14 23:58:33 +01:00
Hector Sanjuan
cc81ffe96b cluster: peerAdd: try to return an up-to-date new peer ID
Sometimes tests fail because the returned api.ID for a new peer
does not include the current cluster peers. This is because the
new peerset has not yet be commited in the new peer at the time
of the request.

This commit retries obtaining the request until the correct peerset
comes in, or gives up after two seconds retrying.

Rather than the tests failing, note that the ID returned it is very
user-facing and should contain the current cluster peers after adding,
and not the former peerset, at least while peerAdd operation is allowed.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-11-14 23:54:23 +01:00
Hector Sanjuan
417f30c9ea Avoid shutting down consensus in the middle of a commit
I think this will prevents some random tests failures
when we realize that we are not anymore in the peerset
and trigger a shutdown but Raft has not finished fully
committing the operation, which then triggers an error,
and a retry. But the contexts are cancelled in the retry
so it won't find a leader and will error finally error
with that message.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-11-14 23:29:56 +01:00
Hector Sanjuan
15bb953afd Issue #192: Update docs to the new peerset handling
Also, start removing mentions of `PeerAdd` operation, as things should
happen with `--bootstrap` (Join). PeerAdd should not be part of the user
workflow and might disappear or be hidden in the future.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-11-14 22:06:59 +01:00
Hector Sanjuan
0525403f8c cluster-service: make sure we wait for configuration to be saved on init.
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-11-14 21:06:10 +01:00
Hector Sanjuan
5e465c3f62 PeerAdd: send cluster multiaddresses to new peer before adding it to raft
It seems more logical that the new peer should know how to contact
everyone before it needs to start doing it, but it probably does not
matter much. Still, more logical.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-11-14 21:04:33 +01:00
Hector Sanjuan
398865f381 Issue #192: Fix typos and address feedback
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-11-14 19:26:49 +01:00
Hector Sanjuan
7baa4c9c6d Guide/Troubleshooting: add the meanings for some libp2p errors
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-11-14 19:26:49 +01:00
Hector Sanjuan
2c3085586c Documentation: bring in line to 0.3.0
Review documentation to be in line with latest updates to Raft and
any other feature introduced since 0.12.0.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-11-14 19:26:49 +01:00
Hector Sanjuan
5a7c1fc847 Re-enable snapcraft build on travis
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-11-14 19:06:05 +01:00
Hector Sanjuan
09cd86e183
Merge pull request #234 from ipfs/feat/snap-travis-ci
Enable travis for snapcraft builds
2017-11-14 18:24:32 +01:00
Hector Sanjuan
798571d3fc
Merge pull request #226 from ipfs/feat/219-peers
Fix #219: WIP: Remove duplicate peer accounting
2017-11-14 18:23:39 +01:00
Hector Sanjuan
4c3f16d7aa
Merge pull request #233 from ipfs/feat/raft-state-backups
Raft: do not ever remove state, rename it and leave it around
2017-11-14 18:13:15 +01:00
Hector Sanjuan
33c325b865 cluster-service: Fix too many arguments for panic
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-11-14 18:01:32 +01:00
Hector Sanjuan
18ff7a79cc cluster-service: Upstream snap patch
Snaps set a custom $HOME, but we were using /etc/passwd.
There might be other cases were using a custom $HOME might be
handy.

In UNIX systems, $HOME should be always set. For all the rest,
we fall back to the original os/user.HomeDir method.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-11-14 13:23:43 +01:00
Hector Sanjuan
dc3f5b202e
Merge pull request #231 from ipfs/fix/224-pin-progress
Fix #224: Better handling of progress updates when proxy-adding file
2017-11-14 12:33:09 +01:00
Hector Sanjuan
2837170d51 Raft: improve test error message
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-11-14 12:31:58 +01:00
Hector Sanjuan
b6ba6d5a1e Issue #219: Clean up peer manager. Rename Peers RPC call
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-11-14 12:26:42 +01:00
Hector Sanjuan
ef86f718e7 Enable travis for snapcraft builds
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-11-14 01:12:36 +01:00
Hector Sanjuan
4ca12026dc
Merge pull request #232 from elopio/snapcraft
Add the packaging metadata to build the ipfs-cluster snap
2017-11-14 00:29:27 +01:00
Hector Sanjuan
1a06baeb23 Raft: do not ever remove state, rename it and leave it around
This commit changes the way that consensus.Clean() works. Before
it deleted the whole data folder. Now it renames it as <name>.old.0
and leaves it. When Clean() is called again, it renames <name>.old.0
as <name>.old.1, and the actual data becomes <name>.old.0. Higher number
means older. The number of backups is fixed to 5. When 5 backups exists
and a new one comes up again, the last one is discarded.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-11-13 18:18:52 +01:00
Leo Arias
4a45e5def5 Add the packaging metadata to build the ipfs-cluster snap 2017-11-13 16:43:48 +00:00
Hector Sanjuan
f14f2f4863 Fix #224: Better handling of progress updates when proxy-adding file
Up to now, we hardcoded progress to "false" in the proxy, regardless
of what the original request said. We now leave it as it is, and
just ignore any progress updates when processing the response.

Since the response is buffered and sent back all together, they are
still useless, but at least the clients (ipfs cli) won't show a 0%
progress bar when successfully adding a file.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-11-13 14:09:20 +01:00
Hector Sanjuan
2323deac6c Issue #219: Fix TestClustersPeerRemoveReallocsPins
This test failed if the leader was randomly selected to be
the node on which we wait for leader. Needed to remove
the shutdown-leader from the clusters slice.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-11-10 21:26:37 +01:00
Hector Sanjuan
f37c6c0620 Issue #219: Fix api types tests
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-11-10 18:23:38 +01:00
Hector Sanjuan
a6ab9fab47 Issue #219: Fix PeerAdd test again
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-11-10 17:32:07 +01:00
Hector Sanjuan
c0310ea995 Issue #219: Fix PeerAdd test
ClusterPeers now contains ourselves.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-11-10 17:09:08 +01:00
Hector Sanjuan
d1473fd3be Issue #219: Fix waiting for save on shutdown
Improve the logic

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-11-10 16:55:39 +01:00
Hector Sanjuan
01fc55550a Issue #219: make sure peers are saved after Join
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-11-10 16:22:31 +01:00
Hector Sanjuan
2a616aeddb Issue #219: Provide a list of peers and a list of addresses in the ID object.
Fix cluster-ctl to show right number of peers

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-11-10 16:11:09 +01:00
Hector Sanjuan
435012a7fd
Merge pull request #228 from elopio/patch-2
Add the instructions to install the snap
2017-11-10 03:28:52 +01:00
Leo Arias
a4248092bf Add instructions to install the snap to the guide 2017-11-10 02:09:35 +00:00
Leo Arias
66e9747733
Add the instructions to install the snap
I have just pushed an initial version of ipfs-cluster to the snapcraft store:
https://discuss.ipfs.io/t/start-testing-the-ipfs-cluster-snap/1422
2017-11-08 18:46:07 -06:00
Hector Sanjuan
9dd239666c
Merge pull request #227 from elopio/patch-1
Fix typo: ins/is
2017-11-09 01:35:56 +01:00
Leo Arias
6d3e96c7b1
Fix typo: ins/is 2017-11-08 18:29:57 -06:00
Hector Sanjuan
289890ad6c Back to 6 testing peers
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-11-09 01:27:26 +01:00
Hector Sanjuan
b852dfa892 Fix #219: WIP: Remove duplicate peer accounting
This change removes the duplicities of the PeerManager component:

* No more commiting PeerAdd and PeerRm log entries
* The Raft peer set is the source of truth
* Basic broadcasting is used to communicate peer multiaddresses
  in the cluster
* A peer can only be added in a healthy cluster
* A peer can be removed from any cluster which can still commit
* This also adds support for multiple multiaddresses per peer

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-11-08 20:04:04 +01:00
Hector Sanjuan
aced97cfa1
Merge pull request #199 from ipfs/feat/131-new-raft
Fix #139: Update cluster to Raft 1.0.0
2017-11-02 16:06:05 +01:00
Hector Sanjuan
bff1ec3635 Issue #131: rename addFromMultiaddrs to setFromMultiaddrs
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-11-01 19:38:46 +01:00
Hector Sanjuan
f64d3edb91 config: save 1 time per second at most.
This prevents hammering and saving multiple times for example
when bootstrapping.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-11-01 14:42:23 +01:00
Hector Sanjuan
073c43e291 Issue #131: Make sure peers are moved to bootstrap when leaving
Also, do not shutdown when seeing our own departure during bootstrap.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-11-01 13:58:57 +01:00
Hector Sanjuan
c912cfd205 Issue #131: Destroy raft data when the peer has been removed
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-11-01 13:25:28 +01:00
Hector Sanjuan
7a5f8f184b Issue #131: Improvements adding and removing
This works on remove+shutdown procedure and fixes a few small
issues.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-11-01 13:00:32 +01:00
Hector Sanjuan
10c7afbd59 Raft: re-enable: do not start with unconsistent peers.
Improved error messages

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-11-01 12:17:33 +01:00
Hector Sanjuan
74ed634653 Raft: add cachestore for the log store
Just like consul does it

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-11-01 12:17:33 +01:00
Hector Sanjuan
107ad3bdfc Shutdown: do not save peers if we did not become ready
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-11-01 12:17:33 +01:00