Commit Graph

64 Commits

Author SHA1 Message Date
dgrisham
98335901fc Refactored private network implementation + config. 2017-07-08 11:11:49 -06:00
dgrisham
59fde30e1e swarm secret implementation started 2017-07-03 14:00:01 -06:00
dgrisham
1d90130f65 initial tests passing 2017-06-29 18:59:36 -06:00
Hector Sanjuan
2d8cd236b5 Fix #75: Overriding options should not make them permanent in config
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-03-28 19:10:50 +02:00
Hector Sanjuan
2bbbea79cc Issue #49: Add disk informer
The disk informer uses "ipfs repo stat" to fetch the RepoSize value and
uses it as a metric.

The numpinalloc allocator is now a generalized ascendalloc which
sorts metrics in ascending order and return the ones with lowest
values.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-03-27 20:40:49 +02:00
Hector Sanjuan
6ee0f3bead Issue #45: Detect expired metrics and trigger re-pins
An initial, simple approach to this. The PeerMonitor will
check it's metrics, compare to the current set of peers and put
an alert in the alerts channel if the metrics for a peer have expired.

Cluster reads this channel looking for "ping" alerts. The leader
is in charge of triggering repins in all the Cids allocated to
a given peer.

Also, metrics are now broadcasted to the cluster instead of pushed only
to the leader. Since they happen every few seconds it should be okay
regarding how it scales. Main problem was that if the leader is the node
going down, the new leader will not now about it as it doesn't have any
metrics for it, so it won't trigger an alert. If it acted on that then
the component needs to know it is the leader, or cluster needs to
handle alerts in complicated ways when leadership changes. Detecting
leadership changes or letting a component know who is the leader is another
dependency from the consensus algorithm that should be avoided. Therefore
we broadcast, for the moment.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-03-02 14:59:45 +01:00
Hector Sanjuan
34fdc329fc Fix #24: Auto-join and auto-leave operations for Cluster
This is the third implementation attempt. This time, rather than
broadcasting PeerAdd/Join requests to the whole cluster, we use the
consensus log to broadcast new peers joining.

This makes it easier to recover from errors and to know who exactly
is member of a cluster and who is not. The consensus is, after all,
meant to agree on things, and the list of cluster peers is something
everyone has to agree on.

Raft itself uses a special log operation to maintain the peer set.

The tests are almost unchanged from the previous attempts so it should
be the same, except it doesn't seem possible to bootstrap a bunch of nodes
at the same time using different bootstrap nodes. It works when using
the same. I'm not sure this worked before either, but the code is
simpler than recursively contacting peers, and scales better for
larger clusters.

Nodes have to be careful about joining clusters while keeping the state
from a different cluster (disjoint logs). This may cause problems with
Raft.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-02-07 18:46:09 +01:00
Hector Sanjuan
89ecc1ce89 Encapsulate Raft functions better and simplify the Consensus component
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-02-02 13:51:49 +01:00
Hector Sanjuan
6c18c02106 Issue #10: peers/add and peers/rm feature + tests
This commit adds PeerAdd() and PeerRemove() endpoints, CLI support,
tests. Peer management is a delicate issue because of how the consensus
works underneath and the places that need to track such peers.

When adding a peer the procedure is as follows:

* Try to open a connection to the new peer and abort if not reachable
* Broadcast a PeerManagerAddPeer operation which tells all cluster members
to add the new Peer. The Raft leader will add it to Raft's peerset and
the multiaddress will be saved in the ClusterPeers configuration key.
* If the above fails because some cluster node is not responding,
broadcast a PeerRemove() and try to undo any damage.
* If the broadcast succeeds, send our ClusterPeers to the new Peer along with
the local multiaddress we are using in the connection opened in the
first step (that is the multiaddress through which the other peer can reach us)
* The new peer updates its configuration with the new list and joins
the consensus

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-02-02 13:51:49 +01:00
Hector Sanjuan
e932b2f3f6 Fix tests with raftCfg config section and do not panic if its not there
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-01-23 21:07:16 +01:00
Hector Sanjuan
d1731ebd28 Use multiaddresses in the configuration and rename JSON entries for clarity
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-01-23 18:38:59 +01:00
Hector Sanjuan
f0c5350743 Get remote RPC requests working. First e2e tests.
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2016-12-16 22:00:08 +01:00
Hector Sanjuan
319c97585b Renames everywhere removing redundant "Cluster" from "ClusterSomething".
Start preparing syncs() and status()

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2016-12-15 19:08:46 +01:00
Hector Sanjuan
0f31995bd6 consensus tests
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2016-12-14 15:31:50 +01:00