Commit Graph

2463 Commits

Author SHA1 Message Date
Hector Sanjuan
b19b3c6882 Release 0.0.7
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-03-09 12:41:52 +01:00
Hector Sanjuan
ea1ad53df1 Fix tagging
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-03-09 12:41:43 +01:00
Hector Sanjuan
3dd0c47724 gx publish 0.0.6
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-03-09 12:24:50 +01:00
Hector Sanjuan
76707d4b9b Release 0.0.6
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-03-09 12:24:27 +01:00
Hector Sanjuan
9dff76b8da Merge pull request #58 from ipfs/pin-options
Allow specifying a replication factor per pin
2017-03-09 12:21:13 +01:00
Hector Sanjuan
a5275ad02e Merge branch 'small-fixes' into pin-options 2017-03-09 12:04:49 +01:00
Hector Sanjuan
271b6194b7 Merge pull request #56 from ipfs/docker
Add Dockerfile
2017-03-09 12:03:02 +01:00
Hector Sanjuan
e5c5909e42 Support a replication factor flag in "ipfs-cluster-ctl pin add".
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-03-08 18:51:03 +01:00
Hector Sanjuan
01d65a1595 Support replication factor as a pin parameter
This adds a replication_factor query argument to the API
endpoint which allows to set a replication factor per Pin.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-03-08 18:50:54 +01:00
Hector Sanjuan
9b652bcfb3 Rename CidArg to Pin.
CidArg used to be an internal name for an argument that carried a Cid.
Now it has surfaced to API level and makes no sense. It is a Pin. It
represents a Pin (Cid, Allocations, Replication Factor)

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-03-08 16:57:27 +01:00
Hector Sanjuan
e014ba7751 Fix output when global pin info comes with an error
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-03-08 01:10:24 +01:00
Hector Sanjuan
ee57cf8a36 Add Dockerfile
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-03-07 17:39:19 +01:00
Hector Sanjuan
9d6121c426 gx publish 0.0.5
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-03-07 17:39:00 +01:00
Hector Sanjuan
7be6399433 Release 0.0.5
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-03-07 17:38:53 +01:00
Hector Sanjuan
e5543f3b30 Fix ipfs-cluster-service looping eating lots of CPU (bad select loop)
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-03-07 17:38:04 +01:00
Hector Sanjuan
101720bd91 gx publish 0.0.4
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-03-03 18:49:52 +01:00
Hector Sanjuan
135a222301 Release 0.0.4
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-03-03 18:49:45 +01:00
Hector Sanjuan
d72fc5b317 Correctly replace release number when releasing
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-03-03 17:57:28 +01:00
Hector Sanjuan
36c12fc297 Put tools README into dist/ folder
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-03-03 17:52:43 +01:00
Hector Sanjuan
85651ae8cb Issue #55: Move tooling information to it's own README in the subprojects.
Add LICENSE there too, so that the build tool for distributions includes
it in the tar files.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-03-03 16:13:09 +01:00
Hector Sanjuan
e038ea039a Fix test by adding more delay
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-03-03 15:38:53 +01:00
Hector Sanjuan
1c16b653a7 gx publish 0.0.3
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-03-03 15:08:51 +01:00
Hector Sanjuan
df3c190fc8 Release 0.0.3
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-03-03 15:08:31 +01:00
Hector Sanjuan
d0e2eaf12f Update libp2p and deps
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-03-02 17:05:21 +01:00
Hector Sanjuan
58d962de4c Update gx version
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-03-02 17:05:21 +01:00
Hector Sanjuan
31e32b9c8e Update README with latest feature and Captain's log
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-03-02 15:12:06 +01:00
Hector Sanjuan
6ee0f3bead Issue #45: Detect expired metrics and trigger re-pins
An initial, simple approach to this. The PeerMonitor will
check it's metrics, compare to the current set of peers and put
an alert in the alerts channel if the metrics for a peer have expired.

Cluster reads this channel looking for "ping" alerts. The leader
is in charge of triggering repins in all the Cids allocated to
a given peer.

Also, metrics are now broadcasted to the cluster instead of pushed only
to the leader. Since they happen every few seconds it should be okay
regarding how it scales. Main problem was that if the leader is the node
going down, the new leader will not now about it as it doesn't have any
metrics for it, so it won't trigger an alert. If it acted on that then
the component needs to know it is the leader, or cluster needs to
handle alerts in complicated ways when leadership changes. Detecting
leadership changes or letting a component know who is the leader is another
dependency from the consensus algorithm that should be avoided. Therefore
we broadcast, for the moment.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-03-02 14:59:45 +01:00
Hector Sanjuan
d87c64b611 Unify the way of handling contex and cancels in components.
Cleaning up some ugly code

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-03-02 14:35:16 +01:00
Hector Sanjuan
c18b4beea3 Try to go more to the point in the readme. Move quickstart guide to docs/
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-02-27 10:30:09 +01:00
Hector Sanjuan
857e38175e Double-delay on test to allow leader re-election.
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-02-15 15:46:51 +01:00
Hector Sanjuan
37046dc925 go vet fixes
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-02-15 15:46:51 +01:00
Hector Sanjuan
8e45ce64c2 Documentation: captain log, readme and golint fixes
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-02-15 15:46:51 +01:00
Hector Sanjuan
469cf51e62 Make sure all times appear as UTC
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-02-15 14:15:12 +01:00
Hector Sanjuan
56d68dded1 Avoid showing duplicate Addresses in peers IDs.Addresses.
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-02-15 14:15:12 +01:00
Hector Sanjuan
f935eb4245 Use c.id shorthand instead of c.host.ID()
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-02-15 14:15:12 +01:00
Hector Sanjuan
c04df7ecf3 Merge pull request #46 from ipfs/41-replication
Add replication factor support
2017-02-15 13:36:45 +01:00
Hector Sanjuan
3f5c072e2d Small improvements to coverage.sh and travis.yml
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-02-14 19:26:48 +01:00
Hector Sanjuan
2512ecb701 Issue #41: Add Replication factor
New PeerManager, Allocator, Informer components have been added along
with a new "replication_factor" configuration option.

First, cluster peers collect and push metrics (Informer) to the Cluster
leader regularly. The Informer is an interface that can be implemented
in custom wayts to support custom metrics.

Second, on a pin operation, using the information from the collected metrics,
an Allocator can provide a list of preferences as to where the new pin
should be assigned. The Allocator is an interface allowing to provide
different allocation strategies.

Both Allocator and Informer are Cluster Componenets, and have access
to the RPC API.

The allocations are kept in the shared state. Cluster peer failure
detection is still missing and re-allocation is still missing, although
re-pinning something when a node is down/metrics missing does re-allocate
the pin somewhere else.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-02-14 19:13:08 +01:00
Hector Sanjuan
9deb56b762 Improve raft log forwarder
It was very idiotic poll a buffer/read it and then write rather than
just have a custom writer.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-02-09 22:31:04 +01:00
Hector Sanjuan
0e7091c6cb Move testing mocks to subpackage so they can be re-used
Related to #18

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-02-09 17:51:19 +01:00
Hector Sanjuan
c0697599ac Add nice text formatters to the ipfs-cluster-ctl app
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-02-09 16:30:53 +01:00
Hector Sanjuan
1b3d04e18b Move all API-related types to the /api subpackage.
At the beginning we opted for native types which were
serializable (PinInfo had a CidStr field instead of Cid).

Now we provide types in two versions: native and serializable.

Go methods use native. The rest of APIs (REST/RPC) use always
serializable versions. Methods are provided to convert between the
two.

The reason for moving these out of the way is to be able to re-use
type definitions when parsing API responses in `ipfs-cluster-ctl` or
any other clients that come up. API responses are just the serializable
version of types in JSON encoding. This also reduces having
duplicate types defs and parsing methods everywhere.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-02-09 16:30:53 +01:00
Hector Sanjuan
08a0261aae Update Captain's log
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-02-08 20:09:15 +01:00
Hector Sanjuan
33095e6dc6 Merge pull request #40 from ipfs/24-auto-join-bis-bis
Fix #24: Auto-join and auto-leave operations for Cluster
2017-02-08 12:25:41 +01:00
Hector Sanjuan
34fdc329fc Fix #24: Auto-join and auto-leave operations for Cluster
This is the third implementation attempt. This time, rather than
broadcasting PeerAdd/Join requests to the whole cluster, we use the
consensus log to broadcast new peers joining.

This makes it easier to recover from errors and to know who exactly
is member of a cluster and who is not. The consensus is, after all,
meant to agree on things, and the list of cluster peers is something
everyone has to agree on.

Raft itself uses a special log operation to maintain the peer set.

The tests are almost unchanged from the previous attempts so it should
be the same, except it doesn't seem possible to bootstrap a bunch of nodes
at the same time using different bootstrap nodes. It works when using
the same. I'm not sure this worked before either, but the code is
simpler than recursively contacting peers, and scales better for
larger clusters.

Nodes have to be careful about joining clusters while keeping the state
from a different cluster (disjoint logs). This may cause problems with
Raft.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-02-07 18:46:09 +01:00
Hector Sanjuan
4e0407ff5c Add ascii diagram to ipfs-cluster-service help. Add explicit "run" command.
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-02-02 14:34:51 +01:00
Hector Sanjuan
dd29be6dc7 Merge pull request #37 from ipfs/10-cluster-peers
PeerAdd and PeerRemove() functionality
2017-02-02 14:26:07 +01:00
Hector Sanjuan
89ecc1ce89 Encapsulate Raft functions better and simplify the Consensus component
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-02-02 13:51:49 +01:00
Hector Sanjuan
6c18c02106 Issue #10: peers/add and peers/rm feature + tests
This commit adds PeerAdd() and PeerRemove() endpoints, CLI support,
tests. Peer management is a delicate issue because of how the consensus
works underneath and the places that need to track such peers.

When adding a peer the procedure is as follows:

* Try to open a connection to the new peer and abort if not reachable
* Broadcast a PeerManagerAddPeer operation which tells all cluster members
to add the new Peer. The Raft leader will add it to Raft's peerset and
the multiaddress will be saved in the ClusterPeers configuration key.
* If the above fails because some cluster node is not responding,
broadcast a PeerRemove() and try to undo any damage.
* If the broadcast succeeds, send our ClusterPeers to the new Peer along with
the local multiaddress we are using in the connection opened in the
first step (that is the multiaddress through which the other peer can reach us)
* The new peer updates its configuration with the new list and joins
the consensus

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-02-02 13:51:49 +01:00
Hector Sanjuan
77b8830419 Rename captain's log to CAPTAIN.LOG.md
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-01-27 15:33:45 +01:00