Commit Graph

184 Commits

Author SHA1 Message Date
Hector Sanjuan
ca3fe646b1 Fix race condition when shutting down and watchPeers() run
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-10-29 12:03:57 +01:00
Hector Sanjuan
765987ef2c Improvements
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-10-29 12:03:32 +01:00
Hector Sanjuan
85a7dc5114 Allocate memory for response rpc types
Might trigger a very weird race when pointing to nil

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-10-29 12:03:03 +01:00
Hector Sanjuan
19b1124999 Make metrics human
Issue #572 exposes metrics but they carry the peer ID in binary.

This was ok with our internal codecs but it doesn't seem to work
very well with json, and makes the output format unusable.

This makes the Metric.Peer field a string.

Additinoally, fixes calling the command without arguments and displaying
the date in the right format.

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-10-26 14:11:30 +02:00
Hector Sanjuan
7d16108751 Start using libp2p/go-libp2p-gorpc
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-10-17 15:28:03 +02:00
Hector Sanjuan
398cdaa957
Merge pull request #549 from ipfs/fix/543-add-unhealthy
Fix #543: Use only healthy peers when adding everywhere (+tests)
2018-09-27 16:52:12 +02:00
Hector Sanjuan
cbdd075bb2
Merge pull request #551 from ipfs/fix/minor-version-compat
Allow running peers with different cluster versions in the same cluster
2018-09-27 08:13:53 +02:00
Hector Sanjuan
e7b1eacf83 Fix #543: Use only healthy peers when adding everywhere (+tests)
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-09-27 08:12:29 +02:00
Hector Sanjuan
59714f69d4 Allow running peers with different cluster versions in the same cluster
This patch modifies the RPC protocol tag to use Major and Minor parts of the
version and not all of it.

This means all peers on the 0.5.x can run in the same cluster.

As cluster has become more mature and I see less risks in letting peers from
similar versions run together. This is useful when upgrading too.

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-09-27 07:35:57 +02:00
Kishan Sagathiya
c62c922e87 Issue #446 Adding peername to PinInfo
Avoid typecasting

License: MIT
Signed-off-by: Kishan Mohanbhai Sagathiya <kishansagathiya@gmail.com>
2018-09-26 08:35:56 +05:30
Kishan Sagathiya
2cd4420ee8 Issue #446 Adding peername to PinInfo
Fixing errors and better code placement

License: MIT
Signed-off-by: Kishan Mohanbhai Sagathiya <kishansagathiya@gmail.com>
2018-09-24 23:20:37 +05:30
Kishan Sagathiya
773b4de1f0 Issue #446 Adding peername to PinInfo
Removed comments and code used for debugging

License: MIT
Signed-off-by: Kishan Mohanbhai Sagathiya <kishansagathiya@gmail.com>
2018-09-24 23:04:16 +05:30
Kishan Sagathiya
ef85ba8780 Issue #446 Adding peername to PinInfo
This commit adds peername to PinInfo and GlobalPinInfo so that we have
a nicer and more meaningfull output for `ipfs-cluster-ctl` queries like
`status`, `sync` and `recover`

License: MIT
Signed-off-by: Kishan Mohanbhai Sagathiya <kishansagathiya@gmail.com>
2018-09-24 23:04:16 +05:30
Adrian Lanzafame
31474f6490
update go-cid and go-libp2p
License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2018-09-24 11:35:38 +10:00
Hector Sanjuan
5bbc699bb4 Issue #340: Fix some data races
Unfortunately, there are still some data races in yamux
https://github.com/libp2p/go-libp2p/issues/396 so we can't
enable this by default.

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-08-15 12:27:01 +02:00
Hector Sanjuan
1b8967aa46 Merge branch 'master' into feat/sharding-v1
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-08-14 11:40:03 +02:00
Hector Sanjuan
f4455d3733 Address comments
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-08-13 11:56:16 +02:00
Hector Sanjuan
50fc3c4e95 Address comments from review
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-08-09 13:22:47 +02:00
Hector Sanjuan
0151f5e312 Fixes for adding: set default timeouts to 0. Improve flags and param names.
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-08-08 21:11:26 +02:00
Hector Sanjuan
87f4fcf958 Make the adder modules ipld.DAGService modules.
This removes a bunch of the channel dance and block forwarding
by having the adder submodules be DAGServices themselves and take
Add() directly from the ipfsAdder.

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-08-07 20:12:05 +02:00
Hector Sanjuan
6ffc8b8347 Update metrics regularly after a few blockputs
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-08-07 20:12:05 +02:00
Hector Sanjuan
3575f05753 Support adding Output (api.AddedOutput)
This was a long FIXME/TODO. Handling adding output and
reporting to the client of the progress of the adding process.

This attempts to do it. It is not sure that it works correctly
(response body being written while the multipart request is still being read)

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-08-07 20:12:05 +02:00
Hector Sanjuan
c81f61eeea Make sure sync and recover operations receive all cids in a clusterDAG.
Cleanup some code and fixmes.

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-08-07 20:12:05 +02:00
Hector Sanjuan
8f1a15b279 Move adder.Params to api.AddParams. Re-use in other modules
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-08-07 20:12:05 +02:00
Hector Sanjuan
aa5589d0d8 codeclimate
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-08-07 20:12:05 +02:00
Hector Sanjuan
327a81b85a golint govets
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-08-07 20:12:05 +02:00
Hector Sanjuan
623120fd50 Start cluster tests
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-08-07 20:12:05 +02:00
Hector Sanjuan
65dc17a78b testfixing
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-08-07 20:12:05 +02:00
Hector Sanjuan
a96241941e WIP
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-08-07 20:12:05 +02:00
Wyatt Daviau
82facd3629 fix for pinning same cid sharded and unsharded error
License: MIT
Signed-off-by: Wyatt Daviau <wdaviau@cs.stanford.edu>
2018-08-07 20:11:24 +02:00
Wyatt Daviau
2bc295ca9f address comments:
rebase, cleanup and separate tests,
AddParams type replaces map of strings

License: MIT
Signed-off-by: Wyatt Daviau <wdaviau@cs.stanford.edu>
2018-08-07 20:11:24 +02:00
Wyatt Daviau
f7c3dcce5b removing pointer across rpc
instead add logic is a submodule imported in api and cluster comp

License: MIT
Signed-off-by: Wyatt Daviau <wdaviau@cs.stanford.edu>
2018-08-07 20:11:24 +02:00
Wyatt Daviau
1704295331 Refactor add & general cleanup
addFile function is now a Cluster method accessed by RPC
residue from attempting to stream responses removed
ipfs-cluster-ctl ls bug fixed
problem with importer/add not printing resolved
new test now checks for this

License: MIT
Signed-off-by: Wyatt Daviau <wdaviau@cs.stanford.edu>
2018-08-07 20:11:23 +02:00
Wyatt Daviau
238f3726f3 Pin datastructure updated to support sharding
4 PinTypes specify how CID is pinned
Changes to Pin and Unpin to handle different PinTypes
Tests for different PinTypes
Migration for new state format using new Pin datastructures
Visibility of the PinTypes used internally limited by default

License: MIT
Signed-off-by: Wyatt Daviau <wdaviau@cs.stanford.edu>
2018-08-07 20:11:23 +02:00
Wyatt Daviau
e78ccbf6f4 Begin sharding component work:
Write basic scaffolding to include a sharding component in cluster
Sketch out a high level implementation in pseudo code
Share thoughts on upcoming design challenges

License: MIT
Signed-off-by: Wyatt Daviau <wdaviau@cs.stanford.edu>
2018-08-07 20:11:23 +02:00
Hector Sanjuan
d3d1f960f5 Feat: Enable DHT-based peer discovery and routing for cluster peers
This uses go-libp2p-kad-dht as routing provider for the Cluster Peers.

This means that:

* A cluster peer can discover other Cluster peers even if they are
not in their peerstore file.
* We remove a bunch of code sending and receiving peers multiaddresses
when a new peer was added to the Cluster.
* PeerAdd now takes an ID and not a multiaddress. We do not need to
ask the new peer which is our external multiaddress nor broadcast
the new multiaddress to everyone. This will fix problems when bootstrapping
a new peer to the Cluster while not all the other peers are online.
* Adding a new peer does not mean to open connections to all peers
anymore. The number of connections will be made according to the DHT
parameters (this is good to have for future work)

The that detecting a peer addition in the watchPeers() function does
no longer mean that we have connected to it or that we know its
multiaddresses. Therefore it's no point to save the peerstore in these
events anymore.

Here a question opens, should we save the peerstore at all, and should we
save multiaddresses only for cluster peers, or for everyone known?
Currently, the peerstore is only updated on clean shutdown,
and it is updated with all the multiaddresses known, and not limited to
peer IDs in the cluster, (because, why not).

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-07-24 15:33:41 +02:00
Adrian Lanzafame
c89508035a Maptracker: extract optracker and make improvements
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-05-28 11:59:26 +02:00
Hector Sanjuan
4d8f975d9b StateSync(): some improvements
This commit:

* Does not collect and return changed items when doing StateSync (they are
not used)
* Removes the StateSync RPC method (no longer used)
* Uses tracker.StatusAll() rather than requesting Status on each Cid (should
be faster with upcoming pintracker)
* Does not launch a go-routine to track every item. Track is an async
operation. This likely causes 1000s goroutines to be started with no good
reason.

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-05-25 09:58:18 +02:00
Hector Sanjuan
e4844ca819 Monitor: address comments
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-05-09 11:01:52 +02:00
Hector Sanjuan
a9d6fe3479 Types: rename metric.SetTTLDuration to metric.SetTTL
GetTTL returns duration. SetTTL should take duration too, not seconds.
This removes the original SetTTL method which used seconds.

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-05-07 14:26:06 +02:00
Hector Sanjuan
72e1d64de2 Fix publish cancelling contexts too early.
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-05-07 14:26:06 +02:00
Hector Sanjuan
3c3341e491 Monitor: add PublishMetric() to component interface
The monitor component should be in charge of deciding how it is
best to send metrics to other peers and what that means.

This adds the PublishMetric() method to the component interface
and moves that functionality from Cluster main component to the
basic monitor.

There is a behaviour change. Before, the metrics where sent only to
the leader, while the leader was the only peer to broadcast them everywhere.
Now, all peers broadcast all metrics everywhere. This is mostly
because we should not rely on the consensus layer providing a Leader(), so
we are taking the chance to remove this dependency.

Note that in any-case, pubsub monitoring should replace the
existing basic monitor. This is just paving the ground.

Additionally, in order to not duplicate the multiRPC code
in the monitor, I have moved that functionality to go-libp2p-gorpc
and added an rpcutil library to cluster which includes useful
methods to perform multiRPC requests (some of them existed in
util.go, others are new and help handling multiple contexts etc).

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-05-07 14:26:06 +02:00
Hector Sanjuan
33d9cdd3c4 Feat: emancipate Consensus from the Cluster component
This commit promotes the Consensus component (and Raft) to become a fully
independent thing like other components, passed to NewCluster during
initialization. Cluster (main component) no longer creates the consensus
layer internally. This has triggered a number of breaking changes
that I will explain below.

Motivation: Future work will require the possibility of running Cluster
with a consensus layer that is not Raft. The "consensus" layer is in charge
of maintaining two things:
  * The current cluster peerset, as required by the implementation
  * The current cluster pinset (shared state)

While the pinset maintenance has always been in the consensus layer, the
peerset maintenance was handled by the main component (starting by the "peers"
key in the configuration) AND the Raft component (internally)
and this generated lots of confusion: if the user edited the peers in the
configuration they would be greeted with an error.

The bootstrap process (adding a peer to an existing cluster) and configuration
key also complicated many things, since the main component did it, but only
when the consensus was initialized and in single peer mode.

In all this we also mixed the peerstore (list of peer addresses in the libp2p
host) with the peerset, when they need not to be linked.

By initializing the consensus layer before calling NewCluster, all the
difficulties in maintaining the current implementation in the same way
have come to light. Thus, the following changes have been introduced:

* Remove "peers" and "bootstrap" keys from the configuration: we no longer
edit or save the configuration files. This was a very bad practice, requiring
write permissions by the process to the file containing the private key and
additionally made things like Puppet deployments of cluster difficult as
configuration would mutate from its initial version. Needless to say all the
maintenance associated to making sure peers and bootstrap had correct values
when peers are bootstrapped or removed. A loud and detailed error message has
been added when staring cluster with an old config, along with instructions on
how to move forward.

* Introduce a PeerstoreFile ("peerstore") which stores peer addresses: in
ipfs, the peerstore is not persisted because it can be re-built from the
network bootstrappers and the DHT. Cluster should probably also allow
discoverability of peers addresses (when not bootstrapping, as in that case
we have it), but in the meantime, we will read and persist the peerstore
addresses for cluster peers in this file, different from the configuration.
Note that dns multiaddresses are now fully supported and no IPs are saved
when we have DNS multiaddresses for a peer.

* The former "peer_manager" code is now a pstoremgr module, providing utilities
to parse, add, list and generally maintain the libp2p host peerstore, including
operations on the PeerstoreFile. This "pstoremgr" can now also be extended to
perform address autodiscovery and other things indepedently from Cluster.

* Create and initialize Raft outside of the main Cluster component: since we
can now launch Raft independently from Cluster, we have more degrees of
freedom. A new "staging" option when creating the object allows a raft peer to
be launched in Staging mode, waiting to be added to a running consensus, and
thus, not electing itself as leader or doing anything like we were doing
before. This additionally allows us to track when the peer has become a
Voter, which only happens when it's caught up with the state, something that
was wonky previously.

* The raft configuration now includes an InitPeerset key, which allows to
provide a peerset for new peers and which is ignored when staging==true. The
whole Raft initialization code is way cleaner and stronger now.

* Cluster peer bootsrapping is now an ipfs-cluster-service feature. The
--bootstrap flag works as before (additionally allowing comma-separated-list
of entries). What bootstrap does, is to initialize Raft with staging == true,
and then call Join in the main cluster component. Only when the Raft peer
transitions to Voter, consensus becomes ready, and cluster becomes Ready.
This is cleaner, works better and is less complex than before (supporting
both flags and config values). We also backup and clean the state whenever
we are boostrapping, automatically

* ipfs-cluster-service no longer runs the daemon. Starting cluster needs
now "ipfs-cluster-service daemon". The daemon specific flags (bootstrap,
alloc) are now flags for the daemon subcommand. Here we mimic ipfs ("ipfs"
does not start the daemon but print help) and pave the path for merging both
service and ctl in the future.

While this brings some breaking changes, it significantly reduces the
complexity of the configuration, the code and most importantly, the
documentation. It should be easier now to explain the user what is the
right way to launch a cluster peer, and more difficult to make mistakes.

As a side effect, the PR also:

* Fixes #381 - peers with dynamic addresses
* Fixes #371 - peers should be Raft configuration option
* Fixes #378 - waitForUpdates may return before state fully synced
* Fixes #235 - config option shadowing (no cfg saves, no need to shadow)

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-05-07 07:39:41 +02:00
Sina Mahmoodi
d8f7a2adcc cluster: add version diff log to start errors
License: MIT
Signed-off-by: Sina Mahmoodi <itz.s1na@gmail.com>
2018-04-24 14:27:15 +02:00
Sina Mahmoodi
03cc809708 config: Add log and testcase for disable_repinning
* Test case creates a bunch of clusters, assigns a pin with replica factor
of n-1 to them, and removes one of the peers randomly. It then tests
to check that the number of clusters pinning the cid is n-2.
* Add warn log to let user know that due to disable_repinning option,
the cluster won't attempt to re-assign the pin.

License: MIT
Signed-off-by: Sina Mahmoodi <itz.s1na@gmail.com>
2018-04-23 22:01:52 +02:00
Sina Mahmoodi
0954c6d6fa Add disable_repinning cluster option
License: MIT
Signed-off-by: Sina Mahmoodi <itz.s1na@gmail.com>
2018-04-22 18:40:46 +02:00
Hector Sanjuan
dd4128affc Fix #339: Reduce Sleeps in tests
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-04-05 16:49:26 +02:00
Hector Sanjuan
58acf16efa cluster: introduce PeerWatchInterval config option.
It should provide a way to speed up peer list updates when
peers join/part. It was hardcoded.

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-04-05 16:49:26 +02:00
Hector Sanjuan
a4adce6592
Merge pull request #349 from ipfs/feat/restapi-libp2p
Feat #305: Libp2p support for REST API
2018-03-26 14:22:39 +02:00
Hector Sanjuan
6777122abf rest/libp2p-http: address @zenground0 comments
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-03-20 19:51:57 +01:00