Commit Graph

148 Commits

Author SHA1 Message Date
Adrian Lanzafame
c89508035a Maptracker: extract optracker and make improvements
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-05-28 11:59:26 +02:00
Hector Sanjuan
4d8f975d9b StateSync(): some improvements
This commit:

* Does not collect and return changed items when doing StateSync (they are
not used)
* Removes the StateSync RPC method (no longer used)
* Uses tracker.StatusAll() rather than requesting Status on each Cid (should
be faster with upcoming pintracker)
* Does not launch a go-routine to track every item. Track is an async
operation. This likely causes 1000s goroutines to be started with no good
reason.

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-05-25 09:58:18 +02:00
Hector Sanjuan
e4844ca819 Monitor: address comments
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-05-09 11:01:52 +02:00
Hector Sanjuan
a9d6fe3479 Types: rename metric.SetTTLDuration to metric.SetTTL
GetTTL returns duration. SetTTL should take duration too, not seconds.
This removes the original SetTTL method which used seconds.

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-05-07 14:26:06 +02:00
Hector Sanjuan
72e1d64de2 Fix publish cancelling contexts too early.
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-05-07 14:26:06 +02:00
Hector Sanjuan
3c3341e491 Monitor: add PublishMetric() to component interface
The monitor component should be in charge of deciding how it is
best to send metrics to other peers and what that means.

This adds the PublishMetric() method to the component interface
and moves that functionality from Cluster main component to the
basic monitor.

There is a behaviour change. Before, the metrics where sent only to
the leader, while the leader was the only peer to broadcast them everywhere.
Now, all peers broadcast all metrics everywhere. This is mostly
because we should not rely on the consensus layer providing a Leader(), so
we are taking the chance to remove this dependency.

Note that in any-case, pubsub monitoring should replace the
existing basic monitor. This is just paving the ground.

Additionally, in order to not duplicate the multiRPC code
in the monitor, I have moved that functionality to go-libp2p-gorpc
and added an rpcutil library to cluster which includes useful
methods to perform multiRPC requests (some of them existed in
util.go, others are new and help handling multiple contexts etc).

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-05-07 14:26:06 +02:00
Hector Sanjuan
33d9cdd3c4 Feat: emancipate Consensus from the Cluster component
This commit promotes the Consensus component (and Raft) to become a fully
independent thing like other components, passed to NewCluster during
initialization. Cluster (main component) no longer creates the consensus
layer internally. This has triggered a number of breaking changes
that I will explain below.

Motivation: Future work will require the possibility of running Cluster
with a consensus layer that is not Raft. The "consensus" layer is in charge
of maintaining two things:
  * The current cluster peerset, as required by the implementation
  * The current cluster pinset (shared state)

While the pinset maintenance has always been in the consensus layer, the
peerset maintenance was handled by the main component (starting by the "peers"
key in the configuration) AND the Raft component (internally)
and this generated lots of confusion: if the user edited the peers in the
configuration they would be greeted with an error.

The bootstrap process (adding a peer to an existing cluster) and configuration
key also complicated many things, since the main component did it, but only
when the consensus was initialized and in single peer mode.

In all this we also mixed the peerstore (list of peer addresses in the libp2p
host) with the peerset, when they need not to be linked.

By initializing the consensus layer before calling NewCluster, all the
difficulties in maintaining the current implementation in the same way
have come to light. Thus, the following changes have been introduced:

* Remove "peers" and "bootstrap" keys from the configuration: we no longer
edit or save the configuration files. This was a very bad practice, requiring
write permissions by the process to the file containing the private key and
additionally made things like Puppet deployments of cluster difficult as
configuration would mutate from its initial version. Needless to say all the
maintenance associated to making sure peers and bootstrap had correct values
when peers are bootstrapped or removed. A loud and detailed error message has
been added when staring cluster with an old config, along with instructions on
how to move forward.

* Introduce a PeerstoreFile ("peerstore") which stores peer addresses: in
ipfs, the peerstore is not persisted because it can be re-built from the
network bootstrappers and the DHT. Cluster should probably also allow
discoverability of peers addresses (when not bootstrapping, as in that case
we have it), but in the meantime, we will read and persist the peerstore
addresses for cluster peers in this file, different from the configuration.
Note that dns multiaddresses are now fully supported and no IPs are saved
when we have DNS multiaddresses for a peer.

* The former "peer_manager" code is now a pstoremgr module, providing utilities
to parse, add, list and generally maintain the libp2p host peerstore, including
operations on the PeerstoreFile. This "pstoremgr" can now also be extended to
perform address autodiscovery and other things indepedently from Cluster.

* Create and initialize Raft outside of the main Cluster component: since we
can now launch Raft independently from Cluster, we have more degrees of
freedom. A new "staging" option when creating the object allows a raft peer to
be launched in Staging mode, waiting to be added to a running consensus, and
thus, not electing itself as leader or doing anything like we were doing
before. This additionally allows us to track when the peer has become a
Voter, which only happens when it's caught up with the state, something that
was wonky previously.

* The raft configuration now includes an InitPeerset key, which allows to
provide a peerset for new peers and which is ignored when staging==true. The
whole Raft initialization code is way cleaner and stronger now.

* Cluster peer bootsrapping is now an ipfs-cluster-service feature. The
--bootstrap flag works as before (additionally allowing comma-separated-list
of entries). What bootstrap does, is to initialize Raft with staging == true,
and then call Join in the main cluster component. Only when the Raft peer
transitions to Voter, consensus becomes ready, and cluster becomes Ready.
This is cleaner, works better and is less complex than before (supporting
both flags and config values). We also backup and clean the state whenever
we are boostrapping, automatically

* ipfs-cluster-service no longer runs the daemon. Starting cluster needs
now "ipfs-cluster-service daemon". The daemon specific flags (bootstrap,
alloc) are now flags for the daemon subcommand. Here we mimic ipfs ("ipfs"
does not start the daemon but print help) and pave the path for merging both
service and ctl in the future.

While this brings some breaking changes, it significantly reduces the
complexity of the configuration, the code and most importantly, the
documentation. It should be easier now to explain the user what is the
right way to launch a cluster peer, and more difficult to make mistakes.

As a side effect, the PR also:

* Fixes #381 - peers with dynamic addresses
* Fixes #371 - peers should be Raft configuration option
* Fixes #378 - waitForUpdates may return before state fully synced
* Fixes #235 - config option shadowing (no cfg saves, no need to shadow)

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-05-07 07:39:41 +02:00
Sina Mahmoodi
d8f7a2adcc cluster: add version diff log to start errors
License: MIT
Signed-off-by: Sina Mahmoodi <itz.s1na@gmail.com>
2018-04-24 14:27:15 +02:00
Sina Mahmoodi
03cc809708 config: Add log and testcase for disable_repinning
* Test case creates a bunch of clusters, assigns a pin with replica factor
of n-1 to them, and removes one of the peers randomly. It then tests
to check that the number of clusters pinning the cid is n-2.
* Add warn log to let user know that due to disable_repinning option,
the cluster won't attempt to re-assign the pin.

License: MIT
Signed-off-by: Sina Mahmoodi <itz.s1na@gmail.com>
2018-04-23 22:01:52 +02:00
Sina Mahmoodi
0954c6d6fa Add disable_repinning cluster option
License: MIT
Signed-off-by: Sina Mahmoodi <itz.s1na@gmail.com>
2018-04-22 18:40:46 +02:00
Hector Sanjuan
dd4128affc Fix #339: Reduce Sleeps in tests
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-04-05 16:49:26 +02:00
Hector Sanjuan
58acf16efa cluster: introduce PeerWatchInterval config option.
It should provide a way to speed up peer list updates when
peers join/part. It was hardcoded.

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-04-05 16:49:26 +02:00
Hector Sanjuan
a4adce6592
Merge pull request #349 from ipfs/feat/restapi-libp2p
Feat #305: Libp2p support for REST API
2018-03-26 14:22:39 +02:00
Hector Sanjuan
6777122abf rest/libp2p-http: address @zenground0 comments
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-03-20 19:51:57 +01:00
Hector Sanjuan
09f4c9fce3 rest/libp2p-http: address lanzafame's review
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-03-20 19:35:42 +01:00
Hector Sanjuan
a6acb72a2c Improve errors for bootstraps
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-03-20 14:27:49 +01:00
Hector Sanjuan
a73d7e6f7e Relocate multiaddrJoin and multiaddrSplit to api/types.h
So they can serve as multi-module helpers without having circular deps.

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-03-16 13:37:32 +01:00
Hector Sanjuan
de07a9dd70
Merge pull request #344 from ipfs/feat/wrong-secrets
Fix #167: Useful messages when consensus doesn't start
2018-03-16 11:41:05 +01:00
Hector Sanjuan
5956dce69f Create cluster Host in ipfs-cluster-service.
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-03-15 00:04:54 +01:00
Hector Sanjuan
41b17bf477 Cluster: add libp2p host parameter to constructor.
NewCluster() now takes an optional Host parameter.
The rationale is to allow to re-use an existing libp2p Host
when creating the cluster.

The NewClusterHost method now allows to create a host
with the options used by cluster.

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-03-15 00:04:54 +01:00
Hector Sanjuan
4f9fccde72 Addressing feedback
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-03-13 10:37:47 +01:00
Hector Sanjuan
740f314976 Fix #167: Useful messages when consensus doesn't start
This will display a few hints when consensus fails to start.
If consensus doesn't start (normally WaitForLeader times out),
it's because of libp2p not being able to reach other peers.

This sometimes also means that the wrong protector key (secret)
is being used, even though libp2p does not give us clear
indications.

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-03-12 22:54:59 +01:00
Wyatt Daviau
b0e8452020 fix errors
License: MIT
Signed-off-by: Wyatt Daviau <wdaviau@cs.stanford.edu>
2018-03-12 11:33:46 -04:00
Wyatt Daviau
e2c4b6f5a9 consolidate Pin and PinTo
License: MIT
Signed-off-by: Wyatt Daviau <wdaviau@cs.stanford.edu>
2018-03-09 17:16:20 -05:00
Wyatt Daviau
0a34f3382b ToPin call and priority pinning
License: MIT
Signed-off-by: Wyatt Daviau <wdaviau@cs.stanford.edu>
2018-03-09 17:13:35 -05:00
Hector Sanjuan
5ba746a9ca Fix: official builds panic on start
Since the commit variable is not set in these builds :(

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-02-20 17:55:18 +01:00
Hector Sanjuan
ebd167edc0
Merge pull request #294 from ipfs/fix/jenkins
Fix jenkins tests
2018-01-26 12:40:09 +01:00
Wyatt Daviau
eafc747305 fix/297 Resolve the lack of snapshot pushes:
Snapshot saving state commands (upgrade and import)
now save raft config peers as consensus peers in snapshot.
Snapshot index 1 -> 2 when saving from a fresh import to force
replication when bootstrapping.

License: MIT
Signed-off-by: Wyatt Daviau <wdaviau@cs.stanford.edu>
2018-01-25 16:47:12 -05:00
Hector Sanjuan
ddb5da18c9 Tests: Bind testing clusters on random port
Jenkins likes this very much.

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-01-24 20:16:55 +01:00
Hector Sanjuan
4b6ee706e7 Fix #222: Fix overpinning or underpinning of pins after rejoin
The StateSync() function did not take into account that the maptracker
may think that some pinned items are local/remote when they
should not be. In those cases, it needs to trigger re-tracks.

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-01-19 22:24:03 +01:00
Hector Sanjuan
dcfc962f24 Feat #277: Address review comments
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-01-19 22:24:03 +01:00
Hector Sanjuan
a1ab106fcc Feat #277: Improve getCurretPin() call
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-01-19 22:24:03 +01:00
Hector Sanjuan
ae1afe3af8 Feat #277: Avoid re-pinning log entries when the new pin is the same
This ensures that we don't re-pin something which is already correctly
pinned with the same allocations.

It also ensures that we do re-pin something when the replication
factor associated to it changes.

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-01-19 22:24:03 +01:00
Hector Sanjuan
b013850f94 Fear #277: Add test about wanted < 0
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-01-19 22:24:03 +01:00
Hector Sanjuan
4549282cba Fix #277: Introduce maximum and minimum replication factor
This PR replaces ReplicationFactor with ReplicationFactorMax
and ReplicationFactor min.

This allows a CID to be pinned even though the desired
replication factor (max) is not reached, and prevents triggering
re-pinnings when the replication factor has not crossed the
lower threshold (min).

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-01-16 16:36:06 +01:00
Wyatt
fc237b21d4 Feat: Enable Jenkins builds
This enables support for testing in jenkins.

Several minor adjustments have been performed to improve the probability
that the tests pass, but there are still some random
problems appearing with libp2p conections not becoming available or
stopping working (similar to travis, but perhaps more often).

MacOS and Windows builds are broken in worse ways (those issues will
need to be addressed in the future).

Thanks to @zenground0 and @victorbjelkholm for support!

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-01-11 18:11:46 +01:00
Hector Sanjuan
2b6dfa45cd cluster-service: add version subcommand and change some startup logging
The --version flag is default from our cli library so I left that. The
version subcommand prints only the version number + the short commit
so it's a bit more easy to parse.

I have additionally reduced the amount of output on start up by converting
some messages to debug. I wish there was a level between INFO and DEBUG
though.

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2017-12-13 10:25:01 +01:00
Hector Sanjuan
9a246a237d fix go vet: address go vet warnings
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2017-12-06 15:15:41 +01:00
Hector Sanjuan
c0628e43ff fix golint: Address a few golint warnings
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2017-12-06 15:15:38 +01:00
Hector Sanjuan
1e87fccf0e
Merge pull request #257 from te0d/feat/add-peer-identifier
Added Hostname Property to Configuration
2017-12-04 14:25:49 +01:00
Hector Sanjuan
d6a7caf7a4 Issue #259: Address CR comments
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2017-12-04 13:59:48 +01:00
Tom O'Donnell (te0d)
7d43be33a4 Added Peername Configuration Test and Renamed to Peername
I've modified the peer identifier to be 'peername'. I've also
modified the TestLoadJSON to check that it is correctly read from
config and set to a default if empty.

Also added 'peername' fields to configurations for various tests.
2017-12-01 13:50:13 -05:00
Hector Sanjuan
4922c95589 Support --local parameter for Status[Local] and Sync[Local] operations
This allows to call the Rest API's status and sync endpoints with a
"?local=true" parameter. This will trigger operations but only on the
local peer. Cluster *Local and RPC-*Local methods have been accordingly,
although they are aliases for the PinTracker methods (but otherwise they
would not be exposed in external APIs). ipfs-cluster-ctl has been updated to
support the new flag.

The rationaly behind this feature is that sometimes, a single cluster peer
(or the ipfs daemon in it) is misbehaving. The user then wants to Sync,
Recover, or see Status for that single peer. This is specially relevant
when working with big pinsets in larger clusters, as a Status() call will
be considerably more expensive when broadcasted everywhere.

Note that the Rest API keeps returning GlobalPinInfo objects even on local=true
calls. This ensures that the user always gets the same datatype from an endpoint.

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2017-12-01 12:56:26 +01:00
Tom O'Donnell (te0d)
d1ef3d0493 Fixes for Adding Peer Identifier from Code Review
I renamed Hostname to simply Name as to not imply relation to DNS.
Removed quotes from formatter, used helper function setting config,
and added defensive error check.
2017-11-30 15:21:53 -05:00
Hector Sanjuan
e824aea55e RecoverAll: Implement RecoverAllLocal() which recovers all pins in a peer
This adds API, RPC calls to support RecoverAllLocal() (and expose RecoverLocal()
on the Rest API too). cluster-ctl is updated accordingly.

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2017-11-30 01:53:31 +01:00
Tom O'Donnell (te0d)
c6c8512a27 Added Hostname Property to Configuration
I added a "hostname" property to a node's configuration file. Its
value defaults to the hostname provided by the OS, but can be
modified to anything beside an empty string in the config file.

The "hostname" was added to the output of the "id" call. Thus, peer
hostnames are available when listing peers.
2017-11-29 15:44:31 -05:00
Wyatt
47b744f1c0 ipfs-cluster-service state upgrade cli command
ipfs-cluster-service now has a migration subcommand that upgrades
    persistant state snapshots with an out-of-date format version to the
    newest version of raft state. If all cluster members shutdown with
    consistent state, upgrade ipfs-cluster, and run the state upgrade command,
    the new version of cluster will be compatible with persistent storage.
    ipfs-cluster now validates its persistent state upon loading it and exits
    with a clear error in the case the state format version is not up to date.

    Raft snapshotting is enforced on all shutdowns and the json backup is no
    longer run.  This commit makes use of recent changes to libp2p-raft
    allowing raft states to implement their own marshaling strategies. Now
    mapstate handles the logic for its (de)serialization.  In the interest of
    supporting various potential upgrade formats the state serialization
    begins with a varint (right now one byte) describing the version.

    Some go tests are modified and a go test is added to cover new ipfs-cluster
    raft snapshot reading functions.  Sharness tests are added to cover the
    state upgrade command.
2017-11-28 22:35:48 -05:00
Hector Sanjuan
1f93662b3e cluster: get first peerset from configuration
make sure we save a new config if the new peerset
is different than the one in the configuration at
boot.

Hopefully this fixes a race condition in PeerAdd test

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2017-11-15 18:01:49 +01:00
Hector Sanjuan
a656e45375 cluster: safeguard consensus not set when calling ID
SwarmConnect on the ipfs connector calls rpc Peers() which
requests IDs for every peer member. If that peer member
is booting, it might get the request after RPC is setup
but before consensus is initialized. In which case
a panic happens. Probability that this happens is small, but still.

Also increase the connect swarms delay to 30 seconds, which
should be a bit longer than the default wait_for_leader timeout,
otherwise we might connect swarms while there's not even a leader.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-11-15 16:38:21 +01:00
Hector Sanjuan
76fe62fae4 Fix error message when not enough candidates for pinning exist
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-11-15 03:17:08 +01:00