* Test case creates a bunch of clusters, assigns a pin with replica factor
of n-1 to them, and removes one of the peers randomly. It then tests
to check that the number of clusters pinning the cid is n-2.
* Add warn log to let user know that due to disable_repinning option,
the cluster won't attempt to re-assign the pin.
License: MIT
Signed-off-by: Sina Mahmoodi <itz.s1na@gmail.com>
It should provide a way to speed up peer list updates when
peers join/part. It was hardcoded.
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
NewCluster() now takes an optional Host parameter.
The rationale is to allow to re-use an existing libp2p Host
when creating the cluster.
The NewClusterHost method now allows to create a host
with the options used by cluster.
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
This will display a few hints when consensus fails to start.
If consensus doesn't start (normally WaitForLeader times out),
it's because of libp2p not being able to reach other peers.
This sometimes also means that the wrong protector key (secret)
is being used, even though libp2p does not give us clear
indications.
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
Snapshot saving state commands (upgrade and import)
now save raft config peers as consensus peers in snapshot.
Snapshot index 1 -> 2 when saving from a fresh import to force
replication when bootstrapping.
License: MIT
Signed-off-by: Wyatt Daviau <wdaviau@cs.stanford.edu>
The StateSync() function did not take into account that the maptracker
may think that some pinned items are local/remote when they
should not be. In those cases, it needs to trigger re-tracks.
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
This ensures that we don't re-pin something which is already correctly
pinned with the same allocations.
It also ensures that we do re-pin something when the replication
factor associated to it changes.
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
This PR replaces ReplicationFactor with ReplicationFactorMax
and ReplicationFactor min.
This allows a CID to be pinned even though the desired
replication factor (max) is not reached, and prevents triggering
re-pinnings when the replication factor has not crossed the
lower threshold (min).
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
This enables support for testing in jenkins.
Several minor adjustments have been performed to improve the probability
that the tests pass, but there are still some random
problems appearing with libp2p conections not becoming available or
stopping working (similar to travis, but perhaps more often).
MacOS and Windows builds are broken in worse ways (those issues will
need to be addressed in the future).
Thanks to @zenground0 and @victorbjelkholm for support!
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
The --version flag is default from our cli library so I left that. The
version subcommand prints only the version number + the short commit
so it's a bit more easy to parse.
I have additionally reduced the amount of output on start up by converting
some messages to debug. I wish there was a level between INFO and DEBUG
though.
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
I've modified the peer identifier to be 'peername'. I've also
modified the TestLoadJSON to check that it is correctly read from
config and set to a default if empty.
Also added 'peername' fields to configurations for various tests.
This allows to call the Rest API's status and sync endpoints with a
"?local=true" parameter. This will trigger operations but only on the
local peer. Cluster *Local and RPC-*Local methods have been accordingly,
although they are aliases for the PinTracker methods (but otherwise they
would not be exposed in external APIs). ipfs-cluster-ctl has been updated to
support the new flag.
The rationaly behind this feature is that sometimes, a single cluster peer
(or the ipfs daemon in it) is misbehaving. The user then wants to Sync,
Recover, or see Status for that single peer. This is specially relevant
when working with big pinsets in larger clusters, as a Status() call will
be considerably more expensive when broadcasted everywhere.
Note that the Rest API keeps returning GlobalPinInfo objects even on local=true
calls. This ensures that the user always gets the same datatype from an endpoint.
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
I renamed Hostname to simply Name as to not imply relation to DNS.
Removed quotes from formatter, used helper function setting config,
and added defensive error check.
This adds API, RPC calls to support RecoverAllLocal() (and expose RecoverLocal()
on the Rest API too). cluster-ctl is updated accordingly.
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
I added a "hostname" property to a node's configuration file. Its
value defaults to the hostname provided by the OS, but can be
modified to anything beside an empty string in the config file.
The "hostname" was added to the output of the "id" call. Thus, peer
hostnames are available when listing peers.
ipfs-cluster-service now has a migration subcommand that upgrades
persistant state snapshots with an out-of-date format version to the
newest version of raft state. If all cluster members shutdown with
consistent state, upgrade ipfs-cluster, and run the state upgrade command,
the new version of cluster will be compatible with persistent storage.
ipfs-cluster now validates its persistent state upon loading it and exits
with a clear error in the case the state format version is not up to date.
Raft snapshotting is enforced on all shutdowns and the json backup is no
longer run. This commit makes use of recent changes to libp2p-raft
allowing raft states to implement their own marshaling strategies. Now
mapstate handles the logic for its (de)serialization. In the interest of
supporting various potential upgrade formats the state serialization
begins with a varint (right now one byte) describing the version.
Some go tests are modified and a go test is added to cover new ipfs-cluster
raft snapshot reading functions. Sharness tests are added to cover the
state upgrade command.
make sure we save a new config if the new peerset
is different than the one in the configuration at
boot.
Hopefully this fixes a race condition in PeerAdd test
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
SwarmConnect on the ipfs connector calls rpc Peers() which
requests IDs for every peer member. If that peer member
is booting, it might get the request after RPC is setup
but before consensus is initialized. In which case
a panic happens. Probability that this happens is small, but still.
Also increase the connect swarms delay to 30 seconds, which
should be a bit longer than the default wait_for_leader timeout,
otherwise we might connect swarms while there's not even a leader.
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
This is what it was likely causing PeerRemove tests to fail randomly
but very often. We cancelled the Cluster context before shutting down
the Consensus component. This killed networking and aborted
the peer remove operations when the leader is removing itself.
As a result, it would error with "leadership lost", which would
trigger a retry which would set the final error to "context cancelled"
because the shutdown of the consensus component proceeds during the
retry, cancelling the consensus context.
This is not only affecting tests, it might affected operations when
running cluster.
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
Sometimes tests fail because the returned api.ID for a new peer
does not include the current cluster peers. This is because the
new peerset has not yet be commited in the new peer at the time
of the request.
This commit retries obtaining the request until the correct peerset
comes in, or gives up after two seconds retrying.
Rather than the tests failing, note that the ID returned it is very
user-facing and should contain the current cluster peers after adding,
and not the former peerset, at least while peerAdd operation is allowed.
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
It seems more logical that the new peer should know how to contact
everyone before it needs to start doing it, but it probably does not
matter much. Still, more logical.
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
This change removes the duplicities of the PeerManager component:
* No more commiting PeerAdd and PeerRm log entries
* The Raft peer set is the source of truth
* Basic broadcasting is used to communicate peer multiaddresses
in the cluster
* A peer can only be added in a healthy cluster
* A peer can be removed from any cluster which can still commit
* This also adds support for multiple multiaddresses per peer
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>