This is what it was likely causing PeerRemove tests to fail randomly
but very often. We cancelled the Cluster context before shutting down
the Consensus component. This killed networking and aborted
the peer remove operations when the leader is removing itself.
As a result, it would error with "leadership lost", which would
trigger a retry which would set the final error to "context cancelled"
because the shutdown of the consensus component proceeds during the
retry, cancelling the consensus context.
This is not only affecting tests, it might affected operations when
running cluster.
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
Sometimes tests fail because the returned api.ID for a new peer
does not include the current cluster peers. This is because the
new peerset has not yet be commited in the new peer at the time
of the request.
This commit retries obtaining the request until the correct peerset
comes in, or gives up after two seconds retrying.
Rather than the tests failing, note that the ID returned it is very
user-facing and should contain the current cluster peers after adding,
and not the former peerset, at least while peerAdd operation is allowed.
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
I think this will prevents some random tests failures
when we realize that we are not anymore in the peerset
and trigger a shutdown but Raft has not finished fully
committing the operation, which then triggers an error,
and a retry. But the contexts are cancelled in the retry
so it won't find a leader and will error finally error
with that message.
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
Also, start removing mentions of `PeerAdd` operation, as things should
happen with `--bootstrap` (Join). PeerAdd should not be part of the user
workflow and might disappear or be hidden in the future.
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
It seems more logical that the new peer should know how to contact
everyone before it needs to start doing it, but it probably does not
matter much. Still, more logical.
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
Review documentation to be in line with latest updates to Raft and
any other feature introduced since 0.12.0.
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
Snaps set a custom $HOME, but we were using /etc/passwd.
There might be other cases were using a custom $HOME might be
handy.
In UNIX systems, $HOME should be always set. For all the rest,
we fall back to the original os/user.HomeDir method.
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
This commit changes the way that consensus.Clean() works. Before
it deleted the whole data folder. Now it renames it as <name>.old.0
and leaves it. When Clean() is called again, it renames <name>.old.0
as <name>.old.1, and the actual data becomes <name>.old.0. Higher number
means older. The number of backups is fixed to 5. When 5 backups exists
and a new one comes up again, the last one is discarded.
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
Up to now, we hardcoded progress to "false" in the proxy, regardless
of what the original request said. We now leave it as it is, and
just ignore any progress updates when processing the response.
Since the response is buffered and sent back all together, they are
still useless, but at least the clients (ipfs cli) won't show a 0%
progress bar when successfully adding a file.
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
This test failed if the leader was randomly selected to be
the node on which we wait for leader. Needed to remove
the shutdown-leader from the clusters slice.
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
This change removes the duplicities of the PeerManager component:
* No more commiting PeerAdd and PeerRm log entries
* The Raft peer set is the source of truth
* Basic broadcasting is used to communicate peer multiaddresses
in the cluster
* A peer can only be added in a healthy cluster
* A peer can be removed from any cluster which can still commit
* This also adds support for multiple multiaddresses per peer
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>