This commit adds badger v3 support as datastore backend for IPFS Cluster.
Badgerv3 has a number of improvements over v1, particularly around
compaction. It is significantly more modern and should behave better in
general (some testing is pending).
Long story: Since #1768 there has been a recurring repinning test failure with
Raft consensus.
Per the test, if a pin is allocated to a peer that has been shutdown,
submitting the pin again should re-allocate it to a peer that is still
running.
Investigation on why this test fails and why it fails only in Raft lead to
realizing that this and other similar tests, were passing by chance. The
needed re-allocations were made not by the new submission of the pin, but by
the automatic-repinning feature. The actual resubmitted pin was carrying the
same allocations (one of them being the peer that was down), but it was
silently failing because the RedirectToLeader() code path was using
cc.ctx and hitting the peer that had been shutdown, which caused it to error.
Fixing the context propagation, meant that we would re-overwrite the pin with
the old allocations, thus the actual behaviour did not pass the test.
So, on one side, this fix an number of tests that had not disabled automatic
repinning and was probably getting in the way of things. On the other side,
this removes a condition that prevents re-allocation of pins if they exists
and options have not changed.
I don't fully understand why this was there though, since the Allocate() code
does return the old allocations anyways when they are enough, so it should not
re-allocate randomly. I suspect this was preventing some misbehaviour in the
Allocate() code from the time before it was improved with multiple allocators
etc.
We are propagating the wrong context (mostly from the Cluster top-level
methods). This makes that request cancellations (and cancellations of the
associated contexts) are not propagated to many methods, and can result in
deadlocks when an operation that is holding a lock is not aborted.
This affects for example the operation tracker. Getting all operations from
the tracker relies on someone reading from the out channel, or on the context
being cancelled. When a request is aborted in the middle of the response, and
the context is not cancelled, everything that wants to list operations would
become deadlocked, including operations that need write locks like
TrackNewOperation.
This fixes it.
When IPFS starts failing or doesn't respond (i.e. during a restart), cluster
is likely to start sending requests at very fast rates. i.e. if there are 100k
items to be pinned, and pins start failing immediately, cluster will consume
the pin queue really fast and it will all be failures. At the same time, ipfs
is hammered non-stop until recover, which may make it harder.
This commits introduces a rate-limit when requests to IPFS fail. After 10
failed requests, requests will be sent at most at 1req/s rate. Once a requests
succeeds, the rate-limit is raised.
This should prevent hammering the IPFS daemon, but also increased CPU in
cluster as it burns through pinning queues when IPFS is offline, making the
situation in machines worse (and emitting way more logs).
This fixes a bug in API code that made it return 204-No content when the RPC
methods failed with an error before any items were returned on the channel.
Unfortunately we were not paying attentions to errors while rpc-streaming pins
in the pintracker. The result is that the StatusAll operation would list all
the pins as unexpectedly unpinned when ipfs is offline, and this would result in
recover/requeing operations for all pins when ipfs is offline.
This commits changes the behaviour so that if IPFS Pin/ls has resulted in an
error, then the StatusAll operation cannot complete at all.
This commit introduces unlimited waiting on start until a request to `ipfs id`
succeeds.
Waiting has some consequences:
* State watching (recover/sync) and metrics publishing does not start until ipfs is ready
* swarm/connect is not triggered until ipfs is ready.
Once the first request to ipfs succeeds everything goes to what it was before.
This alleviates trying operations like sending our IDs in metrics when IPFS is
simply not there.
This fixes two bugs. First, the "blockPut response CID does not match the
multihash" warning was coming up when it shouldn't. Particularly, the
multipart reader called Node() several times for the same block, resulting in
CIDs been removed from the Seen set, and causing the warning when there were
several blocks (usually the empty dir block).
This also means we were counting Blockputs (and total data added) wrong in the
metrics, double-counting some blocks as these were recorded in Node() calls.
The fix makes the tracking in Next(), which is only called once for each
block. To avoid timing issues between Block reads from the channel and
blockput responses, the Seen set now stores how many times we have seen a
block. Thus a duplicated block that will get two BlockPut responses will not
trigger a warning regardless of the time when those responses arrive.
Fixes#1706.
It has been observed that some peers have a growing number of goroutines,
usually stuck in go-libp2p-gorpc.MultiStream() function, which is waiting to
read items from the arguments channel.
We suspect this is due to aborted /add requests. In situations when the add
request is aborted or fails, Finalize() is never called and the blocks channel
stays open, so MultiStream() can never exit, and the BlockStreamer can never
stop streaming etc.
As a fix, we added the requirement to call Close() when we stop using a
ClusterDAGService (error or not). This should ensure that the blocks channel
is always closed and not just on Finalize().