We are propagating the wrong context (mostly from the Cluster top-level
methods). This makes that request cancellations (and cancellations of the
associated contexts) are not propagated to many methods, and can result in
deadlocks when an operation that is holding a lock is not aborted.
This affects for example the operation tracker. Getting all operations from
the tracker relies on someone reading from the out channel, or on the context
being cancelled. When a request is aborted in the middle of the response, and
the context is not cancelled, everything that wants to list operations would
become deadlocked, including operations that need write locks like
TrackNewOperation.
This fixes it.
This fixes a bug in API code that made it return 204-No content when the RPC
methods failed with an error before any items were returned on the channel.
Unfortunately we were not paying attentions to errors while rpc-streaming pins
in the pintracker. The result is that the StatusAll operation would list all
the pins as unexpectedly unpinned when ipfs is offline, and this would result in
recover/requeing operations for all pins when ipfs is offline.
This commits changes the behaviour so that if IPFS Pin/ls has resulted in an
error, then the StatusAll operation cannot complete at all.
The operation tracker was not setting some fields correctly when producing
PinInfo objects. Additionally, recover operations were submitted with empty
pin objects, which resulted in the status for pins sent on recover operations
to be missing fields.
This new component broadcasts metrics about the current size of the pinqueue,
which can in turn be used to inform allocations.
It has a weight_bucket_size option that serves to divide the actual size by a
given factor. This allows considering peers with similar queue sizes to have
the same weight.
Additionally, some changes have been made to the balanced allocator so that a
combination of tags, pinqueue sizes and free-spaces can be used. When
allocating by [<tag>, pinqueue, freespace], the allocator will prioritize
choosing peers with the smallest pin queue weight first, and of those with the
same weight, it will allocate based on freespace.
We were currently tracking this metrics as a counter (SUM). The number is
good, but conceptually this is more a gauge (LastValue), given it can go down.
Thus we switch it by tracking the aggregation numbers directy in the operation
tracker.
This commit introduces an api.Cid type and replaces the usage of cid.Cid
everywhere.
The main motivation here is to override MarshalJSON so that Cids are
JSON-ified as '"Qm...."' instead of '{ "/": "Qm....." }', as this "ipld"
representation of IDs is horrible to work with, and our APIs are not issuing
IPLD objects to start with.
Unfortunately, there is no way to do this cleanly, and the best way is to just
switch everything to our own type.
This commit continues the work of taking advantage of the streaming
capabilities in go-libp2p-gorpc by improving the ipfsconnector and pintracker
components.
StatusAll and RecoverAll methods are now streaming methods, with the REST API
output changing accordingly to produce a stream of GlobalPinInfos rather than
a json array.
pin/ls request to the ipfs daemon now use ?stream=true and avoid having to
load the full pinset map on memory. StatusAllLocal and RecoverAllLocal
requests to the pin tracker stream all the way and no longer store the full
pinset, and the full PinInfo status slice before sending it out.
We have additionally switched to a pattern where streaming methods receive the
channel as an argument, allowing the caller to decide on whether to launch a
goroutine, do buffering etc.
This commit introduces the new go-libp2p-gorpc streaming capabilities for
Cluster. The main aim is to work towards heavily reducing memory usage when
working with very large pinsets.
As a side-effect, it takes the chance to revampt all types for all public
methods so that pointers to static what should be static objects are not used
anymore. This should heavily reduce heap allocations and GC activity.
The main change is that state.List now returns a channel from which to read
the pins, rather than pins being all loaded into a huge slice.
Things reading pins have been all updated to iterate on the channel rather
than on the slice. The full pinset is no longer fully loaded onto memory for
things that run regularly like StateSync().
Additionally, the /allocations endpoint of the rest API no longer returns an
array of pins, but rather streams json-encoded pin objects directly. This
change has extended to the restapi client (which puts pins into a channel as
they arrive) and to ipfs-cluster-ctl.
There are still pending improvements like StatusAll() calls which should also
stream responses, and specially BlockPut calls which should stream blocks
directly into IPFS on a single call.
These are coming up in future commits.
pinsvcapi: do not cache peer information here as all the needed information is
in the status objects.
This adds ipfs_addresses as a field broadcasted with the ping metrics.
This will facilitate building outputs for the Pinning Services API, saving a
round trip to query the cluster State, since all the needed information
already comes from the PinTracker, which has already accessed the state.
Since the pintracker already included a state attribute (Name), we are simply
going down that path.
Fixes#1554
Fixes: peer names unset for remote peers
This adds an IPFS field to pin status information (PinInfoShort).
It has not been easy to add this, given that the IPFS ID is something that
comes from outside of cluster (unlike the peer name). After several tries I
have settled in the following things:
- Use the ping metric to send out peer names and IPFS IDs to the peers in the
cluster.
- Cache the latest known IPFS ID (if IPFS dies we should still be setting
the ID).
- Provide an RPC method for the Pintracker to obtain IPFS ID from the cache.
- Given we now know information for peernames and IPFS IDs from other peers,
we can use that information even if the requests to them error or we are not
contacting (i.e. peers allocated as remote are not queried for status). We can
use the information from the last received ping metric.
- This means we should keep metrics around even if peers go away, at least for
a while rather than deleting them as soon as we detect that they have expired.
Puting it all together we now have a system to gossip peer information around on top
of the ping metrics.
We call RecoverAll regularly and I noticed it was way slower than it should be.
After all, it should just loop the pinset and enqueued items that are
unexpectedly unpinned or in pin error.
However, at some point we decided that RecoverAll would return information for
all pins, regardless of whether they were recovered or not. This ends up
resulting in a separate Status call for every pin that is already pinned, and
this call hits IPFS. This is pretty bad with big pinsets.
This commit fixes that, we return no state information for pins that are not
touched.
This must has been an oversight. We added a special unexpectedly_unpinned
status so that we could just return things from the operation tracker when
filtering by pin_error. unexpectedly_unpinned are things that we have no
operation for but are unpinned on ipfs.
Status however still returned a pin_error state for these, even though,
StatusAll would not show them when filtering with pin_error, and would show
them as unexpectedly_unpinned otherwise.
Since Recover correctly repins pin_error and unexpectedly_unpinned, this
change has no further consequences.
Shutting down the cluster while a recover operation is ongoing resulted in
each remaining pin in the recover loop producing an error in the logs, as the
loop kept going even though compontents were already shutdown.
With 8 million items, this meant a lot of log messages, and a shutdown delay
that forced the killing of the process in most cases.
The recover loop now exits when the component's context is cancelled.
This adds a Timestamp field to the pin objects. This allows to track when they were pinned.
This:
* Allows the pin-tracker to actually show accurate information on when the pin
entered the system for pins that are not part of ongoing operations
(currently it shows time.Now())
* Adds support for reporting timestamp on a pinning services api.
This is a follow up to #1360 which further optimizes StatusAll calls by
avoiding listing and filtering the cluster state when requesting status for
operations that should be direclty in the operation tracker because they are
ongoing (queued, pinning, unpinning, error).
This commit modifies the pintracker StatusAll call to take a status filter.
This allows to skip a PinLs call to ipfs when checking status for items that
are queued, pinning, unpinning or in error. Those status come directly from
the operation tracker. This should result in a significant performance
increase for those calls, particularly in nodes with several hundred thousand
pins and more, where the call to IPFS is very expensive.
A new TrackerStatusUnexpectedlyUnpinned status has been introduce to
differentiate between pin errors (tracked by the operation tracker) and "lost"
items (which before were pin errors too). This new status is handled by the
Recover() operation as before.
GlobalPinInfo objects carried redundant information (Cid, Peer) that takes
space and time to serialize.
This has been addressed by having GlobalPinInfo embed PinInfoShort rather than
PinInfo. This new types ommits redundant fields.
Receive the full pin object so that it can decide whether to check
for recursive or direct pins directly.
Additionally, unpin will not check for the pin presence anymore and
simply trigger unpins (ignoring errors)