We call RecoverAll regularly and I noticed it was way slower than it should be.
After all, it should just loop the pinset and enqueued items that are
unexpectedly unpinned or in pin error.
However, at some point we decided that RecoverAll would return information for
all pins, regardless of whether they were recovered or not. This ends up
resulting in a separate Status call for every pin that is already pinned, and
this call hits IPFS. This is pretty bad with big pinsets.
This commit fixes that, we return no state information for pins that are not
touched.
This adds a Timestamp field to the pin objects. This allows to track when they were pinned.
This:
* Allows the pin-tracker to actually show accurate information on when the pin
entered the system for pins that are not part of ongoing operations
(currently it shows time.Now())
* Adds support for reporting timestamp on a pinning services api.
On large pinsets this may take a very long time and prevents metrics and
re-boostrapping from starting, among other things. See bug description.
This lets watchPinset trigger an immediate RecoverAllLocal instead, but this
happens in its own goroutine and should allow everything else to start.
This commit modifies the pintracker StatusAll call to take a status filter.
This allows to skip a PinLs call to ipfs when checking status for items that
are queued, pinning, unpinning or in error. Those status come directly from
the operation tracker. This should result in a significant performance
increase for those calls, particularly in nodes with several hundred thousand
pins and more, where the call to IPFS is very expensive.
A new TrackerStatusUnexpectedlyUnpinned status has been introduce to
differentiate between pin errors (tracked by the operation tracker) and "lost"
items (which before were pin errors too). This new status is handled by the
Recover() operation as before.
StatusCID() and Peers() are calls that should return relatively quickly.
If they don't, it means they are hanging for some reason. We cannot let the
whole Multicall request hang waiting on a single peer, therefore, set a
hardcoded 15 second deadline for both.
The Allocations of a pin that has been added with default replication factor
are kept even when the replication factor turns out to be -1.
This resulted in the Status(cid) code skipping calls to a number of peers
and setting the pin directly as REMOTE.
The fix, on one side makes sure Allocations is always nil when the replication
factor is -1. On the other size, lets the globalPinInfoCid method check the
replication factor value, rather than the number of allocations to decide if
any nodes are bound to be remote.
On the plus side, the pin tracker used the IsRemotePin method, which uses the
replication factor, so things were pinned even if the Status(cid) method shows
them as remote.
* feat: make MDNS failure on start non-fatal
- if discovery.NewMdnsService errors on start, show warning "error message, MDNS service will be disabled"
- same as setting MDNSInterval to 0: NewCluster is still created and daemon runs, but without MDNS
GlobalPinInfo objects carried redundant information (Cid, Peer) that takes
space and time to serialize.
This has been addressed by having GlobalPinInfo embed PinInfoShort rather than
PinInfo. This new types ommits redundant fields.
We cannot rely on current pin allocations anymore since the allocations
may be follower peers that do not even handle alerts.
Instead, we assume that we are a trusted peer (because we are not in follower
mode), get all other trusted peers and act if we are the XOR-closest to the
CID.
This also means that replication-factor = 1 pins can be recovered too.
i.e. a direct pin can be repinned as recursive, but a recursive
pin cannot be pinned as direct (this fails in IPFS too).
Additionally, save a couple of calls to the datastore by obtaining the
existing pin only once.
Add necessary validators.
There is currently no way to run a dynamic-mode DHT that supports
LAN-based peers. Therefore for the time-being we are setting the
dht.Mode to "server".
* Libp2p protectors no longer needed, use PSK directly
* Generate cluster 32-byte secret here (helper gone from pnet)
* Switch to go-log/v2 in all places
* DHT bootstrapping not needed. Adjust DHT options for tests.
* Do not rely on dissappeared CidToDsKey and DsKeyToCid functions fro dshelp.
* Disable QUIC (does not support private networks)
* Fix tests: autodiscovery started working properly
StateSync() used to take care of this by issuing Track() calls. But this
functionality was removed.
This starts returning items that are in the state but not on IPFS as
PIN_ERRORs. It ensures that the Recover methods see them so that they can
trigger repinnings for missing items. This covers cases where the user
modifies the ipfs state manually, or resets the ipfs daemon but keeps the
cluster state, and cases where cluster was stopped half-way through a pinning.
This removes mappintracker and sets stateless tracker as the default (and only) pintracker component.
Because the stateless tracker matches the cluster state with only ongoing operations being kept on memory, and additional information provided by ipfs-pin-ls, syncing operations are not necessary. Therefore the Sync/SyncAll operations are removed cluster-wide.
This introduces the possiblity of running Cluster with multiple informer components. The first one in the list is the used for "allocations". This is just groundwork for working with several informers in the future.
This adds a PeerAddresses entry to the main cluster configuration.
The peer will ingest and potentially connect to those peer addresses during
the start (similarly to the ones in the peerstore).
This allows to provide "bootstrap" (as in "peers we connect to") addresses
directly in the configuration, which is useful when distributing a single
configuration template that will allow a cluster peer to know where to connect
on the first boot.