This commit introduces an api.Cid type and replaces the usage of cid.Cid
everywhere.
The main motivation here is to override MarshalJSON so that Cids are
JSON-ified as '"Qm...."' instead of '{ "/": "Qm....." }', as this "ipld"
representation of IDs is horrible to work with, and our APIs are not issuing
IPLD objects to start with.
Unfortunately, there is no way to do this cleanly, and the best way is to just
switch everything to our own type.
This commit continues the work of taking advantage of the streaming
capabilities in go-libp2p-gorpc by improving the ipfsconnector and pintracker
components.
StatusAll and RecoverAll methods are now streaming methods, with the REST API
output changing accordingly to produce a stream of GlobalPinInfos rather than
a json array.
pin/ls request to the ipfs daemon now use ?stream=true and avoid having to
load the full pinset map on memory. StatusAllLocal and RecoverAllLocal
requests to the pin tracker stream all the way and no longer store the full
pinset, and the full PinInfo status slice before sending it out.
We have additionally switched to a pattern where streaming methods receive the
channel as an argument, allowing the caller to decide on whether to launch a
goroutine, do buffering etc.
This commit introduces the new go-libp2p-gorpc streaming capabilities for
Cluster. The main aim is to work towards heavily reducing memory usage when
working with very large pinsets.
As a side-effect, it takes the chance to revampt all types for all public
methods so that pointers to static what should be static objects are not used
anymore. This should heavily reduce heap allocations and GC activity.
The main change is that state.List now returns a channel from which to read
the pins, rather than pins being all loaded into a huge slice.
Things reading pins have been all updated to iterate on the channel rather
than on the slice. The full pinset is no longer fully loaded onto memory for
things that run regularly like StateSync().
Additionally, the /allocations endpoint of the rest API no longer returns an
array of pins, but rather streams json-encoded pin objects directly. This
change has extended to the restapi client (which puts pins into a channel as
they arrive) and to ipfs-cluster-ctl.
There are still pending improvements like StatusAll() calls which should also
stream responses, and specially BlockPut calls which should stream blocks
directly into IPFS on a single call.
These are coming up in future commits.
The go-ds-crdt upgrade disables multi-head-processing by default again. We see
this causes a lot of branching.
We however increase the number of workers. With large deltas, it may be
possible that all the 5 workers are busy downloading a delta or processing
them, while we potentially have hundreds of children in the DAG. Thus it is
not bad to attempt to do more things in parallel.
This adds batching support to crdt-consensus per #1008 . The crdt component can now take
advantage of the BatchingState, which uses the batching-crdt datastore. In
batching mode, the crdt datastore groups any Add and Delete operations
in a single delta (instead of just 1, as it does by default).
Batching is enabled in the crdt configuration section by setting MaxBatchSize
**and** MaxBatchAge. These two settings control when a batch is committed,
either by reaching a maximum number of pin/unpin operations, or by reaching a
maximum age.
Batching unlocks large pin-ingestion scalability for clusters, but should be
set according to expected work loads. An additional, hidden MaxQueueSize
parameter provides the ability to perform backpressure on Pin/Unpin
requests. When more than MaxQueueSize pin/unpins are waiting to be included in
a batch, the LogPin/LogUnpin operations will fail. If this happens, it is
means cluster cannot commit batches as fast as pins are arriving. Thus,
MaxQueueSize should be increase (to accommodate bursts), or the batch size
increased (to perform less commits and hopefully handle the requests faster).
Note that the underlying CRDT library will auto-commit when batch deltas reach
1MB of size.
Currently, it was not looking at the signer of the message, but at the
peer that relayed it, to verify the validity of the message.
This caused that under certain gossip graph configurations, some nodes would only
get messages via untrusted peers, and thus be unable to sync the chain.
trusted_peers and peer_addresses were appended the same info twice when
reparsing the config.
Once when parsing json and once when re-parsing with env-vars.
Add necessary validators.
There is currently no way to run a dynamic-mode DHT that supports
LAN-based peers. Therefore for the time-being we are setting the
dht.Mode to "server".
* Libp2p protectors no longer needed, use PSK directly
* Generate cluster 32-byte secret here (helper gone from pnet)
* Switch to go-log/v2 in all places
* DHT bootstrapping not needed. Adjust DHT options for tests.
* Do not rely on dissappeared CidToDsKey and DsKeyToCid functions fro dshelp.
* Disable QUIC (does not support private networks)
* Fix tests: autodiscovery started working properly
This removes mappintracker and sets stateless tracker as the default (and only) pintracker component.
Because the stateless tracker matches the cluster state with only ongoing operations being kept on memory, and additional information provided by ipfs-pin-ls, syncing operations are not necessary. Therefore the Sync/SyncAll operations are removed cluster-wide.
* Update go-ds-crdt to 0.1.5 which adds a return statement in case of error fetching a node.
* Increase DAG-Get timeout to 2 minutes
* Downgrade go-bitswap to 0.1.6.
With this commit, cluster peer will observe on events of peer removal
from cluster. On occurence of the event, the cluster peer will clear
the removed peer from its peerstore.
* Init should take a list of peers
This commit adds `--peers` option to `ipfs-cluster-service init`
`ipfs-cluster-service init --peers <multiaddress,multiaddress>`
- Adds and writes the given peers to the peerstore file
- For raft config section, adds the peer IDs to the `init_peerset`
- For crdt config section, add the peer IDs to the `trusted_peers`
Specifying "*" as part of "trusted_peers" in the configuration will
result in trusting all peers.
This is useful for private clusters where we don't want to list every
peer ID in the config.