This commit introduces an api.Cid type and replaces the usage of cid.Cid
everywhere.
The main motivation here is to override MarshalJSON so that Cids are
JSON-ified as '"Qm...."' instead of '{ "/": "Qm....." }', as this "ipld"
representation of IDs is horrible to work with, and our APIs are not issuing
IPLD objects to start with.
Unfortunately, there is no way to do this cleanly, and the best way is to just
switch everything to our own type.
This commit continues the work of taking advantage of the streaming
capabilities in go-libp2p-gorpc by improving the ipfsconnector and pintracker
components.
StatusAll and RecoverAll methods are now streaming methods, with the REST API
output changing accordingly to produce a stream of GlobalPinInfos rather than
a json array.
pin/ls request to the ipfs daemon now use ?stream=true and avoid having to
load the full pinset map on memory. StatusAllLocal and RecoverAllLocal
requests to the pin tracker stream all the way and no longer store the full
pinset, and the full PinInfo status slice before sending it out.
We have additionally switched to a pattern where streaming methods receive the
channel as an argument, allowing the caller to decide on whether to launch a
goroutine, do buffering etc.
This commit introduces the new go-libp2p-gorpc streaming capabilities for
Cluster. The main aim is to work towards heavily reducing memory usage when
working with very large pinsets.
As a side-effect, it takes the chance to revampt all types for all public
methods so that pointers to static what should be static objects are not used
anymore. This should heavily reduce heap allocations and GC activity.
The main change is that state.List now returns a channel from which to read
the pins, rather than pins being all loaded into a huge slice.
Things reading pins have been all updated to iterate on the channel rather
than on the slice. The full pinset is no longer fully loaded onto memory for
things that run regularly like StateSync().
Additionally, the /allocations endpoint of the rest API no longer returns an
array of pins, but rather streams json-encoded pin objects directly. This
change has extended to the restapi client (which puts pins into a channel as
they arrive) and to ipfs-cluster-ctl.
There are still pending improvements like StatusAll() calls which should also
stream responses, and specially BlockPut calls which should stream blocks
directly into IPFS on a single call.
These are coming up in future commits.
* Libp2p protectors no longer needed, use PSK directly
* Generate cluster 32-byte secret here (helper gone from pnet)
* Switch to go-log/v2 in all places
* DHT bootstrapping not needed. Adjust DHT options for tests.
* Do not rely on dissappeared CidToDsKey and DsKeyToCid functions fro dshelp.
* Disable QUIC (does not support private networks)
* Fix tests: autodiscovery started working properly
This adds a new PinOption: ExpireAt.
The StateSync ticker will check and unpin expired pins from the Cluster.
ipfs-cluster-ctl supports an "expire-in" which gives a duration.
This adds a new "crdt" consensus component using go-ds-crdt.
This implies several refactors to fully make cluster consensus-component
independent:
* Delete mapstate and fully adopt dsstate (after people have migrated).
* Return errors from state methods rather than ignoring them.
* Add a new "datastore" modules so that we can configure datastores in the
main configuration like other components.
* Let the consensus components fully define the "state.State". Thus, they do
not receive the state, they receive the storage where we put the state (a
go-datastore).
* Allow to customize how the monitor component obtains Peers() (the current
peerset), including avoiding using the current peerset. At the moment the
crdt consensus uses the monitoring component to define the current peerset.
Therefore the monitor component cannot rely on the consensus component to
produce a peerset.
* Re-factor/re-implementation of "ipfs-cluster-service state"
operations. Includes the dissapearance of the "migrate" one.
The CRDT consensus component defines creates a crdt-datastore (with ipfs-lite)
and uses it to intitialize a dssate. Thus the crdt-store is elegantly
wrapped. Any modifications to the state get automatically replicated to other
peers. We store all the CRDT DAG blocks in the local datastore.
The consensus components only expose a ReadOnly state, as any modifications to
the shared state should happen through them.
DHT and PubSub facilities must now be created outside of Cluster and passed in
so they can be re-used by different components.
This takes advantange of the latest features in go-cid, peer.ID and
go-multiaddr and makes the Go types serializable by default.
This means we no longer need to copy between Pin <-> PinSerial, or ID <->
IDSerial etc. We can now efficiently binary-encode these types using short
field keys and without parsing/stringifying (in many cases it just a cast).
We still get the same json output as before (with minor modifications for
Cids).
This should greatly improve Cluster performance and memory usage when dealing
with large collections of items.
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
Additionally, remove persisting the state version to the go-datastore. In the
future versions of the state, there is not a global format anymore (with a
global version). Instead, every pin element can potentially be stored in a
different version.
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
Since the beginning, we have used a Go map to store the shared state (pinset)
in memory. The mapstate knew how to serialize itself so that libp2p-raft would
know how to write to disk when it:
* Saved snapshots of the state on shutdown
* Sent the state to a newcomer peer
hashicorp.Raft assumes an in-memory state which is snapshotted from time to
time and read from disk on boot.
This commit adds a `dsstate` implementation of the state interface using
`go-datastore`. This allows to effortlessly switch to a disk-backed state in
the future (as we will need), and also have at our disposal the different
implementations and utilities of Datastore for fine-tuning (caching, batching
etc.).
`mapstate` has been reworked to use dsstate. Ideally, we would not even need
`mapstate`, as it would suffice to initialize `dsstate` with a
`MapDatastore`. BUT, we still need it separate to be able to auto-migrate to
the new format.
This will be the last migration with the current system. Once this has been
released and users have been able to upgrade we will just remove `mapstate` as
it is now.
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
This commit adds support for OpenCensus tracing
and metrics collection. This required support for
context.Context propogation throughout the cluster
codebase, and in particular, the ipfscluster component
interfaces.
The tracing propogates across RPC and HTTP boundaries.
The current default tracing backend is Jaeger.
The metrics currently exports the metrics exposed by
the opencensus http plugin as well as the pprof metrics
to a prometheus endpoint for scraping.
The current default metrics backend is Prometheus.
Metrics are currently exposed by default due to low
overhead, can be turned off if desired, whereas tracing
is off by default as it has a much higher performance
overhead, though the extent of the performance hit can be
adjusted with smaller sampling rates.
License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
4 PinTypes specify how CID is pinned
Changes to Pin and Unpin to handle different PinTypes
Tests for different PinTypes
Migration for new state format using new Pin datastructures
Visibility of the PinTypes used internally limited by default
License: MIT
Signed-off-by: Wyatt Daviau <wdaviau@cs.stanford.edu>
This ensures that we don't re-pin something which is already correctly
pinned with the same allocations.
It also ensures that we do re-pin something when the replication
factor associated to it changes.
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
This PR replaces ReplicationFactor with ReplicationFactorMax
and ReplicationFactor min.
This allows a CID to be pinned even though the desired
replication factor (max) is not reached, and prevents triggering
re-pinnings when the replication factor has not crossed the
lower threshold (min).
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
Raft will fail to take a snapshot when applied index is
different from the last index. Therefore, we wait for
all updates to be aplied before snapshotting.
If still it doesn't work, we retry a few times.
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
ipfs-cluster-service now has a migration subcommand that upgrades
persistant state snapshots with an out-of-date format version to the
newest version of raft state. If all cluster members shutdown with
consistent state, upgrade ipfs-cluster, and run the state upgrade command,
the new version of cluster will be compatible with persistent storage.
ipfs-cluster now validates its persistent state upon loading it and exits
with a clear error in the case the state format version is not up to date.
Raft snapshotting is enforced on all shutdowns and the json backup is no
longer run. This commit makes use of recent changes to libp2p-raft
allowing raft states to implement their own marshaling strategies. Now
mapstate handles the logic for its (de)serialization. In the interest of
supporting various potential upgrade formats the state serialization
begins with a varint (right now one byte) describing the version.
Some go tests are modified and a go test is added to cover new ipfs-cluster
raft snapshot reading functions. Sharness tests are added to cover the
state upgrade command.
This adds snapshot and restore methods to state and uses the snapshot
one to save a copy of the state when shutting down. Right now, this is
not used for anything else.
Some lines performing a migration, but this is only an idea of how it could
work.
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
This adds a replication_factor query argument to the API
endpoint which allows to set a replication factor per Pin.
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
CidArg used to be an internal name for an argument that carried a Cid.
Now it has surfaced to API level and makes no sense. It is a Pin. It
represents a Pin (Cid, Allocations, Replication Factor)
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
New PeerManager, Allocator, Informer components have been added along
with a new "replication_factor" configuration option.
First, cluster peers collect and push metrics (Informer) to the Cluster
leader regularly. The Informer is an interface that can be implemented
in custom wayts to support custom metrics.
Second, on a pin operation, using the information from the collected metrics,
an Allocator can provide a list of preferences as to where the new pin
should be assigned. The Allocator is an interface allowing to provide
different allocation strategies.
Both Allocator and Informer are Cluster Componenets, and have access
to the RPC API.
The allocations are kept in the shared state. Cluster peer failure
detection is still missing and re-allocation is still missing, although
re-pinning something when a node is down/metrics missing does re-allocate
the pin somewhere else.
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>