This introduces a pin/update operation which allows to Pin a new item to
cluster indicating that said pin is an update to an already-existing pin.
When this is the case, all the configuration for the existing pin is copied to
the new one (including allocations). The IPFS connector will then trigger
pin/update directly in IPFS, allowing an efficient pinning based on
DAG-differences. Since the allocations where the same for both pins,
the pin/update can proceed.
PinUpdate does not unpin the previous pin (it is not possible to do this
atomically in cluster like it happens in IPFS). The user can manually do it
after the pin/update is done.
Internally, after a lot of deliberations on what the optimal way for this is,
I opted for adding a `PinUpdate` option to the `PinOptions` type (carries the
CID to update from). In order to carry this option from the REST API to the
IPFS Connector, it is serialized in the Protobuf (and stored in the
datastore). There is no other way to do this in a simple fashion since the Pin
object is piece of information that is sent around.
Additionally, making it a PinOption plays well with the Pin/PinPath APIs which
need little changes. Effectively, you are pinning a new thing. You are just
indicating that it should be configured from an existing one.
Fixes#732
With this commit
- If cid in `DELETE /pins/{cid}` isn't part of the pinset, it would
return 404
- If path in `DELETE /pins/{keyType}/{path}` resolves to a cid that
isn't part of the pinset, it would return 404
* Improve pin/unpin method signatures:
These changes the following Cluster Go API methods:
* -> Cluster.Pin(ctx, cid, options) (pin, error)
* -> Cluster.Unpin(ctx, cid) (pin, error)
* -> Cluster.PinPath(ctx, path, opts) (pin,error)
Pin and Unpin now return the pinned object.
The signature of the methods now matches that of the API Client, is clearer as
to what options the user can set and is aligned with PinPath, UnpinPath, which
returned pin methods.
The REST API now returns the Pinned/Unpinned object rather than 204-Accepted.
This was necessary for a cleaner pin/update approach, which I'm working on in
another branch.
Most of the changes here are updating tests to the new signatures
* Adapt load-balancing client to new Pin/Unpin signatures
* cluster.go: Fix typo
Co-Authored-By: Kishan Sagathiya <kishansagathiya@gmail.com>
* cluster.go: Fix typo
Co-Authored-By: Kishan Sagathiya <kishansagathiya@gmail.com>
This adds a LoadBalancing rest client implementation which is initialized with a set of client configurations and can use two strategies: failover and roundrobin (more strategies can be added by implementing the LBStrategy interface).
Those regex were compiled with each call to the function. As it's
called by PinLs, this resulted in a significant amount of memory used,
500MB in my case after a single call.
License: MIT
Signed-off-by: Michael Muré <batolettre@gmail.com>
Currently, unless doing Join() (--bootstrap), we do not connect to any peers on startup.
We however loaded up the peerstore file and Raft will automatically connect
older peers to figure out who is the leader etc. DHT bootstrap, after Raft
was working, did the rest.
For CRDTs we need to connect to people on a normal boot as otherwise, unless
bootstrapping, this does not happen, even if the peerstore contains known peers.
This introduces a number of changes:
* Move peerstore file management back inside the Cluster component, which was
already in charge of saving the peerstore file.
* We keep saving all "known addresses" but we load them with a non permanent
TTL, so that there will be clean up of peers we're not connected to for long.
* "Bootstrap" (connect) to a small number of peers during Cluster component creation.
* Bootstrap the DHT asap after this, so that other cluster components can
initialize with a working peer discovery mechanism.
* CRDT Trust() method will now:
* Protect the trusted Peer ID in the conn manager
* Give top priority in the PeerManager to that Peer (see below)
* Mark addresses as permanent in the Peerstore
The PeerManager now attaches priorities to peers when importing them and is
able to order them according to that priority. The result is that peers with
high priority are saved first in the peerstore file. When we load the peerstore
file, the first entries in it are given the highest priority.
This means that during startup we will connect to "trusted peers" first
(because they have been tagged with priority in the previous run and saved at
the top of the list). Once connected to a small number of peers, we let the
DHT bootstrap process in the background do the rest and discover the network.
All this makes the peerstore file a "bootstrap" list for CRDTs and we will attempt
to connect to peers on that list until some of those connections succeed.
TrustedPeers are specified in the configuration. Additional peers
can be added at runtime with Trust/Distrust functions.
Unfortunately we cannot use consensus.PeerAdd as a way to trust a peer as
cluster.PeerAdd+Join can be called by any peer and this calls
consensus.PeerAdd.
The result is consensus.PeerAdd doing a lot in Raft while consensus.Trust does
nothing, while in CRDTs consensus.Trust does something but consensus.PeerAdd
does nothing. But this is more or less consistent.
I had thought of this for a very long time but there were no compelling
reasons to do it. Specifying RPC endpoint permissions becomes however
significantly nicer if each Component is a different RPC Service. This also
fixes some naming issues like having to prefix methods with the component name
to separate them from methods named in the same way in some other component
(Pin and IPFSPin).
The IPFS pin/update endpoint takes two arguments and usually
unpins the first and pins the second. It is a bit more efficient
to do it in a single operation than two separate ones.
This will make the proxy endpoint hijack pin/update requests.
First, the FROM pin is fetched from the state. If present, we
set the options (replication factors, actual allocations) from
that pin to the new one. Then we pin the TO item and proceed
to unpin the FROM item when `unpin` is not false.
We need to support path resolving, just like IPFS, therefore
it was necessary to expose IPFSResolve() via RPC.
Currently `curl -X GET http://localhost:9094/allocations` results in an
empty array, because no filter is provided. Solution to this would be
either 1) to consider filter as all if no filter is provided by the user
or 2) to make the `filter` parameter mandatory and reply with BadRequest
if no filter is provided
This commit makes the default filter as all
This adds a new "crdt" consensus component using go-ds-crdt.
This implies several refactors to fully make cluster consensus-component
independent:
* Delete mapstate and fully adopt dsstate (after people have migrated).
* Return errors from state methods rather than ignoring them.
* Add a new "datastore" modules so that we can configure datastores in the
main configuration like other components.
* Let the consensus components fully define the "state.State". Thus, they do
not receive the state, they receive the storage where we put the state (a
go-datastore).
* Allow to customize how the monitor component obtains Peers() (the current
peerset), including avoiding using the current peerset. At the moment the
crdt consensus uses the monitoring component to define the current peerset.
Therefore the monitor component cannot rely on the consensus component to
produce a peerset.
* Re-factor/re-implementation of "ipfs-cluster-service state"
operations. Includes the dissapearance of the "migrate" one.
The CRDT consensus component defines creates a crdt-datastore (with ipfs-lite)
and uses it to intitialize a dssate. Thus the crdt-store is elegantly
wrapped. Any modifications to the state get automatically replicated to other
peers. We store all the CRDT DAG blocks in the local datastore.
The consensus components only expose a ReadOnly state, as any modifications to
the shared state should happen through them.
DHT and PubSub facilities must now be created outside of Cluster and passed in
so they can be re-used by different components.
Some command were failing with `input isn't valid multihash`, due to
empty peer.ID. This commit omit peer id from IPFSID when empty, thus
avoiding the error state
This was a leftover. For consisency, all CIDs coming out of the API
should have the canonical cid form ( { "/": "cid" } ), otherwise
we are inconsistent.