It has been observed that some peers have a growing number of goroutines,
usually stuck in go-libp2p-gorpc.MultiStream() function, which is waiting to
read items from the arguments channel.
We suspect this is due to aborted /add requests. In situations when the add
request is aborted or fails, Finalize() is never called and the blocks channel
stays open, so MultiStream() can never exit, and the BlockStreamer can never
stop streaming etc.
As a fix, we added the requirement to call Close() when we stop using a
ClusterDAGService (error or not). This should ensure that the blocks channel
is always closed and not just on Finalize().
This commit introduces an api.Cid type and replaces the usage of cid.Cid
everywhere.
The main motivation here is to override MarshalJSON so that Cids are
JSON-ified as '"Qm...."' instead of '{ "/": "Qm....." }', as this "ipld"
representation of IDs is horrible to work with, and our APIs are not issuing
IPLD objects to start with.
Unfortunately, there is no way to do this cleanly, and the best way is to just
switch everything to our own type.
The ipfsadder actually ends up DAG-putting some nodes several times
(i.e. non-leafs, root)... but usually one after the other. This prevents that
and avoids sending the same data multiple times over the wire (not a good
thing to 3x a small payload because of this).
This commit fixes#810 and adds block streaming to the final destinations when
adding. This should add major performance gains when adding data to clusters.
Before, everytime cluster issued a block, it was broadcasted individually to
all destinations (new libp2p stream), where it was block/put to IPFS (a single
block/put http roundtrip per block).
Now, blocks are streamed all the way from the adder module to the ipfs daemon,
by making every block as it arrives a single part in a multipart block/put
request.
Before, block-broadcast needed to wait for all destinations to finish in order
to process the next block. Now, buffers allow some destinations to be faster
than others while sending and receiving blocks.
Before, if a block put request failed to be broadcasted everywhere, an error
would happen at that moment.
Now, we keep streaming until the end and only then report any errors. The
operation succeeds as long as at least one stream finished successfully.
Errors block/putting to IPFS will not abort streams. Instead, subsequent
blocks are retried with a new request, although the method will return an
error when the stream finishes if there were errors at any point.
This commit introduces the new go-libp2p-gorpc streaming capabilities for
Cluster. The main aim is to work towards heavily reducing memory usage when
working with very large pinsets.
As a side-effect, it takes the chance to revampt all types for all public
methods so that pointers to static what should be static objects are not used
anymore. This should heavily reduce heap allocations and GC activity.
The main change is that state.List now returns a channel from which to read
the pins, rather than pins being all loaded into a huge slice.
Things reading pins have been all updated to iterate on the channel rather
than on the slice. The full pinset is no longer fully loaded onto memory for
things that run regularly like StateSync().
Additionally, the /allocations endpoint of the rest API no longer returns an
array of pins, but rather streams json-encoded pin objects directly. This
change has extended to the restapi client (which puts pins into a channel as
they arrive) and to ipfs-cluster-ctl.
There are still pending improvements like StatusAll() calls which should also
stream responses, and specially BlockPut calls which should stream blocks
directly into IPFS on a single call.
These are coming up in future commits.
This adds a new allocations field to add response objects which
provides the cluster peers to which the content has been allocated.
In the case of sharded dags, it provides peers for the current shard.
This does 3 things:
- Add a NoPin option to the adder. When set to true, the adding process does not
send a pin in the end.
- When user-allocations are set and local=true happens, we do not overwrite
the allocations returned by the allocator to include the local peer
anymore, as this could alter user-allocations.
- Some code improvement (remove pointers).
It is not good to add something locally only to pin it somewhere else:
* The locally used space is not GCed automatically or anything and is lost
* Pay the penalty of having to copy things somewhere else
This commit adds a new add option: "format".
This option specifies how IPFS Cluster is expected to build the DAG when
adding content. By default, it takes a "unixfs", which chunks and DAG-ifies as
it did before, resulting in a UnixFSv1 DAG.
Alternatively, it can be set to "car". In this case, Cluster will directly
read blocks from the CAR file and add them.
Adding CAR files or doing normal processing is independent from letting
cluster do sharding or not. If sharding is ever enabled, Cluster could
potentially shard a large CAR file among peers.
Currently, importing CAR files is limited to a single CAR file with a single
root (the one that is pinned). Future iterations may support multiple CARs
and/or multiple roots by transparently wrapping them.
* Libp2p protectors no longer needed, use PSK directly
* Generate cluster 32-byte secret here (helper gone from pnet)
* Switch to go-log/v2 in all places
* DHT bootstrapping not needed. Adjust DHT options for tests.
* Do not rely on dissappeared CidToDsKey and DsKeyToCid functions fro dshelp.
* Disable QUIC (does not support private networks)
* Fix tests: autodiscovery started working properly
Currently we were only specifying the block format. When adding with
a custom hash function, even though we produced the right cids, IPFS
did not know the hash function and ended up storing them using SHA256.
Additionally, since NodeWithMeta serializes the CID, we do not need
to carry a Format parameter (which specifies the Codec): it is already
embedded.
Tests have been added and BlockPut in ipfshttp now checks that the
response's CID matches the data sent. This will catch errors like
what was happening, but also any data corruption between cluster and
IPFS during the block upload.
This commit introduces `--local` option for `ctl add` which would add
content only the local ipfs peer and then pin it according to pin
options (fetching from the local peer)
For achieving this, a new local dag service is introduced
Local dagservice is not really a local as it add to other peers as well.
It is a dagservice that does not perform sharding. Since we are going to
have a local dagservice(one that adds only to the local peer), renaming
this `single` dagservice
* Improve pin/unpin method signatures:
These changes the following Cluster Go API methods:
* -> Cluster.Pin(ctx, cid, options) (pin, error)
* -> Cluster.Unpin(ctx, cid) (pin, error)
* -> Cluster.PinPath(ctx, path, opts) (pin,error)
Pin and Unpin now return the pinned object.
The signature of the methods now matches that of the API Client, is clearer as
to what options the user can set and is aligned with PinPath, UnpinPath, which
returned pin methods.
The REST API now returns the Pinned/Unpinned object rather than 204-Accepted.
This was necessary for a cleaner pin/update approach, which I'm working on in
another branch.
Most of the changes here are updating tests to the new signatures
* Adapt load-balancing client to new Pin/Unpin signatures
* cluster.go: Fix typo
Co-Authored-By: Kishan Sagathiya <kishansagathiya@gmail.com>
* cluster.go: Fix typo
Co-Authored-By: Kishan Sagathiya <kishansagathiya@gmail.com>
I had thought of this for a very long time but there were no compelling
reasons to do it. Specifying RPC endpoint permissions becomes however
significantly nicer if each Component is a different RPC Service. This also
fixes some naming issues like having to prefix methods with the component name
to separate them from methods named in the same way in some other component
(Pin and IPFSPin).
This was a leftover. For consisency, all CIDs coming out of the API
should have the canonical cid form ( { "/": "cid" } ), otherwise
we are inconsistent.
This takes advantange of the latest features in go-cid, peer.ID and
go-multiaddr and makes the Go types serializable by default.
This means we no longer need to copy between Pin <-> PinSerial, or ID <->
IDSerial etc. We can now efficiently binary-encode these types using short
field keys and without parsing/stringifying (in many cases it just a cast).
We still get the same json output as before (with minor modifications for
Cids).
This should greatly improve Cluster performance and memory usage when dealing
with large collections of items.
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>