Because we are adding blocks on a single call, and we choose the format
parameter based on the prefix of the first block, IPFS will return block CIDs
based on that option.
This caused warnings when adding content that has multiple CID prefixes: for
example, any cid-version=1 file will include both dag-pb CIDs and raw
CIDs. Since the first block is usually a leave, IPFS will only return
raw-cids, and cause a warning because of the CID-mistmatch.
This fixes things by comparing multihashes only.
But! We might be writing blocks with the wrong CID and then the good CID won't
work!
Correct, we might, in some corner cases.
In go-ipfs >= 0.12.0, all blocks are addressed by multihash so CID prefixes
are irrelevant. This problem does not exist in that case.
In go-ipfs < 0.12.0, if a read for a CIDv1 DAG-PB fails, it is retried as it
it was raw. This means that if we wrote something with cidv1/format=raw, that
should have been a cidv1/format=dag-pb, the read will still work. That covers
some common cases (i.e. adding with cid-version=1) because the first block
should be a raw-leaf. Default-params (cidv0) is not affected since everything
is raw multihashes. However, there are still possible CAR layouts etc. where
cluster will write blocks wrongly to older IPFS versions.
This commit introduces an api.Cid type and replaces the usage of cid.Cid
everywhere.
The main motivation here is to override MarshalJSON so that Cids are
JSON-ified as '"Qm...."' instead of '{ "/": "Qm....." }', as this "ipld"
representation of IDs is horrible to work with, and our APIs are not issuing
IPLD objects to start with.
Unfortunately, there is no way to do this cleanly, and the best way is to just
switch everything to our own type.
The ipfsadder actually ends up DAG-putting some nodes several times
(i.e. non-leafs, root)... but usually one after the other. This prevents that
and avoids sending the same data multiple times over the wire (not a good
thing to 3x a small payload because of this).
go-libp2p-connmgr does not build on some architectures due to some
dependencies (https://github.com/libp2p/go-libp2p-connmgr/pull/103). By
extension, go-libp2p does not build when using the connmgr.
The fix is in master.
This commit fixes#810 and adds block streaming to the final destinations when
adding. This should add major performance gains when adding data to clusters.
Before, everytime cluster issued a block, it was broadcasted individually to
all destinations (new libp2p stream), where it was block/put to IPFS (a single
block/put http roundtrip per block).
Now, blocks are streamed all the way from the adder module to the ipfs daemon,
by making every block as it arrives a single part in a multipart block/put
request.
Before, block-broadcast needed to wait for all destinations to finish in order
to process the next block. Now, buffers allow some destinations to be faster
than others while sending and receiving blocks.
Before, if a block put request failed to be broadcasted everywhere, an error
would happen at that moment.
Now, we keep streaming until the end and only then report any errors. The
operation succeeds as long as at least one stream finished successfully.
Errors block/putting to IPFS will not abort streams. Instead, subsequent
blocks are retried with a new request, although the method will return an
error when the stream finishes if there were errors at any point.
This commit makes all the changes to make Peers() a streaming call.
While Peers is usually a non problematic call, for consistency, all calls
returning collections assembled through broadcast to cluster peers are now
streaming calls.
This commit continues the work of taking advantage of the streaming
capabilities in go-libp2p-gorpc by improving the ipfsconnector and pintracker
components.
StatusAll and RecoverAll methods are now streaming methods, with the REST API
output changing accordingly to produce a stream of GlobalPinInfos rather than
a json array.
pin/ls request to the ipfs daemon now use ?stream=true and avoid having to
load the full pinset map on memory. StatusAllLocal and RecoverAllLocal
requests to the pin tracker stream all the way and no longer store the full
pinset, and the full PinInfo status slice before sending it out.
We have additionally switched to a pattern where streaming methods receive the
channel as an argument, allowing the caller to decide on whether to launch a
goroutine, do buffering etc.