Commit Graph

72 Commits

Author SHA1 Message Date
Hector Sanjuan
19d1ab3d3f
Dependency upgrades (#2026)
* Update dependencies

* Update dependencies

* Update tests. Remove Pretty()

* Fix problems with new behaviour of Paths

* Typo
2024-01-30 00:58:28 +01:00
Hector Sanjuan
11124ee224 Fix: repinning does not re-allocate as needed
Long story: Since #1768 there has been a recurring repinning test failure with
Raft consensus.

Per the test, if a pin is allocated to a peer that has been shutdown,
submitting the pin again should re-allocate it to a peer that is still
running.

Investigation on why this test fails and why it fails only in Raft lead to
realizing that this and other similar tests, were passing by chance. The
needed re-allocations were made not by the new submission of the pin, but by
the automatic-repinning feature. The actual resubmitted pin was carrying the
same allocations (one of them being the peer that was down), but it was
silently failing because the RedirectToLeader() code path was using
cc.ctx and hitting the peer that had been shutdown, which caused it to error.

Fixing the context propagation, meant that we would re-overwrite the pin with
the old allocations, thus the actual behaviour did not pass the test.

So, on one side, this fix an number of tests that had not disabled automatic
repinning and was probably getting in the way of things. On the other side,
this removes a condition that prevents re-allocation of pins if they exists
and options have not changed.

I don't fully understand why this was there though, since the Allocate() code
does return the old allocations anyways when they are enough, so it should not
re-allocate randomly. I suspect this was preventing some misbehaviour in the
Allocate() code from the time before it was improved with multiple allocators
etc.
2022-09-27 12:31:24 +02:00
Hector Sanjuan
2286ee73f8 api: return errors on stream response requests with 0 items
This fixes a bug in API code that made it return 204-No content when the RPC
methods failed with an error before any items were returned on the channel.
2022-09-15 16:40:34 +02:00
Hector Sanjuan
12b8ce63ce stateless: abort when ipfs PinLs errors
Unfortunately we were not paying attentions to errors while rpc-streaming pins
in the pintracker. The result is that the StatusAll operation would list all
the pins as unexpectedly unpinned when ipfs is offline, and this would result in
recover/requeing operations for all pins when ipfs is offline.

This commits changes the behaviour so that if IPFS Pin/ls has resulted in an
error, then the StatusAll operation cannot complete at all.
2022-09-15 16:40:34 +02:00
Hector Sanjuan
5452b59a2e
Dependency upgrades (#1755)
* Update go-libp2p to v0.22.0

* Testing with go1.19

* build(deps): bump github.com/multiformats/go-multicodec

Bumps [github.com/multiformats/go-multicodec](https://github.com/multiformats/go-multicodec) from 0.5.0 to 0.6.0.
- [Release notes](https://github.com/multiformats/go-multicodec/releases)
- [Commits](https://github.com/multiformats/go-multicodec/compare/v0.5.0...v0.6.0)

---
updated-dependencies:
- dependency-name: github.com/multiformats/go-multicodec
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* build(deps): bump github.com/ipld/go-car from 0.4.0 to 0.5.0

Bumps [github.com/ipld/go-car](https://github.com/ipld/go-car) from 0.4.0 to 0.5.0.
- [Release notes](https://github.com/ipld/go-car/releases)
- [Commits](https://github.com/ipld/go-car/compare/v0.4.0...v0.5.0)

---
updated-dependencies:
- dependency-name: github.com/ipld/go-car
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* build(deps): bump github.com/prometheus/client_golang

Bumps [github.com/prometheus/client_golang](https://github.com/prometheus/client_golang) from 1.12.2 to 1.13.0.
- [Release notes](https://github.com/prometheus/client_golang/releases)
- [Changelog](https://github.com/prometheus/client_golang/blob/main/CHANGELOG.md)
- [Commits](https://github.com/prometheus/client_golang/compare/v1.12.2...v1.13.0)

---
updated-dependencies:
- dependency-name: github.com/prometheus/client_golang
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* build(deps): bump github.com/hashicorp/go-hclog from 1.2.1 to 1.3.0

Bumps [github.com/hashicorp/go-hclog](https://github.com/hashicorp/go-hclog) from 1.2.1 to 1.3.0.
- [Release notes](https://github.com/hashicorp/go-hclog/releases)
- [Commits](https://github.com/hashicorp/go-hclog/compare/v1.2.1...v1.3.0)

---
updated-dependencies:
- dependency-name: github.com/hashicorp/go-hclog
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* build(deps): bump github.com/ipfs/go-ds-crdt from 0.3.6 to 0.3.7

Bumps [github.com/ipfs/go-ds-crdt](https://github.com/ipfs/go-ds-crdt) from 0.3.6 to 0.3.7.
- [Release notes](https://github.com/ipfs/go-ds-crdt/releases)
- [Commits](https://github.com/ipfs/go-ds-crdt/compare/v0.3.6...v0.3.7)

---
updated-dependencies:
- dependency-name: github.com/ipfs/go-ds-crdt
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* build(deps): bump github.com/urfave/cli/v2 from 2.10.2 to 2.14.1

Bumps [github.com/urfave/cli/v2](https://github.com/urfave/cli) from 2.10.2 to 2.14.1.
- [Release notes](https://github.com/urfave/cli/releases)
- [Changelog](https://github.com/urfave/cli/blob/main/docs/CHANGELOG.md)
- [Commits](https://github.com/urfave/cli/compare/v2.10.2...v2.14.1)

---
updated-dependencies:
- dependency-name: github.com/urfave/cli/v2
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* build(deps): bump github.com/libp2p/go-libp2p-http from 0.3.0 to 0.4.0

Bumps [github.com/libp2p/go-libp2p-http](https://github.com/libp2p/go-libp2p-http) from 0.3.0 to 0.4.0.
- [Release notes](https://github.com/libp2p/go-libp2p-http/releases)
- [Commits](https://github.com/libp2p/go-libp2p-http/compare/v0.3.0...v0.4.0)

---
updated-dependencies:
- dependency-name: github.com/libp2p/go-libp2p-http
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* build(deps): bump github.com/libp2p/go-libp2p-gorpc from 0.4.0 to 0.5.0

Bumps [github.com/libp2p/go-libp2p-gorpc](https://github.com/libp2p/go-libp2p-gorpc) from 0.4.0 to 0.5.0.
- [Release notes](https://github.com/libp2p/go-libp2p-gorpc/releases)
- [Commits](https://github.com/libp2p/go-libp2p-gorpc/compare/v0.4.0...v0.5.0)

---
updated-dependencies:
- dependency-name: github.com/libp2p/go-libp2p-gorpc
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* build(deps): bump contrib.go.opencensus.io/exporter/prometheus

Bumps [contrib.go.opencensus.io/exporter/prometheus](https://github.com/census-ecosystem/opencensus-go-exporter-prometheus) from 0.4.1 to 0.4.2.
- [Release notes](https://github.com/census-ecosystem/opencensus-go-exporter-prometheus/releases)
- [Commits](https://github.com/census-ecosystem/opencensus-go-exporter-prometheus/compare/v0.4.1...v0.4.2)

---
updated-dependencies:
- dependency-name: contrib.go.opencensus.io/exporter/prometheus
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* build(deps): bump github.com/libp2p/go-libp2p-raft from 0.1.8 to 0.2.0

Bumps [github.com/libp2p/go-libp2p-raft](https://github.com/libp2p/go-libp2p-raft) from 0.1.8 to 0.2.0.
- [Release notes](https://github.com/libp2p/go-libp2p-raft/releases)
- [Commits](https://github.com/libp2p/go-libp2p-raft/compare/v0.1.8...v0.2.0)

---
updated-dependencies:
- dependency-name: github.com/libp2p/go-libp2p-raft
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* build(deps): bump github.com/urfave/cli from 1.22.9 to 1.22.10

Bumps [github.com/urfave/cli](https://github.com/urfave/cli) from 1.22.9 to 1.22.10.
- [Release notes](https://github.com/urfave/cli/releases)
- [Changelog](https://github.com/urfave/cli/blob/main/docs/CHANGELOG.md)
- [Commits](https://github.com/urfave/cli/compare/v1.22.9...v1.22.10)

---
updated-dependencies:
- dependency-name: github.com/urfave/cli
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* Fix checker/linter/staticcheck warnings

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-06 16:57:17 +02:00
Hector Sanjuan
f4a445e019 pintracker: fix status objects missing or having wrong fields
The operation tracker was not setting some fields correctly when producing
PinInfo objects. Additionally, recover operations were submitted with empty
pin objects, which resulted in the status for pins sent on recover operations
to be missing fields.
2022-06-20 20:03:34 +02:00
Hector Sanjuan
bc8b65a099 Rename Cancelled to Canceled in operationtracker
Part of fixing misspellings.
2022-06-16 17:43:30 +02:00
Hector Sanjuan
755cebbe0d Enable spell checking and fix spelling errors (using US locale) 2022-06-16 17:43:30 +02:00
Hector Sanjuan
508791b547 Migrate from ipfs/ipfs-cluster to ipfs-cluster/ipfs-cluster
This performs the necessary renamings.
2022-06-16 17:43:30 +02:00
Hector Sanjuan
4daece2b98 Feat: add a new "pinqueue" informer component
This new component broadcasts metrics about the current size of the pinqueue,
which can in turn be used to inform allocations.

It has a weight_bucket_size option that serves to divide the actual size by a
given factor. This allows considering peers with similar queue sizes to have
the same weight.

Additionally, some changes have been made to the balanced allocator so that a
combination of tags, pinqueue sizes and free-spaces can be used. When
allocating by [<tag>, pinqueue, freespace], the allocator will prioritize
choosing peers with the smallest pin queue weight first, and of those with the
same weight, it will allocate based on freespace.
2022-06-16 17:43:29 +02:00
Hector Sanjuan
3169fba9d1 metrics: track total pins, queued, pinning, pin error.
This fixes #1470 and #1187.
2022-04-22 15:57:48 +02:00
Hector Sanjuan
3f07b20f3b pintracker: Deadlock on RecoverAll and StatusAll methods 2022-04-11 15:03:33 +02:00
Hector Sanjuan
a97ed10d0b Adopt api.Cid type - replaces cid.Cid everwhere.
This commit introduces an api.Cid type and replaces the usage of cid.Cid
everywhere.

The main motivation here is to override MarshalJSON so that Cids are
JSON-ified as '"Qm...."' instead of '{ "/": "Qm....." }', as this "ipld"
representation of IDs is horrible to work with, and our APIs are not issuing
IPLD objects to start with.

Unfortunately, there is no way to do this cleanly, and the best way is to just
switch everything to our own type.
2022-04-07 14:27:39 +02:00
Hector Sanjuan
0d73d33ef5 Pintracker: streaming methods
This commit continues the work of taking advantage of the streaming
capabilities in go-libp2p-gorpc by improving the ipfsconnector and pintracker
components.

StatusAll and RecoverAll methods are now streaming methods, with the REST API
output changing accordingly to produce a stream of GlobalPinInfos rather than
a json array.

pin/ls request to the ipfs daemon now use ?stream=true and avoid having to
load the full pinset map on memory. StatusAllLocal and RecoverAllLocal
requests to the pin tracker stream all the way and no longer store the full
pinset, and the full PinInfo status slice before sending it out.

We have additionally switched to a pattern where streaming methods receive the
channel as an argument, allowing the caller to decide on whether to launch a
goroutine, do buffering etc.
2022-03-22 15:38:01 +01:00
Hector Sanjuan
957b3ec278 Sharness: fix tests for new pin ls json output 2022-03-19 12:19:12 +01:00
Hector Sanjuan
9b9d76f92d Pinset streaming and method type revamp
This commit introduces the new go-libp2p-gorpc streaming capabilities for
Cluster. The main aim is to work towards heavily reducing memory usage when
working with very large pinsets.

As a side-effect, it takes the chance to revampt all types for all public
methods so that pointers to static what should be static objects are not used
anymore. This should heavily reduce heap allocations and GC activity.

The main change is that state.List now returns a channel from which to read
the pins, rather than pins being all loaded into a huge slice.

Things reading pins have been all updated to iterate on the channel rather
than on the slice. The full pinset is no longer fully loaded onto memory for
things that run regularly like StateSync().

Additionally, the /allocations endpoint of the rest API no longer returns an
array of pins, but rather streams json-encoded pin objects directly. This
change has extended to the restapi client (which puts pins into a channel as
they arrive) and to ipfs-cluster-ctl.

There are still pending improvements like StatusAll() calls which should also
stream responses, and specially BlockPut calls which should stream blocks
directly into IPFS on a single call.

These are coming up in future commits.
2022-03-19 03:02:55 +01:00
Hector Sanjuan
e4b11b783b Pinsvcapi: address comments from review
- Add "Created" field to pinInfo.
- Support before/after filter
- 404 when something is unpinned or on a non-recognize state
2022-03-14 12:21:08 +01:00
Hector Sanjuan
c871c85f98 stateless: cast empty peer.ID correctly 2022-03-11 16:41:22 +01:00
Hector Sanjuan
cd5f9c869d pinsvcapi: Test and fix bugs in Add Pin. Improve performance. 2022-03-11 14:01:15 +01:00
Hector Sanjuan
fe4c0f61c9 stateless: fix test mock rpc 2022-03-11 09:58:19 +01:00
Hector Sanjuan
5fed4a2c5e types: include IPFSAddresses in pinInfo objects.
pinsvcapi: do not cache peer information here as all the needed information is
in the status objects.

This adds ipfs_addresses as a field broadcasted with the ping metrics.
2022-03-10 23:49:01 +01:00
Hector Sanjuan
0787ffbe36 PinInfo type: include Allocations, Origins, Metadata
This will facilitate building outputs for the Pinning Services API, saving a
round trip to query the cluster State, since all the needed information
already comes from the PinTracker, which has already accessed the state.

Since the pintracker already included a state attribute (Name), we are simply
going down that path.
2022-02-02 00:52:38 +01:00
Hector Sanjuan
809b7fbda5 Pintracker: add IPFS ID to Pin Information
Fixes #1554
Fixes: peer names unset for remote peers

This adds an IPFS field to pin status information (PinInfoShort).

It has not been easy to add this, given that the IPFS ID is something that
comes from outside of cluster (unlike the peer name). After several tries I
have settled in the following things:

- Use the ping metric to send out peer names and IPFS IDs to the peers in the
  cluster.
- Cache the latest known IPFS ID (if IPFS dies we should still be setting
  the ID).
- Provide an RPC method for the Pintracker to obtain IPFS ID from the cache.
- Given we now know information for peernames and IPFS IDs from other peers,
  we can use that information even if the requests to them error or we are not
  contacting (i.e. peers allocated as remote are not queried for status). We can
  use the information from the last received ping metric.
- This means we should keep metrics around even if peers go away, at least for
  a while rather than deleting them as soon as we detect that they have expired.

Puting it all together we now have a system to gossip peer information around on top
of the ping metrics.
2022-01-31 17:53:09 +01:00
Hector Sanjuan
af0cf8b106
Merge pull request #1538 from ipfs/pintracker/recover-speedup
pintracker: RecoverAll should only return status for recovered items
2022-01-11 17:09:20 +01:00
Hector Sanjuan
592ce450ce pintracker: RecoverAll should only return status for recovered items
We call RecoverAll regularly and I noticed it was way slower than it should be.

After all, it should just loop the pinset and enqueued items that are
unexpectedly unpinned or in pin error.

However, at some point we decided that RecoverAll would return information for
all pins, regardless of whether they were recovered or not. This ends up
resulting in a separate Status call for every pin that is already pinned, and
this call hits IPFS. This is pretty bad with big pinsets.

This commit fixes that, we return no state information for pins that are not
touched.
2022-01-11 16:22:03 +01:00
Hector Sanjuan
789b633366 pintracker: set status unexpectedly_unpinned correctly on Status()
This must has been an oversight. We added a special unexpectedly_unpinned
status so that we could just return things from the operation tracker when
filtering by pin_error. unexpectedly_unpinned are things that we have no
operation for but are unpinned on ipfs.

Status however still returned a pin_error state for these, even though,
StatusAll would not show them when filtering with pin_error, and would show
them as unexpectedly_unpinned otherwise.

Since Recover correctly repins pin_error and unexpectedly_unpinned, this
change has no further consequences.
2022-01-10 13:04:21 +01:00
Hector Sanjuan
67eeb45798 pintracker: clean exit during recover
Shutting down the cluster while a recover operation is ongoing resulted in
each remaining pin in the recover loop producing an error in the logs, as the
loop kept going even though compontents were already shutdown.

With 8 million items, this meant a lot of log messages, and a shutdown delay
that forced the killing of the process in most cases.

The recover loop now exits when the component's context is cancelled.
2021-12-17 11:47:50 +01:00
Hector Sanjuan
33343751b0 Stateless: add tests to make sure AttemptCounts are set correctly 2021-11-30 04:43:16 +01:00
Hector Sanjuan
7cf40de354 pintracker: Fix attempt count not increasing. Expose priority queueing.
"RetryCount" has been renamed to "AttemptCount", because it counts attempts.
2021-11-30 04:20:35 +01:00
Hector Sanjuan
be5c2d1569 pintracker: PriorityPinMaxAge and PriorityPinMaxRetries
These new configuration settings control whether pins are enqueued in the
priority queue or not.
2021-11-30 04:20:35 +01:00
Hector Sanjuan
0a146dae76 pintracker: support a priority channel for pinning 2021-11-30 04:20:35 +01:00
Hector Sanjuan
29c277b67f Pintracker: add and track retry counts in the operation manager.
Report retry count in the PinStatus
2021-11-30 04:20:35 +01:00
Hector Sanjuan
e9857652f2 Add a timestamp to Pins
This adds a Timestamp field to the pin objects. This allows to track when they were pinned.

This:

* Allows the pin-tracker to actually show accurate information on when the pin
  entered the system for pins that are not part of ongoing operations
  (currently it shows time.Now())
* Adds support for reporting timestamp on a pinning services api.
2021-10-20 16:55:57 +02:00
Hector Sanjuan
27569bdf88 pintracker: avoid listing the state unless necessary
This is a follow up to #1360 which further optimizes StatusAll calls by
avoiding listing and filtering the cluster state when requesting status for
operations that should be direclty in the operation tracker because they are
ongoing (queued, pinning, unpinning, error).
2021-07-08 01:01:25 +02:00
Hector Sanjuan
edfcfa3fb0 Fix #1360: Efficient pinset status with filters
This commit modifies the pintracker StatusAll call to take a status filter.

This allows to skip a PinLs call to ipfs when checking status for items that
are queued, pinning, unpinning or in error. Those status come directly from
the operation tracker. This should result in a significant performance
increase for those calls, particularly in nodes with several hundred thousand
pins and more, where the call to IPFS is very expensive.

A new TrackerStatusUnexpectedlyUnpinned status has been introduce to
differentiate between pin errors (tracked by the operation tracker) and "lost"
items (which before were pin errors too). This new status is handled by the
Recover() operation as before.
2021-07-06 11:34:19 +02:00
Hector Sanjuan
88cfcf62fc
Pintracker: use cid.Cid as map keys. (#1322)
* No reason not to.
* Should save a non trivial amount of time when doing
"Status" operation by not having to stringify Cids.
2021-03-16 15:47:44 +01:00
Hector Sanjuan
e967238848
Merge pull request #1129 from ipfs/fix/1013-follow-list
Improvements to ipfs-cluster-follow * list
2020-05-16 02:31:06 +02:00
Hector Sanjuan
c026299b95 Include Name as GlobalPinInfo key and consolidate redundant keys
GlobalPinInfo objects carried redundant information (Cid, Peer) that takes
space and time to serialize.

This has been addressed by having GlobalPinInfo embed PinInfoShort rather than
PinInfo. This new types ommits redundant fields.
2020-05-16 02:27:24 +02:00
Kishan Mohanbhai Sagathiya
ae8e74453b Fix #937: Print full working configuration at startup
Only when using debug mode

Co-authored-by: Hector Sanjuan <code@hector.link>
2020-05-15 01:33:04 +02:00
Hector Sanjuan
7e9cece29c Include pin Names in PinInfo objects
Fixes #1013 by avoiding to have to make a specific request to allocations.
2020-05-15 00:18:14 +02:00
Hector Sanjuan
fa762d78cf Improve PinLsCid to check for pinned items
Receive the full pin object so that it can decide whether to check
for recursive or direct pins directly.

Additionally, unpin will not check for the pin presence anymore and
simply trigger unpins (ignoring errors)
2020-04-21 17:23:55 +02:00
Hector Sanjuan
f83ff9b655 staticcheck: fix all staticcheck warnings in the project 2020-04-14 20:16:10 +02:00
Hector Sanjuan
b3853caf36 Dependency ugprade: changes needed
* Libp2p protectors no longer needed, use PSK directly
* Generate cluster 32-byte secret here (helper gone from pnet)
* Switch to go-log/v2 in all places
* DHT bootstrapping not needed. Adjust DHT options for tests.
* Do not rely on dissappeared CidToDsKey and DsKeyToCid functions fro dshelp.
* Disable QUIC (does not support private networks)
* Fix tests: autodiscovery started working properly
2020-03-22 14:50:25 +01:00
Hector Sanjuan
09d933fde1 pintracker: take care of tests
Simplify the tests, remove things that are not used at all, align the
behaviour of the mocks, add methods to test the correct behaviour of Status
etc.
2019-12-13 12:03:01 +01:00
Hector Sanjuan
8b6fd1fabe Fix: stateless: cluster should pin items that are in the state but not on ipfs
StateSync() used to take care of this by issuing Track() calls. But this
functionality was removed.

This starts returning items that are in the state but not on IPFS as
PIN_ERRORs. It ensures that the Recover methods see them so that they can
trigger repinnings for missing items. This covers cases where the user
modifies the ipfs state manually, or resets the ipfs daemon but keeps the
cluster state, and cases where cluster was stopped half-way through a pinning.
2019-12-13 12:00:03 +01:00
Kishan Sagathiya
5258a4d428 Remove map pintracker (#944)
This removes mappintracker and sets stateless tracker as the default (and only) pintracker component.

Because the stateless tracker matches the cluster state with only ongoing operations being kept on memory, and additional information provided by ipfs-pin-ls, syncing operations are not necessary. Therefore the Sync/SyncAll operations are removed cluster-wide.
2019-12-12 21:22:54 +01:00
Kishan Mohanbhai Sagathiya
19cde2e8cf Error queue is full for stateless pintracker
- increase max pin queue size to 1 million
- hide max_pin_queue_size from configuration
2019-09-11 12:53:51 +07:00
Hector Sanjuan
62b7054d31 Fix: pintrackers: Do not spam the logs when running recover
Currently logs every pin we call recover with. We call recover regularly.
So it will print all pins.
2019-08-14 14:10:44 +02:00
Hector Sanjuan
b804e61ef0 Update deps along with go-libp2p-core refactor
Lots of rewrites in imports...
2019-06-14 13:10:45 +02:00
Hector Sanjuan
3d49ac26a5 Feat: Split components into RPC Services
I had thought of this for a very long time but there were no compelling
reasons to do it. Specifying RPC endpoint permissions becomes however
significantly nicer if each Component is a different RPC Service. This also
fixes some naming issues like having to prefix methods with the component name
to separate them from methods named in the same way in some other component
(Pin and IPFSPin).
2019-05-04 21:36:10 +01:00