Commit Graph

2920 Commits

Author SHA1 Message Date
Hector Sanjuan
048d168126
Merge pull request #1768 from ipfs-cluster/fix/bad-context-propagation
Fix bad context propagation / deadlocks
2022-09-26 19:36:13 +02:00
Hector Sanjuan
21855c3130 Fix bad context propagation / deadlocks
We are propagating the wrong context (mostly from the Cluster top-level
methods). This makes that request cancellations (and cancellations of the
associated contexts) are not propagated to many methods, and can result in
deadlocks when an operation that is holding a lock is not aborted.

This affects for example the operation tracker. Getting all operations from
the tracker relies on someone reading from the out channel, or on the context
being cancelled. When a request is aborted in the middle of the response, and
the context is not cancelled, everything that wants to list operations would
become deadlocked, including operations that need write locks like
TrackNewOperation.

This fixes it.
2022-09-26 19:35:55 +02:00
Hector Sanjuan
328f2388d0
Merge pull request #1770 from ipfs-cluster/fix/ipfshttp-panic-return
ipfshttp: fix return with nil error
2022-09-26 19:34:56 +02:00
Hector Sanjuan
1efd6a7bd1 ipfshttp: fix return with nil error 2022-09-26 18:50:33 +02:00
Hector Sanjuan
003da51c7b
Release v1.0.3 2022-09-16 12:01:03 +02:00
Hector Sanjuan
b9debdb79c
Merge pull request #1763 from ipfs-cluster/v1.0.3/changelog
Changelog updates for v1.0.3
2022-09-15 19:10:51 +02:00
Hector Sanjuan
cb6cdc8ad6 Changelog updates for v1.0.3 2022-09-15 18:29:54 +02:00
Hector Sanjuan
af1f69a6f0 ipfshttp: remove leftover Println 2022-09-15 18:02:04 +02:00
Hector Sanjuan
c9895bf607
Merge pull request #1762 from ipfs-cluster/fix/1733-ipfs-error-handling
Behaviour improvements when the ipfs daemon is unavailable
2022-09-15 17:59:04 +02:00
Hector Sanjuan
592d61b228 ipfshttp: rate limit requests when failures happen
When IPFS starts failing or doesn't respond (i.e. during a restart), cluster
is likely to start sending requests at very fast rates. i.e. if there are 100k
items to be pinned, and pins start failing immediately, cluster will consume
the pin queue really fast and it will all be failures. At the same time, ipfs
is hammered non-stop until recover, which may make it harder.

This commits introduces a rate-limit when requests to IPFS fail. After 10
failed requests, requests will be sent at most at 1req/s rate. Once a requests
succeeds, the rate-limit is raised.

This should prevent hammering the IPFS daemon, but also increased CPU in
cluster as it burns through pinning queues when IPFS is offline, making the
situation in machines worse (and emitting way more logs).
2022-09-15 17:37:26 +02:00
Hector Sanjuan
2286ee73f8 api: return errors on stream response requests with 0 items
This fixes a bug in API code that made it return 204-No content when the RPC
methods failed with an error before any items were returned on the channel.
2022-09-15 16:40:34 +02:00
Hector Sanjuan
12b8ce63ce stateless: abort when ipfs PinLs errors
Unfortunately we were not paying attentions to errors while rpc-streaming pins
in the pintracker. The result is that the StatusAll operation would list all
the pins as unexpectedly unpinned when ipfs is offline, and this would result in
recover/requeing operations for all pins when ipfs is offline.

This commits changes the behaviour so that if IPFS Pin/ls has resulted in an
error, then the StatusAll operation cannot complete at all.
2022-09-15 16:40:34 +02:00
Hector Sanjuan
a60a835e36 Wait for IPFS to be ready during start
This commit introduces unlimited waiting on start until a request to `ipfs id`
succeeds.

Waiting has some consequences:

* State watching (recover/sync) and metrics publishing does not start until ipfs is ready
* swarm/connect is not triggered until ipfs is ready.

Once the first request to ipfs succeeds everything goes to what it was before.

This alleviates trying operations like sending our IDs in metrics when IPFS is
simply not there.
2022-09-15 16:40:34 +02:00
Hector Sanjuan
b80f89dd01
Merge pull request #1516 from ipfs-cluster/crdtdot
Allow exporting CRDT dag to dot files with: "ipfs-cluster-service state crdt dot"
2022-09-14 10:58:58 +02:00
Hector Sanjuan
29d6f69819 Add "service state crdt dot" command
This subcommand allows to export the peer's CRDT DAG as dot file.

It also opens the door to have more crdt-specific subcommands, i.e. CAR export
etc.
2022-09-13 16:36:10 +02:00
Hector Sanjuan
fe11730e58
Merge pull request #1757 from ipfs-cluster/update-raft
Update raft libraries
2022-09-09 17:33:07 +02:00
Hector Sanjuan
b81379e383 Update raft libraries
This updates Raft to v1.3.0. We can also remove some logging glue that is no
longer necessary in here.
2022-09-09 17:17:45 +02:00
Hector Sanjuan
920ba03b1e
Merge pull request #1756 from ipfs-cluster/fix/1738-proxy-block-dag-put-intercept
ipfsproxy: intercept block/put and dag/put and pin to cluster on pin=true
2022-09-09 16:44:22 +02:00
Hector Sanjuan
7af52bdb4e ipfsproxy: forward block/put and dag/put with pin=false
Since the pin argument is interpreted and performed by cluster. Pins may be allocated to other nodes.
2022-09-09 16:41:58 +02:00
Hector Sanjuan
b2b33e8668 ipfsproxy: add missing heads to dag/put, block/put handlers 2022-09-09 16:35:55 +02:00
Hector Sanjuan
06f0cac9c0
Merge pull request #1753 from ipfs-cluster/fix/1706-fix-block-warning
ipfshttp: Fix "seen blocks" tracking and blockPut metrics
2022-09-09 16:32:09 +02:00
Hector Sanjuan
573d7e8916 ipfsproxy: add tests for block/put and dag/put intercepts 2022-09-09 16:27:09 +02:00
Hector Sanjuan
6ce90dfe47 ipfsproxy: intercept block/put and dag/put and pin to cluster on pin=true
This fixes #1738. Tests still missing
2022-09-09 00:49:50 +02:00
Hector Sanjuan
5452b59a2e
Dependency upgrades (#1755)
* Update go-libp2p to v0.22.0

* Testing with go1.19

* build(deps): bump github.com/multiformats/go-multicodec

Bumps [github.com/multiformats/go-multicodec](https://github.com/multiformats/go-multicodec) from 0.5.0 to 0.6.0.
- [Release notes](https://github.com/multiformats/go-multicodec/releases)
- [Commits](https://github.com/multiformats/go-multicodec/compare/v0.5.0...v0.6.0)

---
updated-dependencies:
- dependency-name: github.com/multiformats/go-multicodec
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* build(deps): bump github.com/ipld/go-car from 0.4.0 to 0.5.0

Bumps [github.com/ipld/go-car](https://github.com/ipld/go-car) from 0.4.0 to 0.5.0.
- [Release notes](https://github.com/ipld/go-car/releases)
- [Commits](https://github.com/ipld/go-car/compare/v0.4.0...v0.5.0)

---
updated-dependencies:
- dependency-name: github.com/ipld/go-car
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* build(deps): bump github.com/prometheus/client_golang

Bumps [github.com/prometheus/client_golang](https://github.com/prometheus/client_golang) from 1.12.2 to 1.13.0.
- [Release notes](https://github.com/prometheus/client_golang/releases)
- [Changelog](https://github.com/prometheus/client_golang/blob/main/CHANGELOG.md)
- [Commits](https://github.com/prometheus/client_golang/compare/v1.12.2...v1.13.0)

---
updated-dependencies:
- dependency-name: github.com/prometheus/client_golang
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* build(deps): bump github.com/hashicorp/go-hclog from 1.2.1 to 1.3.0

Bumps [github.com/hashicorp/go-hclog](https://github.com/hashicorp/go-hclog) from 1.2.1 to 1.3.0.
- [Release notes](https://github.com/hashicorp/go-hclog/releases)
- [Commits](https://github.com/hashicorp/go-hclog/compare/v1.2.1...v1.3.0)

---
updated-dependencies:
- dependency-name: github.com/hashicorp/go-hclog
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* build(deps): bump github.com/ipfs/go-ds-crdt from 0.3.6 to 0.3.7

Bumps [github.com/ipfs/go-ds-crdt](https://github.com/ipfs/go-ds-crdt) from 0.3.6 to 0.3.7.
- [Release notes](https://github.com/ipfs/go-ds-crdt/releases)
- [Commits](https://github.com/ipfs/go-ds-crdt/compare/v0.3.6...v0.3.7)

---
updated-dependencies:
- dependency-name: github.com/ipfs/go-ds-crdt
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* build(deps): bump github.com/urfave/cli/v2 from 2.10.2 to 2.14.1

Bumps [github.com/urfave/cli/v2](https://github.com/urfave/cli) from 2.10.2 to 2.14.1.
- [Release notes](https://github.com/urfave/cli/releases)
- [Changelog](https://github.com/urfave/cli/blob/main/docs/CHANGELOG.md)
- [Commits](https://github.com/urfave/cli/compare/v2.10.2...v2.14.1)

---
updated-dependencies:
- dependency-name: github.com/urfave/cli/v2
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* build(deps): bump github.com/libp2p/go-libp2p-http from 0.3.0 to 0.4.0

Bumps [github.com/libp2p/go-libp2p-http](https://github.com/libp2p/go-libp2p-http) from 0.3.0 to 0.4.0.
- [Release notes](https://github.com/libp2p/go-libp2p-http/releases)
- [Commits](https://github.com/libp2p/go-libp2p-http/compare/v0.3.0...v0.4.0)

---
updated-dependencies:
- dependency-name: github.com/libp2p/go-libp2p-http
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* build(deps): bump github.com/libp2p/go-libp2p-gorpc from 0.4.0 to 0.5.0

Bumps [github.com/libp2p/go-libp2p-gorpc](https://github.com/libp2p/go-libp2p-gorpc) from 0.4.0 to 0.5.0.
- [Release notes](https://github.com/libp2p/go-libp2p-gorpc/releases)
- [Commits](https://github.com/libp2p/go-libp2p-gorpc/compare/v0.4.0...v0.5.0)

---
updated-dependencies:
- dependency-name: github.com/libp2p/go-libp2p-gorpc
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* build(deps): bump contrib.go.opencensus.io/exporter/prometheus

Bumps [contrib.go.opencensus.io/exporter/prometheus](https://github.com/census-ecosystem/opencensus-go-exporter-prometheus) from 0.4.1 to 0.4.2.
- [Release notes](https://github.com/census-ecosystem/opencensus-go-exporter-prometheus/releases)
- [Commits](https://github.com/census-ecosystem/opencensus-go-exporter-prometheus/compare/v0.4.1...v0.4.2)

---
updated-dependencies:
- dependency-name: contrib.go.opencensus.io/exporter/prometheus
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* build(deps): bump github.com/libp2p/go-libp2p-raft from 0.1.8 to 0.2.0

Bumps [github.com/libp2p/go-libp2p-raft](https://github.com/libp2p/go-libp2p-raft) from 0.1.8 to 0.2.0.
- [Release notes](https://github.com/libp2p/go-libp2p-raft/releases)
- [Commits](https://github.com/libp2p/go-libp2p-raft/compare/v0.1.8...v0.2.0)

---
updated-dependencies:
- dependency-name: github.com/libp2p/go-libp2p-raft
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* build(deps): bump github.com/urfave/cli from 1.22.9 to 1.22.10

Bumps [github.com/urfave/cli](https://github.com/urfave/cli) from 1.22.9 to 1.22.10.
- [Release notes](https://github.com/urfave/cli/releases)
- [Changelog](https://github.com/urfave/cli/blob/main/docs/CHANGELOG.md)
- [Commits](https://github.com/urfave/cli/compare/v1.22.9...v1.22.10)

---
updated-dependencies:
- dependency-name: github.com/urfave/cli
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* Fix checker/linter/staticcheck warnings

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-06 16:57:17 +02:00
Hector Sanjuan
8c93d4cb81
Merge pull request #1754 from ipfs-cluster/struct-align-comment
Add comments to struct fields that must be aligned.
2022-09-06 16:23:24 +02:00
Hector Sanjuan
ed54c665b8 ipfshttp: Fix "seen blocks" tracking and blockPut metrics
This fixes two bugs. First, the "blockPut response CID does not match the
multihash" warning was coming up when it shouldn't. Particularly, the
multipart reader called Node() several times for the same block, resulting in
CIDs been removed from the Seen set, and causing the warning when there were
several blocks (usually the empty dir block).

This also means we were counting Blockputs (and total data added) wrong in the
metrics, double-counting some blocks as these were recorded in Node() calls.

The fix makes the tracking in Next(), which is only called once for each
block. To avoid timing issues between Block reads from the channel and
blockput responses, the Seen set now stores how many times we have seen a
block. Thus a duplicated block that will get two BlockPut responses will not
trigger a warning regardless of the time when those responses arrive.

Fixes #1706.
2022-09-05 18:10:36 +02:00
Hector Sanjuan
a84869d3db Add comments to struct fields that must be aligned. 2022-09-05 17:03:30 +02:00
Hector Sanjuan
71bda2d658
Merge pull request #1736 from cldy309/fix/1735-panic
fix panic: unaligned 64-bit atomic operation on Linux@armv7
2022-09-05 16:44:31 +02:00
chenlong348
bd3b88b933 fix panic: unaligned 64-bit atomic operation on Linux@armv7 2022-08-15 10:13:37 +08:00
Hector Sanjuan
c4d78d52f8
Merge pull request #1732 from ipfs-cluster/fix/goroutine-leak-adder
Fix: leaking goroutines on aborted /add requests
2022-07-08 17:40:24 +02:00
Hector Sanjuan
d19c7facff Fix: leaking goroutines on aborted /add requests
It has been observed that some peers have a growing number of goroutines,
usually stuck in go-libp2p-gorpc.MultiStream() function, which is waiting to
read items from the arguments channel.

We suspect this is due to aborted /add requests. In situations when the add
request is aborted or fails, Finalize() is never called and the blocks channel
stays open, so MultiStream() can never exit, and the BlockStreamer can never
stop streaming etc.

As a fix, we added the requirement to call Close() when we stop using a
ClusterDAGService (error or not). This should ensure that the blocks channel
is always closed and not just on Finalize().
2022-07-08 17:39:59 +02:00
Hector Sanjuan
b2ce7d916d
Release v1.0.2 2022-07-06 18:26:32 +02:00
Hector Sanjuan
ff50c119e4
Merge pull request #1731 from ipfs-cluster/v1.0.2/changelog
Changelog for v1.0.2
2022-07-06 18:25:18 +02:00
Hector Sanjuan
a1ff94c504 Changelog for v1.0.2 2022-07-06 18:25:03 +02:00
Hector Sanjuan
f3662b8e0f
Merge pull request #1730 from ipfs-cluster/update-deps-car
Update go-car to v0.4.0
2022-07-06 18:08:05 +02:00
Hector Sanjuan
ad7329f602 Update go-car to v0.4.0 2022-07-06 17:49:44 +02:00
Hector Sanjuan
d2cbd8f910
Merge pull request #1729 from ipfs-cluster/fix/pins-error-negative
Fix: operationtracker metrics go negative
2022-07-04 20:20:13 +02:00
Hector Sanjuan
38e3c4a695 Fix: operationtracker metrics go negative
By substracing 1 on every cancel we are double-counting.
2022-07-04 20:09:10 +02:00
Hector Sanjuan
04177fa545
Release candidate v1.0.2-rc1 2022-06-30 14:26:54 +02:00
Hector Sanjuan
3e6577c22a
Merge pull request #1725 from ipfs-cluster/metrics-freespace
Metrics freespace
2022-06-23 14:19:00 +02:00
Hector Sanjuan
c454769887 Informer/disk: record issued metric weights as prometheus metric. 2022-06-23 11:58:35 +02:00
Hector Sanjuan
2aec92301d metrics: set block/added_size unit to bytes 2022-06-23 11:58:07 +02:00
Hector Sanjuan
e8695dc6f3 informer/disk: set repoSize weight to negative
smaller repositories should have more priority
2022-06-23 11:49:37 +02:00
Hector Sanjuan
f7e62beee2
Merge pull request #1724 from ipfs-cluster/dependency-upgrades
Dependency upgrades
2022-06-23 11:42:45 +02:00
Hector Sanjuan
5ff5cb2e68 Update dependencies 2022-06-23 11:41:57 +02:00
Hector Sanjuan
12490c959a
Merge pull request #1719 from ipfs-cluster/fix/1697-commit-batches
crdt: Commit batches on shutdown
2022-06-23 11:03:28 +02:00
Hector Sanjuan
5e7a694cd1 crdt: Implement proper Batch commit on shutdown
This implements committing batches on shutdown properly.

Now the batchWorker will only finish when there are no more things queued to
be included in the final batch(es).

LogPin/Unpin operations will fail while we are shutting down and they cannot
be included in the batch.
2022-06-22 20:19:12 +02:00
Hector Sanjuan
a393ebd8d8 crdt: Commit batches on shutdown
This attempt to commit any pending batches when the crdt component is being
shutudown. A commit should succeed if the new DAG node is created, heads are
replaced and broadcast.

The latest version of CRDT ensure that the datastore does not unnecessarily
gets marked as dirty when a broadcasted head cannot be fetched/processed, so
the side effect of publishing a head before shutting down should be under
control at least.
2022-06-21 19:11:05 +02:00
Hector Sanjuan
d6166d802b
Merge pull request #1717 from ipfs-cluster/fix/1702-neg-metrics
pintracker: fix some races resulting in wrong metric counts
2022-06-20 22:29:51 +02:00
Hector Sanjuan
28c24931b6 pintracker: fix some races resulting in wrong metric counts
I believe this fixes the issue with some metrics like pinning going into
negative numbers occasionally. Fixes #1702.
2022-06-20 22:16:36 +02:00