Commit Graph

103 Commits

Author SHA1 Message Date
Hector Sanjuan
11124ee224 Fix: repinning does not re-allocate as needed
Long story: Since #1768 there has been a recurring repinning test failure with
Raft consensus.

Per the test, if a pin is allocated to a peer that has been shutdown,
submitting the pin again should re-allocate it to a peer that is still
running.

Investigation on why this test fails and why it fails only in Raft lead to
realizing that this and other similar tests, were passing by chance. The
needed re-allocations were made not by the new submission of the pin, but by
the automatic-repinning feature. The actual resubmitted pin was carrying the
same allocations (one of them being the peer that was down), but it was
silently failing because the RedirectToLeader() code path was using
cc.ctx and hitting the peer that had been shutdown, which caused it to error.

Fixing the context propagation, meant that we would re-overwrite the pin with
the old allocations, thus the actual behaviour did not pass the test.

So, on one side, this fix an number of tests that had not disabled automatic
repinning and was probably getting in the way of things. On the other side,
this removes a condition that prevents re-allocation of pins if they exists
and options have not changed.

I don't fully understand why this was there though, since the Allocate() code
does return the old allocations anyways when they are enough, so it should not
re-allocate randomly. I suspect this was preventing some misbehaviour in the
Allocate() code from the time before it was improved with multiple allocators
etc.
2022-09-27 12:31:24 +02:00
Hector Sanjuan
21855c3130 Fix bad context propagation / deadlocks
We are propagating the wrong context (mostly from the Cluster top-level
methods). This makes that request cancellations (and cancellations of the
associated contexts) are not propagated to many methods, and can result in
deadlocks when an operation that is holding a lock is not aborted.

This affects for example the operation tracker. Getting all operations from
the tracker relies on someone reading from the out channel, or on the context
being cancelled. When a request is aborted in the middle of the response, and
the context is not cancelled, everything that wants to list operations would
become deadlocked, including operations that need write locks like
TrackNewOperation.

This fixes it.
2022-09-26 19:35:55 +02:00
Hector Sanjuan
b81379e383 Update raft libraries
This updates Raft to v1.3.0. We can also remove some logging glue that is no
longer necessary in here.
2022-09-09 17:17:45 +02:00
Hector Sanjuan
5452b59a2e
Dependency upgrades (#1755)
* Update go-libp2p to v0.22.0

* Testing with go1.19

* build(deps): bump github.com/multiformats/go-multicodec

Bumps [github.com/multiformats/go-multicodec](https://github.com/multiformats/go-multicodec) from 0.5.0 to 0.6.0.
- [Release notes](https://github.com/multiformats/go-multicodec/releases)
- [Commits](https://github.com/multiformats/go-multicodec/compare/v0.5.0...v0.6.0)

---
updated-dependencies:
- dependency-name: github.com/multiformats/go-multicodec
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* build(deps): bump github.com/ipld/go-car from 0.4.0 to 0.5.0

Bumps [github.com/ipld/go-car](https://github.com/ipld/go-car) from 0.4.0 to 0.5.0.
- [Release notes](https://github.com/ipld/go-car/releases)
- [Commits](https://github.com/ipld/go-car/compare/v0.4.0...v0.5.0)

---
updated-dependencies:
- dependency-name: github.com/ipld/go-car
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* build(deps): bump github.com/prometheus/client_golang

Bumps [github.com/prometheus/client_golang](https://github.com/prometheus/client_golang) from 1.12.2 to 1.13.0.
- [Release notes](https://github.com/prometheus/client_golang/releases)
- [Changelog](https://github.com/prometheus/client_golang/blob/main/CHANGELOG.md)
- [Commits](https://github.com/prometheus/client_golang/compare/v1.12.2...v1.13.0)

---
updated-dependencies:
- dependency-name: github.com/prometheus/client_golang
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* build(deps): bump github.com/hashicorp/go-hclog from 1.2.1 to 1.3.0

Bumps [github.com/hashicorp/go-hclog](https://github.com/hashicorp/go-hclog) from 1.2.1 to 1.3.0.
- [Release notes](https://github.com/hashicorp/go-hclog/releases)
- [Commits](https://github.com/hashicorp/go-hclog/compare/v1.2.1...v1.3.0)

---
updated-dependencies:
- dependency-name: github.com/hashicorp/go-hclog
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* build(deps): bump github.com/ipfs/go-ds-crdt from 0.3.6 to 0.3.7

Bumps [github.com/ipfs/go-ds-crdt](https://github.com/ipfs/go-ds-crdt) from 0.3.6 to 0.3.7.
- [Release notes](https://github.com/ipfs/go-ds-crdt/releases)
- [Commits](https://github.com/ipfs/go-ds-crdt/compare/v0.3.6...v0.3.7)

---
updated-dependencies:
- dependency-name: github.com/ipfs/go-ds-crdt
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* build(deps): bump github.com/urfave/cli/v2 from 2.10.2 to 2.14.1

Bumps [github.com/urfave/cli/v2](https://github.com/urfave/cli) from 2.10.2 to 2.14.1.
- [Release notes](https://github.com/urfave/cli/releases)
- [Changelog](https://github.com/urfave/cli/blob/main/docs/CHANGELOG.md)
- [Commits](https://github.com/urfave/cli/compare/v2.10.2...v2.14.1)

---
updated-dependencies:
- dependency-name: github.com/urfave/cli/v2
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* build(deps): bump github.com/libp2p/go-libp2p-http from 0.3.0 to 0.4.0

Bumps [github.com/libp2p/go-libp2p-http](https://github.com/libp2p/go-libp2p-http) from 0.3.0 to 0.4.0.
- [Release notes](https://github.com/libp2p/go-libp2p-http/releases)
- [Commits](https://github.com/libp2p/go-libp2p-http/compare/v0.3.0...v0.4.0)

---
updated-dependencies:
- dependency-name: github.com/libp2p/go-libp2p-http
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* build(deps): bump github.com/libp2p/go-libp2p-gorpc from 0.4.0 to 0.5.0

Bumps [github.com/libp2p/go-libp2p-gorpc](https://github.com/libp2p/go-libp2p-gorpc) from 0.4.0 to 0.5.0.
- [Release notes](https://github.com/libp2p/go-libp2p-gorpc/releases)
- [Commits](https://github.com/libp2p/go-libp2p-gorpc/compare/v0.4.0...v0.5.0)

---
updated-dependencies:
- dependency-name: github.com/libp2p/go-libp2p-gorpc
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* build(deps): bump contrib.go.opencensus.io/exporter/prometheus

Bumps [contrib.go.opencensus.io/exporter/prometheus](https://github.com/census-ecosystem/opencensus-go-exporter-prometheus) from 0.4.1 to 0.4.2.
- [Release notes](https://github.com/census-ecosystem/opencensus-go-exporter-prometheus/releases)
- [Commits](https://github.com/census-ecosystem/opencensus-go-exporter-prometheus/compare/v0.4.1...v0.4.2)

---
updated-dependencies:
- dependency-name: contrib.go.opencensus.io/exporter/prometheus
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* build(deps): bump github.com/libp2p/go-libp2p-raft from 0.1.8 to 0.2.0

Bumps [github.com/libp2p/go-libp2p-raft](https://github.com/libp2p/go-libp2p-raft) from 0.1.8 to 0.2.0.
- [Release notes](https://github.com/libp2p/go-libp2p-raft/releases)
- [Commits](https://github.com/libp2p/go-libp2p-raft/compare/v0.1.8...v0.2.0)

---
updated-dependencies:
- dependency-name: github.com/libp2p/go-libp2p-raft
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* build(deps): bump github.com/urfave/cli from 1.22.9 to 1.22.10

Bumps [github.com/urfave/cli](https://github.com/urfave/cli) from 1.22.9 to 1.22.10.
- [Release notes](https://github.com/urfave/cli/releases)
- [Changelog](https://github.com/urfave/cli/blob/main/docs/CHANGELOG.md)
- [Commits](https://github.com/urfave/cli/compare/v1.22.9...v1.22.10)

---
updated-dependencies:
- dependency-name: github.com/urfave/cli
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* Fix checker/linter/staticcheck warnings

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-06 16:57:17 +02:00
Hector Sanjuan
755cebbe0d Enable spell checking and fix spelling errors (using US locale) 2022-06-16 17:43:30 +02:00
Hector Sanjuan
508791b547 Migrate from ipfs/ipfs-cluster to ipfs-cluster/ipfs-cluster
This performs the necessary renamings.
2022-06-16 17:43:30 +02:00
Hector Sanjuan
3169fba9d1 metrics: track total pins, queued, pinning, pin error.
This fixes #1470 and #1187.
2022-04-22 15:57:48 +02:00
Hector Sanjuan
a97ed10d0b Adopt api.Cid type - replaces cid.Cid everwhere.
This commit introduces an api.Cid type and replaces the usage of cid.Cid
everywhere.

The main motivation here is to override MarshalJSON so that Cids are
JSON-ified as '"Qm...."' instead of '{ "/": "Qm....." }', as this "ipld"
representation of IDs is horrible to work with, and our APIs are not issuing
IPLD objects to start with.

Unfortunately, there is no way to do this cleanly, and the best way is to just
switch everything to our own type.
2022-04-07 14:27:39 +02:00
Hector Sanjuan
0d73d33ef5 Pintracker: streaming methods
This commit continues the work of taking advantage of the streaming
capabilities in go-libp2p-gorpc by improving the ipfsconnector and pintracker
components.

StatusAll and RecoverAll methods are now streaming methods, with the REST API
output changing accordingly to produce a stream of GlobalPinInfos rather than
a json array.

pin/ls request to the ipfs daemon now use ?stream=true and avoid having to
load the full pinset map on memory. StatusAllLocal and RecoverAllLocal
requests to the pin tracker stream all the way and no longer store the full
pinset, and the full PinInfo status slice before sending it out.

We have additionally switched to a pattern where streaming methods receive the
channel as an argument, allowing the caller to decide on whether to launch a
goroutine, do buffering etc.
2022-03-22 15:38:01 +01:00
Hector Sanjuan
9b9d76f92d Pinset streaming and method type revamp
This commit introduces the new go-libp2p-gorpc streaming capabilities for
Cluster. The main aim is to work towards heavily reducing memory usage when
working with very large pinsets.

As a side-effect, it takes the chance to revampt all types for all public
methods so that pointers to static what should be static objects are not used
anymore. This should heavily reduce heap allocations and GC activity.

The main change is that state.List now returns a channel from which to read
the pins, rather than pins being all loaded into a huge slice.

Things reading pins have been all updated to iterate on the channel rather
than on the slice. The full pinset is no longer fully loaded onto memory for
things that run regularly like StateSync().

Additionally, the /allocations endpoint of the rest API no longer returns an
array of pins, but rather streams json-encoded pin objects directly. This
change has extended to the restapi client (which puts pins into a channel as
they arrive) and to ipfs-cluster-ctl.

There are still pending improvements like StatusAll() calls which should also
stream responses, and specially BlockPut calls which should stream blocks
directly into IPFS on a single call.

These are coming up in future commits.
2022-03-19 03:02:55 +01:00
Hector Sanjuan
4739ed9210 Changes pertaining to go-libp2p v0.16.0 2021-11-30 06:25:15 +01:00
Kishan Mohanbhai Sagathiya
ae8e74453b Fix #937: Print full working configuration at startup
Only when using debug mode

Co-authored-by: Hector Sanjuan <code@hector.link>
2020-05-15 01:33:04 +02:00
Hector Sanjuan
b513ec194d Fix some mispellings 2020-04-14 23:47:09 +02:00
Hector Sanjuan
f83ff9b655 staticcheck: fix all staticcheck warnings in the project 2020-04-14 20:16:10 +02:00
Hector Sanjuan
b3853caf36 Dependency ugprade: changes needed
* Libp2p protectors no longer needed, use PSK directly
* Generate cluster 32-byte secret here (helper gone from pnet)
* Switch to go-log/v2 in all places
* DHT bootstrapping not needed. Adjust DHT options for tests.
* Do not rely on dissappeared CidToDsKey and DsKeyToCid functions fro dshelp.
* Disable QUIC (does not support private networks)
* Fix tests: autodiscovery started working properly
2020-03-22 14:50:25 +01:00
Hector Sanjuan
b306bda877 Raft logging: update logger to new interface 2019-11-05 12:51:18 +01:00
Hector Sanjuan
d63a7fd641
Merge pull request #877 from ipfs/fix/ipfs-to-p2p
Use `p2p` protocol name over `ipfs` for multiaddr
2019-09-06 15:00:36 +02:00
Kishan Mohanbhai Sagathiya
6656b80a00 Some more occurences of /ipfs
and use  SwapToP2pMultiaddrs (very helpful since ipfs still send
addresses with `/ipfs` tag)
2019-08-16 11:56:09 +05:30
Hector Sanjuan
28ae394fa9 Fix #883: Tweak timeouts for better tests 2019-08-13 19:44:48 +02:00
Kishan Sagathiya
0a5598a922 Fix #211: Remove commented code around LeaderObservation (#858)
* Remove 32bit safegaurd and remove LeaderObersvation
2019-07-29 19:11:24 +02:00
Kishan Sagathiya
7f52242f35 Fix #840: Removed Raft peers should dissapear from peerstore (#846)
With this commit, cluster peer will observe on events of peer removal
from cluster. On occurence of the event, the cluster peer will clear
the removed peer from its peerstore.
2019-07-25 14:40:05 +02:00
Hector Sanjuan
b804e61ef0 Update deps along with go-libp2p-core refactor
Lots of rewrites in imports...
2019-06-14 13:10:45 +02:00
Hector Sanjuan
83c4866100 Remove Leftover println 2019-06-13 17:27:31 +02:00
Hector Sanjuan
27368ab077 Fix: alert at most once PER METRIC
Before it would alert at most once per peer, which prevented some metrics
from alerting at all.
2019-06-11 11:44:12 +02:00
Hector Sanjuan
b46f022884 Raft: rewrite logger
New Raft update has changed the type of the logger
2019-05-25 00:24:30 +02:00
Hector Sanjuan
21032f2101 Raft: remove TODO. Trust all peers. 2019-05-13 23:22:08 +02:00
Hector Sanjuan
dbc52ae981 rpc auth: golint 2019-05-09 22:36:03 +02:00
Hector Sanjuan
949e6f2364 RPC auth: Support Trusted Peers in CRDT consensus component.
TrustedPeers are specified in the configuration. Additional peers
can be added at runtime with Trust/Distrust functions.

Unfortunately we cannot use consensus.PeerAdd as a way to trust a peer as
cluster.PeerAdd+Join can be called by any peer and this calls
consensus.PeerAdd.

The result is consensus.PeerAdd doing a lot in Raft while consensus.Trust does
nothing, while in CRDTs consensus.Trust does something but consensus.PeerAdd
does nothing. But this is more or less consistent.
2019-05-09 19:48:40 +02:00
Hector Sanjuan
70f4cad613 RPC Auth: start using the RPC policy in the RPC server. 2019-05-09 15:14:26 +02:00
Hector Sanjuan
3d49ac26a5 Feat: Split components into RPC Services
I had thought of this for a very long time but there were no compelling
reasons to do it. Specifying RPC endpoint permissions becomes however
significantly nicer if each Component is a different RPC Service. This also
fixes some naming issues like having to prefix methods with the component name
to separate them from methods named in the same way in some other component
(Pin and IPFSPin).
2019-05-04 21:36:10 +01:00
Hector Sanjuan
acbd7fda60 Consensus: add new "crdt" consensus component
This adds a new "crdt" consensus component using go-ds-crdt.

This implies several refactors to fully make cluster consensus-component
independent:

* Delete mapstate and fully adopt dsstate (after people have migrated).
* Return errors from state methods rather than ignoring them.
* Add a new "datastore" modules so that we can configure datastores in the
   main configuration like other components.
* Let the consensus components fully define the "state.State". Thus, they do
not receive the state, they receive the storage where we put the state (a
go-datastore).
* Allow to customize how the monitor component obtains Peers() (the current
  peerset), including avoiding using the current peerset. At the moment the
  crdt consensus uses the monitoring component to define the current peerset.
  Therefore the monitor component cannot rely on the consensus component to
  produce a peerset.
* Re-factor/re-implementation of "ipfs-cluster-service state"
  operations. Includes the dissapearance of the "migrate" one.

The CRDT consensus component defines creates a crdt-datastore (with ipfs-lite)
and uses it to intitialize a dssate. Thus the crdt-store is elegantly
wrapped. Any modifications to the state get automatically replicated to other
peers. We store all the CRDT DAG blocks in the local datastore.

The consensus components only expose a ReadOnly state, as any modifications to
the shared state should happen through them.

DHT and PubSub facilities must now be created outside of Cluster and passed in
so they can be re-used by different components.
2019-04-17 19:14:26 +02:00
Hector Sanjuan
ea85cf7805 Rename "test.Test*" to "test.*" (test.TestCid1 -> test.Cid1)
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2019-02-27 20:19:10 +00:00
Hector Sanjuan
9df6344a07 Avoid using string testing CIDs and use cid.Cids directly
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2019-02-27 20:09:31 +00:00
Hector Sanjuan
cbf51a2b66 Fix struct tags
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2019-02-27 18:50:46 +00:00
Hector Sanjuan
c4b18cd5f6 Address issues from self-review
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2019-02-27 18:43:29 +00:00
Hector Sanjuan
6447ea51d2 Remove *Serial types. Use pointers for all types.
This takes advantange of the latest features in go-cid, peer.ID and
go-multiaddr and makes the Go types serializable by default.

This means we no longer need to copy between Pin <-> PinSerial, or ID <->
IDSerial etc. We can now efficiently binary-encode these types using short
field keys and without parsing/stringifying (in many cases it just a cast).

We still get the same json output as before (with minor modifications for
Cids).

This should greatly improve Cluster performance and memory usage when dealing
with large collections of items.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2019-02-27 17:04:35 +00:00
Hector Sanjuan
0fed61192a Remove backwards compatibility hacks
The things removed here have been live for more than 2 releases.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2019-02-20 14:02:09 +00:00
Hector Sanjuan
d57b81490f State: Use go-datastore to implement the state interface
Since the beginning, we have used a Go map to store the shared state (pinset)
in memory. The mapstate knew how to serialize itself so that libp2p-raft would
know how to write to disk when it:

* Saved snapshots of the state on shutdown
* Sent the state to a newcomer peer

hashicorp.Raft assumes an in-memory state which is snapshotted from time to
time and read from disk on boot.

This commit adds a `dsstate` implementation of the state interface using
`go-datastore`. This allows to effortlessly switch to a disk-backed state in
the future (as we will need), and also have at our disposal the different
implementations and utilities of Datastore for fine-tuning (caching, batching
etc.).

`mapstate` has been reworked to use dsstate. Ideally, we would not even need
`mapstate`, as it would suffice to initialize `dsstate` with a
`MapDatastore`. BUT, we still need it separate to be able to auto-migrate to
the new format.

This will be the last migration with the current system. Once this has been
released and users have been able to upgrade we will just remove `mapstate` as
it is now.

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2019-02-19 18:31:14 +00:00
Hector Sanjuan
3059ab387a
Merge pull request #663 from roignpar/issue_656
Read config values from env on init command
2019-02-19 17:48:47 +00:00
Robert Ignat
bac982c5aa Add ApplyEnvVars test to raft config
License: MIT
Signed-off-by: Robert Ignat <robert.ignat91@gmail.com>
2019-02-18 17:43:54 +02:00
Robert Ignat
168cf76224 Change ApplyEnvVars strategy for all config components
Get jsonConfig from Config, apply env vars to it, load jsonConfig
back into Config.

License: MIT
Signed-off-by: Robert Ignat <robert.ignat91@gmail.com>
2019-02-15 19:07:20 +02:00
Hector Sanjuan
10d6a37304 Update deps: fix the things that need fixing
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2019-02-15 12:40:53 +00:00
Robert Ignat
032f02802f Implement ApplyEnvVars for all ComponentConfigs
License: MIT
Signed-off-by: Robert Ignat <robert.ignat91@gmail.com>
2019-02-08 23:57:16 +02:00
Robert Ignat
ed30ac1ab4 Add ApplyEnvVars() to ComponentConfig interface
* cluster and restapi configs can also get values from environment variables
* other config components don't read any values from the environment

License: MIT
Signed-off-by: Robert Ignat <robert.ignat91@gmail.com>
2019-02-07 20:51:20 +02:00
Adrian Lanzafame
3b3f786d68
add opencensus tracing and metrics
This commit adds support for OpenCensus tracing
and metrics collection. This required support for
context.Context propogation throughout the cluster
codebase, and in particular, the ipfscluster component
interfaces.

The tracing propogates across RPC and HTTP boundaries.
The current default tracing backend is Jaeger.

The metrics currently exports the metrics exposed by
the opencensus http plugin as well as the pprof metrics
to a prometheus endpoint for scraping.
The current default metrics backend is Prometheus.

Metrics are currently exposed by default due to low
overhead, can be turned off if desired, whereas tracing
is off by default as it has a much higher performance
overhead, though the extent of the performance hit can be
adjusted with smaller sampling rates.

License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2019-02-04 18:53:21 +10:00
Adrian Lanzafame
4f194f52d3
use DecodeCid in log_op
License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2018-10-30 21:07:27 +10:00
Adrian Lanzafame
91358e1ed1
only call ToPin when absolutely required
License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2018-10-30 21:07:26 +10:00
Hector Sanjuan
d63a5e2667 Fix race on ApplyTo
The FSM tries to decode an operation on top of the
*LogOp. We might still be using the *LogOp.Cid.Allocations
slice. We need to make a deep of *LogOp.Cid before
returning from ApplyTo.

This one was tricky...

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-10-29 12:03:47 +01:00
Hector Sanjuan
9330ac82e2 Fix tests with latest libp2p
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-10-25 14:55:01 +02:00
Hector Sanjuan
7d16108751 Start using libp2p/go-libp2p-gorpc
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-10-17 15:28:03 +02:00