Commit Graph

70 Commits

Author SHA1 Message Date
Hector Sanjuan
508791b547 Migrate from ipfs/ipfs-cluster to ipfs-cluster/ipfs-cluster
This performs the necessary renamings.
2022-06-16 17:43:30 +02:00
Hector Sanjuan
116cef5cfc allocator: Filter out metrics from non-trusted peers when configured
Adds pin_only_on_cluster_peers configuration option
2022-03-01 11:25:16 +01:00
Hector Sanjuan
2c204968b8
Merge pull request #1559 from ipfs/fix/repo-stat-hammering
Fix: repo/stat gets hammered on busy cluster peers
2022-02-01 14:36:27 +01:00
Hector Sanjuan
acde3f16d0 Fix: repo/stat gets hammered on busy cluster peers
Given that every pin and block/put writes something to IPFS and thus increases
the repo size, a while ago we added a check to let the IPFS connector directly
trigger the sending of metrics every 10 of such requests. This was meant to
update the metrics more often so that balancing happened more granularly
(particularly the freespace one).

In practice, on a cluster that receives several hundreds of pin/adds
operations in a few seconds, this is just bad.

So:

* We disable by default the whole thing.
* We add a new InformerTriggerInterval configuration option to enable the thing.
* Fix a bug that made this always call the first informer, which may not
  have been the freespace one).
2022-02-01 01:34:17 +01:00
Hector Sanjuan
5e89c0ba41 Pintracker: set Name in operation tracker. Fixes #1212. 2022-01-31 21:04:11 +01:00
Hector Sanjuan
809b7fbda5 Pintracker: add IPFS ID to Pin Information
Fixes #1554
Fixes: peer names unset for remote peers

This adds an IPFS field to pin status information (PinInfoShort).

It has not been easy to add this, given that the IPFS ID is something that
comes from outside of cluster (unlike the peer name). After several tries I
have settled in the following things:

- Use the ping metric to send out peer names and IPFS IDs to the peers in the
  cluster.
- Cache the latest known IPFS ID (if IPFS dies we should still be setting
  the ID).
- Provide an RPC method for the Pintracker to obtain IPFS ID from the cache.
- Given we now know information for peernames and IPFS IDs from other peers,
  we can use that information even if the requests to them error or we are not
  contacting (i.e. peers allocated as remote are not queried for status). We can
  use the information from the last received ping metric.
- This means we should keep metrics around even if peers go away, at least for
  a while rather than deleting them as soon as we detect that they have expired.

Puting it all together we now have a system to gossip peer information around on top
of the ping metrics.
2022-01-31 17:53:09 +01:00
Hector Sanjuan
96db605c50
Merge pull request #1468 from ipfs/fix/159-improved-allocators
Add tags informer and enable partition-based peer allocations for intelligent distribution
2021-10-06 14:35:16 +02:00
Hector Sanjuan
26e229df94 Rename allocator/metrics to allocator/balanced 2021-10-06 11:26:38 +02:00
Hector Sanjuan
63972f2b2e API: Refactor REST API. Extract all functionality.
This is a preparatory PR to add additional APIs (Pinning Service API) easily
to cluster.

Instead of copy-pasting most of what the REST API does, I have refactored so
that the whole configuration, routing and request-handling utilities can be
re-used.

The worst part has been to divide the test between tests that test core
(common.API) functionality and tests that test specific REST API endpoint
functionality. I could not get away without an additional common/test package
to provide functions that are used from both places. This is a side effect of
testing both http and libp2p endpoints for every request etc.
2021-09-16 15:52:25 +02:00
Hector Sanjuan
b6a46cd8a4 allocator: rework the whole allocator system
The new "metrics" allocator is about to partition metrics and distribe
allocations among the partitions.

For example: given a region, an availability zone and free space on disk, the
allocator would be able to choose allocations by distributing among regions
and availability zones as much as possible, and for those peers in the same
region/az, selecting those with most free space first.

This requires a major overhaul of the allocator component.
2021-09-13 12:24:00 +02:00
Hector Sanjuan
4ac2cf3eb0 Fix #1320: Add automatic GC to Badger datastore
This takes advantage of go-ds-badger "auto-gc" feature.

It will run a GC cycle made of multiple GC rounds (until it cannot GC more)
automatically. The behaviour is enabled by default in the configuration and
can be disabled by setting "gc_interval" to "0m". Hopefully this prevents
badger datastores from growing crazy.
2021-06-28 21:51:29 +02:00
Hector Sanjuan
099e23cbd1 Run tests also using leveldb backend 2021-06-11 18:43:54 +02:00
Hector Sanjuan
09d933fde1 pintracker: take care of tests
Simplify the tests, remove things that are not used at all, align the
behaviour of the mocks, add methods to test the correct behaviour of Status
etc.
2019-12-13 12:03:01 +01:00
Kishan Sagathiya
5258a4d428 Remove map pintracker (#944)
This removes mappintracker and sets stateless tracker as the default (and only) pintracker component.

Because the stateless tracker matches the cluster state with only ongoing operations being kept on memory, and additional information provided by ipfs-pin-ls, syncing operations are not necessary. Therefore the Sync/SyncAll operations are removed cluster-wide.
2019-12-12 21:22:54 +01:00
Hector Sanjuan
7b4d647267 Tests: multiple fixes, increase timings 2019-11-08 12:46:11 +01:00
Kishan Sagathiya
295915272b Tests: multiple fixes to tests reliability (#943)
This makes a number of fixes to improve the reliability of tests.
2019-10-31 21:51:13 +01:00
Hector Sanjuan
33f111c44d mDNS: attach mDNS inside the Cluster. Allow interval configuration.
Setting up mDNS outside the Cluster is dirtier and allows less configuration.

This adds MDNSInterval to the cluster config options and allow disabling it
when the option is set to 0.
2019-08-24 17:24:18 +02:00
Hector Sanjuan
28ae394fa9 Fix #883: Tweak timeouts for better tests 2019-08-13 19:44:48 +02:00
Hector Sanjuan
b4f6fe284d Remove all references to pin_method 2019-08-13 16:16:45 +02:00
Hector Sanjuan
b349aacc83 crdt: Allow to configure CRDT in "TrustAll" mode
Specifying "*" as part of "trusted_peers" in the configuration will
result in trusting all peers.

This is useful for private clusters where we don't want to list every
peer ID in the config.
2019-06-10 13:35:25 +02:00
Hector Sanjuan
ba5e423f58 Feat: introduce a ConnectionManager for the libp2p host
As follow up to #787, this uses the default libp2p connection manager for the
cluster libp2p host. The connection manager settings can be set in the main
configuration section (but it should be compatible with previous
configurations which have it unset).

This PR is just introducing the connection manager. Peer connection
protection etc will come in additional PRs.
2019-05-23 00:34:47 +02:00
Hector Sanjuan
8e6eefb714 Tests: multiple fixes
This fixes multiple issues in and around tests while
increasing ttls and delays in 100ms. Multiple issues, including
races, tests not running with consensus-crdt missing log messages
and better initialization have been fixed.

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2019-05-20 23:45:04 +02:00
Hector Sanjuan
d51c2a0377 Merge branch 'master' into feat/monitor-ring 2019-05-16 15:46:30 +02:00
Hector Sanjuan
7a66fc3484
Merge pull request #775 from ipfs/feat/rpc-auth
Feat: RPC Authorization
2019-05-16 11:08:13 +01:00
Hector Sanjuan
949e6f2364 RPC auth: Support Trusted Peers in CRDT consensus component.
TrustedPeers are specified in the configuration. Additional peers
can be added at runtime with Trust/Distrust functions.

Unfortunately we cannot use consensus.PeerAdd as a way to trust a peer as
cluster.PeerAdd+Join can be called by any peer and this calls
consensus.PeerAdd.

The result is consensus.PeerAdd doing a lot in Raft while consensus.Trust does
nothing, while in CRDTs consensus.Trust does something but consensus.PeerAdd
does nothing. But this is more or less consistent.
2019-05-09 19:48:40 +02:00
Kishan Mohanbhai Sagathiya
f05af75abc Tests for identity separation
Added tests for identity.go and modifies others according ly

License: MIT
Signed-off-by: Kishan Mohanbhai Sagathiya <kishansagathiya@gmail.com>
2019-05-08 21:54:59 +05:30
Adrian Lanzafame
4f0e3c85ff
fix threshold test config value
add full tracing config

License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2019-05-07 19:06:14 +10:00
Adrian Lanzafame
eae4329cb3
address pr feedback
License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2019-04-18 16:18:19 +10:00
Adrian Lanzafame
31af640e33
use allocations list to choose peer to repin
License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2019-04-18 16:16:40 +10:00
Hector Sanjuan
acbd7fda60 Consensus: add new "crdt" consensus component
This adds a new "crdt" consensus component using go-ds-crdt.

This implies several refactors to fully make cluster consensus-component
independent:

* Delete mapstate and fully adopt dsstate (after people have migrated).
* Return errors from state methods rather than ignoring them.
* Add a new "datastore" modules so that we can configure datastores in the
   main configuration like other components.
* Let the consensus components fully define the "state.State". Thus, they do
not receive the state, they receive the storage where we put the state (a
go-datastore).
* Allow to customize how the monitor component obtains Peers() (the current
  peerset), including avoiding using the current peerset. At the moment the
  crdt consensus uses the monitoring component to define the current peerset.
  Therefore the monitor component cannot rely on the consensus component to
  produce a peerset.
* Re-factor/re-implementation of "ipfs-cluster-service state"
  operations. Includes the dissapearance of the "migrate" one.

The CRDT consensus component defines creates a crdt-datastore (with ipfs-lite)
and uses it to intitialize a dssate. Thus the crdt-store is elegantly
wrapped. Any modifications to the state get automatically replicated to other
peers. We store all the CRDT DAG blocks in the local datastore.

The consensus components only expose a ReadOnly state, as any modifications to
the shared state should happen through them.

DHT and PubSub facilities must now be created outside of Cluster and passed in
so they can be re-used by different components.
2019-04-17 19:14:26 +02:00
Kishan Sagathiya
962d249e74
Remove basic monitor (#726)
Remove basic monitor

This commit removes `basic` monitor component, because it is not being
used by default since few releases ago pubsub monitor was introduced.

Issue #689
2019-03-21 22:48:40 +05:30
Adrian Lanzafame
3b3f786d68
add opencensus tracing and metrics
This commit adds support for OpenCensus tracing
and metrics collection. This required support for
context.Context propogation throughout the cluster
codebase, and in particular, the ipfscluster component
interfaces.

The tracing propogates across RPC and HTTP boundaries.
The current default tracing backend is Jaeger.

The metrics currently exports the metrics exposed by
the opencensus http plugin as well as the pprof metrics
to a prometheus endpoint for scraping.
The current default metrics backend is Prometheus.

Metrics are currently exposed by default due to low
overhead, can be turned off if desired, whereas tracing
is off by default as it has a much higher performance
overhead, though the extent of the performance hit can be
adjusted with smaller sampling rates.

License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2019-02-04 18:53:21 +10:00
Hector Sanjuan
9b2f6f7522 Remove proxy_ prefix from testing configurations
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-12-17 12:34:04 +01:00
Kishan Sagathiya
aef68f4101 Issue #453 Extract the IPFS Proxy from ipfshttp
Fixed breaking tests

License: MIT
Signed-off-by: Kishan Mohanbhai Sagathiya <kishansagathiya@gmail.com>
2018-11-11 16:23:36 +05:30
Hector Sanjuan
322e87dd59 Restapi: Add configurable response headers
By default, CORS headers allowing GET requests from everywhere are
set. This should facilitate the IPFS Web UI integration with the
Cluster API.

This commit refactors the sendResponse methods in the API, merging
them into one as it was difficult to follow the flows that actually
send something to the client. All tests now check the presence of
the configured headers too, to make sure no route was missed.

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-10-17 11:42:25 +02:00
Adrian Lanzafame
df2753dfc6 implements a stateless pintracker
Also updates to the optracker to make retrieving information easier.

License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2018-08-14 13:33:36 +02:00
Hector Sanjuan
0151f5e312 Fixes for adding: set default timeouts to 0. Improve flags and param names.
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-08-08 21:11:26 +02:00
Hector Sanjuan
65dc17a78b testfixing
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-08-07 20:12:05 +02:00
Wyatt Daviau
9f74f6f47d Addressing third round of comments
License: MIT
Signed-off-by: Wyatt Daviau <wdaviau@cs.stanford.edu>
2018-08-07 20:11:23 +02:00
Wyatt Daviau
edb38b2830 fixing make check errors
License: MIT
Signed-off-by: Wyatt Daviau <wdaviau@cs.stanford.edu>
2018-08-07 20:11:23 +02:00
Hector Sanjuan
5ca8ca39eb Monitor/tests: Allow to run tests using the basic monitor.
Do it in additional stage in Travis.

Also, test fixes.

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-05-09 11:39:21 +02:00
Hector Sanjuan
bb8c20b2fb Enable pubsubmon in cluster e2e tests
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-05-07 18:47:05 +02:00
Hector Sanjuan
029cd77c27
Merge pull request #398 from ipfs/feat/promote-consensus
Emancipate the consensus component
2018-05-07 08:29:14 +02:00
Hector Sanjuan
33d9cdd3c4 Feat: emancipate Consensus from the Cluster component
This commit promotes the Consensus component (and Raft) to become a fully
independent thing like other components, passed to NewCluster during
initialization. Cluster (main component) no longer creates the consensus
layer internally. This has triggered a number of breaking changes
that I will explain below.

Motivation: Future work will require the possibility of running Cluster
with a consensus layer that is not Raft. The "consensus" layer is in charge
of maintaining two things:
  * The current cluster peerset, as required by the implementation
  * The current cluster pinset (shared state)

While the pinset maintenance has always been in the consensus layer, the
peerset maintenance was handled by the main component (starting by the "peers"
key in the configuration) AND the Raft component (internally)
and this generated lots of confusion: if the user edited the peers in the
configuration they would be greeted with an error.

The bootstrap process (adding a peer to an existing cluster) and configuration
key also complicated many things, since the main component did it, but only
when the consensus was initialized and in single peer mode.

In all this we also mixed the peerstore (list of peer addresses in the libp2p
host) with the peerset, when they need not to be linked.

By initializing the consensus layer before calling NewCluster, all the
difficulties in maintaining the current implementation in the same way
have come to light. Thus, the following changes have been introduced:

* Remove "peers" and "bootstrap" keys from the configuration: we no longer
edit or save the configuration files. This was a very bad practice, requiring
write permissions by the process to the file containing the private key and
additionally made things like Puppet deployments of cluster difficult as
configuration would mutate from its initial version. Needless to say all the
maintenance associated to making sure peers and bootstrap had correct values
when peers are bootstrapped or removed. A loud and detailed error message has
been added when staring cluster with an old config, along with instructions on
how to move forward.

* Introduce a PeerstoreFile ("peerstore") which stores peer addresses: in
ipfs, the peerstore is not persisted because it can be re-built from the
network bootstrappers and the DHT. Cluster should probably also allow
discoverability of peers addresses (when not bootstrapping, as in that case
we have it), but in the meantime, we will read and persist the peerstore
addresses for cluster peers in this file, different from the configuration.
Note that dns multiaddresses are now fully supported and no IPs are saved
when we have DNS multiaddresses for a peer.

* The former "peer_manager" code is now a pstoremgr module, providing utilities
to parse, add, list and generally maintain the libp2p host peerstore, including
operations on the PeerstoreFile. This "pstoremgr" can now also be extended to
perform address autodiscovery and other things indepedently from Cluster.

* Create and initialize Raft outside of the main Cluster component: since we
can now launch Raft independently from Cluster, we have more degrees of
freedom. A new "staging" option when creating the object allows a raft peer to
be launched in Staging mode, waiting to be added to a running consensus, and
thus, not electing itself as leader or doing anything like we were doing
before. This additionally allows us to track when the peer has become a
Voter, which only happens when it's caught up with the state, something that
was wonky previously.

* The raft configuration now includes an InitPeerset key, which allows to
provide a peerset for new peers and which is ignored when staging==true. The
whole Raft initialization code is way cleaner and stronger now.

* Cluster peer bootsrapping is now an ipfs-cluster-service feature. The
--bootstrap flag works as before (additionally allowing comma-separated-list
of entries). What bootstrap does, is to initialize Raft with staging == true,
and then call Join in the main cluster component. Only when the Raft peer
transitions to Voter, consensus becomes ready, and cluster becomes Ready.
This is cleaner, works better and is less complex than before (supporting
both flags and config values). We also backup and clean the state whenever
we are boostrapping, automatically

* ipfs-cluster-service no longer runs the daemon. Starting cluster needs
now "ipfs-cluster-service daemon". The daemon specific flags (bootstrap,
alloc) are now flags for the daemon subcommand. Here we mimic ipfs ("ipfs"
does not start the daemon but print help) and pave the path for merging both
service and ctl in the future.

While this brings some breaking changes, it significantly reduces the
complexity of the configuration, the code and most importantly, the
documentation. It should be easier now to explain the user what is the
right way to launch a cluster peer, and more difficult to make mistakes.

As a side effect, the PR also:

* Fixes #381 - peers with dynamic addresses
* Fixes #371 - peers should be Raft configuration option
* Fixes #378 - waitForUpdates may return before state fully synced
* Fixes #235 - config option shadowing (no cfg saves, no need to shadow)

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-05-07 07:39:41 +02:00
Hector Sanjuan
9856bcdb94 Pintracker: remove timeouts
Pinning/unpinning timeouts are controlled by the ipfs connector component.

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-05-02 15:24:26 +02:00
Sina Mahmoodi
0954c6d6fa Add disable_repinning cluster option
License: MIT
Signed-off-by: Sina Mahmoodi <itz.s1na@gmail.com>
2018-04-22 18:40:46 +02:00
Hector Sanjuan
0069c0062f Fix metric expire type. Do not discard metrics in Allocate().
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-04-05 17:57:24 +02:00
Hector Sanjuan
dd4128affc Fix #339: Reduce Sleeps in tests
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-04-05 16:49:26 +02:00
Hector Sanjuan
3b715041ac RestAPI: support libp2p host parameters in configuration
This adds support for parameters to create a libp2p host
in the REST API configuration: ID, PrivateKey and ListenMultiaddr.

These parameters default to nil/empty and are ommited in the default
configuration. They are only supposed to be used when the user wants
the REST API to use a different libp2p host than a provided one (upcoming
changes).

Pnet protector not supported yet in this case. Underlying basic auth
should cover that front. Will implement if someone has a usecase.

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-03-15 00:04:54 +01:00
Hector Sanjuan
f469966d02 Fix tests with bad json
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-03-09 17:30:06 +01:00