Commit Graph

57 Commits

Author SHA1 Message Date
Kishan Sagathiya
5258a4d428 Remove map pintracker (#944)
This removes mappintracker and sets stateless tracker as the default (and only) pintracker component.

Because the stateless tracker matches the cluster state with only ongoing operations being kept on memory, and additional information provided by ipfs-pin-ls, syncing operations are not necessary. Therefore the Sync/SyncAll operations are removed cluster-wide.
2019-12-12 21:22:54 +01:00
Hector Sanjuan
7b4d647267 Tests: multiple fixes, increase timings 2019-11-08 12:46:11 +01:00
Kishan Sagathiya
295915272b Tests: multiple fixes to tests reliability (#943)
This makes a number of fixes to improve the reliability of tests.
2019-10-31 21:51:13 +01:00
Hector Sanjuan
33f111c44d mDNS: attach mDNS inside the Cluster. Allow interval configuration.
Setting up mDNS outside the Cluster is dirtier and allows less configuration.

This adds MDNSInterval to the cluster config options and allow disabling it
when the option is set to 0.
2019-08-24 17:24:18 +02:00
Hector Sanjuan
28ae394fa9 Fix #883: Tweak timeouts for better tests 2019-08-13 19:44:48 +02:00
Hector Sanjuan
b4f6fe284d Remove all references to pin_method 2019-08-13 16:16:45 +02:00
Hector Sanjuan
b349aacc83 crdt: Allow to configure CRDT in "TrustAll" mode
Specifying "*" as part of "trusted_peers" in the configuration will
result in trusting all peers.

This is useful for private clusters where we don't want to list every
peer ID in the config.
2019-06-10 13:35:25 +02:00
Hector Sanjuan
ba5e423f58 Feat: introduce a ConnectionManager for the libp2p host
As follow up to #787, this uses the default libp2p connection manager for the
cluster libp2p host. The connection manager settings can be set in the main
configuration section (but it should be compatible with previous
configurations which have it unset).

This PR is just introducing the connection manager. Peer connection
protection etc will come in additional PRs.
2019-05-23 00:34:47 +02:00
Hector Sanjuan
8e6eefb714 Tests: multiple fixes
This fixes multiple issues in and around tests while
increasing ttls and delays in 100ms. Multiple issues, including
races, tests not running with consensus-crdt missing log messages
and better initialization have been fixed.

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2019-05-20 23:45:04 +02:00
Hector Sanjuan
d51c2a0377 Merge branch 'master' into feat/monitor-ring 2019-05-16 15:46:30 +02:00
Hector Sanjuan
7a66fc3484
Merge pull request #775 from ipfs/feat/rpc-auth
Feat: RPC Authorization
2019-05-16 11:08:13 +01:00
Hector Sanjuan
949e6f2364 RPC auth: Support Trusted Peers in CRDT consensus component.
TrustedPeers are specified in the configuration. Additional peers
can be added at runtime with Trust/Distrust functions.

Unfortunately we cannot use consensus.PeerAdd as a way to trust a peer as
cluster.PeerAdd+Join can be called by any peer and this calls
consensus.PeerAdd.

The result is consensus.PeerAdd doing a lot in Raft while consensus.Trust does
nothing, while in CRDTs consensus.Trust does something but consensus.PeerAdd
does nothing. But this is more or less consistent.
2019-05-09 19:48:40 +02:00
Kishan Mohanbhai Sagathiya
f05af75abc Tests for identity separation
Added tests for identity.go and modifies others according ly

License: MIT
Signed-off-by: Kishan Mohanbhai Sagathiya <kishansagathiya@gmail.com>
2019-05-08 21:54:59 +05:30
Adrian Lanzafame
4f0e3c85ff
fix threshold test config value
add full tracing config

License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2019-05-07 19:06:14 +10:00
Adrian Lanzafame
eae4329cb3
address pr feedback
License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2019-04-18 16:18:19 +10:00
Adrian Lanzafame
31af640e33
use allocations list to choose peer to repin
License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2019-04-18 16:16:40 +10:00
Hector Sanjuan
acbd7fda60 Consensus: add new "crdt" consensus component
This adds a new "crdt" consensus component using go-ds-crdt.

This implies several refactors to fully make cluster consensus-component
independent:

* Delete mapstate and fully adopt dsstate (after people have migrated).
* Return errors from state methods rather than ignoring them.
* Add a new "datastore" modules so that we can configure datastores in the
   main configuration like other components.
* Let the consensus components fully define the "state.State". Thus, they do
not receive the state, they receive the storage where we put the state (a
go-datastore).
* Allow to customize how the monitor component obtains Peers() (the current
  peerset), including avoiding using the current peerset. At the moment the
  crdt consensus uses the monitoring component to define the current peerset.
  Therefore the monitor component cannot rely on the consensus component to
  produce a peerset.
* Re-factor/re-implementation of "ipfs-cluster-service state"
  operations. Includes the dissapearance of the "migrate" one.

The CRDT consensus component defines creates a crdt-datastore (with ipfs-lite)
and uses it to intitialize a dssate. Thus the crdt-store is elegantly
wrapped. Any modifications to the state get automatically replicated to other
peers. We store all the CRDT DAG blocks in the local datastore.

The consensus components only expose a ReadOnly state, as any modifications to
the shared state should happen through them.

DHT and PubSub facilities must now be created outside of Cluster and passed in
so they can be re-used by different components.
2019-04-17 19:14:26 +02:00
Kishan Sagathiya
962d249e74
Remove basic monitor (#726)
Remove basic monitor

This commit removes `basic` monitor component, because it is not being
used by default since few releases ago pubsub monitor was introduced.

Issue #689
2019-03-21 22:48:40 +05:30
Adrian Lanzafame
3b3f786d68
add opencensus tracing and metrics
This commit adds support for OpenCensus tracing
and metrics collection. This required support for
context.Context propogation throughout the cluster
codebase, and in particular, the ipfscluster component
interfaces.

The tracing propogates across RPC and HTTP boundaries.
The current default tracing backend is Jaeger.

The metrics currently exports the metrics exposed by
the opencensus http plugin as well as the pprof metrics
to a prometheus endpoint for scraping.
The current default metrics backend is Prometheus.

Metrics are currently exposed by default due to low
overhead, can be turned off if desired, whereas tracing
is off by default as it has a much higher performance
overhead, though the extent of the performance hit can be
adjusted with smaller sampling rates.

License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2019-02-04 18:53:21 +10:00
Hector Sanjuan
9b2f6f7522 Remove proxy_ prefix from testing configurations
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-12-17 12:34:04 +01:00
Kishan Sagathiya
aef68f4101 Issue #453 Extract the IPFS Proxy from ipfshttp
Fixed breaking tests

License: MIT
Signed-off-by: Kishan Mohanbhai Sagathiya <kishansagathiya@gmail.com>
2018-11-11 16:23:36 +05:30
Hector Sanjuan
322e87dd59 Restapi: Add configurable response headers
By default, CORS headers allowing GET requests from everywhere are
set. This should facilitate the IPFS Web UI integration with the
Cluster API.

This commit refactors the sendResponse methods in the API, merging
them into one as it was difficult to follow the flows that actually
send something to the client. All tests now check the presence of
the configured headers too, to make sure no route was missed.

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-10-17 11:42:25 +02:00
Adrian Lanzafame
df2753dfc6 implements a stateless pintracker
Also updates to the optracker to make retrieving information easier.

License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2018-08-14 13:33:36 +02:00
Hector Sanjuan
0151f5e312 Fixes for adding: set default timeouts to 0. Improve flags and param names.
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-08-08 21:11:26 +02:00
Hector Sanjuan
65dc17a78b testfixing
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-08-07 20:12:05 +02:00
Wyatt Daviau
9f74f6f47d Addressing third round of comments
License: MIT
Signed-off-by: Wyatt Daviau <wdaviau@cs.stanford.edu>
2018-08-07 20:11:23 +02:00
Wyatt Daviau
edb38b2830 fixing make check errors
License: MIT
Signed-off-by: Wyatt Daviau <wdaviau@cs.stanford.edu>
2018-08-07 20:11:23 +02:00
Hector Sanjuan
5ca8ca39eb Monitor/tests: Allow to run tests using the basic monitor.
Do it in additional stage in Travis.

Also, test fixes.

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-05-09 11:39:21 +02:00
Hector Sanjuan
bb8c20b2fb Enable pubsubmon in cluster e2e tests
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-05-07 18:47:05 +02:00
Hector Sanjuan
029cd77c27
Merge pull request #398 from ipfs/feat/promote-consensus
Emancipate the consensus component
2018-05-07 08:29:14 +02:00
Hector Sanjuan
33d9cdd3c4 Feat: emancipate Consensus from the Cluster component
This commit promotes the Consensus component (and Raft) to become a fully
independent thing like other components, passed to NewCluster during
initialization. Cluster (main component) no longer creates the consensus
layer internally. This has triggered a number of breaking changes
that I will explain below.

Motivation: Future work will require the possibility of running Cluster
with a consensus layer that is not Raft. The "consensus" layer is in charge
of maintaining two things:
  * The current cluster peerset, as required by the implementation
  * The current cluster pinset (shared state)

While the pinset maintenance has always been in the consensus layer, the
peerset maintenance was handled by the main component (starting by the "peers"
key in the configuration) AND the Raft component (internally)
and this generated lots of confusion: if the user edited the peers in the
configuration they would be greeted with an error.

The bootstrap process (adding a peer to an existing cluster) and configuration
key also complicated many things, since the main component did it, but only
when the consensus was initialized and in single peer mode.

In all this we also mixed the peerstore (list of peer addresses in the libp2p
host) with the peerset, when they need not to be linked.

By initializing the consensus layer before calling NewCluster, all the
difficulties in maintaining the current implementation in the same way
have come to light. Thus, the following changes have been introduced:

* Remove "peers" and "bootstrap" keys from the configuration: we no longer
edit or save the configuration files. This was a very bad practice, requiring
write permissions by the process to the file containing the private key and
additionally made things like Puppet deployments of cluster difficult as
configuration would mutate from its initial version. Needless to say all the
maintenance associated to making sure peers and bootstrap had correct values
when peers are bootstrapped or removed. A loud and detailed error message has
been added when staring cluster with an old config, along with instructions on
how to move forward.

* Introduce a PeerstoreFile ("peerstore") which stores peer addresses: in
ipfs, the peerstore is not persisted because it can be re-built from the
network bootstrappers and the DHT. Cluster should probably also allow
discoverability of peers addresses (when not bootstrapping, as in that case
we have it), but in the meantime, we will read and persist the peerstore
addresses for cluster peers in this file, different from the configuration.
Note that dns multiaddresses are now fully supported and no IPs are saved
when we have DNS multiaddresses for a peer.

* The former "peer_manager" code is now a pstoremgr module, providing utilities
to parse, add, list and generally maintain the libp2p host peerstore, including
operations on the PeerstoreFile. This "pstoremgr" can now also be extended to
perform address autodiscovery and other things indepedently from Cluster.

* Create and initialize Raft outside of the main Cluster component: since we
can now launch Raft independently from Cluster, we have more degrees of
freedom. A new "staging" option when creating the object allows a raft peer to
be launched in Staging mode, waiting to be added to a running consensus, and
thus, not electing itself as leader or doing anything like we were doing
before. This additionally allows us to track when the peer has become a
Voter, which only happens when it's caught up with the state, something that
was wonky previously.

* The raft configuration now includes an InitPeerset key, which allows to
provide a peerset for new peers and which is ignored when staging==true. The
whole Raft initialization code is way cleaner and stronger now.

* Cluster peer bootsrapping is now an ipfs-cluster-service feature. The
--bootstrap flag works as before (additionally allowing comma-separated-list
of entries). What bootstrap does, is to initialize Raft with staging == true,
and then call Join in the main cluster component. Only when the Raft peer
transitions to Voter, consensus becomes ready, and cluster becomes Ready.
This is cleaner, works better and is less complex than before (supporting
both flags and config values). We also backup and clean the state whenever
we are boostrapping, automatically

* ipfs-cluster-service no longer runs the daemon. Starting cluster needs
now "ipfs-cluster-service daemon". The daemon specific flags (bootstrap,
alloc) are now flags for the daemon subcommand. Here we mimic ipfs ("ipfs"
does not start the daemon but print help) and pave the path for merging both
service and ctl in the future.

While this brings some breaking changes, it significantly reduces the
complexity of the configuration, the code and most importantly, the
documentation. It should be easier now to explain the user what is the
right way to launch a cluster peer, and more difficult to make mistakes.

As a side effect, the PR also:

* Fixes #381 - peers with dynamic addresses
* Fixes #371 - peers should be Raft configuration option
* Fixes #378 - waitForUpdates may return before state fully synced
* Fixes #235 - config option shadowing (no cfg saves, no need to shadow)

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-05-07 07:39:41 +02:00
Hector Sanjuan
9856bcdb94 Pintracker: remove timeouts
Pinning/unpinning timeouts are controlled by the ipfs connector component.

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-05-02 15:24:26 +02:00
Sina Mahmoodi
0954c6d6fa Add disable_repinning cluster option
License: MIT
Signed-off-by: Sina Mahmoodi <itz.s1na@gmail.com>
2018-04-22 18:40:46 +02:00
Hector Sanjuan
0069c0062f Fix metric expire type. Do not discard metrics in Allocate().
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-04-05 17:57:24 +02:00
Hector Sanjuan
dd4128affc Fix #339: Reduce Sleeps in tests
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-04-05 16:49:26 +02:00
Hector Sanjuan
3b715041ac RestAPI: support libp2p host parameters in configuration
This adds support for parameters to create a libp2p host
in the REST API configuration: ID, PrivateKey and ListenMultiaddr.

These parameters default to nil/empty and are ommited in the default
configuration. They are only supposed to be used when the user wants
the REST API to use a different libp2p host than a provided one (upcoming
changes).

Pnet protector not supported yet in this case. Underlying basic auth
should cover that front. Will implement if someone has a usecase.

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-03-15 00:04:54 +01:00
Hector Sanjuan
f469966d02 Fix tests with bad json
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-03-09 17:30:06 +01:00
Hector Sanjuan
fb4812ec79 Feat #326: Adds "refs -r" pinning method support + multiple pin workers
This fixes #326. It adds a new `pin_method` configuration option to the
`ipfshttp` component allows to configure it to perform `refs -r <cid>` before
the `pin/add` call. By fetching content before pinning, we don't have
a global lock in place, and we can have several pin-requests to
ipfs in parallel.

It also adds a `concurrent_pins` option to the pin tracker, which
launches more pin workers so it can potentially trigger more pins at
the same time. This is a minimal intervention in the pintracker as #308
is still pending.

Documentation for the configuration file has been updated.

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-03-09 15:01:29 +01:00
Wyatt
fc237b21d4 Feat: Enable Jenkins builds
This enables support for testing in jenkins.

Several minor adjustments have been performed to improve the probability
that the tests pass, but there are still some random
problems appearing with libp2p conections not becoming available or
stopping working (similar to travis, but perhaps more often).

MacOS and Windows builds are broken in worse ways (those issues will
need to be addressed in the future).

Thanks to @zenground0 and @victorbjelkholm for support!

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-01-11 18:11:46 +01:00
Hector Sanjuan
11a8926236 MapPinTracker: support configuration section
This also generates a default configuration section when it
doesn't exist, so it's backwards compatible.

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2017-11-29 14:42:50 +01:00
Hector Sanjuan
848023e381 Fix #139: Update cluster to Raft 1.0.0
The main differences is that the new version of Raft is more strict
about starting raft peers which already contain configurations.

For a start, cluster will fail to start if the configured cluster
peers are different from the Raft peers. The user will have to
manually cleanup Raft (TODO: an ipfs-cluster-service command for it).

Additionally, this commit adds extra options to the consensus/raft
configuration section, adds tests and improves existing ones and
improves certain code sections.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-11-01 12:17:33 +01:00
Hector Sanjuan
8f06baa1bf Issue #162: Rework configuration format
The following commit reimplements ipfs-cluster configuration under
the following premises:

  * Each component is initialized with a configuration object
  defined by its module
  * Each component decides how the JSON representation of its
  configuration looks like
  * Each component parses and validates its own configuration
  * Each component exposes its own defaults
  * Component configurations are make the sections of a
  central JSON configuration file (which replaces the current
  JSON format)
  * Component configurations implement a common interface
  (config.ComponentConfig) with a set of common operations
  * The central configuration file is managed by a
  config.ConfigManager which:
    * Registers ComponentConfigs
    * Assigns the correspondent sections from the JSON file to each
    component and delegates the parsing
    * Delegates the JSON generation for each section
    * Can be notified when the configuration is updated and must be
    saved to disk

The new service.json would then look as follows:

```json
{
  "cluster": {
    "id": "QmTVW8NoRxC5wBhV7WtAYtRn7itipEESfozWN5KmXUQnk2",
    "private_key": "<...>",
    "secret": "00224102ae6aaf94f2606abf69a0e278251ecc1d64815b617ff19d6d2841f786",
    "peers": [],
    "bootstrap": [],
    "leave_on_shutdown": false,
    "listen_multiaddress": "/ip4/0.0.0.0/tcp/9096",
    "state_sync_interval": "1m0s",
    "ipfs_sync_interval": "2m10s",
    "replication_factor": -1,
    "monitor_ping_interval": "15s"
  },
  "consensus": {
    "raft": {
      "heartbeat_timeout": "1s",
      "election_timeout": "1s",
      "commit_timeout": "50ms",
      "max_append_entries": 64,
      "trailing_logs": 10240,
      "snapshot_interval": "2m0s",
      "snapshot_threshold": 8192,
      "leader_lease_timeout": "500ms"
    }
  },
  "api": {
    "restapi": {
      "listen_multiaddress": "/ip4/127.0.0.1/tcp/9094",
      "read_timeout": "30s",
      "read_header_timeout": "5s",
      "write_timeout": "1m0s",
      "idle_timeout": "2m0s"
    }
  },
  "ipfs_connector": {
    "ipfshttp": {
      "proxy_listen_multiaddress": "/ip4/127.0.0.1/tcp/9095",
      "node_multiaddress": "/ip4/127.0.0.1/tcp/5001",
      "connect_swarms_delay": "7s",
      "proxy_read_timeout": "10m0s",
      "proxy_read_header_timeout": "5s",
      "proxy_write_timeout": "10m0s",
      "proxy_idle_timeout": "1m0s"
    }
  },
  "monitor": {
    "monbasic": {
      "check_interval": "15s"
    }
  },
  "informer": {
    "disk": {
      "metric_ttl": "30s",
      "metric_type": "freespace"
    },
    "numpin": {
      "metric_ttl": "10s"
    }
  }
}
```

This new format aims to be easily extensible per component. As such,
it already surfaces quite a few new options which were hardcoded
before.

Additionally, since Go API have changed, some redundant methods have been
removed and small refactoring has happened to take advantage of the new
way.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-10-18 00:00:12 +02:00
dgrisham
90d1e97a8e Cluster secret: Docs, error handling, internal key mgmt. 2017-07-13 11:17:30 -06:00
dgrisham
98335901fc Refactored private network implementation + config. 2017-07-08 11:11:49 -06:00
dgrisham
59fde30e1e swarm secret implementation started 2017-07-03 14:00:01 -06:00
dgrisham
1d90130f65 initial tests passing 2017-06-29 18:59:36 -06:00
Hector Sanjuan
2d8cd236b5 Fix #75: Overriding options should not make them permanent in config
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-03-28 19:10:50 +02:00
Hector Sanjuan
2bbbea79cc Issue #49: Add disk informer
The disk informer uses "ipfs repo stat" to fetch the RepoSize value and
uses it as a metric.

The numpinalloc allocator is now a generalized ascendalloc which
sorts metrics in ascending order and return the ones with lowest
values.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-03-27 20:40:49 +02:00
Hector Sanjuan
6ee0f3bead Issue #45: Detect expired metrics and trigger re-pins
An initial, simple approach to this. The PeerMonitor will
check it's metrics, compare to the current set of peers and put
an alert in the alerts channel if the metrics for a peer have expired.

Cluster reads this channel looking for "ping" alerts. The leader
is in charge of triggering repins in all the Cids allocated to
a given peer.

Also, metrics are now broadcasted to the cluster instead of pushed only
to the leader. Since they happen every few seconds it should be okay
regarding how it scales. Main problem was that if the leader is the node
going down, the new leader will not now about it as it doesn't have any
metrics for it, so it won't trigger an alert. If it acted on that then
the component needs to know it is the leader, or cluster needs to
handle alerts in complicated ways when leadership changes. Detecting
leadership changes or letting a component know who is the leader is another
dependency from the consensus algorithm that should be avoided. Therefore
we broadcast, for the moment.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-03-02 14:59:45 +01:00
Hector Sanjuan
34fdc329fc Fix #24: Auto-join and auto-leave operations for Cluster
This is the third implementation attempt. This time, rather than
broadcasting PeerAdd/Join requests to the whole cluster, we use the
consensus log to broadcast new peers joining.

This makes it easier to recover from errors and to know who exactly
is member of a cluster and who is not. The consensus is, after all,
meant to agree on things, and the list of cluster peers is something
everyone has to agree on.

Raft itself uses a special log operation to maintain the peer set.

The tests are almost unchanged from the previous attempts so it should
be the same, except it doesn't seem possible to bootstrap a bunch of nodes
at the same time using different bootstrap nodes. It works when using
the same. I'm not sure this worked before either, but the code is
simpler than recursively contacting peers, and scales better for
larger clusters.

Nodes have to be careful about joining clusters while keeping the state
from a different cluster (disjoint logs). This may cause problems with
Raft.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-02-07 18:46:09 +01:00