Commit Graph

99 Commits

Author SHA1 Message Date
Adrian Lanzafame
e187b800cf
rename TS to ReceivedAt
License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2019-04-18 16:14:13 +10:00
Adrian Lanzafame
c4b76619c1
Add failure_threshold monitors config
License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2019-04-18 16:14:13 +10:00
Adrian Lanzafame
3d6eb64db6
Add accrual failure detection method
License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2019-04-18 16:14:13 +10:00
Adrian Lanzafame
13ed78786c
fix distribution test and general clean up
License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2019-04-18 16:09:19 +10:00
Hector Sanjuan
4e61935379
Use defer for locks. Move to Prev() in All()
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2019-04-18 16:09:19 +10:00
Hector Sanjuan
da3c543ce2
Revert "attempt copying slice"
This reverts commit 0d4d40513fccd31b9cdc4db369aa87e87c529be4.
2019-04-18 16:09:19 +10:00
Adrian Lanzafame
46d6cb155d
attempt copying slice
License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2019-04-18 16:09:19 +10:00
Adrian Lanzafame
2b1b8a4389
remove use of last
License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2019-04-18 16:09:18 +10:00
Adrian Lanzafame
ebcf40cf7d
rename TS to ReceivedAt
License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2019-04-18 16:09:18 +10:00
Hector Sanjuan
7711ab8cfd
Replace underlying slice with ring.Ring in metrics window
License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2019-04-18 16:09:18 +10:00
Hector Sanjuan
acbd7fda60 Consensus: add new "crdt" consensus component
This adds a new "crdt" consensus component using go-ds-crdt.

This implies several refactors to fully make cluster consensus-component
independent:

* Delete mapstate and fully adopt dsstate (after people have migrated).
* Return errors from state methods rather than ignoring them.
* Add a new "datastore" modules so that we can configure datastores in the
   main configuration like other components.
* Let the consensus components fully define the "state.State". Thus, they do
not receive the state, they receive the storage where we put the state (a
go-datastore).
* Allow to customize how the monitor component obtains Peers() (the current
  peerset), including avoiding using the current peerset. At the moment the
  crdt consensus uses the monitoring component to define the current peerset.
  Therefore the monitor component cannot rely on the consensus component to
  produce a peerset.
* Re-factor/re-implementation of "ipfs-cluster-service state"
  operations. Includes the dissapearance of the "migrate" one.

The CRDT consensus component defines creates a crdt-datastore (with ipfs-lite)
and uses it to intitialize a dssate. Thus the crdt-store is elegantly
wrapped. Any modifications to the state get automatically replicated to other
peers. We store all the CRDT DAG blocks in the local datastore.

The consensus components only expose a ReadOnly state, as any modifications to
the shared state should happen through them.

DHT and PubSub facilities must now be created outside of Cluster and passed in
so they can be re-used by different components.
2019-04-17 19:14:26 +02:00
Kishan Sagathiya
962d249e74
Remove basic monitor (#726)
Remove basic monitor

This commit removes `basic` monitor component, because it is not being
used by default since few releases ago pubsub monitor was introduced.

Issue #689
2019-03-21 22:48:40 +05:30
Hector Sanjuan
ea85cf7805 Rename "test.Test*" to "test.*" (test.TestCid1 -> test.Cid1)
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2019-02-27 20:19:10 +00:00
Hector Sanjuan
6447ea51d2 Remove *Serial types. Use pointers for all types.
This takes advantange of the latest features in go-cid, peer.ID and
go-multiaddr and makes the Go types serializable by default.

This means we no longer need to copy between Pin <-> PinSerial, or ID <->
IDSerial etc. We can now efficiently binary-encode these types using short
field keys and without parsing/stringifying (in many cases it just a cast).

We still get the same json output as before (with minor modifications for
Cids).

This should greatly improve Cluster performance and memory usage when dealing
with large collections of items.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2019-02-27 17:04:35 +00:00
Robert Ignat
38886dae42 Add ApplyEnvVars test to pubsubmon config
License: MIT
Signed-off-by: Robert Ignat <robert.ignat91@gmail.com>
2019-02-18 17:48:15 +02:00
Robert Ignat
40d1077af2 Add ApplyEnvVars test to monbasic config
License: MIT
Signed-off-by: Robert Ignat <robert.ignat91@gmail.com>
2019-02-18 17:47:28 +02:00
Robert Ignat
168cf76224 Change ApplyEnvVars strategy for all config components
Get jsonConfig from Config, apply env vars to it, load jsonConfig
back into Config.

License: MIT
Signed-off-by: Robert Ignat <robert.ignat91@gmail.com>
2019-02-15 19:07:20 +02:00
Robert Ignat
032f02802f Implement ApplyEnvVars for all ComponentConfigs
License: MIT
Signed-off-by: Robert Ignat <robert.ignat91@gmail.com>
2019-02-08 23:57:16 +02:00
Robert Ignat
ed30ac1ab4 Add ApplyEnvVars() to ComponentConfig interface
* cluster and restapi configs can also get values from environment variables
* other config components don't read any values from the environment

License: MIT
Signed-off-by: Robert Ignat <robert.ignat91@gmail.com>
2019-02-07 20:51:20 +02:00
Adrian Lanzafame
3b3f786d68
add opencensus tracing and metrics
This commit adds support for OpenCensus tracing
and metrics collection. This required support for
context.Context propogation throughout the cluster
codebase, and in particular, the ipfscluster component
interfaces.

The tracing propogates across RPC and HTTP boundaries.
The current default tracing backend is Jaeger.

The metrics currently exports the metrics exposed by
the opencensus http plugin as well as the pprof metrics
to a prometheus endpoint for scraping.
The current default metrics backend is Prometheus.

Metrics are currently exposed by default due to low
overhead, can be turned off if desired, whereas tracing
is off by default as it has a much higher performance
overhead, though the extent of the performance hit can be
adjusted with smaller sampling rates.

License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2019-02-04 18:53:21 +10:00
Hector Sanjuan
8a045ae2d2 Update to latest go-libp2p-pubsub (with renames)
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-10-29 17:25:52 +01:00
Hector Sanjuan
19b1124999 Make metrics human
Issue #572 exposes metrics but they carry the peer ID in binary.

This was ok with our internal codecs but it doesn't seem to work
very well with json, and makes the output format unusable.

This makes the Metric.Peer field a string.

Additinoally, fixes calling the command without arguments and displaying
the date in the right format.

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-10-26 14:11:30 +02:00
Hector Sanjuan
b0b826de39
Merge pull request #580 from ipfs/libp2p-6.0.19
Upgrade to libp2p-6.0.19. Update deps.
2018-10-18 12:07:46 +02:00
Hector Sanjuan
7d16108751 Start using libp2p/go-libp2p-gorpc
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-10-17 15:28:03 +02:00
Hector Sanjuan
fcbfc7f46a Pubsubmon: fix test by reducing gossipsub heartbeat
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-10-11 11:27:42 +02:00
Hector Sanjuan
7f60cb318c Pubsubmon: Gossipsub
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-10-10 16:00:37 +02:00
Hector Sanjuan
5ca8ca39eb Monitor/tests: Allow to run tests using the basic monitor.
Do it in additional stage in Travis.

Also, test fixes.

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-05-09 11:39:21 +02:00
Hector Sanjuan
69c47fe811 Monitor: remove safe parameter for metrics.Window
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-05-09 11:01:52 +02:00
Hector Sanjuan
e4844ca819 Monitor: address comments
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-05-09 11:01:52 +02:00
Hector Sanjuan
954ede931f Monitor: more refactoring. Rename util to metrics
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-05-09 11:01:41 +02:00
Hector Sanjuan
6f84b3bb01 Add new pubsubmon: A monitor that uses pubsub to send and receive metrics
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-05-07 18:47:05 +02:00
Hector Sanjuan
73b962f799 Basic Monitor: test Publish()
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-05-07 14:26:06 +02:00
Hector Sanjuan
a9d6fe3479 Types: rename metric.SetTTLDuration to metric.SetTTL
GetTTL returns duration. SetTTL should take duration too, not seconds.
This removes the original SetTTL method which used seconds.

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-05-07 14:26:06 +02:00
Hector Sanjuan
8f8e76ac9a Monitor: extract MetricsChecker to util module
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-05-07 14:26:06 +02:00
Hector Sanjuan
72e1d64de2 Fix publish cancelling contexts too early.
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-05-07 14:26:06 +02:00
Hector Sanjuan
3c3341e491 Monitor: add PublishMetric() to component interface
The monitor component should be in charge of deciding how it is
best to send metrics to other peers and what that means.

This adds the PublishMetric() method to the component interface
and moves that functionality from Cluster main component to the
basic monitor.

There is a behaviour change. Before, the metrics where sent only to
the leader, while the leader was the only peer to broadcast them everywhere.
Now, all peers broadcast all metrics everywhere. This is mostly
because we should not rely on the consensus layer providing a Leader(), so
we are taking the chance to remove this dependency.

Note that in any-case, pubsub monitoring should replace the
existing basic monitor. This is just paving the ground.

Additionally, in order to not duplicate the multiRPC code
in the monitor, I have moved that functionality to go-libp2p-gorpc
and added an rpcutil library to cluster which includes useful
methods to perform multiRPC requests (some of them existed in
util.go, others are new and help handling multiple contexts etc).

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-05-07 14:26:06 +02:00
Hector Sanjuan
1886782530 Feat pubsubmon: Extract MetricsWindow to utils module
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-05-07 14:24:49 +02:00
Hector Sanjuan
0069c0062f Fix metric expire type. Do not discard metrics in Allocate().
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-04-05 17:57:24 +02:00
Hector Sanjuan
dd4128affc Fix #339: Reduce Sleeps in tests
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-04-05 16:49:26 +02:00
Wyatt
fc237b21d4 Feat: Enable Jenkins builds
This enables support for testing in jenkins.

Several minor adjustments have been performed to improve the probability
that the tests pass, but there are still some random
problems appearing with libp2p conections not becoming available or
stopping working (similar to travis, but perhaps more often).

MacOS and Windows builds are broken in worse ways (those issues will
need to be addressed in the future).

Thanks to @zenground0 and @victorbjelkholm for support!

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-01-11 18:11:46 +01:00
Hector Sanjuan
b6ba6d5a1e Issue #219: Clean up peer manager. Rename Peers RPC call
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-11-14 12:26:42 +01:00
Hector Sanjuan
fb8fdb94c5 Issue #162: Improve Config.ToJSON() tests
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-10-20 10:42:41 +02:00
Hector Sanjuan
67ecd6d171 Issue #162: Add tests for basic.Config
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-10-19 20:53:18 +02:00
Hector Sanjuan
8f06baa1bf Issue #162: Rework configuration format
The following commit reimplements ipfs-cluster configuration under
the following premises:

  * Each component is initialized with a configuration object
  defined by its module
  * Each component decides how the JSON representation of its
  configuration looks like
  * Each component parses and validates its own configuration
  * Each component exposes its own defaults
  * Component configurations are make the sections of a
  central JSON configuration file (which replaces the current
  JSON format)
  * Component configurations implement a common interface
  (config.ComponentConfig) with a set of common operations
  * The central configuration file is managed by a
  config.ConfigManager which:
    * Registers ComponentConfigs
    * Assigns the correspondent sections from the JSON file to each
    component and delegates the parsing
    * Delegates the JSON generation for each section
    * Can be notified when the configuration is updated and must be
    saved to disk

The new service.json would then look as follows:

```json
{
  "cluster": {
    "id": "QmTVW8NoRxC5wBhV7WtAYtRn7itipEESfozWN5KmXUQnk2",
    "private_key": "<...>",
    "secret": "00224102ae6aaf94f2606abf69a0e278251ecc1d64815b617ff19d6d2841f786",
    "peers": [],
    "bootstrap": [],
    "leave_on_shutdown": false,
    "listen_multiaddress": "/ip4/0.0.0.0/tcp/9096",
    "state_sync_interval": "1m0s",
    "ipfs_sync_interval": "2m10s",
    "replication_factor": -1,
    "monitor_ping_interval": "15s"
  },
  "consensus": {
    "raft": {
      "heartbeat_timeout": "1s",
      "election_timeout": "1s",
      "commit_timeout": "50ms",
      "max_append_entries": 64,
      "trailing_logs": 10240,
      "snapshot_interval": "2m0s",
      "snapshot_threshold": 8192,
      "leader_lease_timeout": "500ms"
    }
  },
  "api": {
    "restapi": {
      "listen_multiaddress": "/ip4/127.0.0.1/tcp/9094",
      "read_timeout": "30s",
      "read_header_timeout": "5s",
      "write_timeout": "1m0s",
      "idle_timeout": "2m0s"
    }
  },
  "ipfs_connector": {
    "ipfshttp": {
      "proxy_listen_multiaddress": "/ip4/127.0.0.1/tcp/9095",
      "node_multiaddress": "/ip4/127.0.0.1/tcp/5001",
      "connect_swarms_delay": "7s",
      "proxy_read_timeout": "10m0s",
      "proxy_read_header_timeout": "5s",
      "proxy_write_timeout": "10m0s",
      "proxy_idle_timeout": "1m0s"
    }
  },
  "monitor": {
    "monbasic": {
      "check_interval": "15s"
    }
  },
  "informer": {
    "disk": {
      "metric_ttl": "30s",
      "metric_type": "freespace"
    },
    "numpin": {
      "metric_ttl": "10s"
    }
  }
}
```

This new format aims to be easily extensible per component. As such,
it already surfaces quite a few new options which were hardcoded
before.

Additionally, since Go API have changed, some redundant methods have been
removed and small refactoring has happened to take advantage of the new
way.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-10-18 00:00:12 +02:00
Hector Sanjuan
8d3c72b766 Fix tests: Make metric broadcasting async
When a peer is down, metric broadcasting hangs, and no more
ticks are sent for a while.
2017-07-21 23:45:33 +02:00
Hector Sanjuan
faa755f43a Re-allocate pins on peer removal
PeerRm now triggers re-pinning of all the Cids allocated to
the removed peer.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-07-05 16:38:36 +02:00
Hector Sanjuan
e2efef8469 go lint, go vet, put the Consensus component behind interface.
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-03-14 16:37:29 +01:00
Hector Sanjuan
a40e90a78d Fix: metrics with nanoseconds TTLs
It turns out they only worked in round seconds. Tests send a metric
every second, so sometimes they were expired right away.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-03-14 14:18:23 +01:00
Hector Sanjuan
c2faf48177 Issue #18: Move Consensus and PeerMonitor to its own submodules
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-03-13 18:40:35 +01:00