Commit Graph

45 Commits

Author SHA1 Message Date
Hector Sanjuan
08aea3b3aa Fix: ttl for tags should be default when not provided (now 0). 2022-01-07 14:52:35 +01:00
Hector Sanjuan
db00d651bf Balanced allocator: weight-based ordering of partitions
This fixes the issue about partitions not being picked based
on the amount of freespace available in them.

It additionally removes the metrics registry and carries information directly
in the metric.

Metrics have two additional fields: Weight and Partitionable.

Informers have been updated to make use of these fields. Partitions have
weights that equals to the weight of the metrics under them.

Older cluster versions will not set these fields. Partitionable is false by
default and weight has a GetWeight() function to convert value->weight when
unset. This provides backwards compatibility for the freespace metric.
2021-10-06 14:10:06 +02:00
Hector Sanjuan
26e229df94 Rename allocator/metrics to allocator/balanced 2021-10-06 11:26:38 +02:00
Hector Sanjuan
6b31f44351 Address most comments from PR review 2021-10-05 14:04:28 +02:00
Hector Sanjuan
cf4fb74993 service/follow: Enable new metrics allocator 2021-09-15 21:49:26 +02:00
Hector Sanjuan
1c0abde8a5 Informer: add tags informer
The tags informer produces metrics in the form tags:name/value
with the name and values defined in its configuration.
2021-09-15 20:07:37 +02:00
Hector Sanjuan
ea5e18078c Informers: GetMetric() -> GetMetrics()
Support returning multiple metrics per informer.
2021-09-15 20:07:37 +02:00
Hector Sanjuan
b6a46cd8a4 allocator: rework the whole allocator system
The new "metrics" allocator is about to partition metrics and distribe
allocations among the partitions.

For example: given a region, an availability zone and free space on disk, the
allocator would be able to choose allocations by distributing among regions
and availability zones as much as possible, and for those peers in the same
region/az, selecting those with most free space first.

This requires a major overhaul of the allocator component.
2021-09-13 12:24:00 +02:00
Hector Sanjuan
4060f4196b Revert "Informer/disk: deprecate RepoSize metric"
This reverts commit 3e54c4c695.
2021-09-10 18:53:59 +02:00
Hector Sanjuan
3e54c4c695 Informer/disk: deprecate RepoSize metric
The allocator is hardcoded to descendalloc for freespace so it is not even useful
as it would allocate to peers with largest reposize first no matter what.

We are, in any case, reworking allocators and informers etc.
2021-09-08 17:36:54 +02:00
Ian Davis
33f1334c4a
Guard access to informer rpc client with mutex
Fixes data race

==================
WARNING: DATA RACE
Read at 0x00c02be544a8 by goroutine 4718:
  github.com/ipfs/ipfs-cluster/informer/disk.(*Informer).GetMetric()
      /home/iand/wip/iand/ipfs-cluster/informer/disk/disk.go:75 +0x135
  github.com/ipfs/ipfs-cluster.(*Cluster).sendInformerMetric()
      /home/iand/wip/iand/ipfs-cluster/cluster.go:279 +0x149
  github.com/ipfs/ipfs-cluster.(*Cluster).pushInformerMetrics()
      /home/iand/wip/iand/ipfs-cluster/cluster.go:324 +0x264
  github.com/ipfs/ipfs-cluster.(*Cluster).run.func3()
      /home/iand/wip/iand/ipfs-cluster/cluster.go:583 +0xba

Previous write at 0x00c02be544a8 by goroutine 3049:
  github.com/ipfs/ipfs-cluster/informer/disk.(*Informer).Shutdown()
      /home/iand/wip/iand/ipfs-cluster/informer/disk/disk.go:65 +0x10e
  github.com/ipfs/ipfs-cluster.(*Cluster).Shutdown()
      /home/iand/wip/iand/ipfs-cluster/cluster.go:768 +0x9d7
  github.com/ipfs/ipfs-cluster.shutdownCluster()
      /home/iand/wip/iand/ipfs-cluster/ipfscluster_test.go:420 +0x64
  github.com/ipfs/ipfs-cluster.shutdownClusters()
      /home/iand/wip/iand/ipfs-cluster/ipfscluster_test.go:414 +0x74
  github.com/ipfs/ipfs-cluster.TestClustersReplicationRealloc()
      /home/iand/wip/iand/ipfs-cluster/ipfscluster_test.go:1680 +0x10c6
  testing.tRunner()
      /opt/go/src/testing/testing.go:1194 +0x202

Goroutine 4718 (running) created at:
  github.com/ipfs/ipfs-cluster.(*Cluster).run()
      /home/iand/wip/iand/ipfs-cluster/cluster.go:581 +0x16d
  github.com/ipfs/ipfs-cluster.NewCluster.func1()
      /home/iand/wip/iand/ipfs-cluster/cluster.go:208 +0xa4

Goroutine 3049 (running) created at:
  testing.(*T).Run()
      /opt/go/src/testing/testing.go:1239 +0x5d7
  testing.runTests.func1()
      /opt/go/src/testing/testing.go:1512 +0xa6
  testing.tRunner()
      /opt/go/src/testing/testing.go:1194 +0x202
  testing.runTests()
      /opt/go/src/testing/testing.go:1510 +0x612
  testing.(*M).Run()
      /opt/go/src/testing/testing.go:1418 +0x3b3
  github.com/ipfs/ipfs-cluster.TestMain()
      /home/iand/wip/iand/ipfs-cluster/ipfscluster_test.go:134 +0x7dc
  main.main()
      _testmain.go:179 +0x271
==================
--- FAIL: TestClustersReplicationRealloc (13.60s)
    ipfscluster_test.go:1641: Shutting down QmYc1fpiBd7owHgDsnr42E7XtKZCTVjjm7FLnQMozcKid3
    testing.go:1093: race detected during execution of test
2021-07-28 13:05:23 +01:00
Kishan Mohanbhai Sagathiya
ae8e74453b Fix #937: Print full working configuration at startup
Only when using debug mode

Co-authored-by: Hector Sanjuan <code@hector.link>
2020-05-15 01:33:04 +02:00
Hector Sanjuan
1880daf59d Fix #1120: Do not let the size metric underflow uint. 2020-05-12 21:17:25 +02:00
Hector Sanjuan
f83ff9b655 staticcheck: fix all staticcheck warnings in the project 2020-04-14 20:16:10 +02:00
Hector Sanjuan
b3853caf36 Dependency ugprade: changes needed
* Libp2p protectors no longer needed, use PSK directly
* Generate cluster 32-byte secret here (helper gone from pnet)
* Switch to go-log/v2 in all places
* DHT bootstrapping not needed. Adjust DHT options for tests.
* Do not rely on dissappeared CidToDsKey and DsKeyToCid functions fro dshelp.
* Disable QUIC (does not support private networks)
* Fix tests: autodiscovery started working properly
2020-03-22 14:50:25 +01:00
Kishan Mohanbhai Sagathiya
586253a2ce JSON Config object key should match JSON tags
License: MIT
Signed-off-by: Kishan Mohanbhai Sagathiya <kishansagathiya@gmail.com>
2019-07-15 17:47:35 +05:30
Hector Sanjuan
b804e61ef0 Update deps along with go-libp2p-core refactor
Lots of rewrites in imports...
2019-06-14 13:10:45 +02:00
Hector Sanjuan
d51c2a0377 Merge branch 'master' into feat/monitor-ring 2019-05-16 15:46:30 +02:00
Hector Sanjuan
3d49ac26a5 Feat: Split components into RPC Services
I had thought of this for a very long time but there were no compelling
reasons to do it. Specifying RPC endpoint permissions becomes however
significantly nicer if each Component is a different RPC Service. This also
fixes some naming issues like having to prefix methods with the component name
to separate them from methods named in the same way in some other component
(Pin and IPFSPin).
2019-05-04 21:36:10 +01:00
Adrian Lanzafame
c4b76619c1
Add failure_threshold monitors config
License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2019-04-18 16:14:13 +10:00
Hector Sanjuan
6447ea51d2 Remove *Serial types. Use pointers for all types.
This takes advantange of the latest features in go-cid, peer.ID and
go-multiaddr and makes the Go types serializable by default.

This means we no longer need to copy between Pin <-> PinSerial, or ID <->
IDSerial etc. We can now efficiently binary-encode these types using short
field keys and without parsing/stringifying (in many cases it just a cast).

We still get the same json output as before (with minor modifications for
Cids).

This should greatly improve Cluster performance and memory usage when dealing
with large collections of items.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2019-02-27 17:04:35 +00:00
Robert Ignat
e38ceab222 Add ApplyEnvVars test to numpin config
License: MIT
Signed-off-by: Robert Ignat <robert.ignat91@gmail.com>
2019-02-18 17:46:15 +02:00
Robert Ignat
08580c3d5c Add ApplyEnvVars test to disk config
License: MIT
Signed-off-by: Robert Ignat <robert.ignat91@gmail.com>
2019-02-18 17:45:42 +02:00
Robert Ignat
168cf76224 Change ApplyEnvVars strategy for all config components
Get jsonConfig from Config, apply env vars to it, load jsonConfig
back into Config.

License: MIT
Signed-off-by: Robert Ignat <robert.ignat91@gmail.com>
2019-02-15 19:07:20 +02:00
Robert Ignat
032f02802f Implement ApplyEnvVars for all ComponentConfigs
License: MIT
Signed-off-by: Robert Ignat <robert.ignat91@gmail.com>
2019-02-08 23:57:16 +02:00
Robert Ignat
ed30ac1ab4 Add ApplyEnvVars() to ComponentConfig interface
* cluster and restapi configs can also get values from environment variables
* other config components don't read any values from the environment

License: MIT
Signed-off-by: Robert Ignat <robert.ignat91@gmail.com>
2019-02-07 20:51:20 +02:00
Adrian Lanzafame
3b3f786d68
add opencensus tracing and metrics
This commit adds support for OpenCensus tracing
and metrics collection. This required support for
context.Context propogation throughout the cluster
codebase, and in particular, the ipfscluster component
interfaces.

The tracing propogates across RPC and HTTP boundaries.
The current default tracing backend is Jaeger.

The metrics currently exports the metrics exposed by
the opencensus http plugin as well as the pprof metrics
to a prometheus endpoint for scraping.
The current default metrics backend is Prometheus.

Metrics are currently exposed by default due to low
overhead, can be turned off if desired, whereas tracing
is off by default as it has a much higher performance
overhead, though the extent of the performance hit can be
adjusted with smaller sampling rates.

License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2019-02-04 18:53:21 +10:00
Hector Sanjuan
7d16108751 Start using libp2p/go-libp2p-gorpc
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-10-17 15:28:03 +02:00
Hector Sanjuan
2ffa3d80de Fix 466: Hijack repo/stat in the proxy and return aggregates from the cluster.
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-08-20 20:43:27 +02:00
Hector Sanjuan
a9d6fe3479 Types: rename metric.SetTTLDuration to metric.SetTTL
GetTTL returns duration. SetTTL should take duration too, not seconds.
This removes the original SetTTL method which used seconds.

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-05-07 14:26:06 +02:00
Hector Sanjuan
a746255719 Fix tests
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-03-13 16:30:47 +01:00
Hector Sanjuan
c231fe8060 Update libp2p and deps to correct versions
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-03-13 15:32:56 +01:00
Hector Sanjuan
5831b251fe Issue #202: Fix mock informer
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-10-26 16:01:41 +02:00
Hector Sanjuan
00e871ddec Fix #202: Fix informers and allocators for 32-bit architectures 2017-10-26 16:01:41 +02:00
Hector Sanjuan
fb8fdb94c5 Issue #162: Improve Config.ToJSON() tests
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-10-20 10:42:41 +02:00
Hector Sanjuan
89364fd671 Issue #162: Add tests for informer Configs
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-10-19 21:09:03 +02:00
Hector Sanjuan
8f06baa1bf Issue #162: Rework configuration format
The following commit reimplements ipfs-cluster configuration under
the following premises:

  * Each component is initialized with a configuration object
  defined by its module
  * Each component decides how the JSON representation of its
  configuration looks like
  * Each component parses and validates its own configuration
  * Each component exposes its own defaults
  * Component configurations are make the sections of a
  central JSON configuration file (which replaces the current
  JSON format)
  * Component configurations implement a common interface
  (config.ComponentConfig) with a set of common operations
  * The central configuration file is managed by a
  config.ConfigManager which:
    * Registers ComponentConfigs
    * Assigns the correspondent sections from the JSON file to each
    component and delegates the parsing
    * Delegates the JSON generation for each section
    * Can be notified when the configuration is updated and must be
    saved to disk

The new service.json would then look as follows:

```json
{
  "cluster": {
    "id": "QmTVW8NoRxC5wBhV7WtAYtRn7itipEESfozWN5KmXUQnk2",
    "private_key": "<...>",
    "secret": "00224102ae6aaf94f2606abf69a0e278251ecc1d64815b617ff19d6d2841f786",
    "peers": [],
    "bootstrap": [],
    "leave_on_shutdown": false,
    "listen_multiaddress": "/ip4/0.0.0.0/tcp/9096",
    "state_sync_interval": "1m0s",
    "ipfs_sync_interval": "2m10s",
    "replication_factor": -1,
    "monitor_ping_interval": "15s"
  },
  "consensus": {
    "raft": {
      "heartbeat_timeout": "1s",
      "election_timeout": "1s",
      "commit_timeout": "50ms",
      "max_append_entries": 64,
      "trailing_logs": 10240,
      "snapshot_interval": "2m0s",
      "snapshot_threshold": 8192,
      "leader_lease_timeout": "500ms"
    }
  },
  "api": {
    "restapi": {
      "listen_multiaddress": "/ip4/127.0.0.1/tcp/9094",
      "read_timeout": "30s",
      "read_header_timeout": "5s",
      "write_timeout": "1m0s",
      "idle_timeout": "2m0s"
    }
  },
  "ipfs_connector": {
    "ipfshttp": {
      "proxy_listen_multiaddress": "/ip4/127.0.0.1/tcp/9095",
      "node_multiaddress": "/ip4/127.0.0.1/tcp/5001",
      "connect_swarms_delay": "7s",
      "proxy_read_timeout": "10m0s",
      "proxy_read_header_timeout": "5s",
      "proxy_write_timeout": "10m0s",
      "proxy_idle_timeout": "1m0s"
    }
  },
  "monitor": {
    "monbasic": {
      "check_interval": "15s"
    }
  },
  "informer": {
    "disk": {
      "metric_ttl": "30s",
      "metric_type": "freespace"
    },
    "numpin": {
      "metric_ttl": "10s"
    }
  }
}
```

This new format aims to be easily extensible per component. As such,
it already surfaces quite a few new options which were hardcoded
before.

Additionally, since Go API have changed, some redundant methods have been
removed and small refactoring has happened to take advantage of the new
way.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-10-18 00:00:12 +02:00
dgrisham
b1356cd33a removing name arg from NewInformer 2017-09-01 15:02:15 -06:00
dgrisham
456603f7a8 adding error output to NewInformerWithMetric 2017-09-01 09:48:15 -06:00
dgrisham
407fd9f68a Informer impl refactored; SortNumeric added for allocators. 2017-08-28 08:51:01 -06:00
dgrisham
2d2a9da793 FreeSpace metric impl (including descendalloc) refactored and tested. 2017-08-11 13:57:42 -06:00
dgrisham
a46ab3dda9 freespace metric partial impl 2017-08-03 11:24:19 -06:00
Hector Sanjuan
2bbbea79cc Issue #49: Add disk informer
The disk informer uses "ipfs repo stat" to fetch the RepoSize value and
uses it as a metric.

The numpinalloc allocator is now a generalized ascendalloc which
sorts metrics in ascending order and return the ones with lowest
values.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-03-27 20:40:49 +02:00
Hector Sanjuan
8e45ce64c2 Documentation: captain log, readme and golint fixes
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-02-15 15:46:51 +01:00
Hector Sanjuan
2512ecb701 Issue #41: Add Replication factor
New PeerManager, Allocator, Informer components have been added along
with a new "replication_factor" configuration option.

First, cluster peers collect and push metrics (Informer) to the Cluster
leader regularly. The Informer is an interface that can be implemented
in custom wayts to support custom metrics.

Second, on a pin operation, using the information from the collected metrics,
an Allocator can provide a list of preferences as to where the new pin
should be assigned. The Allocator is an interface allowing to provide
different allocation strategies.

Both Allocator and Informer are Cluster Componenets, and have access
to the RPC API.

The allocations are kept in the shared state. Cluster peer failure
detection is still missing and re-allocation is still missing, although
re-pinning something when a node is down/metrics missing does re-allocate
the pin somewhere else.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-02-14 19:13:08 +01:00