Commit Graph

52 Commits

Author SHA1 Message Date
Hector Sanjuan
c454769887 Informer/disk: record issued metric weights as prometheus metric. 2022-06-23 11:58:35 +02:00
Hector Sanjuan
e8695dc6f3 informer/disk: set repoSize weight to negative
smaller repositories should have more priority
2022-06-23 11:49:37 +02:00
Hector Sanjuan
508791b547 Migrate from ipfs/ipfs-cluster to ipfs-cluster/ipfs-cluster
This performs the necessary renamings.
2022-06-16 17:43:30 +02:00
Hector Sanjuan
4daece2b98 Feat: add a new "pinqueue" informer component
This new component broadcasts metrics about the current size of the pinqueue,
which can in turn be used to inform allocations.

It has a weight_bucket_size option that serves to divide the actual size by a
given factor. This allows considering peers with similar queue sizes to have
the same weight.

Additionally, some changes have been made to the balanced allocator so that a
combination of tags, pinqueue sizes and free-spaces can be used. When
allocating by [<tag>, pinqueue, freespace], the allocator will prioritize
choosing peers with the smallest pin queue weight first, and of those with the
same weight, it will allocate based on freespace.
2022-06-16 17:43:29 +02:00
Hector Sanjuan
060ef812b2 Do not issue freespace metric when free-space is 0. 2022-05-03 18:39:44 +02:00
Hector Sanjuan
0d73d33ef5 Pintracker: streaming methods
This commit continues the work of taking advantage of the streaming
capabilities in go-libp2p-gorpc by improving the ipfsconnector and pintracker
components.

StatusAll and RecoverAll methods are now streaming methods, with the REST API
output changing accordingly to produce a stream of GlobalPinInfos rather than
a json array.

pin/ls request to the ipfs daemon now use ?stream=true and avoid having to
load the full pinset map on memory. StatusAllLocal and RecoverAllLocal
requests to the pin tracker stream all the way and no longer store the full
pinset, and the full PinInfo status slice before sending it out.

We have additionally switched to a pattern where streaming methods receive the
channel as an argument, allowing the caller to decide on whether to launch a
goroutine, do buffering etc.
2022-03-22 15:38:01 +01:00
Hector Sanjuan
9b9d76f92d Pinset streaming and method type revamp
This commit introduces the new go-libp2p-gorpc streaming capabilities for
Cluster. The main aim is to work towards heavily reducing memory usage when
working with very large pinsets.

As a side-effect, it takes the chance to revampt all types for all public
methods so that pointers to static what should be static objects are not used
anymore. This should heavily reduce heap allocations and GC activity.

The main change is that state.List now returns a channel from which to read
the pins, rather than pins being all loaded into a huge slice.

Things reading pins have been all updated to iterate on the channel rather
than on the slice. The full pinset is no longer fully loaded onto memory for
things that run regularly like StateSync().

Additionally, the /allocations endpoint of the rest API no longer returns an
array of pins, but rather streams json-encoded pin objects directly. This
change has extended to the restapi client (which puts pins into a channel as
they arrive) and to ipfs-cluster-ctl.

There are still pending improvements like StatusAll() calls which should also
stream responses, and specially BlockPut calls which should stream blocks
directly into IPFS on a single call.

These are coming up in future commits.
2022-03-19 03:02:55 +01:00
Hector Sanjuan
08aea3b3aa Fix: ttl for tags should be default when not provided (now 0). 2022-01-07 14:52:35 +01:00
Hector Sanjuan
db00d651bf Balanced allocator: weight-based ordering of partitions
This fixes the issue about partitions not being picked based
on the amount of freespace available in them.

It additionally removes the metrics registry and carries information directly
in the metric.

Metrics have two additional fields: Weight and Partitionable.

Informers have been updated to make use of these fields. Partitions have
weights that equals to the weight of the metrics under them.

Older cluster versions will not set these fields. Partitionable is false by
default and weight has a GetWeight() function to convert value->weight when
unset. This provides backwards compatibility for the freespace metric.
2021-10-06 14:10:06 +02:00
Hector Sanjuan
26e229df94 Rename allocator/metrics to allocator/balanced 2021-10-06 11:26:38 +02:00
Hector Sanjuan
6b31f44351 Address most comments from PR review 2021-10-05 14:04:28 +02:00
Hector Sanjuan
cf4fb74993 service/follow: Enable new metrics allocator 2021-09-15 21:49:26 +02:00
Hector Sanjuan
1c0abde8a5 Informer: add tags informer
The tags informer produces metrics in the form tags:name/value
with the name and values defined in its configuration.
2021-09-15 20:07:37 +02:00
Hector Sanjuan
ea5e18078c Informers: GetMetric() -> GetMetrics()
Support returning multiple metrics per informer.
2021-09-15 20:07:37 +02:00
Hector Sanjuan
b6a46cd8a4 allocator: rework the whole allocator system
The new "metrics" allocator is about to partition metrics and distribe
allocations among the partitions.

For example: given a region, an availability zone and free space on disk, the
allocator would be able to choose allocations by distributing among regions
and availability zones as much as possible, and for those peers in the same
region/az, selecting those with most free space first.

This requires a major overhaul of the allocator component.
2021-09-13 12:24:00 +02:00
Hector Sanjuan
4060f4196b Revert "Informer/disk: deprecate RepoSize metric"
This reverts commit 3e54c4c695.
2021-09-10 18:53:59 +02:00
Hector Sanjuan
3e54c4c695 Informer/disk: deprecate RepoSize metric
The allocator is hardcoded to descendalloc for freespace so it is not even useful
as it would allocate to peers with largest reposize first no matter what.

We are, in any case, reworking allocators and informers etc.
2021-09-08 17:36:54 +02:00
Ian Davis
33f1334c4a
Guard access to informer rpc client with mutex
Fixes data race

==================
WARNING: DATA RACE
Read at 0x00c02be544a8 by goroutine 4718:
  github.com/ipfs/ipfs-cluster/informer/disk.(*Informer).GetMetric()
      /home/iand/wip/iand/ipfs-cluster/informer/disk/disk.go:75 +0x135
  github.com/ipfs/ipfs-cluster.(*Cluster).sendInformerMetric()
      /home/iand/wip/iand/ipfs-cluster/cluster.go:279 +0x149
  github.com/ipfs/ipfs-cluster.(*Cluster).pushInformerMetrics()
      /home/iand/wip/iand/ipfs-cluster/cluster.go:324 +0x264
  github.com/ipfs/ipfs-cluster.(*Cluster).run.func3()
      /home/iand/wip/iand/ipfs-cluster/cluster.go:583 +0xba

Previous write at 0x00c02be544a8 by goroutine 3049:
  github.com/ipfs/ipfs-cluster/informer/disk.(*Informer).Shutdown()
      /home/iand/wip/iand/ipfs-cluster/informer/disk/disk.go:65 +0x10e
  github.com/ipfs/ipfs-cluster.(*Cluster).Shutdown()
      /home/iand/wip/iand/ipfs-cluster/cluster.go:768 +0x9d7
  github.com/ipfs/ipfs-cluster.shutdownCluster()
      /home/iand/wip/iand/ipfs-cluster/ipfscluster_test.go:420 +0x64
  github.com/ipfs/ipfs-cluster.shutdownClusters()
      /home/iand/wip/iand/ipfs-cluster/ipfscluster_test.go:414 +0x74
  github.com/ipfs/ipfs-cluster.TestClustersReplicationRealloc()
      /home/iand/wip/iand/ipfs-cluster/ipfscluster_test.go:1680 +0x10c6
  testing.tRunner()
      /opt/go/src/testing/testing.go:1194 +0x202

Goroutine 4718 (running) created at:
  github.com/ipfs/ipfs-cluster.(*Cluster).run()
      /home/iand/wip/iand/ipfs-cluster/cluster.go:581 +0x16d
  github.com/ipfs/ipfs-cluster.NewCluster.func1()
      /home/iand/wip/iand/ipfs-cluster/cluster.go:208 +0xa4

Goroutine 3049 (running) created at:
  testing.(*T).Run()
      /opt/go/src/testing/testing.go:1239 +0x5d7
  testing.runTests.func1()
      /opt/go/src/testing/testing.go:1512 +0xa6
  testing.tRunner()
      /opt/go/src/testing/testing.go:1194 +0x202
  testing.runTests()
      /opt/go/src/testing/testing.go:1510 +0x612
  testing.(*M).Run()
      /opt/go/src/testing/testing.go:1418 +0x3b3
  github.com/ipfs/ipfs-cluster.TestMain()
      /home/iand/wip/iand/ipfs-cluster/ipfscluster_test.go:134 +0x7dc
  main.main()
      _testmain.go:179 +0x271
==================
--- FAIL: TestClustersReplicationRealloc (13.60s)
    ipfscluster_test.go:1641: Shutting down QmYc1fpiBd7owHgDsnr42E7XtKZCTVjjm7FLnQMozcKid3
    testing.go:1093: race detected during execution of test
2021-07-28 13:05:23 +01:00
Kishan Mohanbhai Sagathiya
ae8e74453b Fix #937: Print full working configuration at startup
Only when using debug mode

Co-authored-by: Hector Sanjuan <code@hector.link>
2020-05-15 01:33:04 +02:00
Hector Sanjuan
1880daf59d Fix #1120: Do not let the size metric underflow uint. 2020-05-12 21:17:25 +02:00
Hector Sanjuan
f83ff9b655 staticcheck: fix all staticcheck warnings in the project 2020-04-14 20:16:10 +02:00
Hector Sanjuan
b3853caf36 Dependency ugprade: changes needed
* Libp2p protectors no longer needed, use PSK directly
* Generate cluster 32-byte secret here (helper gone from pnet)
* Switch to go-log/v2 in all places
* DHT bootstrapping not needed. Adjust DHT options for tests.
* Do not rely on dissappeared CidToDsKey and DsKeyToCid functions fro dshelp.
* Disable QUIC (does not support private networks)
* Fix tests: autodiscovery started working properly
2020-03-22 14:50:25 +01:00
Kishan Mohanbhai Sagathiya
586253a2ce JSON Config object key should match JSON tags
License: MIT
Signed-off-by: Kishan Mohanbhai Sagathiya <kishansagathiya@gmail.com>
2019-07-15 17:47:35 +05:30
Hector Sanjuan
b804e61ef0 Update deps along with go-libp2p-core refactor
Lots of rewrites in imports...
2019-06-14 13:10:45 +02:00
Hector Sanjuan
d51c2a0377 Merge branch 'master' into feat/monitor-ring 2019-05-16 15:46:30 +02:00
Hector Sanjuan
3d49ac26a5 Feat: Split components into RPC Services
I had thought of this for a very long time but there were no compelling
reasons to do it. Specifying RPC endpoint permissions becomes however
significantly nicer if each Component is a different RPC Service. This also
fixes some naming issues like having to prefix methods with the component name
to separate them from methods named in the same way in some other component
(Pin and IPFSPin).
2019-05-04 21:36:10 +01:00
Adrian Lanzafame
c4b76619c1
Add failure_threshold monitors config
License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2019-04-18 16:14:13 +10:00
Hector Sanjuan
6447ea51d2 Remove *Serial types. Use pointers for all types.
This takes advantange of the latest features in go-cid, peer.ID and
go-multiaddr and makes the Go types serializable by default.

This means we no longer need to copy between Pin <-> PinSerial, or ID <->
IDSerial etc. We can now efficiently binary-encode these types using short
field keys and without parsing/stringifying (in many cases it just a cast).

We still get the same json output as before (with minor modifications for
Cids).

This should greatly improve Cluster performance and memory usage when dealing
with large collections of items.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2019-02-27 17:04:35 +00:00
Robert Ignat
e38ceab222 Add ApplyEnvVars test to numpin config
License: MIT
Signed-off-by: Robert Ignat <robert.ignat91@gmail.com>
2019-02-18 17:46:15 +02:00
Robert Ignat
08580c3d5c Add ApplyEnvVars test to disk config
License: MIT
Signed-off-by: Robert Ignat <robert.ignat91@gmail.com>
2019-02-18 17:45:42 +02:00
Robert Ignat
168cf76224 Change ApplyEnvVars strategy for all config components
Get jsonConfig from Config, apply env vars to it, load jsonConfig
back into Config.

License: MIT
Signed-off-by: Robert Ignat <robert.ignat91@gmail.com>
2019-02-15 19:07:20 +02:00
Robert Ignat
032f02802f Implement ApplyEnvVars for all ComponentConfigs
License: MIT
Signed-off-by: Robert Ignat <robert.ignat91@gmail.com>
2019-02-08 23:57:16 +02:00
Robert Ignat
ed30ac1ab4 Add ApplyEnvVars() to ComponentConfig interface
* cluster and restapi configs can also get values from environment variables
* other config components don't read any values from the environment

License: MIT
Signed-off-by: Robert Ignat <robert.ignat91@gmail.com>
2019-02-07 20:51:20 +02:00
Adrian Lanzafame
3b3f786d68
add opencensus tracing and metrics
This commit adds support for OpenCensus tracing
and metrics collection. This required support for
context.Context propogation throughout the cluster
codebase, and in particular, the ipfscluster component
interfaces.

The tracing propogates across RPC and HTTP boundaries.
The current default tracing backend is Jaeger.

The metrics currently exports the metrics exposed by
the opencensus http plugin as well as the pprof metrics
to a prometheus endpoint for scraping.
The current default metrics backend is Prometheus.

Metrics are currently exposed by default due to low
overhead, can be turned off if desired, whereas tracing
is off by default as it has a much higher performance
overhead, though the extent of the performance hit can be
adjusted with smaller sampling rates.

License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2019-02-04 18:53:21 +10:00
Hector Sanjuan
7d16108751 Start using libp2p/go-libp2p-gorpc
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-10-17 15:28:03 +02:00
Hector Sanjuan
2ffa3d80de Fix 466: Hijack repo/stat in the proxy and return aggregates from the cluster.
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-08-20 20:43:27 +02:00
Hector Sanjuan
a9d6fe3479 Types: rename metric.SetTTLDuration to metric.SetTTL
GetTTL returns duration. SetTTL should take duration too, not seconds.
This removes the original SetTTL method which used seconds.

License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-05-07 14:26:06 +02:00
Hector Sanjuan
a746255719 Fix tests
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-03-13 16:30:47 +01:00
Hector Sanjuan
c231fe8060 Update libp2p and deps to correct versions
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2018-03-13 15:32:56 +01:00
Hector Sanjuan
5831b251fe Issue #202: Fix mock informer
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-10-26 16:01:41 +02:00
Hector Sanjuan
00e871ddec Fix #202: Fix informers and allocators for 32-bit architectures 2017-10-26 16:01:41 +02:00
Hector Sanjuan
fb8fdb94c5 Issue #162: Improve Config.ToJSON() tests
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-10-20 10:42:41 +02:00
Hector Sanjuan
89364fd671 Issue #162: Add tests for informer Configs
License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-10-19 21:09:03 +02:00
Hector Sanjuan
8f06baa1bf Issue #162: Rework configuration format
The following commit reimplements ipfs-cluster configuration under
the following premises:

  * Each component is initialized with a configuration object
  defined by its module
  * Each component decides how the JSON representation of its
  configuration looks like
  * Each component parses and validates its own configuration
  * Each component exposes its own defaults
  * Component configurations are make the sections of a
  central JSON configuration file (which replaces the current
  JSON format)
  * Component configurations implement a common interface
  (config.ComponentConfig) with a set of common operations
  * The central configuration file is managed by a
  config.ConfigManager which:
    * Registers ComponentConfigs
    * Assigns the correspondent sections from the JSON file to each
    component and delegates the parsing
    * Delegates the JSON generation for each section
    * Can be notified when the configuration is updated and must be
    saved to disk

The new service.json would then look as follows:

```json
{
  "cluster": {
    "id": "QmTVW8NoRxC5wBhV7WtAYtRn7itipEESfozWN5KmXUQnk2",
    "private_key": "<...>",
    "secret": "00224102ae6aaf94f2606abf69a0e278251ecc1d64815b617ff19d6d2841f786",
    "peers": [],
    "bootstrap": [],
    "leave_on_shutdown": false,
    "listen_multiaddress": "/ip4/0.0.0.0/tcp/9096",
    "state_sync_interval": "1m0s",
    "ipfs_sync_interval": "2m10s",
    "replication_factor": -1,
    "monitor_ping_interval": "15s"
  },
  "consensus": {
    "raft": {
      "heartbeat_timeout": "1s",
      "election_timeout": "1s",
      "commit_timeout": "50ms",
      "max_append_entries": 64,
      "trailing_logs": 10240,
      "snapshot_interval": "2m0s",
      "snapshot_threshold": 8192,
      "leader_lease_timeout": "500ms"
    }
  },
  "api": {
    "restapi": {
      "listen_multiaddress": "/ip4/127.0.0.1/tcp/9094",
      "read_timeout": "30s",
      "read_header_timeout": "5s",
      "write_timeout": "1m0s",
      "idle_timeout": "2m0s"
    }
  },
  "ipfs_connector": {
    "ipfshttp": {
      "proxy_listen_multiaddress": "/ip4/127.0.0.1/tcp/9095",
      "node_multiaddress": "/ip4/127.0.0.1/tcp/5001",
      "connect_swarms_delay": "7s",
      "proxy_read_timeout": "10m0s",
      "proxy_read_header_timeout": "5s",
      "proxy_write_timeout": "10m0s",
      "proxy_idle_timeout": "1m0s"
    }
  },
  "monitor": {
    "monbasic": {
      "check_interval": "15s"
    }
  },
  "informer": {
    "disk": {
      "metric_ttl": "30s",
      "metric_type": "freespace"
    },
    "numpin": {
      "metric_ttl": "10s"
    }
  }
}
```

This new format aims to be easily extensible per component. As such,
it already surfaces quite a few new options which were hardcoded
before.

Additionally, since Go API have changed, some redundant methods have been
removed and small refactoring has happened to take advantage of the new
way.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-10-18 00:00:12 +02:00
dgrisham
b1356cd33a removing name arg from NewInformer 2017-09-01 15:02:15 -06:00
dgrisham
456603f7a8 adding error output to NewInformerWithMetric 2017-09-01 09:48:15 -06:00
dgrisham
407fd9f68a Informer impl refactored; SortNumeric added for allocators. 2017-08-28 08:51:01 -06:00
dgrisham
2d2a9da793 FreeSpace metric impl (including descendalloc) refactored and tested. 2017-08-11 13:57:42 -06:00
dgrisham
a46ab3dda9 freespace metric partial impl 2017-08-03 11:24:19 -06:00
Hector Sanjuan
2bbbea79cc Issue #49: Add disk informer
The disk informer uses "ipfs repo stat" to fetch the RepoSize value and
uses it as a metric.

The numpinalloc allocator is now a generalized ascendalloc which
sorts metrics in ascending order and return the ones with lowest
values.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
2017-03-27 20:40:49 +02:00