Commit Graph

92 Commits

Author SHA1 Message Date
Hector Sanjuan
d4591b8442 Monitor: remove accrual detection. Add LatestForPeer method.
Fixes #939
2022-01-31 17:53:09 +01:00
Hector Sanjuan
4739ed9210 Changes pertaining to go-libp2p v0.16.0 2021-11-30 06:25:15 +01:00
Hector Sanjuan
b6a46cd8a4 allocator: rework the whole allocator system
The new "metrics" allocator is about to partition metrics and distribe
allocations among the partitions.

For example: given a region, an availability zone and free space on disk, the
allocator would be able to choose allocations by distributing among regions
and availability zones as much as possible, and for those peers in the same
region/az, selecting those with most free space first.

This requires a major overhaul of the allocator component.
2021-09-13 12:24:00 +02:00
Hector Sanjuan
ecf287c8e6 Fix #1409: Better support of metrics from older peers
Before: receiving a metric from a peer <= 0.13.3 causes decoding error on logs.
Now: metric is correctly parsed and a warning message is printed once.
2021-08-12 00:03:05 +02:00
Hector Sanjuan
adb15feb6e
Dependency upgrades (#1395)
* build(deps): bump github.com/multiformats/go-multiaddr-dns

Bumps [github.com/multiformats/go-multiaddr-dns](https://github.com/multiformats/go-multiaddr-dns) from 0.2.0 to 0.3.1.
- [Release notes](https://github.com/multiformats/go-multiaddr-dns/releases)
- [Commits](https://github.com/multiformats/go-multiaddr-dns/compare/v0.2.0...v0.3.1)

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>

* build(deps): bump github.com/hashicorp/go-hclog from 0.15.0 to 0.16.0

Bumps [github.com/hashicorp/go-hclog](https://github.com/hashicorp/go-hclog) from 0.15.0 to 0.16.0.
- [Release notes](https://github.com/hashicorp/go-hclog/releases)
- [Commits](https://github.com/hashicorp/go-hclog/compare/v0.15.0...v0.16.0)

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>

* build(deps): bump github.com/ipfs/go-unixfs from 0.2.4 to 0.2.5

Bumps [github.com/ipfs/go-unixfs](https://github.com/ipfs/go-unixfs) from 0.2.4 to 0.2.5.
- [Release notes](https://github.com/ipfs/go-unixfs/releases)
- [Commits](https://github.com/ipfs/go-unixfs/compare/v0.2.4...v0.2.5)

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>

* build(deps): bump github.com/libp2p/go-libp2p-peerstore

Bumps [github.com/libp2p/go-libp2p-peerstore](https://github.com/libp2p/go-libp2p-peerstore) from 0.2.6 to 0.2.7.
- [Release notes](https://github.com/libp2p/go-libp2p-peerstore/releases)
- [Commits](https://github.com/libp2p/go-libp2p-peerstore/compare/v0.2.6...v0.2.7)

Signed-off-by: dependabot[bot] <support@github.com>

* build(deps): bump go.uber.org/multierr from 1.6.0 to 1.7.0

Bumps [go.uber.org/multierr](https://github.com/uber-go/multierr) from 1.6.0 to 1.7.0.
- [Release notes](https://github.com/uber-go/multierr/releases)
- [Changelog](https://github.com/uber-go/multierr/blob/master/CHANGELOG.md)
- [Commits](https://github.com/uber-go/multierr/compare/v1.6.0...v1.7.0)

Signed-off-by: dependabot[bot] <support@github.com>

* Chore: update deps

* Update changelog

* Update to go1.16. Downgrade unixfs.

* go mod tidy

* travis: use go install

* golint no more

* Update configuration for dependabot

* Fix wrong dependabot config

* dependabot

* Revert update of go-unixfs

* Dependency upgrades

* Bump github.com/libp2p/go-libp2p-gorpc from 0.1.2 to 0.1.3

Bumps [github.com/libp2p/go-libp2p-gorpc](https://github.com/libp2p/go-libp2p-gorpc) from 0.1.2 to 0.1.3.
- [Release notes](https://github.com/libp2p/go-libp2p-gorpc/releases)
- [Commits](https://github.com/libp2p/go-libp2p-gorpc/compare/v0.1.2...v0.1.3)

---
updated-dependencies:
- dependency-name: github.com/libp2p/go-libp2p-gorpc
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* Fix deprecated objects with prometheus

* chore: update dependencies

* monitor: remove dependency to go-multicodec

go-multicodec has been deprecated and it was just a wrapper.

This switches directly to ugorji/go/codec's msgpack for cluster metrics
serialization.

* Upgrade mfs so it works with latest go-unixfs

* Bump github.com/ugorji/go/codec from 1.2.5 to 1.2.6 (#1391)

Bumps [github.com/ugorji/go/codec](https://github.com/ugorji/go) from 1.2.5 to 1.2.6.
- [Release notes](https://github.com/ugorji/go/releases)
- [Commits](https://github.com/ugorji/go/compare/v1.2.5...v1.2.6)

---
updated-dependencies:
- dependency-name: github.com/ugorji/go/codec
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump github.com/hashicorp/go-hclog from 0.16.0 to 0.16.1 (#1392)

Bumps [github.com/hashicorp/go-hclog](https://github.com/hashicorp/go-hclog) from 0.16.0 to 0.16.1.
- [Release notes](https://github.com/hashicorp/go-hclog/releases)
- [Commits](https://github.com/hashicorp/go-hclog/compare/v0.16.0...v0.16.1)

---
updated-dependencies:
- dependency-name: github.com/hashicorp/go-hclog
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Co-authored-by: dependabot-preview[bot] <27856297+dependabot-preview[bot]@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-07-06 16:47:04 +02:00
Hector Sanjuan
c02130245a
Revert "monitor: remove dependency to go-multicodec (#1313)" (#1330)
This reverts commit fab2ed8149.
2021-03-23 23:59:36 +01:00
Hector Sanjuan
fab2ed8149
monitor: remove dependency to go-multicodec (#1313)
go-multicodec has been deprecated and it was just a wrapper.

This switches directly to ugorji/go/codec's msgpack for cluster metrics
serialization.
2021-02-19 09:59:17 +01:00
Hector Sanjuan
12756e0220 monitor: fix panic in checker
Alert might be launched when no metrics for peer are received at all.
2021-01-14 00:04:40 +01:00
Hector Sanjuan
90208b45f9 health/alerts endpoint: brush up old PR 2021-01-13 22:09:21 +01:00
Hector Sanjuan
4bcb91ee2b Merge branch 'master' into feat/alerts 2021-01-13 21:08:49 +01:00
Hector Sanjuan
0dfa9ca185
(chore) Upgrade dependencies
Upgrade dependencies and bump to go1.15.
2020-08-27 14:10:58 +02:00
Kishan Mohanbhai Sagathiya
ae8e74453b Fix #937: Print full working configuration at startup
Only when using debug mode

Co-authored-by: Hector Sanjuan <code@hector.link>
2020-05-15 01:33:04 +02:00
Hector Sanjuan
b513ec194d Fix some mispellings 2020-04-14 23:47:09 +02:00
Hector Sanjuan
f83ff9b655 staticcheck: fix all staticcheck warnings in the project 2020-04-14 20:16:10 +02:00
Hector Sanjuan
b3853caf36 Dependency ugprade: changes needed
* Libp2p protectors no longer needed, use PSK directly
* Generate cluster 32-byte secret here (helper gone from pnet)
* Switch to go-log/v2 in all places
* DHT bootstrapping not needed. Adjust DHT options for tests.
* Do not rely on dissappeared CidToDsKey and DsKeyToCid functions fro dshelp.
* Disable QUIC (does not support private networks)
* Fix tests: autodiscovery started working properly
2020-03-22 14:50:25 +01:00
Kishan Mohanbhai Sagathiya
618ebd23f4 Check expiry in alert 2019-12-13 12:25:28 +05:30
Kishan Sagathiya
31534a429b Fix #374: health metrics improvements
- Human-sizes for freespace metrics. Display whether if metric is
expires in something like "expires in 3m".
- When not passing metric name `ipfs-cluster-ctl health metrics` hits
the the metrics endpoint which returns a list of available metrics and
displays to user
- Humanize metrics output
- Sort metrics output
2019-10-24 16:37:26 +02:00
Kishan Mohanbhai Sagathiya
76857112b2 Test that expired PeerMetrics gets deleted 2019-09-13 08:01:15 +07:00
Hector Sanjuan
e240c2a19f Simplify failed peer detection 2019-06-27 16:55:51 +01:00
Adrian Lanzafame
2255ba737b
fix ttl expiration check
License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2019-06-25 12:54:41 +02:00
Hector Sanjuan
563a0da9ae Do alert for all metric types 2019-06-23 10:14:29 +01:00
Adrian Lanzafame
27295c10ac fix check failed
License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2019-06-23 10:14:29 +01:00
Adrian Lanzafame
5e09da9d63 address pr feedback
License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2019-06-23 10:14:29 +01:00
Adrian Lanzafame
e1b40d49c1 fix how accrual fd treats ttls
License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2019-06-23 10:14:29 +01:00
Hector Sanjuan
b804e61ef0 Update deps along with go-libp2p-core refactor
Lots of rewrites in imports...
2019-06-14 13:10:45 +02:00
Hector Sanjuan
27368ab077 Fix: alert at most once PER METRIC
Before it would alert at most once per peer, which prevented some metrics
from alerting at all.
2019-06-11 11:44:12 +02:00
Hector Sanjuan
a0d93fc62c Change MaxAlertThreshold to 1 2019-06-11 10:54:12 +02:00
Adrian Lanzafame
14841e4e24 address pr feedback
License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2019-06-11 10:54:12 +02:00
Adrian Lanzafame
7459917275 alerting for peers stops after one alert
License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2019-06-11 10:54:12 +02:00
Hector Sanjuan
6caf78a57b monitor config: make threshold optional in the configuration
takes default when not set.
2019-05-16 12:52:40 +02:00
Adrian Lanzafame
a763560e0c
extend the initial size of metrics distribution to 5
This prevents accrual failure detection from kicking in too
soon after a cluster has started.

License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2019-05-07 19:07:11 +10:00
Adrian Lanzafame
9464759ae6
remove hard timeout limits and use only accrual failure detection
License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2019-04-30 12:06:01 +10:00
Adrian Lanzafame
42693eb06d
fix passing ctx from daemon to pubsub
License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2019-04-29 17:58:28 +10:00
Adrian Lanzafame
32ca9167d6
use accrual instead of metric expiration
License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2019-04-26 17:58:29 +10:00
Adrian Lanzafame
3c09ebcc71
add Alerts measure
License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2019-04-26 17:56:44 +10:00
Adrian Lanzafame
b0dbcbaa8d
add reference to original prob.go
License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2019-04-26 12:20:31 +10:00
Adrian Lanzafame
d5ecd9ef6a
WIP
License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2019-04-23 20:30:26 +10:00
Adrian Lanzafame
bf1b5eff90
comment config value
License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2019-04-18 16:18:19 +10:00
Adrian Lanzafame
eae4329cb3
address pr feedback
License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2019-04-18 16:18:19 +10:00
Adrian Lanzafame
31af640e33
use allocations list to choose peer to repin
License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2019-04-18 16:16:40 +10:00
Adrian Lanzafame
1349e99c1e
fix time taken by tests
License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2019-04-18 16:16:39 +10:00
Adrian Lanzafame
4338ea6905
refactor prob to use gonum and pass []float64
License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2019-04-18 16:16:39 +10:00
Adrian Lanzafame
bcbe7b453f
refactor from big.Float to float64 and add prob tests
License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2019-04-18 16:14:13 +10:00
Adrian Lanzafame
e187b800cf
rename TS to ReceivedAt
License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2019-04-18 16:14:13 +10:00
Adrian Lanzafame
c4b76619c1
Add failure_threshold monitors config
License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2019-04-18 16:14:13 +10:00
Adrian Lanzafame
3d6eb64db6
Add accrual failure detection method
License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2019-04-18 16:14:13 +10:00
Adrian Lanzafame
13ed78786c
fix distribution test and general clean up
License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2019-04-18 16:09:19 +10:00
Hector Sanjuan
4e61935379
Use defer for locks. Move to Prev() in All()
License: MIT
Signed-off-by: Hector Sanjuan <code@hector.link>
2019-04-18 16:09:19 +10:00
Hector Sanjuan
da3c543ce2
Revert "attempt copying slice"
This reverts commit 0d4d40513fccd31b9cdc4db369aa87e87c529be4.
2019-04-18 16:09:19 +10:00
Adrian Lanzafame
46d6cb155d
attempt copying slice
License: MIT
Signed-off-by: Adrian Lanzafame <adrianlanzafame92@gmail.com>
2019-04-18 16:09:19 +10:00