Commit Graph

302 Commits

Author SHA1 Message Date
Hector Sanjuan
e9857652f2 Add a timestamp to Pins
This adds a Timestamp field to the pin objects. This allows to track when they were pinned.

This:

* Allows the pin-tracker to actually show accurate information on when the pin
  entered the system for pins that are not part of ongoing operations
  (currently it shows time.Now())
* Adds support for reporting timestamp on a pinning services api.
2021-10-20 16:55:57 +02:00
Hector Sanjuan
6b31f44351 Address most comments from PR review 2021-10-05 14:04:28 +02:00
Hector Sanjuan
ea5e18078c Informers: GetMetric() -> GetMetrics()
Support returning multiple metrics per informer.
2021-09-15 20:07:37 +02:00
Hector Sanjuan
67497c4eb4 Fix #1436: Do not block peer startup waiting for RecoverAll
On large pinsets this may take a very long time and prevents metrics and
re-boostrapping from starting, among other things. See bug description.

This lets watchPinset trigger an immediate RecoverAllLocal instead, but this
happens in its own goroutine and should allow everything else to start.
2021-08-06 11:30:29 +02:00
Ian Davis
cb4023855c
Ensure read of alerts slice is performed while holding lock
Fixes data race:

==================
WARNING: DATA RACE
Read at 0x00c029ae7f10 by goroutine 4785:
  github.com/ipfs/ipfs-cluster.(*Cluster).Alerts()
      /home/iand/wip/iand/ipfs-cluster/cluster.go:395 +0x64
  github.com/ipfs/ipfs-cluster.TestClusterAlerts()
      /home/iand/wip/iand/ipfs-cluster/ipfscluster_test.go:2159 +0x238
  testing.tRunner()
      /opt/go/src/testing/testing.go:1194 +0x202

Previous write at 0x00c029ae7f10 by goroutine 5062:
  github.com/ipfs/ipfs-cluster.(*Cluster).alertsHandler()
      /home/iand/wip/iand/ipfs-cluster/cluster.go:429 +0x48c
  github.com/ipfs/ipfs-cluster.(*Cluster).run.func5()
      /home/iand/wip/iand/ipfs-cluster/cluster.go:596 +0x76

Goroutine 4785 (running) created at:
  testing.(*T).Run()
      /opt/go/src/testing/testing.go:1239 +0x5d7
  testing.runTests.func1()
      /opt/go/src/testing/testing.go:1512 +0xa6
  testing.tRunner()
      /opt/go/src/testing/testing.go:1194 +0x202
  testing.runTests()
      /opt/go/src/testing/testing.go:1510 +0x612
  testing.(*M).Run()
      /opt/go/src/testing/testing.go:1418 +0x3b3
  github.com/ipfs/ipfs-cluster.TestMain()
      /home/iand/wip/iand/ipfs-cluster/ipfscluster_test.go:134 +0x7dc
  main.main()
      _testmain.go:179 +0x271

Goroutine 5062 (running) created at:
  github.com/ipfs/ipfs-cluster.(*Cluster).run()
      /home/iand/wip/iand/ipfs-cluster/cluster.go:594 +0x1f6
  github.com/ipfs/ipfs-cluster.NewCluster.func1()
      /home/iand/wip/iand/ipfs-cluster/cluster.go:208 +0xa4
==================
--- FAIL: TestClusterAlerts (8.69s)
    testing.go:1093: race detected during execution of test
2021-07-28 13:08:56 +01:00
Hector Sanjuan
edfcfa3fb0 Fix #1360: Efficient pinset status with filters
This commit modifies the pintracker StatusAll call to take a status filter.

This allows to skip a PinLs call to ipfs when checking status for items that
are queued, pinning, unpinning or in error. Those status come directly from
the operation tracker. This should result in a significant performance
increase for those calls, particularly in nodes with several hundred thousand
pins and more, where the call to IPFS is very expensive.

A new TrackerStatusUnexpectedlyUnpinned status has been introduce to
differentiate between pin errors (tracked by the operation tracker) and "lost"
items (which before were pin errors too). This new status is handled by the
Recover() operation as before.
2021-07-06 11:34:19 +02:00
Hector Sanjuan
5419d0ff7c Issue #1350: Ensure CID status and Peers do not take too long
StatusCID() and Peers() are calls that should return relatively quickly.

If they don't, it means they are hanging for some reason. We cannot let the
whole Multicall request hang waiting on a single peer, therefore, set a
hardcoded 15 second deadline for both.
2021-05-03 17:39:33 +02:00
Hector Sanjuan
d1700dbe81 Fixes #1319: Status wrongly shows pins as REMOTE
The Allocations of a pin that has been added with default replication factor
are kept even when the replication factor turns out to be -1.

This resulted in the Status(cid) code skipping calls to a number of peers
and setting the pin directly as REMOTE.

The fix, on one side makes sure Allocations is always nil when the replication
factor is -1. On the other size, lets the globalPinInfoCid method check the
replication factor value, rather than the number of allocations to decide if
any nodes are bound to be remote.

On the plus side, the pin tracker used the IsRemotePin method, which uses the
replication factor, so things were pinned even if the Status(cid) method shows
them as remote.
2021-03-24 00:47:15 +01:00
Sergei Udris
43fa2994ac
feat: make MDNS failure on start non-fatal (#1310)
* feat: make MDNS failure on start non-fatal

- if discovery.NewMdnsService errors on start, show warning "error message, MDNS service will be disabled"
- same as setting MDNSInterval to 0: NewCluster is still created and daemon runs, but without MDNS
2021-02-19 09:47:46 +01:00
Hector Sanjuan
7ea11da75f Fix linter problem 2021-01-14 00:18:16 +01:00
Hector Sanjuan
90208b45f9 health/alerts endpoint: brush up old PR 2021-01-13 22:09:21 +01:00
Hector Sanjuan
4bcb91ee2b Merge branch 'master' into feat/alerts 2021-01-13 21:08:49 +01:00
Hector Sanjuan
e967238848
Merge pull request #1129 from ipfs/fix/1013-follow-list
Improvements to ipfs-cluster-follow * list
2020-05-16 02:31:06 +02:00
Hector Sanjuan
c026299b95 Include Name as GlobalPinInfo key and consolidate redundant keys
GlobalPinInfo objects carried redundant information (Cid, Peer) that takes
space and time to serialize.

This has been addressed by having GlobalPinInfo embed PinInfoShort rather than
PinInfo. This new types ommits redundant fields.
2020-05-16 02:27:24 +02:00
Hector Sanjuan
b0dcfe68c7
Merge pull request #1127 from ipfs/fix/1064-repin-followers
Fix #1064: Make the peer closest to the CID in charge of repinning
2020-05-15 20:21:26 +02:00
Hector Sanjuan
2e49d522ec Fix latest staticcheck errors and let travis test it 2020-05-14 23:54:11 +02:00
Hector Sanjuan
9bfbde8e76 Fix #1064: Make the peer closest to the CID in charge of repinning
We cannot rely on current pin allocations anymore since the allocations
may be follower peers that do not even handle alerts.

Instead, we assume that we are a trusted peer (because we are not in follower
mode), get all other trusted peers and act if we are the XOR-closest to the
CID.

This also means that replication-factor = 1 pins can be recovered too.
2020-05-14 23:47:56 +02:00
Hector Sanjuan
1499107835 Cluster: update docstring for setupPin() 2020-04-21 22:59:35 +02:00
Hector Sanjuan
99eb29a7d6 cluster: do not allow to repin recursive pins as something else.
i.e. a direct pin can be repinned as recursive, but a recursive
pin cannot be pinned as direct (this fails in IPFS too).

Additionally, save a couple of calls to the datastore by obtaining the
existing pin only once.
2020-04-21 17:23:55 +02:00
Hector Sanjuan
a6d8e00d20
Merge pull request #1065 from gargdeepak/fix/cluster/pinupdate
Fixes #996 pin expiry is updated if set in options
2020-04-16 10:55:18 +02:00
Hector Sanjuan
7ffd18e41b Feat: upgrade to dual DHT 2020-04-14 22:03:24 +02:00
Hector Sanjuan
f83ff9b655 staticcheck: fix all staticcheck warnings in the project 2020-04-14 20:16:10 +02:00
Hector Sanjuan
eba3246b98 chore: update deps: dht to cypress
Add necessary validators.

There is currently no way to run a dynamic-mode DHT that supports
LAN-based peers. Therefore for the time-being we are setting the
dht.Mode to "server".
2020-04-07 16:22:46 +02:00
Hector Sanjuan
65ad4bd632 cluster/daemons: Close the datastore AFTER the DHT.
Avoids panics.

This also removes the abnormality of cluster closing a datastore
that it did not create.
2020-04-02 16:29:41 +02:00
deepakgarg
788ecff327 Fixes #996 pin expiry is updated if set in options 2020-03-26 20:07:23 -07:00
Hector Sanjuan
b3853caf36 Dependency ugprade: changes needed
* Libp2p protectors no longer needed, use PSK directly
* Generate cluster 32-byte secret here (helper gone from pnet)
* Switch to go-log/v2 in all places
* DHT bootstrapping not needed. Adjust DHT options for tests.
* Do not rely on dissappeared CidToDsKey and DsKeyToCid functions fro dshelp.
* Disable QUIC (does not support private networks)
* Fix tests: autodiscovery started working properly
2020-03-22 14:50:25 +01:00
Yang Hau
7986d94242
fix: Fix typos (#1001)
Fix typos in files
2020-02-03 10:30:04 +01:00
Kishan Mohanbhai Sagathiya
68abae9287 Merge branch 'master' into feat/alerts 2019-12-23 12:45:22 +05:30
Kishan Mohanbhai Sagathiya
a3b8767e87 Added tests for Alerts
- tests for related cluster method, rest api, client method etc
- clean expired alerts everytime a new alerts come in
2019-12-23 12:42:38 +05:30
Hector Sanjuan
ad1e739bfb Cluster: follower peers should ignore alerts/re-allocations. 2019-12-16 13:42:35 +01:00
Hector Sanjuan
9f660ba38e Fix: stateless: cluster should pin items that are in the state but not on ipfs
StateSync() used to take care of this by issuing Track() calls. But this
functionality was removed.

This starts returning items that are in the state but not on IPFS as
PIN_ERRORs. It ensures that the Recover methods see them so that they can
trigger repinnings for missing items. This covers cases where the user
modifies the ipfs state manually, or resets the ipfs daemon but keeps the
cluster state, and cases where cluster was stopped half-way through a pinning.
2019-12-16 13:42:35 +01:00
Hector Sanjuan
6ad8974bde cluster: Shutdown informer components 2019-12-13 09:48:12 +01:00
Hector Sanjuan
31f4afadea cluster: Improve Shutdown() docstring 2019-12-13 09:47:42 +01:00
Kishan Mohanbhai Sagathiya
618ebd23f4 Check expiry in alert 2019-12-13 12:25:28 +05:30
Kishan Mohanbhai Sagathiya
d21860eee7 Merge branch 'master' into feat/alerts 2019-12-13 10:22:06 +05:30
Kishan Sagathiya
5258a4d428 Remove map pintracker (#944)
This removes mappintracker and sets stateless tracker as the default (and only) pintracker component.

Because the stateless tracker matches the cluster state with only ongoing operations being kept on memory, and additional information provided by ipfs-pin-ls, syncing operations are not necessary. Therefore the Sync/SyncAll operations are removed cluster-wide.
2019-12-12 21:22:54 +01:00
Kishan Mohanbhai Sagathiya
04069b8c81 Introduction of ctl alerts
- basic version, just alerts if peers are down
2019-12-13 00:21:28 +05:30
Kishan Sagathiya
30fd5ee6a0 Feat: Allow and run multiple informer components (#962)
This introduces the possiblity of running Cluster with multiple informer components. The first one in the list is the used for "allocations". This is just groundwork for working with several informers in the future.
2019-12-05 15:08:43 +01:00
Hector Sanjuan
d2bf1bc7b6 config: Add PeerAddresses
This adds a PeerAddresses entry to the main cluster configuration.

The peer will ingest and potentially connect to those peer addresses during
the start (similarly to the ones in the peerstore).

This allows to provide "bootstrap" (as in "peers we connect to") addresses
directly in the configuration, which is useful when distributing a single
configuration template that will allow a cluster peer to know where to connect
on the first boot.
2019-12-02 15:08:37 +01:00
Kishan Mohanbhai Sagathiya
95016492c3 Merge branch 'master' into fix/broadcast-ops 2019-11-08 20:50:19 +05:30
Kishan Mohanbhai Sagathiya
d42c0fd651 move comment with variables 2019-11-08 20:49:06 +05:30
Kishan Mohanbhai Sagathiya
0e7ed97e59 Make it clearer 2019-11-08 19:45:01 +05:30
Kishan Mohanbhai Sagathiya
108fcff8a9 temp 2019-11-08 12:43:01 +05:30
Hector Sanjuan
249d9007d2 Merge branch 'master' into feat/cluster-gc 2019-11-07 18:35:42 +01:00
Kishan Mohanbhai Sagathiya
a2ce37d52d If pin is not part of the pinset mark it as unpinned 2019-11-06 10:14:59 +05:30
Kishan Mohanbhai Sagathiya
7289b55e99 Merge branch 'master' into fix/broadcast-ops 2019-11-05 21:02:48 +05:30
Kishan Mohanbhai Sagathiya
3a7addc946 globalPinInfo should contain entries for all peers except for
followermode
2019-11-05 21:01:29 +05:30
Hector Sanjuan
669e75aefc libp2p host: add secio as alternative, do not rewrap host
Only use QUIC for tests, as TCP+TLS has proven very unreliable.
2019-11-05 12:50:46 +01:00
Hector Sanjuan
af29bbc944 Pin-expiration: error when pinning expired pins 2019-11-05 12:22:52 +01:00
Kishan Mohanbhai Sagathiya
e4e1cbea6e Fix #481: Pin expiration
This adds a new PinOption: ExpireAt.

The StateSync ticker will check and unpin expired pins from the Cluster.

ipfs-cluster-ctl supports an "expire-in" which gives a duration.
2019-11-05 10:40:48 +01:00