Commit Graph

184 Commits

Author SHA1 Message Date
Hector Sanjuan
eee53bfa4f Streaming Peers(): make Peers() a streaming call
This commit makes all the changes to make Peers() a streaming call.

While Peers is usually a non problematic call, for consistency, all calls
returning collections assembled through broadcast to cluster peers are now
streaming calls.
2022-03-23 01:27:57 +01:00
Hector Sanjuan
0d73d33ef5 Pintracker: streaming methods
This commit continues the work of taking advantage of the streaming
capabilities in go-libp2p-gorpc by improving the ipfsconnector and pintracker
components.

StatusAll and RecoverAll methods are now streaming methods, with the REST API
output changing accordingly to produce a stream of GlobalPinInfos rather than
a json array.

pin/ls request to the ipfs daemon now use ?stream=true and avoid having to
load the full pinset map on memory. StatusAllLocal and RecoverAllLocal
requests to the pin tracker stream all the way and no longer store the full
pinset, and the full PinInfo status slice before sending it out.

We have additionally switched to a pattern where streaming methods receive the
channel as an argument, allowing the caller to decide on whether to launch a
goroutine, do buffering etc.
2022-03-22 15:38:01 +01:00
Hector Sanjuan
9b9d76f92d Pinset streaming and method type revamp
This commit introduces the new go-libp2p-gorpc streaming capabilities for
Cluster. The main aim is to work towards heavily reducing memory usage when
working with very large pinsets.

As a side-effect, it takes the chance to revampt all types for all public
methods so that pointers to static what should be static objects are not used
anymore. This should heavily reduce heap allocations and GC activity.

The main change is that state.List now returns a channel from which to read
the pins, rather than pins being all loaded into a huge slice.

Things reading pins have been all updated to iterate on the channel rather
than on the slice. The full pinset is no longer fully loaded onto memory for
things that run regularly like StateSync().

Additionally, the /allocations endpoint of the rest API no longer returns an
array of pins, but rather streams json-encoded pin objects directly. This
change has extended to the restapi client (which puts pins into a channel as
they arrive) and to ipfs-cluster-ctl.

There are still pending improvements like StatusAll() calls which should also
stream responses, and specially BlockPut calls which should stream blocks
directly into IPFS on a single call.

These are coming up in future commits.
2022-03-19 03:02:55 +01:00
Hector Sanjuan
d0e905babe Fix critical log level
The CRITICAL log level no longer exists and we were logging more than we
wanted to.
2022-02-15 19:38:33 +01:00
Hector Sanjuan
5e89c0ba41 Pintracker: set Name in operation tracker. Fixes #1212. 2022-01-31 21:04:11 +01:00
Hector Sanjuan
809b7fbda5 Pintracker: add IPFS ID to Pin Information
Fixes #1554
Fixes: peer names unset for remote peers

This adds an IPFS field to pin status information (PinInfoShort).

It has not been easy to add this, given that the IPFS ID is something that
comes from outside of cluster (unlike the peer name). After several tries I
have settled in the following things:

- Use the ping metric to send out peer names and IPFS IDs to the peers in the
  cluster.
- Cache the latest known IPFS ID (if IPFS dies we should still be setting
  the ID).
- Provide an RPC method for the Pintracker to obtain IPFS ID from the cache.
- Given we now know information for peernames and IPFS IDs from other peers,
  we can use that information even if the requests to them error or we are not
  contacting (i.e. peers allocated as remote are not queried for status). We can
  use the information from the last received ping metric.
- This means we should keep metrics around even if peers go away, at least for
  a while rather than deleting them as soon as we detect that they have expired.

Puting it all together we now have a system to gossip peer information around on top
of the ping metrics.
2022-01-31 17:53:09 +01:00
Hector Sanjuan
592ce450ce pintracker: RecoverAll should only return status for recovered items
We call RecoverAll regularly and I noticed it was way slower than it should be.

After all, it should just loop the pinset and enqueued items that are
unexpectedly unpinned or in pin error.

However, at some point we decided that RecoverAll would return information for
all pins, regardless of whether they were recovered or not. This ends up
resulting in a separate Status call for every pin that is already pinned, and
this call hits IPFS. This is pretty bad with big pinsets.

This commit fixes that, we return no state information for pins that are not
touched.
2022-01-11 16:22:03 +01:00
Hector Sanjuan
26e229df94 Rename allocator/metrics to allocator/balanced 2021-10-06 11:26:38 +02:00
Hector Sanjuan
b6a46cd8a4 allocator: rework the whole allocator system
The new "metrics" allocator is about to partition metrics and distribe
allocations among the partitions.

For example: given a region, an availability zone and free space on disk, the
allocator would be able to choose allocations by distributing among regions
and availability zones as much as possible, and for those peers in the same
region/az, selecting those with most free space first.

This requires a major overhaul of the allocator component.
2021-09-13 12:24:00 +02:00
Hector Sanjuan
ce2490c64f
Merge pull request #1389 from ipfs/fix/db-close-tests
Fix: tests: close datastore on cluster node shutdown.
2021-07-06 14:05:36 +02:00
Hector Sanjuan
8ce98ceae3 Fix: tests: close datastore on cluster node shutdown.
This resulted in too-many-files-open when running with leveldb.
2021-07-06 12:28:03 +02:00
Hector Sanjuan
edfcfa3fb0 Fix #1360: Efficient pinset status with filters
This commit modifies the pintracker StatusAll call to take a status filter.

This allows to skip a PinLs call to ipfs when checking status for items that
are queued, pinning, unpinning or in error. Those status come directly from
the operation tracker. This should result in a significant performance
increase for those calls, particularly in nodes with several hundred thousand
pins and more, where the call to IPFS is very expensive.

A new TrackerStatusUnexpectedlyUnpinned status has been introduce to
differentiate between pin errors (tracked by the operation tracker) and "lost"
items (which before were pin errors too). This new status is handled by the
Recover() operation as before.
2021-07-06 11:34:19 +02:00
Hector Sanjuan
099e23cbd1 Run tests also using leveldb backend 2021-06-11 18:43:54 +02:00
Hector Sanjuan
39a7f5fdad alerts: fix TestClusterAlerts 2021-01-13 22:23:51 +01:00
Hector Sanjuan
4bcb91ee2b Merge branch 'master' into feat/alerts 2021-01-13 21:08:49 +01:00
Hector Sanjuan
4125f12f52 chore: update dependencies 2020-09-02 12:18:16 +02:00
Hector Sanjuan
c026299b95 Include Name as GlobalPinInfo key and consolidate redundant keys
GlobalPinInfo objects carried redundant information (Cid, Peer) that takes
space and time to serialize.

This has been addressed by having GlobalPinInfo embed PinInfoShort rather than
PinInfo. This new types ommits redundant fields.
2020-05-16 02:27:24 +02:00
Hector Sanjuan
fe4a5b32f4 tests: improve TestClustersPinDirect mode
Check that a recursive pin cannot be repinned as direct
2020-04-23 18:28:16 +02:00
Hector Sanjuan
84eddf4ac8 Direct pins: fix add tests for direct pinning 2020-04-23 18:05:05 +02:00
Hector Sanjuan
74110b9b96 Tests: improve explicitness of expiry time update check 2020-04-16 11:02:33 +02:00
Hector Sanjuan
a6d8e00d20
Merge pull request #1065 from gargdeepak/fix/cluster/pinupdate
Fixes #996 pin expiry is updated if set in options
2020-04-16 10:55:18 +02:00
Hector Sanjuan
7ffd18e41b Feat: upgrade to dual DHT 2020-04-14 22:03:24 +02:00
deepakgarg
bf9a089ae3 Fixes #996 Rounding expiry time to 2s in TestClusterPingUpdate test 2020-04-14 12:45:34 -07:00
Hector Sanjuan
f83ff9b655 staticcheck: fix all staticcheck warnings in the project 2020-04-14 20:16:10 +02:00
deepakgarg
4aae7080a8 Fixes #996. Updated the TestClustersPinUpdate test 2020-04-10 23:41:48 -07:00
Hector Sanjuan
eba3246b98 chore: update deps: dht to cypress
Add necessary validators.

There is currently no way to run a dynamic-mode DHT that supports
LAN-based peers. Therefore for the time-being we are setting the
dht.Mode to "server".
2020-04-07 16:22:46 +02:00
Hector Sanjuan
bebd1e168c cluster: re-add quic to defaults, but keep transport disabled
This still works.
2020-04-02 16:12:45 +02:00
Hector Sanjuan
b3853caf36 Dependency ugprade: changes needed
* Libp2p protectors no longer needed, use PSK directly
* Generate cluster 32-byte secret here (helper gone from pnet)
* Switch to go-log/v2 in all places
* DHT bootstrapping not needed. Adjust DHT options for tests.
* Do not rely on dissappeared CidToDsKey and DsKeyToCid functions fro dshelp.
* Disable QUIC (does not support private networks)
* Fix tests: autodiscovery started working properly
2020-03-22 14:50:25 +01:00
Hector Sanjuan
531379b1d9
Feature: Support multiple listeners in configuration
* add ipv6 listening addresses to the default config

* ipfsproxy: support multiple listeners. Add default ipv6.

* mm

* restapi: support multiple listen addresses. enable ipv6

* cluster_config: format default listen addresses

* commands: update for multiple listeners. Fix randomports for udp and ipv6.

* ipfs-cluster-service: fix randomports test

* multiple listeners: fix remaining tests

* golint

* Disable ipv6 in defaults

It is not supported by docker by default. It is not supported in travis-CI
build environments. User can enable it now manually.

* proxy: disable ipv6 in test

* ipfshttp: fix test

Co-authored-by: @RubenKelevra <cyrond@gmail.com>
2020-02-28 11:16:16 -05:00
Kishan Mohanbhai Sagathiya
a3b8767e87 Added tests for Alerts
- tests for related cluster method, rest api, client method etc
- clean expired alerts everytime a new alerts come in
2019-12-23 12:42:38 +05:30
Kishan Sagathiya
5258a4d428 Remove map pintracker (#944)
This removes mappintracker and sets stateless tracker as the default (and only) pintracker component.

Because the stateless tracker matches the cluster state with only ongoing operations being kept on memory, and additional information provided by ipfs-pin-ls, syncing operations are not necessary. Therefore the Sync/SyncAll operations are removed cluster-wide.
2019-12-12 21:22:54 +01:00
Hector Sanjuan
cf0afdcbc3 Tests: re-enable quic listen address on tests 2019-12-07 12:26:55 +01:00
Kishan Sagathiya
30fd5ee6a0 Feat: Allow and run multiple informer components (#962)
This introduces the possiblity of running Cluster with multiple informer components. The first one in the list is the used for "allocations". This is just groundwork for working with several informers in the future.
2019-12-05 15:08:43 +01:00
Hector Sanjuan
7b4d647267 Tests: multiple fixes, increase timings 2019-11-08 12:46:11 +01:00
Hector Sanjuan
9649664dbd Run tests by default with crdt and not with raft 2019-11-07 20:19:34 +01:00
Hector Sanjuan
249d9007d2 Merge branch 'master' into feat/cluster-gc 2019-11-07 18:35:42 +01:00
Kishan Mohanbhai Sagathiya
4d8ef92b3d health/graph: Improve graph
Mark local, trusted peers. Add peernames. Improve display.
2019-11-07 10:47:29 +01:00
Hector Sanjuan
e34bddd3ab Do not start Autonat when not used
Avoid chaining options as not necessary.

Add functions for common options and protector creation.
2019-11-05 12:51:18 +01:00
Hector Sanjuan
8d7ff58787 Use fixed version of go-libp2p-tls 2019-11-05 12:51:15 +01:00
Hector Sanjuan
669e75aefc libp2p host: add secio as alternative, do not rewrap host
Only use QUIC for tests, as TCP+TLS has proven very unreliable.
2019-11-05 12:50:46 +01:00
Kishan Mohanbhai Sagathiya
56ef75b50c Use TLS instead of secio for security 2019-11-05 12:50:46 +01:00
Kishan Mohanbhai Sagathiya
ce85bfc745 Added support for QUIC
- Cluster peers will now be able dial and listen using QUIC
- By default QUIC is enabled, to disable it remove QUIC listen address
from service.json
- This commit also adds a config option for whether to act as relay or
not, EnableRelayHop
2019-11-05 12:50:46 +01:00
Kishan Mohanbhai Sagathiya
e4e1cbea6e Fix #481: Pin expiration
This adds a new PinOption: ExpireAt.

The StateSync ticker will check and unpin expired pins from the Cluster.

ipfs-cluster-ctl supports an "expire-in" which gives a duration.
2019-11-05 10:40:48 +01:00
Kishan Sagathiya
295915272b Tests: multiple fixes to tests reliability (#943)
This makes a number of fixes to improve the reliability of tests.
2019-10-31 21:51:13 +01:00
Kishan Mohanbhai Sagathiya
492b5612e7 Add ability to run Garbage Collector on all peers
- cluster method, ipfs connector method, rpc and rest apis,
command, etc for repo gc
    - Remove extra space from policy generator
    - Added special timeout for `/repo/gc` call to IPFS
    - Added `RepoGCLocal` cluster rpc method, which will be used to run gc
    on local IPFS daemon
    - Added peer name to the repo gc struct
    - Sorted with peer ids, while formatting(only affects cli
    results)
    - Special timeout setting where timeout gets checked from last update
    - Added `local` argument, which would run gc only on contacted peer
2019-10-22 11:13:19 +05:30
Hector Sanjuan
05be173241 Fix test program for go1.13 2019-10-04 20:01:40 +02:00
Hector Sanjuan
a98292bfa6
Merge pull request #893 from ipfs/feat/recover-all
Pin recover on all peers
2019-09-23 13:13:48 -04:00
Hector Sanjuan
18dad223b2
Merge pull request #912 from ipfs/fix/allocations
Fix: handling allocations
2019-09-09 10:22:14 +02:00
Kishan Mohanbhai Sagathiya
9cb1cdeaff Merge branch 'master' into feat/recover-all 2019-09-08 17:06:43 +07:00
Hector Sanjuan
96752e4e58 Fix: handling allocations
* pin() should not allocate if allocations are already provided
* pin() should not skip pinning if the exact same pin exists
  * Additionally this was unreliable as it allocated it before
    so the pin may have existed but the allocations may have been
    artificially changed.
* pin() re-uses existing pin when pin options are the same and thus
  avoids changing the allocations of a pin.

As a side effect, this fixes re-allocations which were broken: peers
called `shouldPeerRepinCid()` and instead of repinning that single
cid proceeded to repin the full state. For every pin.

Additionally tests have been adapted. It may be that some re-alloc tests
were very unreliable for the problems above.
2019-09-06 17:56:00 +02:00