ipfs-cluster

Author	SHA1	Message	Date
Wyatt	47b744f1c0	ipfs-cluster-service state upgrade cli command ipfs-cluster-service now has a migration subcommand that upgrades persistant state snapshots with an out-of-date format version to the newest version of raft state. If all cluster members shutdown with consistent state, upgrade ipfs-cluster, and run the state upgrade command, the new version of cluster will be compatible with persistent storage. ipfs-cluster now validates its persistent state upon loading it and exits with a clear error in the case the state format version is not up to date. Raft snapshotting is enforced on all shutdowns and the json backup is no longer run. This commit makes use of recent changes to libp2p-raft allowing raft states to implement their own marshaling strategies. Now mapstate handles the logic for its (de)serialization. In the interest of supporting various potential upgrade formats the state serialization begins with a varint (right now one byte) describing the version. Some go tests are modified and a go test is added to cover new ipfs-cluster raft snapshot reading functions. Sharness tests are added to cover the state upgrade command.	2017-11-28 22:35:48 -05:00
Hector Sanjuan	2bc7aec079	Merge pull request #242 from elopio/test-snap-in-travis Move the snap deploy script to a separate file	2017-11-16 17:30:18 +01:00
Leo Arias	22c1d2401f	Move back the docker call to travis	2017-11-16 14:36:42 +00:00
Leo Arias	980c812d20	Pass the multiarch environment variables	2017-11-16 07:05:19 +00:00
Leo Arias	58a884373b	Move the snap deploy script to a separate file	2017-11-16 01:52:06 +00:00
Hector Sanjuan	c8dd79b235	Merge pull request #241 from elopio/snap-push-escape Escape the variable in the snapcraft loop	2017-11-16 00:30:55 +01:00
Leo Arias	ce574a6849	Escape the variable in the snapcraft loop	2017-11-15 23:25:22 +00:00
Hector Sanjuan	ff698141f2	sign release commits too License: MIT Signed-off-by: Hector Sanjuan <code@hector.link>	2017-11-15 23:54:31 +01:00
Hector Sanjuan	0c40d50497	gx publish 0.3.0 License: MIT Signed-off-by: Hector Sanjuan <code@hector.link>	2017-11-15 23:40:03 +01:00
Hector Sanjuan	16acaa67c0	Release 0.3.0 License: MIT Signed-off-by: Hector Sanjuan <code@hector.link>	2017-11-15 23:39:50 +01:00
Hector Sanjuan	3726ef8c9d	Include a tag annotation, sign tags License: MIT Signed-off-by: Hector Sanjuan <code@hector.link>	2017-11-15 23:39:22 +01:00
Leo Arias	81e1f3855e	Enable the snap builds for i386, armhf and arm64 (#237 ) Enable the snap builds for i386, armhf and arm64	2017-11-15 23:23:01 +01:00
Hector Sanjuan	1afde29fa3	Merge branch '0.3.0/changelog'	2017-11-15 23:21:18 +01:00
Hector Sanjuan	1d67cb3c66	Merge pull request #238 from ipfs/fix/start-panic cluster: safeguard consensus not set when calling ID	2017-11-15 20:48:08 +01:00
Hector Sanjuan	081384fb7f	cluster: Make peersFromMultiaddrs remove any duplicates. Use it to find out the number of peers in the config and prevent peerAdd test failures. License: MIT Signed-off-by: Hector Sanjuan <code@hector.link>	2017-11-15 18:55:55 +01:00
Hector Sanjuan	47116ab0ce	Merge pull request #239 from ipfs/elopio-patch-1 Get the snap version from git	2017-11-15 18:45:35 +01:00
Leo Arias	bfdfe5e584	Get the snap version from git	2017-11-15 11:22:22 -06:00
Hector Sanjuan	1f93662b3e	cluster: get first peerset from configuration make sure we save a new config if the new peerset is different than the one in the configuration at boot. Hopefully this fixes a race condition in PeerAdd test License: MIT Signed-off-by: Hector Sanjuan <code@hector.link>	2017-11-15 18:01:49 +01:00
Hector Sanjuan	a656e45375	cluster: safeguard consensus not set when calling ID SwarmConnect on the ipfs connector calls rpc Peers() which requests IDs for every peer member. If that peer member is booting, it might get the request after RPC is setup but before consensus is initialized. In which case a panic happens. Probability that this happens is small, but still. Also increase the connect swarms delay to 30 seconds, which should be a bit longer than the default wait_for_leader timeout, otherwise we might connect swarms while there's not even a leader. License: MIT Signed-off-by: Hector Sanjuan <hector@protocol.ai>	2017-11-15 16:38:21 +01:00
Hector Sanjuan	7230704dd2	Add one more bugfix License: MIT Signed-off-by: Hector Sanjuan <hector@protocol.ai>	2017-11-15 16:29:10 +01:00
Hector Sanjuan	baa6bc65c0	Update to captain's log License: MIT Signed-off-by: Hector Sanjuan <hector@protocol.ai>	2017-11-15 14:55:34 +01:00
Hector Sanjuan	133117e3e8	Changelog for v0.3.0 License: MIT Signed-off-by: Hector Sanjuan <hector@protocol.ai>	2017-11-15 14:55:12 +01:00
Hector Sanjuan	a1f1ef15d8	Merge pull request #236 from ipfs/fix/minor-fixes Fix/minor fixes	2017-11-15 12:09:30 +01:00
Hector Sanjuan	76fe62fae4	Fix error message when not enough candidates for pinning exist License: MIT Signed-off-by: Hector Sanjuan <hector@protocol.ai>	2017-11-15 03:17:08 +01:00
Hector Sanjuan	145dced3e8	Cluster: Fix libp2p host getting shutdown in the middle of peer removal This is what it was likely causing PeerRemove tests to fail randomly but very often. We cancelled the Cluster context before shutting down the Consensus component. This killed networking and aborted the peer remove operations when the leader is removing itself. As a result, it would error with "leadership lost", which would trigger a retry which would set the final error to "context cancelled" because the shutdown of the consensus component proceeds during the retry, cancelling the consensus context. This is not only affecting tests, it might affected operations when running cluster. License: MIT Signed-off-by: Hector Sanjuan <hector@protocol.ai>	2017-11-15 02:33:46 +01:00
Hector Sanjuan	41b6a114c1	Merge pull request #223 from ipfs/192-docs Documentation: bring in line to 0.3.0	2017-11-14 23:58:33 +01:00
Hector Sanjuan	cc81ffe96b	cluster: peerAdd: try to return an up-to-date new peer ID Sometimes tests fail because the returned api.ID for a new peer does not include the current cluster peers. This is because the new peerset has not yet be commited in the new peer at the time of the request. This commit retries obtaining the request until the correct peerset comes in, or gives up after two seconds retrying. Rather than the tests failing, note that the ID returned it is very user-facing and should contain the current cluster peers after adding, and not the former peerset, at least while peerAdd operation is allowed. License: MIT Signed-off-by: Hector Sanjuan <hector@protocol.ai>	2017-11-14 23:54:23 +01:00
Hector Sanjuan	417f30c9ea	Avoid shutting down consensus in the middle of a commit I think this will prevents some random tests failures when we realize that we are not anymore in the peerset and trigger a shutdown but Raft has not finished fully committing the operation, which then triggers an error, and a retry. But the contexts are cancelled in the retry so it won't find a leader and will error finally error with that message. License: MIT Signed-off-by: Hector Sanjuan <hector@protocol.ai>	2017-11-14 23:29:56 +01:00
Hector Sanjuan	15bb953afd	Issue #192 : Update docs to the new peerset handling Also, start removing mentions of `PeerAdd` operation, as things should happen with `--bootstrap` (Join). PeerAdd should not be part of the user workflow and might disappear or be hidden in the future. License: MIT Signed-off-by: Hector Sanjuan <hector@protocol.ai>	2017-11-14 22:06:59 +01:00
Hector Sanjuan	0525403f8c	cluster-service: make sure we wait for configuration to be saved on init. License: MIT Signed-off-by: Hector Sanjuan <hector@protocol.ai>	2017-11-14 21:06:10 +01:00
Hector Sanjuan	5e465c3f62	PeerAdd: send cluster multiaddresses to new peer before adding it to raft It seems more logical that the new peer should know how to contact everyone before it needs to start doing it, but it probably does not matter much. Still, more logical. License: MIT Signed-off-by: Hector Sanjuan <hector@protocol.ai>	2017-11-14 21:04:33 +01:00
Hector Sanjuan	398865f381	Issue #192 : Fix typos and address feedback License: MIT Signed-off-by: Hector Sanjuan <hector@protocol.ai>	2017-11-14 19:26:49 +01:00
Hector Sanjuan	7baa4c9c6d	Guide/Troubleshooting: add the meanings for some libp2p errors License: MIT Signed-off-by: Hector Sanjuan <hector@protocol.ai>	2017-11-14 19:26:49 +01:00
Hector Sanjuan	2c3085586c	Documentation: bring in line to 0.3.0 Review documentation to be in line with latest updates to Raft and any other feature introduced since 0.12.0. License: MIT Signed-off-by: Hector Sanjuan <hector@protocol.ai>	2017-11-14 19:26:49 +01:00
Hector Sanjuan	5a7c1fc847	Re-enable snapcraft build on travis License: MIT Signed-off-by: Hector Sanjuan <hector@protocol.ai>	2017-11-14 19:06:05 +01:00
Hector Sanjuan	09cd86e183	Merge pull request #234 from ipfs/feat/snap-travis-ci Enable travis for snapcraft builds	2017-11-14 18:24:32 +01:00
Hector Sanjuan	798571d3fc	Merge pull request #226 from ipfs/feat/219-peers Fix #219: WIP: Remove duplicate peer accounting	2017-11-14 18:23:39 +01:00
Hector Sanjuan	4c3f16d7aa	Merge pull request #233 from ipfs/feat/raft-state-backups Raft: do not ever remove state, rename it and leave it around	2017-11-14 18:13:15 +01:00
Hector Sanjuan	33c325b865	cluster-service: Fix too many arguments for panic License: MIT Signed-off-by: Hector Sanjuan <hector@protocol.ai>	2017-11-14 18:01:32 +01:00
Hector Sanjuan	18ff7a79cc	cluster-service: Upstream snap patch Snaps set a custom $HOME, but we were using /etc/passwd. There might be other cases were using a custom $HOME might be handy. In UNIX systems, $HOME should be always set. For all the rest, we fall back to the original os/user.HomeDir method. License: MIT Signed-off-by: Hector Sanjuan <hector@protocol.ai>	2017-11-14 13:23:43 +01:00
Hector Sanjuan	dc3f5b202e	Merge pull request #231 from ipfs/fix/224-pin-progress Fix #224: Better handling of progress updates when proxy-adding file	2017-11-14 12:33:09 +01:00
Hector Sanjuan	2837170d51	Raft: improve test error message License: MIT Signed-off-by: Hector Sanjuan <hector@protocol.ai>	2017-11-14 12:31:58 +01:00
Hector Sanjuan	b6ba6d5a1e	Issue #219 : Clean up peer manager. Rename Peers RPC call License: MIT Signed-off-by: Hector Sanjuan <hector@protocol.ai>	2017-11-14 12:26:42 +01:00
Hector Sanjuan	ef86f718e7	Enable travis for snapcraft builds License: MIT Signed-off-by: Hector Sanjuan <hector@protocol.ai>	2017-11-14 01:12:36 +01:00
Hector Sanjuan	4ca12026dc	Merge pull request #232 from elopio/snapcraft Add the packaging metadata to build the ipfs-cluster snap	2017-11-14 00:29:27 +01:00
Hector Sanjuan	1a06baeb23	Raft: do not ever remove state, rename it and leave it around This commit changes the way that consensus.Clean() works. Before it deleted the whole data folder. Now it renames it as <name>.old.0 and leaves it. When Clean() is called again, it renames <name>.old.0 as <name>.old.1, and the actual data becomes <name>.old.0. Higher number means older. The number of backups is fixed to 5. When 5 backups exists and a new one comes up again, the last one is discarded. License: MIT Signed-off-by: Hector Sanjuan <hector@protocol.ai>	2017-11-13 18:18:52 +01:00
Leo Arias	4a45e5def5	Add the packaging metadata to build the ipfs-cluster snap	2017-11-13 16:43:48 +00:00
Hector Sanjuan	f14f2f4863	Fix #224 : Better handling of progress updates when proxy-adding file Up to now, we hardcoded progress to "false" in the proxy, regardless of what the original request said. We now leave it as it is, and just ignore any progress updates when processing the response. Since the response is buffered and sent back all together, they are still useless, but at least the clients (ipfs cli) won't show a 0% progress bar when successfully adding a file. License: MIT Signed-off-by: Hector Sanjuan <hector@protocol.ai>	2017-11-13 14:09:20 +01:00
Hector Sanjuan	2323deac6c	Issue #219 : Fix TestClustersPeerRemoveReallocsPins This test failed if the leader was randomly selected to be the node on which we wait for leader. Needed to remove the shutdown-leader from the clusters slice. License: MIT Signed-off-by: Hector Sanjuan <hector@protocol.ai>	2017-11-10 21:26:37 +01:00
Hector Sanjuan	f37c6c0620	Issue #219 : Fix api types tests License: MIT Signed-off-by: Hector Sanjuan <hector@protocol.ai>	2017-11-10 18:23:38 +01:00

1 2 3 4 5 ...

500 Commits