Discreet Log #6: A Tour of the Cwtch Integration Test

30 Apr 2021

Welcome to Discreet Log! A fortnightly technical development blog to provide an in-depth look into the research, projects and tools that we work on at Open Privacy. For our sixth post Dan Ballard takes us on a tour of Cwtch's core integration test with illustrations from Marcia Díaz Agudelo our Staff Designer

I talked previously on how we had built our automated build and test system and the benefits in quality that gave us. Today I’d like to zoom in to one of my favourite pieces of our quality assurance infrastructure, the Cwtch integration test. This test is important because we’ve written it to hit as much of the Cwtch code base as possible and test it works in one go. It also sits on top of several other crucial components and thus gives them extra coverage and workout too, such as Tapir and our Connectivity package.

Broad integration test coverage helps make sure that code we don’t use in out daily development doesn’t get silently broken by accident. For Cwtch, our core library, covering the protocol, engine, storage, peer logic, and app interfaces is of the utmost importance. So without further ado, welcome to the tour of the core Cwtch integration test.

Setting up the Network

func TestCwtchPeerIntegration(t *testing.T) {
	numGoRoutinesStart := runtime.NumGoroutine()

	...

	     tor.NewTorrc().WithSocksPort(socksPort).WithOnionTrafficOnly().WithHashedPassword(base64.StdEncoding.EncodeToString(key)).WithControlPort(controlPort).Build("tordir/tor/torrc")
	acn, err := tor.NewTorACNWithAuth("./tordir", path.Join("..", "tor"), controlPort, tor.HashedPasswordAuthenticator{Password: base64.StdEncoding.EncodeToString(key)})
	if err != nil {
		t.Fatalf("Could not start Tor: %v", err)
	}
	...

The test starts and does some boiler plate setup that any app would be required to such as checking and creating directories for use for storing app data, assigning port numbers and generating a Tor password. This all culminates in a call to NewTorACNWithAuth which is from our separate connectivity package which supplies Anonymous Communication Network primitives (currently only Tor is supported). If this is successful, we now have a properly setup ACN using Tor and the directories in place to start some Cwtch apps.

Also of note is that throughout the run of the test, starting at the beginning, we record the number of goroutines, so that at the end we can account for all of them, catch if there is a Go Routine leak, in addition to providing metrics on use. This tracking has caught a lot of leaks over the years, no more than when we first wrote it and immediately had to track down several leaks it discovered.

Setting up the Server

	var server *cwtchserver.Server

	var serverAddr string
	var serverKeyBundle []byte
	// launch app with new key
	server = new(cwtchserver.Server)
	fmt.Println("Starting cwtch server...")
	os.Remove("server-test.json")
	config := cwtchserver.LoadConfig(".", "server-test.json")
	identity := config.Identity()
	serverAddr = identity.Hostname()
	server.Setup(config)
	serverKeyBundle, _ = json.Marshal(server.KeyBundle())
	log.Debugf("server key bundle %s", serverKeyBundle)
	go server.Run(acn)

	// let tor get established
	fmt.Printf("Establishing Tor hidden service: %v...\n", serverAddr)

	numGoRoutinesPostServer := runtime.NumGoroutine()

We set up a Cwtch server to host a group. By using a group instead of just a quicker P2P message we cover a whole host of group and server functionality with this test.

Setting up the App

	app := app2.NewApp(acn, "./storage")
	...
	bridgeClient := bridge.NewPipeBridgeClient(path.Join(cwtchDir, "testing/clientPipe"), path.Join(cwtchDir, "testing/servicePipe"))
	bridgeService := bridge.NewPipeBridgeService(path.Join(cwtchDir, "testing/servicePipe"), path.Join(cwtchDir, "testing/clientPipe"))
	appClient := app2.NewAppClient("./storage", bridgeClient)
	appService := app2.NewAppService(acn, "./storage", bridgeService)

	numGoRoutinesPostAppStart := runtime.NumGoroutine()

	// ***** cwtchPeer setup *****

	fmt.Println("Creating Alice...")
	app.CreatePeer("alice", "asdfasdf")

	fmt.Println("Creating Bob...")
	app.CreatePeer("bob", "asdfasdf")

	fmt.Println("Creating Carol...")
	appClient.CreatePeer("carol", "asdfasdf")

Here we create a cwtch instance, and the first two profiles, Alice and Bob. We create an app client and app service and a third profile, Carol.

That last step requires some explanation; Desktop Cwtch uses the all-in-one app, but for Android we require a split between parts of the application so that only a small part of Cwtch can run in the background to conserve battery and allow Cwtch to resume as seamlessly as possible.

Having Carol test out the split application architecture has been very valuable as we don’t often run this app mode during most desktop-based development. On many occasions the integration test has been the sole early indicator that some logic might fail on Android, without the overhead of constant Android testing.

Launching the Peers

	app.LaunchPeers()
	appClient.LaunchPeers()
	...
	numGoRoutinesPostPeerStart := runtime.NumGoroutine()

	fmt.Println("Alice joining server...")
	if err := alice.AddServer(string(serverKeyBundle)); err != nil {
		t.Fatalf("Failed to Add Server Bundle %v", err)
	}
	alice.JoinServer(serverAddr)

	fmt.Println("Alice peering with Bob...")
	alice.PeerWithOnion(bob.GetOnion())

	fmt.Println("Alice peering with Carol...")
	alice.PeerWithOnion(carol.GetOnion())

Alice connects to the server and peers with Bob and Carol.

Starting a Group

	groupID, _, err := alice.StartGroup(serverAddr)
	fmt.Printf("Created group: %v!\n", groupID)
	if err != nil {
		t.Errorf("Failed to init group: %v", err)
		return
	}

	bob.AddContact("alice?", alice.GetOnion(), model.AuthApproved)
	bob.AddServer(string(serverKeyBundle))
	bob.SetContactAuthorization(alice.GetOnion(), model.AuthApproved)

	waitForPeerPeerConnection(t, alice, carol)
	carol.AddContact("alice?", alice.GetOnion(), model.AuthApproved)
	carol.AddServer(string(serverKeyBundle))
	carol.SetContactAuthorization(alice.GetOnion(), model.AuthApproved)

	alice.SendGetValToPeer(bob.GetOnion(), attr.PublicScope, "name")
	bob.SendGetValToPeer(alice.GetOnion(), attr.PublicScope, "name")

	alice.SendGetValToPeer(carol.GetOnion(), attr.PublicScope, "name")
	carol.SendGetValToPeer(alice.GetOnion(), attr.PublicScope, "name")

Alice and Bob exchange attribute values for name

Alice creates a group on the server. Bob and Carol add Alice as a contact, and then we follow the processes the UI follows of sending peer attributes to each other. Following this we test that each peer received each other’s attributes giving coverage of sending peer to peer messages and the whole attribute processing system.

Sending out the Invites

	fmt.Println("Alice inviting Bob to group...")
	err = alice.InviteOnionToGroup(bob.GetOnion(), groupID)

Alice now uses the peer connection to invite Bob to the group on the server. After this is some code to manually find the invite on the Bob peer and accept it, mimicking a real person and client surfacing the peer request and it being accepted by them.

Time to Chat

	_, err = alice.SendMessageToGroupTracked(groupID, aliceLines[0])
...
	_, err = bob.SendMessageToGroupTracked(groupID, bobLines[0])
...
	alice.SendMessageToGroupTracked(groupID, aliceLines[1])
...
	bob.SendMessageToGroupTracked(groupID, bobLines[1])

Alice and Bob take turns saying pre-canned lines (so we can check the others saw them later). We check they were each able to send their first message, leaving additional group timeline checks until the end to catch additional errors.

Growing

	err = alice.InviteOnionToGroup(carol.GetOnion(), groupID)
	if err != nil {
		t.Fatalf("Error for Alice inviting Carol to group: %v", err)
	}

	fmt.Println("Carol examining groups and accepting invites...")
	for _, groupID := range carol.GetGroups() {
		group := carol.GetGroup(groupID)
		fmt.Printf("Carol group: %v (Accepted: %v)\n", group.GroupID, group.Accepted)
		if group.Accepted == false {
			fmt.Printf("Carol received and accepting group invite: %v\n", group.GroupID)
			carol.AcceptInvite(group.GroupID)
		}
	}

	fmt.Println("Shutting down Alice...")
	app.ShutdownPeer(alice.GetOnion())
	numGoRoutinesPostAlice := runtime.NumGoroutine()

	fmt.Println("Carol joining server...")
	carol.JoinServer(serverAddr)
	waitForPeerGroupConnection(t, carol, groupID)
	numGoRotinesPostCarolConnect := runtime.NumGoroutine()

Alice, done talking with Bob, now invites Carol to the group. This sets up our chance to test group history resumption. Alice and Bob should both have this history as they were both in the group for it, but if we can later confirm Carol has it too, that means fetching history is covered and working. Then Alice shuts down and Carol joins the group. While it really shouldn’t be needed this will confirm peer shutdown works if Alice does not have any further group history while Bob and Carol talk.

Time to Verify

	bob.SendMessageToGroupTracked(groupID, bobLines[2])
...	
	carol.SendMessageToGroupTracked(groupID, carolLines[0])

	fmt.Printf("Alice's TimeLine:\n")
	aliceVerified := printAndCountVerifedTimeline(t, alicesGroup.GetTimeline())
	if aliceVerified != 4 {
		t.Errorf("Alice did not have 4 verified messages")
	}
...	
	fmt.Printf("Bob's TimeLine:\n")
	bobVerified := printAndCountVerifedTimeline(t, bobsGroup.GetTimeline())
	if bobVerified != 6 {
		t.Errorf("Bob did not have 6 verified messages")
	}
...
	if len(alicesGroup.GetTimeline()) != 4 {
		t.Errorf("Alice's timeline does not have all messages")
	} else {
		// check message 0,1,2,3
		aliceGroupTimeline := alicesGroup.GetTimeline()
		if aliceGroupTimeline[0].Message != aliceLines[0] || aliceGroupTimeline[1].Message != bobLines[0] ||
			aliceGroupTimeline[2].Message != aliceLines[1] || aliceGroupTimeline[3].Message != bobLines[1] {
			t.Errorf("Some of Alice's timeline messages did not have the expected content!")
		}
	}

	if len(bobsGroup.GetTimeline()) != 6 {
		t.Errorf("Bob's timeline does not have all messages")
	} else {
		// check message 0,1,2,3,4,5
		bobGroupTimeline := bobsGroup.GetTimeline()
		if bobGroupTimeline[0].Message != aliceLines[0] || bobGroupTimeline[1].Message != bobLines[0] ||
			bobGroupTimeline[2].Message != aliceLines[1] || bobGroupTimeline[3].Message != bobLines[1] ||
			bobGroupTimeline[4].Message != bobLines[2] || bobGroupTimeline[5].Message != carolLines[0] {
			t.Errorf("Some of Bob's timeline messages did not have the expected content!")
		}
	}

	if len(carolsGroup.GetTimeline()) != 6 {
		t.Errorf("Carol's timeline does not have all messages")
	} else {
		// check message 0,1,2,3,4,5
		carolGroupTimeline := carolsGroup.GetTimeline()
		if carolGroupTimeline[0].Message != aliceLines[0] || carolGroupTimeline[1].Message != bobLines[0] ||
			carolGroupTimeline[2].Message != aliceLines[1] || carolGroupTimeline[3].Message != bobLines[1] ||
			carolGroupTimeline[4].Message != bobLines[2] || carolGroupTimeline[5].Message != carolLines[0] {
			t.Errorf("Some of Carol's timeline messages did not have the expected content!")
		}
	}

We check that each pair has the correct message count in their individual time lines and then further verify each message has the expected contents. This is where we can catch lots of errors if their are protocol or message problems:

Final Checks

	app.ShutdownPeer(bob.GetOnion())	
	numGoRoutinesPostBob := runtime.NumGoroutine()
	if server != nil {
		fmt.Println("Shutting down server...")
		server.Shutdown()
		time.Sleep(time.Second * 3)
	}
	numGoRoutinesPostServerShutdown := runtime.NumGoroutine()

	fmt.Println("Shutting down Carol...")
	appClient.ShutdownPeer(carol.GetOnion())	
	numGoRoutinesPostCarol := runtime.NumGoroutine()

	fmt.Println("Shutting down apps...")
	fmt.Printf("app Shutdown: %v\n", runtime.NumGoroutine())
	app.Shutdown()
	fmt.Printf("appClientShutdown: %v\n", runtime.NumGoroutine())
	appClient.Shutdown()
	fmt.Printf("appServiceShutdown: %v\n", runtime.NumGoroutine())
	appService.Shutdown()

	fmt.Printf("bridgeClientShutdown: %v\n", runtime.NumGoroutine())
	bridgeClient.Shutdown()

	fmt.Printf("brideServiceShutdown: %v\n", runtime.NumGoroutine())
	bridgeService.Shutdown()

	fmt.Printf("Done shutdown: %v\n", runtime.NumGoroutine())
	numGoRoutinesPostAppShutdown := runtime.NumGoroutine()

	fmt.Println("Shutting down ACN...")
	acn.Close()
	numGoRoutinesPostACN := runtime.NumGoroutine()

We shutdown Bob next, then the server, and finally Carol. When writing integration tests it’s always good to subvert the logical order (e.g. by shutting down the server in between the clients, so you can cover events such as a profile handling the server going away). After tearing down Carol we shut down the rest of the cwtch application infrastructure; the app service and client, their communication bridge, and finally the ACN.

Metrics and Resource Leaks

	pprof.Lookup("goroutine").WriteTo(os.Stdout, 1)

	fmt.Printf("numGoRoutinesStart: %v\nnumGoRoutinesPostServer: %v\nnumGoRoutinesPostAppStart: %v\nnumGoRoutinesPostPeerStart: %v\nnumGoRoutinesPostPeerAndServerConnect: %v\n"+
		"numGoRoutinesPostAlice: %v\nnumGoRotinesPostCarolConnect: %v\nnumGoRoutinesPostBob: %v\nnumGoRoutinesPostServerShutdown: %v\nnumGoRoutinesPostCarol: %v\nnumGoRoutinesPostAppShutdown: %v\nnumGoRoutinesPostACN: %v\n",
		numGoRoutinesStart, numGoRoutinesPostServer, numGoRoutinesPostAppStart, numGoRoutinesPostPeerStart, numGoRoutinesPostServerConnect,
		numGoRoutinesPostAlice, numGoRotinesPostCarolConnect, numGoRoutinesPostBob, numGoRoutinesPostServerShutdown, numGoRoutinesPostCarol, numGoRoutinesPostAppShutdown, numGoRoutinesPostACN)

	if numGoRoutinesStart != numGoRoutinesPostACN {
		t.Errorf("Number of GoRoutines at start (%v) does not match number of goRoutines after cleanup of peers and servers (%v), clean up failed, leak detected!", numGoRoutinesStart, numGoRoutinesPostACN)
	}

Lastly we print a profile of goroutines and a log of goroutine measurements throughout the test’s run and check that we now are back to starting levels of goroutine. As mentioned previously, this final check has been incredibly valuable in terms of preventing easily introduced resource leaks, and I would strongly recommend anyone working with Go give this a try in their tests as an extra layer of coverage.

Continuous Integration

Integration tests aren’t worth much if they aren’t run and we can’t rely on ourselves to always run slow or long tests all the time when we’re developing and so we need to make sure they are plugged into our build automation. In our case they get run against every pull request automatically on our build server which reports back to the PR.

Continuous Investment

Integration tests are an investment, often steepest at first if you have an already existing code base you are introducing them to you need to get coverage on. As Cwtch has grown we’ve continually revisited them to work in covering new features.

In a case of “not quite” Test Driven Development, the Cwtch integration test soon became our primary library client before full actual Client UI work began. As soon as we wrote it, it started paying off. It has found protocol bugs that our unit tests couldn’t catch, it found bugs that could only be discovered by spinning up peers, a server, and making them interact over a group. It immediately caught resource leaks.

Coupled with our automation it’s been one of our most important pieces of quality assurance. If you work on software and don’t have integration tests with full end to end coverage of usage, especially if you work on a piece of networked software, I really hope this might inspire you to take a look at starting.

Quality work takes time and resources; if you’d like to help us with Cwtch, please consider donating!.

Discreet Log #6: A Tour of the Cwtch Integration Test

Setting up the Network

Setting up the Server

Setting up the App

Launching the Peers

Starting a Group

Sending out the Invites

Time to Chat

Growing

Time to Verify

Final Checks

Metrics and Resource Leaks

Continuous Integration

Continuous Investment

Donate to Open Privacy

Donate via Patreon

Donate via Cryptocurrencies

Stickers!

What is Discreet Log?

More Discreet Logs

Contact Us

In Brief

Organization

Communication