Discreet Log #23: Improving Cwtch testing using Cucumber/Gherkin scripts
28 Jan 2022
As a small team producing a complex application, automated testing is an important part of the Cwtch development process, providing confidence that new features or fixes aren’t breaking old ones that we may not always think to test manually. In previous posts, we’ve talked about how we use widget testing, integration testing of the Cwtch backend library, and fuzzing to accomplish this. What we’ve been lacking to date, however, is a wide-ranging (feature complete) frontend integration test. This week’s Discreet Log will talk about how we’re accomplishing that alongside the 1.6 release using Gherkin scripting.
At the time of writing, these updates are still in development. Once reviewed and deployed to our automation pipeline, this post will be updated with links to the relevant files and scripts!
UI Scripting
In Discreet Log #5: Improving Safety with Cwtch UI Tests, I showed what UI testing with the basic Flutter toolkit looks like. While this remains great for individual widgets and, technically, can be used for integration testing of larger app features, the team has subsequently found that writing interaction tests in this style can be quite cumbersome. Reading the tests also takes some time and patience, resulting in the test cases falling behind development of new features and updates.
To solve this problem, we’re now implementing a full suite of tests using flutter_gherkin. Gherkin is an interaction scripting language for the Cucumber ecosystem, and using it gives us access to a wider variety of testing and reporting tools. The flutter_gherkin
package (specifically, the 3.0.0+
releases from the integration_test__package_support branch) parses Gherkin feature scripts and converts them to standard Flutter integration_test
scripts, which makes it trivial to add to our existing automation pipeline.
Discreet Log #5 shows an example of the sort of highly-specific, frame-by-frame scripting that needs to be done to implement a test using the integration_test
package alone. The core advantage of using Gherkin is that our tests now look something like this:
@env:aliceandbob1
Feature: Sending and receiving chat messages
Background:
Given I wait until the widget with type "ProfileRow" is present
And I wait for 4 seconds
Given I tap the button that contains the text "Alice"
And I tap the button that contains the text "Bob"
And I wait until the text "Contact is offline, messages can't be delivered right now" is absent
When I fill the "txtCompose" field with "hello! this is a test!"
And I tap the "btnSend" button
Then I expect a "MessageBubble" widget with text "hello! this is a test!\u202F" to be present within 5 seconds
And I tap the back button
And I tap the back button
Scenario: Bob receives the message from Alice
Given I tap the button that contains the text "Bob"
And I tap the button that contains the text "Alice"
Then I expect a "MessageBubble" widget with text "hello! this is a test!\u202F" to be present within 5 seconds
Scenario: Bob replies to a message from Alice
Given I tap the button that contains the text "Bob"
And I tap the button that contains the text "Alice"
When I swipe right by 15 pixels on the widget of type "MessageBubble" with text "hello! this is a test!\u202F"
And I fill the "txtCompose" field with "yay the test worked"
And I tap the "btnSend" button
Then I expect to see the message "yay the test worked\u202F" replying to "hello! this is a test!" within 5 seconds
And I take a screenshot
This format is obviously much easier to both read and write, and provides natural organization that makes it obvious what the feature is actually attempting to test or verify. The flutter_gherkin
package comes with a number of pre-defined steps like “I tap the button containing the text {string}”, but many are written for our specific application and defined using regular expressions. For example, the “message replying to…” step definition looks like this:
StepDefinitionGeneric ExpectReply() {
return given3<String, String, int, FlutterWorld>(
RegExp(
r'I expect to see the message {string} replying to {string} within {int} second(s)$'),
(originalMessage, responseMessage, seconds, context) async {
await context.world.appDriver.waitUntil(
() async {
await context.world.appDriver.waitForAppToSettle();
return await context.world.appDriver.isPresent(
context.world.appDriver.findByDescendant(
context.world.appDriver.findBy(QuotedMessageBubble, FindType.type),
context.world.appDriver.findBy(originalMessage, FindType.text)
)
) && await context.world.appDriver.isPresent(
context.world.appDriver.findByDescendant(
context.world.appDriver.findBy(QuotedMessageBubble, FindType.type),
context.world.appDriver.findBy(responseMessage, FindType.text)
));
},
timeout: Duration(seconds: seconds),
);
},
configuration: StepDefinitionConfiguration()
..timeout = const Duration(days: 1),
);
}
Since Cwtch inherently supports multiple profiles, this deceptively simple-looking test is actually verifying parts of our entire stack, all the way from entering the text in the frontend, to encrypting it and sending it out via an onion-to-onion service connection over the Tor network, to receiving and authenticating and displaying it on the other end. The only other obvious way to accomplish this sort of testing functionality would be to write bots (and indeed, we plan to add Fuzzbot testing to our Gherkin scripts)… this setup instead lets us confine everything relevent to the test in a single feature file.
Environments and persistence
Cwtch started as a desktop application before Android support was added, and we occasionally encounter Flutter packages that are aimed primarily at mobile environments. Cucumber also has a tendency toward testing web applications, where many examples manipulate initial application state by logging in to different user roles. As a desktop application with no remote data storage, our state is persistent, and this can pose a challenge in frameworks that expect all tests to be standalone.
Fortunately, it’s simple to remedy this using tags and hooks. Before each scenario is run, our custom hooks reset the Cwtch working directory according to the tags specified on the feature. For example, we created an @env:clean
tag to test first-run directory setup, instead of the default configuration that resets to a pre-initialized directory for each test. The example feature above uses @env:aliceandbob1
to load a Cwtch profile that has already been initialized with “Alice” and “Bob” profiles and contacts, so the feature file can be more concise and also just so the test suite itself can run more quickly.
We also implemented persist
environment tags, which tell the test driver not to reset the environment between scenarios. This allows us to test persistent features, like feature saving-and-loading, or turning conversation history saving on and off.
Reporting
Since flutter_gherkin
reports test results using Cucumber’s standard JSON format, we’re able to integrate with the many Cucumber reporting/visualization tools out there. For example, cucumber-html-reporter creates something like this:
This report is a single HTML file that even has images embedded (our tests are configured to attach a screenshot if the test fails!) as base64 strings, making it easy to host as a build artifact and have our buildbot link to results in pull requests on Gitea. It also allows us to attach useful information about the test environment or results, such as the currently-packaged Tor version string shown in the screenshot above.
The Open Privacy Cwtch team is extremely excited to incorporate these new tests into our workflow. The Cwtch 1.5.2 release was primarily to fix some regressions we had accidentally introduced in existing features, and end-to-end testing like this should help prevent similar bugs in the future. The natural language syntax provided by Gherkin will also make it easier for us to maintain the tests going forward, and for volunteer contributors to incorporate as well!
Cwtch and Open Privacy depend on individual donations from people like you in order to keep bringing steady, free improvements to Cwtch IM and Cwtch infrastucture projects. If you’re able to, please consider donating or becoming a Patron!