Discreet Log #23: Improving Cwtch testing using Cucumber/Gherkin scripts

28 Jan 2022

Welcome to Discreet Log! A fortnightly technical development blog to provide an in-depth look into the research, projects and tools that we work on at Open Privacy. For our twenty-third post Erinn Atwater talks about upgrading Cwtch integration and regression testing using Cucumber and Gherkin scripts.

As a small team producing a complex application, automated testing is an important part of the Cwtch development process, providing confidence that new features or fixes aren’t breaking old ones that we may not always think to test manually. In previous posts, we’ve talked about how we use widget testing, integration testing of the Cwtch backend library, and fuzzing to accomplish this. What we’ve been lacking to date, however, is a wide-ranging (feature complete) frontend integration test. This week’s Discreet Log will talk about how we’re accomplishing that alongside the 1.6 release using Gherkin scripting.

At the time of writing, these updates are still in development. Once reviewed and deployed to our automation pipeline, this post will be updated with links to the relevant files and scripts!

Animated gif showing the automated test running on the Cwtch UI

UI Scripting

In Discreet Log #5: Improving Safety with Cwtch UI Tests, I showed what UI testing with the basic Flutter toolkit looks like. While this remains great for individual widgets and, technically, can be used for integration testing of larger app features, the team has subsequently found that writing interaction tests in this style can be quite cumbersome. Reading the tests also takes some time and patience, resulting in the test cases falling behind development of new features and updates.

To solve this problem, we’re now implementing a full suite of tests using flutter_gherkin. Gherkin is an interaction scripting language for the Cucumber ecosystem, and using it gives us access to a wider variety of testing and reporting tools. The flutter_gherkin package (specifically, the 3.0.0+ releases from the integration_test__package_support branch) parses Gherkin feature scripts and converts them to standard Flutter integration_test scripts, which makes it trivial to add to our existing automation pipeline.

Discreet Log #5 shows an example of the sort of highly-specific, frame-by-frame scripting that needs to be done to implement a test using the integration_test package alone. The core advantage of using Gherkin is that our tests now look something like this:

Feature: Sending and receiving chat messages
    Given I wait until the widget with type "ProfileRow" is present
    And I wait for 4 seconds
    Given I tap the button that contains the text "Alice"
    And I tap the button that contains the text "Bob"
    And I wait until the text "Contact is offline, messages can't be delivered right now" is absent
    When I fill the "txtCompose" field with "hello! this is a test!"
    And I tap the "btnSend" button
    Then I expect a "MessageBubble" widget with text "hello! this is a test!\u202F" to be present within 5 seconds
    And I tap the back button
    And I tap the back button

  Scenario: Bob receives the message from Alice
    Given I tap the button that contains the text "Bob"
    And I tap the button that contains the text "Alice"
    Then I expect a "MessageBubble" widget with text "hello! this is a test!\u202F" to be present within 5 seconds

  Scenario: Bob replies to a message from Alice
    Given I tap the button that contains the text "Bob"
    And I tap the button that contains the text "Alice"
    When I swipe right by 15 pixels on the widget of type "MessageBubble" with text "hello! this is a test!\u202F"
    And I fill the "txtCompose" field with "yay the test worked"
    And I tap the "btnSend" button
    Then I expect to see the message "yay the test worked\u202F" replying to "hello! this is a test!" within 5 seconds
    And I take a screenshot

This format is obviously much easier to both read and write, and provides natural organization that makes it obvious what the feature is actually attempting to test or verify. The flutter_gherkin package comes with a number of pre-defined steps like “I tap the button containing the text {string}”, but many are written for our specific application and defined using regular expressions. For example, the “message replying to…” step definition looks like this:

StepDefinitionGeneric ExpectReply() {
  return given3<String, String, int, FlutterWorld>(
        r'I expect to see the message {string} replying to {string} within {int} second(s)$'),
        (originalMessage, responseMessage, seconds, context) async {
      await context.world.appDriver.waitUntil(
            () async {
          await context.world.appDriver.waitForAppToSettle();

          return await context.world.appDriver.isPresent(
                  context.world.appDriver.findBy(QuotedMessageBubble, FindType.type),
                  context.world.appDriver.findBy(originalMessage, FindType.text)
          ) && await context.world.appDriver.isPresent(
                  context.world.appDriver.findBy(QuotedMessageBubble, FindType.type),
                  context.world.appDriver.findBy(responseMessage, FindType.text)
        timeout: Duration(seconds: seconds),
    configuration: StepDefinitionConfiguration()
      ..timeout = const Duration(days: 1),

Since Cwtch inherently supports multiple profiles, this deceptively simple-looking test is actually verifying parts of our entire stack, all the way from entering the text in the frontend, to encrypting it and sending it out via an onion-to-onion service connection over the Tor network, to receiving and authenticating and displaying it on the other end. The only other obvious way to accomplish this sort of testing functionality would be to write bots (and indeed, we plan to add Fuzzbot testing to our Gherkin scripts)… this setup instead lets us confine everything relevent to the test in a single feature file.

Environments and persistence

Cwtch started as a desktop application before Android support was added, and we occasionally encounter Flutter packages that are aimed primarily at mobile environments. Cucumber also has a tendency toward testing web applications, where many examples manipulate initial application state by logging in to different user roles. As a desktop application with no remote data storage, our state is persistent, and this can pose a challenge in frameworks that expect all tests to be standalone.

Fortunately, it’s simple to remedy this using tags and hooks. Before each scenario is run, our custom hooks reset the Cwtch working directory according to the tags specified on the feature. For example, we created an @env:clean tag to test first-run directory setup, instead of the default configuration that resets to a pre-initialized directory for each test. The example feature above uses @env:aliceandbob1 to load a Cwtch profile that has already been initialized with “Alice” and “Bob” profiles and contacts, so the feature file can be more concise and also just so the test suite itself can run more quickly.

We also implemented persist environment tags, which tell the test driver not to reset the environment between scenarios. This allows us to test persistent features, like feature saving-and-loading, or turning conversation history saving on and off.


Since flutter_gherkin reports test results using Cucumber’s standard JSON format, we’re able to integrate with the many Cucumber reporting/visualization tools out there. For example, cucumber-html-reporter creates something like this:

Screenshot of cucumber reporting tool showing all tests passed and a screenshot of Cwtch taken during the test

This report is a single HTML file that even has images embedded (our tests are configured to attach a screenshot if the test fails!) as base64 strings, making it easy to host as a build artifact and have our buildbot link to results in pull requests on Gitea. It also allows us to attach useful information about the test environment or results, such as the currently-packaged Tor version string shown in the screenshot above.

The Open Privacy Cwtch team is extremely excited to incorporate these new tests into our workflow. The Cwtch 1.5.2 release was primarily to fix some regressions we had accidentally introduced in existing features, and end-to-end testing like this should help prevent similar bugs in the future. The natural language syntax provided by Gherkin will also make it easier for us to maintain the tests going forward, and for volunteer contributors to incorporate as well!

Cwtch and Open Privacy depend on individual donations from people like you in order to keep bringing steady, free improvements to Cwtch IM and Cwtch infrastucture projects. If you’re able to, please consider donating or becoming a Patron!

Donate to Open Privacy


Donations of $5 or more receive stickers as a thank-you gift, and $25 or more gets you one of our new sticker sheets! To celebrate our 4th anniversary, we'll even count cumulative pledges since November 2021 to our Patreon.


Donations of $50 or more can claim a limited edition Privacy is Consent t-shirt as a thank-you gift! By popular request, these black tshirts use high quality screen-printing done locally in Vancouver. Available in both unisex and fitted sizes.

Open Privacy is an incorporated non-profit society in British Columbia, Canada. Donations are not tax deductible. You can Donate Once via Bitcoin, Monero, Zcash, and Paypal, or you can Donate Monthly via Patreon or Paypal. Please contact us to arrange a donation by other methods.

What is Discreet Log?

Discreet Log is a fortnightly technical development blog to give a more in-depth look at the research, projects and tools that we work on at Open Privacy.

More Discreet Logs