Discreet Log #1: Anonymity, Bandwidth and Fuzzytags.

19 Feb 2021

Welcome to Discreet Log! A fortnightly technical development blog to provide an in-depth look into the research, projects and tools that we work on at Open Privacy. For our first post Sarah Jamie Lewis will take us through her investigations into some new research on “fuzzy message detection” schemes that might hold the key to bandwidth efficient metadata resistant offline messaging in Cwtch…

Anonymous messaging systems (and other privacy-preserving applications) often require a mechanism for one party to learn that another party has messaged them.

Many schemes rely on a bandwidth-intensive “download everything and attempt-decryption” approach. Others rely on a trusted 3rd party, or non-collusion assumptions, to provide a “private” service.

It would be awesome if we could get an untrusted, adversarial server to do some of that work for us without compromising metadata-resistance.

Recently a paper called “Fuzzy Message Detection” was published by Gabrielle Beck, Julia Len, Ian Miers, Matthew Green. They presented a number of novel “fuzzy message detection” schemes.

These schemes “allow a recipient to derive a specialized message detection key that can identify correct messages, while also incorrectly identifying non-matching messages with a specific and chosen false positive rate p. This allows recipients to outsource detection work to an untrustworthy server, without revealing precisely which messages belong to the receiver”.

This is really cool! Traditionally we’ve only been able to narrow down bandwidth requirements by “bucketing” recipients - this comes at the cost of partitioning the anonymity set for a round. Fuzzy message detection schemes allow us to keep one large anonymity set, and don’t require senders to know anything about the false positive rate of the receiver (or derive any kind of special bucket to put messages in).

One of the big problems we have in Cwtch is how to support groups while preserving metadata resistance. Our prototypes have always relied on the “download-everything and decrypt approach” which is very bandwidth intensive for both peers and servers - which is especially problematic for mobile applications. As such we are very interested in exploring any possible solutions that new research might present.

Introducing Fuzzytags

To kickstart our investigations we released fuzzytags an experimental probabilistic cryptographic tagging structure based on the FMD2 scheme described in Fuzzy Message Detection paper, written in Rust.

Having a concrete open source implementation has forced us to start exploring what a secure, misuse-resistant API for fuzzy message detection might look like, in addition to various integration concerns like serialization of fuzzytags and keys.

A Brief Introduction

With the help of several volunteers we have arrived at some specific terminology for the various parts that make up the fuzzytag system. Some of this terminology differs from the original paper, so we will provide a brief introduction here:

Every party generates a RootSecret, from which they can derive a DetectionKey and a TaggingKey. These keys will be generated with a parameter γ that relates to the minimum false-positive probability 2^-γ.

When submitting messages to the server for an intended recipient, the sender will generate a new Tag from the recipients TaggingKey.

All parties will extract a DetectionKey from their key pair. This key will be of length n and provide a false positive detection probability of 0 <= 2^-n <= 2^-γ. This DetectionKey can be given to an adversarial server.

When fetching new messages from the adversarial server, the server first runs a test of the Tag of the message against the parties’ detection key. If the Tag passes the test, the message (along with the tag) is provided to the recipient.

Simulations, Choosing Parameters, and Breaking Fuzzytags

The privacy and security properties of fuzzytags is highly dependent on selecting a false positive rate p and scheme constant γ for your system. There is no one-size-fits-all approach.

If p is too low, then the probability of false positives for a given party will be very high.

If p is too high, then an adversarial server will be able to link messages to recipients with probability approaching 1.

Likewise a large γ means higher bandwidth costs, but a small γ reveals more of the root secret to the server while also increasing the change of perfect (but false) matches across all parties.

This is why we are releasing fuzzytags-sim, a simulator for fuzzytags deployments.

We built fuzzytags-sim to better understand how different parties choosing different false positive rates impact the overall anonymity of the system, and explore intersection and statistical attacks against various deployment scenarios.

fuzzytags-sim can play through real datasets or draw events from a distribution. Through experimentation, we’ve been able to provide a few high level guidelines on deploying fuzzytags securely.

Entangling Tags

We’ve also been investigating some novel properties of fuzzytags that we have dubbed “entangling”. Through search it is possible to construct special kinds of tags that can match to more than one detection key. These tags have a number of possible applications, including:

Multi-party broadcast - when a party wants to notify more than one party of a message.
Deniable tagging - when a party deliberately entangles to an unrelated party in order to implicate them in a message.
Forging false positives - this is a more general application of deniable tagging where parties deliberately attempt to manipulate any statistical analysis that a server might be conducting.

We believe that these strategies are likely essential to mitigating several forms of statistical analysis on fuzzytag deployment that are possible when parties choose non-optimal false positive rates. We have start collecting notes about these entangling strategies and simulating these strategies in our simulator against realistic datasets.

The Future of Fuzzytags and Integrations with Cwtch

We are still exploring the security bounds of fuzzytags. In particular we believe that with the right combination of entangling strategies it will be possible to significantly cut down the bandwidth involved in using Cwtch servers for offline message delivery. This would greatly increase the robustness of the platform.

One possible deployment approach that we are currently investigating is to have a one or more known popular services using fuzzytags to receive messages and for all parties to entangle their messages to both the intended recipient and a popular service.

This approach provides a number of distinct benefits:

Only the tagging keys of popular services need to be known - these can be public and heavily popularized, discounting most security and privacy concerns with maintaining a database of keys to entangle to.
Every tag in this scheme is individually deniable, increasing the anonymity set of all received tags.
Such services provide a bootstrapping mechanism for a platform based on fuzzytags

The deniability properties still need to be explored through simulation and analysis - we will continue to update the fuzzytags crates, documentation, and notebooks with progress in these areas.

More Information

For more information about fuzzytags, or to get involved, check out these resources:

fuzzytags source code repository and issue tracker
fuzzytags-sim simlator source code repository and issue tracker
fuzzytags crate
fuzzytags crate documentation - the rust documentation for the fuzzytags crate.
fuzzytags notebook - for more indepth analysis on entangling schemes, statistical attacks and simulation results.
Original Paper “Fuzzy Message Detection” by Gabrielle Beck and Julia Len and Ian Miers and Matthew Green
gophertags - an independent go implementation by George Tankersley
#redux.resistant.tech:matrix.org - a matrix chat room to discuss fuzzytags and other technologies and projects for decentralized, privacy preserving applications.