At Brave, we want our browser not only to provide the best protection against the surveillance economy, but to be the very best way to experience the web. We rely extensively on community feedback to make sure that the product provides the most vital features and is as reliable as it can possibly be. Sometimes, however, this simply is not enough to make sure we are providing the best experience to as many users as possible. Many people simply don’t have time to provide feedback, and there are many questions left unanswered. Do people make it through onboarding, or do we need to make it shorter? Are people using Brave Rewards? Are people using sync and if so, on how many devices? How many people still need to download important browser updates?
In an ordinary software company, these questions would be answered by using one of dozens of third-party analytics services. But the way such services operate would mean that Brave users could be individually identified and tracked, by a third party, and in some cases that behavior would be aggregated with other tracked behavior from the ad/tracker ecosystem for the benefit of the third party alone. None of this would be remotely acceptable to Brave given our commitment to user privacy.
We believe that completely private product analytics are the most effective way for us to make Brave the best it can be — by providing us with insights into how the various features of the product are actually being used, so we can shape the product to better match the needs of our users. As always, our code is open source and available for third-party audits and verification.
Privacy is our first value. We really, genuinely don’t want to know anything about you individually, or to know anything that could be used to track you. That means that we need to approach product analytics very differently from most other companies. We’ve built a completely private system which we’re calling Privacy-Preserving Product Analytics, or P3A for short. This project goes well beyond industry norms and GDPR requirements when it comes to privacy preservation. Here are the mechanics:
- P3A doesn’t collect any personal information. Nothing that could identify you, and nothing sensitive like your browser history, search queries, etc.
- Every so often, in the background, the browser sends reports containing simple, non-identifying information on product feature usage. These are essentially automatically-delivered answers to specific questions defined by Brave.
- All the “questions” we ask of the browser (the measurements collected) will be posted publicly in human-readable form. You can find the current list here.
- You can turn P3A off at any time in the “Privacy and Security” section of the browser preferences.
- All the P3A code will be open source (as is all our code except anti-fraud server-side code) — you can always check that your browser is only sharing the specific things we promise.
How P3A Works
Our work on P3A is split into two initial phases. In the first phase, we will use a simple protocol that sends a single answer to a single question, one at a time. In the second phase, we follow up with a more complex protocol that incorporates technologies such as oblivious shuffling and secure enclaves to support more complex questions while retaining the strict privacy goals of phase 1. Our objective is to keep it impossible for us to associate any particular data with any particular user, no matter how much analysis we perform on the data collected.
The first phase of P3A will collect “answers” to a set of 18 specific multiple-choice questions. These answers provide straightforward usage metrics, such as how many tabs people have open, or what fraction of people have turned on Brave Rewards. For example:
Question: Number of open tabs
Some (randomized) time after you open up your browser during the week, the browser counts the number of open tabs, and picks the corresponding answer from the list. This multiple-choice answer style is the first privacy safeguard. None of the questions have exact, detailed answers: only a small number of predetermined options are enabled. This helps ensure that no device ever has a unique or distinctive answer to any question. Roughly once an hour, the encoder prepares to send out that one answer, which looks a little like “Question: 7, Answer: 3”. The exact time is obscured somewhat by adding a random delay of 0-5 minutes. This is combined with information about the version of Brave it comes from, which looks like this:
- Distribution channel (nightly/dev/beta/release)
- Week the browser was installed (only sent within 90 days of installation)
- Country (removed for countries with fewer than 6000 installs per week)
- Referral code which indicates (broadly) what category of link brought you to the Brave website when you downloaded Brave. This is only sent within 90 days of installation, and only for referrers which we’re sure are big enough not to have a privacy impact. You can find a detailed description of our referral codes here.
This combined information — the answer and the version information — is finally sent to Brave’s content delivery network (CDN), operated by Fastly. When an answer reaches the edge of the Fastly CDN, it’s stripped of the IP address and precise timing information.
This system is designed so that we, Brave, are unable to associate any particular response with any other, so we do not have sufficient information to link together any particular user’s “answers”. Instead, each response is an independent data point.
You can see the full list of questions on our Github here: https://github.com/brave/brave-browser/wiki/P3A
Log level records are automatically deleted from our servers within 30 days. Note that these log-level records will not contain IP addresses or exact timing information. Our completely anonymous summaries of the data are intended to be kept indefinitely.
Most of the software you use includes some sort of product analytics, or usage data collection, as does every major browser. And for good reason — knowing which features are resonating and which need work is an important part of making software that’s a pleasure to use. We’ve been cautious about building analytics because we knew we had to get it exactly right. Some other browsers collect thousands of measurements along with a substantial amount of information about what you’ve searched for and which sites you visited. None of the commercial analytics products we’ve seen come anywhere close to our privacy standards. Building this ourselves took a lot longer than using an existing system, but we think that’s time well spent. We hope you agree.