September 9, 2021

Zero-Trust Architecture is the Way Forward

Making Good on the Original Promises of Self-Sovereign Identity

Correlation is the primary tool used by established online services to create real-world political aggression against dissenters. It powers their ability to continually and pervasively identify and track individuals with the intent of branding them thought criminals, cutting them out of the public conversation and eliminating all possibility for peaceful resolution of our political differences.

In my two most recent articles I discussed something called “Zero Architecture”. It is a new approach to designing decentralized systems using zero-trust security combined with business logic that operates entirely on zero-knowledge proofs — based on authentic data — and contains zero personally identifiable information. The primary motivation for this new way of thinking is building fully user-sovereign systems that automate regulatory compliance, drive out fraud, as well as eliminate surveillance capitalism and its attendant societal problems.

Correlation, n. — the linking of two or more observations of the same subject at different times and/or places.

The first principle of user sovereignty is absolute privacy by default. In practice this means that people interact with online services using cryptography to deny the service operator any correlating information. Without absolute privacy, correlated observations enable tracking and the build up of a behavioral and predictive model that shifts the balance of power away from everyday people towards service operators such as large tech companies. It is a form of power consolidation that can, with enough scale, lead to corrupting absolute power. The only way to prevent the power imbalance is to use cryptography to eliminate correlation of observations and maintain absolute privacy.

As an aside, this is the underlying idea for my assertion that surveillance capitalism was baked into the world wide web from the beginning. There is nothing we can do to fix the web other than to replace it with a user-sovereign alternative. Zero architecture is an earnest effort to provide an alternative to all online service designs, the web included.

The goal of this article is to build up your understanding of exactly how online correlation works so that you can look at the design decisions behind Zero Architecture from an informed perspective. This is where information warfare comes in. One discipline of information warfare deals directly with correlating and tracking. It is an application of analytical combinatorics. You can go read about analytical combinatorics but it quickly gets into scary looking math that really only serves two goals: the first is to keep uninitiated plebs, like us, from further diluting the pool of grant money for post doc research and the second is to be the justification for the same post doc researchers to give each other awards for being smart. My approach is to present analytical combinatorics in an intuitive way so that everybody grasps the basic ideas needed to understand the way correlation and tracking works online.

The Onedees

To begin with, I would like you to meet my friends, the Onedees (pronounced wun-dees). Onedees are interesting creatures because they only have one observable trait: their length. Some Onedees are long, some are short, but most of them are close to average in length. In fact, the distribution of Onedee length for their whole population follows what is called a “normal distribution” or a bell curve. Out of all of the Onedees I know, there are two that are relevant to this example: Mid Onedee and Long Onedee. As you may have guessed, Mid Onedee has a length that is very near the median length for Onedees. Long Onedee however is so long that no other Onedee is longer.

Now imagine looking down a staircase to a subway platform full of Onedees going to work. Remember, you can only observe the lengths of each Onedee. Picture looking down on top of a sea of human heads, all shaped the same, all with the same skin tone, hair color and hair cut. The only observable trait is the height of the humans, just like the length of the Onedees.

Out of nowhere you hear Mid Onedee calling your name to get your attention but no matter how hard you try, you cannot pinpoint which Onedee is Mid. Why not? Because Mid Onedee has a length so similar to most other Onedees that Mid doesn’t stand out in any observable way. In fact, you and Mid could run this experiment every morning, day after day, and you would never be able to determine which one was Mid. It is statistically improbable without any other observable traits besides length.

After giving up on finding Mid, you ask Long to repeat the same experiment. While standing in the middle of all of the Onedees, Long yells your name and you immediately know which Onedee is Long. Why? Because Long is the longest Onedee and sticks out. Long is an outlier. Long’s length is so different from every other Onedee that simply observing Long’s length is enough to identify Long and correlate Long over time and space. Given any crowd of Onedees that includes Long, you can immediately identify Long. Given multiple photos of Onedee crowds taken each day, over a few weeks, Long is easily identified and his behavior modeled over time. With each correlated observation, the behavioral model that predicts Long’s behavior — is Long in or not in the crowd on a given day — gets more and more accurate. But now consider the opposite. Since Mid’s presence in a crowd is impossible to determine, even an infinite number of observations of Onedee crowds is not enough to build a behavior model of Mid. The only difference is the correlation of observations of Long and non-correlation of Mid.

Correlation

Correlation requires finding outliers in a population. An outlier is an observed trait of an individual (e.g. height, weight, etc) that is so far out of the range of what is most common that a single individual can be identified. This technically isn’t the correct mathematical definition of an outlier but it is close enough for this discussion. The important thing to remember is that correlation typically requires observing multiple different traits that, when combined, make each individual observed into an outlier. For Long Onedee, we only need to observe length. For humans, it usually takes many different linked trait observations to identify an individual. How do different traits affect correlation? How does the binding of traits (e.g. height and weight) affect correlation? How does time affect traits? How many different traits must be observed to uniquely identify an individual? Answers to those questions are where we are headed.

Let us first consider how two traits that are bound together in some way can be used to identify individuals and correlate observations. For instance, in humans, our height and weight are bound traits; taller people tend to be heavier than shorter people. How these are bound together is unimportant but just know that the binding can be a very complicated function that, when modeled from empirical data, conveys to an observer a large amount of information about the individual that is far greater than just the observed trait values alone.

The height and weight of an individual are bound together by a function that has many different inputs such as average calories consumed in a day, the age of the individual, the activity levels of the individual, their genetic heritage and many other traits that are not directly observable. It is these non-observable traits that can be modeled using empirical data and used to expand on the limited data available to an observer. To put it simply, just by observing the height and weight of an individual, other traits that are part of the binding function can be predicted with the empirical model. Many non-observable trait values can be predicted with a high degree of confidence. A trait such as average daily caloric intake is predictable just by observing a person’s height and weight. This is statistical inference and it is the primary means by which online services take relatively few linked trait observations and expanded them into a larger set of data that in turn uniquely identifies an individual. Once the individual is identified, then the service provider correlates the latest observation with all other observations and information related to the person.

Notice that the model of how traits are bound together is critical for turning simple observations of a person’s height and weight into predicted values for other non-observed traits. That expansion of data is only as accurate as the model of the binding function. Better models give greater numbers of predicted traits and can predict with a higher degree of certainty. The most economically and politically valuable models take a few easily observable traits and expand them into enough predicted traits to cluster, or even identify, individuals for targeted manipulation to boost profits or increase political power. Over the last two decades, the large tech companies have recorded millions of observations of you and me and everybody online. They now possess the most sophisticated, well trained, and accurate empirical models for human traits and behavior in human history. Armed with this knowledge, go (re-)watch the documentary The Great Hack. I’ll wait.

How are these empirical models created? A perhaps over-simplified explanation is: companies use artificial intelligence and deep learning. There is a reason that the bulk of the breakthroughs in deep learning are coming from Google and Facebook. It is because those two companies, probably more than any others, have millions of observations on every one of their billions of users. To extract value from those observations they need to build models that drive revenue and serve the powerful in our society. The most efficient way of building models is to teach a computer how to build models and then turn it loose on the data. This is exactly what TensorFlow and other machine learning tools were developed to do.

There is another interesting way to understand what is going on when several simple trait values are expanded via a model into many more predicted trait values. Expansion of small amounts of data into its corresponding large amount of data is called decompression in computer science. The opposite action — turning large amounts of data into corresponding smaller amounts — is of course called compression. I’m sure most of you have used the Zip program to combine and compress computer files into a “zip file”. Then on the other end, you “unzipped” the file back into the original files. Creating the zip file is compression and unzipping it is decompression. I want you to consider the idea that creating an empirical model from observations creates a kind of compression and decompression tool. Given some compressed data — an observation of a person’s height and weight — the empirical model can decompress it into the person’s height, weight, and average caloric intake and any other human traits that are closely linked to height and weight (e.g. diabetes risk, activity levels, etc). The Zip program uses a “lossless” compression algorithm, it always recreates exact copies of the original files. Empirical models are a form of “lossy” compression because the decompression creates approximate values. It is the lossy nature of empirical modeling that provides us the key we need to preserve absolute privacy. More on that later.

Computers are very good at searching for and finding optimal compression algorithms for a given set of data which, in this case, is the same thing as building the predictive empirical model of trait binding. In fact the primary way to do deep learning for correlation purposes is to take a known set of observed trait values and train a model on how all of the different non-observable traits contribute to the binding of a smaller set of easily observable traits. The training process involves automatically searching for the optimal compression algorithm for taking some number of different traits, say ten, and compressing them down to a smaller number of traits, say two. Getting back to height and weight, if we had a data set of trait values for height, weight, caloric intake, and activity levels and we knew that only height and weight were traits we could observe, the deep learning process would search for the optimal compression algorithm that maps the combinations of all four traits to their corresponding height and weight combination. Once completed, feeding just height and weight into the model decompresses it back to the most likely combination of all four traits.

Again, remember this is a lossy process but the amount of loss is determined by two things, the amount of data preserved during the compression process and the number of observations in the training set for the decompression process. If we have to take 100 bound traits and compress them down to just one trait, there is a large degree of loss when decompressing the single trait back into predicted values for the other 99 traits. It is usually so lossy that it is all but impossible to recreate the other 99 traits with any level of confidence. The same is true with empirical models trained from relatively few observations. Even if we’re only decompressing height and weight into a predicted trait value for caloric intake, it is all but impossible to get an accurate result if the model was trained from just a few observations of height, weight, and caloric intake.

Since we know that large tech companies have billions of observations and have built the best empirical models ever created, our only option for preserving privacy is to rely on using cryptography to prevent them from observing enough information that they are able to decompress it into substantial profiles accurate enough to identify individuals, correlate us and track us.

The scary-good empirical models that tech companies have are being used to identify us everywhere we go and manipulate us in ways that we don’t understand. I think it is a grave threat to the stability of human society. The most immediate fallout is the way these models are being used to eliminate the possibility of political solutions to our differences. Their ability to pervasively identify and track all of us enables them to target individuals and deny them access to the online services and prevent their participation in online public conversation. This is creating the conditions where violent conflict is inevitable.

Are we doomed? Even with the adoption of Zero Architecture, is our privacy forever lost? Can we undo those models and eliminate the economic value and political power they create for tech companies? The good news is I don’t think we’re doomed, our privacy isn’t lost forever and we can undo the models. Time is on our side.

The Half-Life of Trait Observations

In every interaction, time is the one observable trait that nobody can hide. Time can serve as the single trait that correlates different observations. If a person does the same action repeatedly at regular intervals such as the exact same time every day, the time of the event observation correlates multiple events as coming from the same person. Time also has an effect on the value of observable traits. Let us assume that Onedees are born short, grow up and get longer, and then shorten a bit in old age. The value of their one observable trait varies over time and it has a profound effect on how much information it conveys to the observer. This stability of traits over time is very important to correlation. Fingerprints, for example, not only provide enough information in a single observation to identify an individual but they are also stable over the entire lifespan of the individual. Any two observations of a person’s fingerprint — even decades apart — are easily linked together and correlated. Because of this, it is common for people who research Internet correlation and tracking to use the term “fingerprinting” to mean the act of collecting enough linked observable data to uniquely identify an individual and track them.

The most common form is of course web browser fingerprinting. Mozilla likes to claim that Firefox “blocks fingerprinting” but it doesn’t. Browser fingerprinting is done using Javascript to make multiple observations of the browser and computer a person is using and combining those linked trait values to uniquely identify each and every person using the web. Mozilla’s claim is based on blocking only 3rd party requests to prevent “cross-site tracking”, the kind done by the Facebook like button. All of this is fine and dandy except that it does nothing to stop ad placement networks and 1st party Javascript from fingerprinting web clients and colluding on the server with data aggregators to combine observations into correlating and tracking data. Surveillance capitalism is baked into the core design of the web. Mozilla even admits, “despite a near complete agreement between standards bodies and browser vendors that fingerprinting is harmful, its use on the web has steadily increased over the past decade.”

Me (circa 2015) presentating on all of the linked observable traits in web browsers.

When I worked on Firefox years ago, my contribution to web privacy was to show that time plays a critical role in the stability of browser fingerprints over time and therefore some traits were more useful than others in correlating and tracking users. Some traits such as IP address network routing information are not stable at all and change fairly rapidly, on the order of minutes. Other traits such as your computer operating system and display resolution are fairly stable over years. This understanding of stability of traits led to a classification system by which changes to the web API could be judged. Stability became a consideration whenever new web APIs, with new observable traits, were proposed. The WebVR API, when first proposed, included exposing the person’s interpupillary distance. The interpupillary distance is the distance between your eyes and it is a very precise measurement, different in each human, that is stable over a person’s lifespan once they reach adulthood. It was the first identified super-trait using the new time stability criteria.

I don’t want us to get totally distracted by the lack of web privacy and browser fingerprinting. I think there are three points I want you to take away from this part. The first is that browsers expose many linked observable traits and are perfect tools for correlating and tracking people who use the web. That has translated into massive profits for surveillance capitalists and concentrated so much power that the corruption of social media platform companies is upon us and it is threatening the stability of society. The web is designed so poorly that the only way to regain any privacy on the web is to stop using it altogether. The second point is that time enhances the models for predicting unobservable trait values. As we grow up our height increases and then it becomes stable in adulthood and then it slightly declines as we enter old age. Observing our height over several years can predict accurately our age during certain parts of our life and less accurately at other times. The third, and last, point is that time plays a critical role in the stability of trait values and nearly all traits vary over time. This variation means that observed trait values do have a shelf-life and become stale and less valuable as time progresses. Adopting Zero Architecture will draw a line in the sands of time, after which, no more linked observations are possible and the already observed trait values start getting older with each passing minute, hour, day, and year. At some point all of the data already gathered will have so little economic and predictive value that it will no longer support surveillance capitalism.

Practicum

Now that you’re armed with a working understanding of correlation and tracking, you can easily see that the goal of Zero Architecture is to eliminate linked observations of multiple traits. Observations can be linked in both time and/or space. Linking in time is when several independent trait observations are linked together as coming from the same individual. In the case of Long Onedee, observations of his presence in a crowd of Onedees were possible over subsequent days and linked together. Linking in space is when multiple observations of different traits are presented together in a single interaction. Web browsers suffer most from space-linked observations. In a single request, a web browser can download and execute Javascript that makes observations of a couple hundred observable traits to calculate a fingerprint that most likely identifies the person using the browser.

Zero-Knowledge Proofs aren’t Magic Privacy Powder

Zero Architecture is primarily a set of techniques for building systems that operate entirely on zero-knowledge proofs (ZKPs) calculated from verifiably authentic data instead of the data itself. The ZKPs allow for systems to make operational decisions without the person using the system revealing any of their personal data to the system. Unfortunately, basic ZKPs constitute observable traits themselves and even though they do not reveal the exact value of the underlying trait, they can reveal things such as the range the value falls in.

The most common example given by the self-sovereign identity (SSI) community is proving that you are old enough to buy alcohol (i.e. your age is equal to, or greater than 21) without revealing exactly how old you are. This simple example is of providing what is called a zero-knowledge range proof. What isn’t commonly understood in the SSI community is that this example only preserves privacy because it is a single attribute presented only one time. There is no linking in time or space. If there was any linking, my privacy would be eroded. For instance, if I tried to buy alcohol the day before my 21st birthday and was rejected and then the next day I tried again and was accepted, the linking in time reveals my exact date of birth and age. If the age range proof was also presented and therefore linked with multiple other range proofs, it is possible that even with ZKPs I revealed enough data to be correlated. For instance, what if I was 23 and 7 feet tall and weighed 198 pounds and I presented proofs that my age is equal to or greater than 21, my height is greater than 6 feet, 6 inches and my weight is less than 200 pounds, the combinations of those range proofs probably constitutes enough information to make me an outlier and therefore correlatable and trackable. There aren’t very many people that fall in the intersection of those three range proofs.

Multiple linked ZKPs do not preserve privacy. Unfortunately that is how all of the existing SSI stacks work. The notion of selective disclosure is a joke. It really just means each of us will give up all of our personal data a little slower than we used to. And the current version of the proposed W3C standard for presenting zero-knowledge proofs lacks even basic understanding of correlation and tracking. There is no consideration for preventing any linking in space or time. All of the SSI systems currently implemented do nothing to preserve privacy in the long run and anything based on DID methods also suffers from rampant centralization and siloing of data. SSI in my opinion is retreating from the privacy battlefield, not pushing forward.

Qualifications

Finally I can get to the raison d’être of this article. In Zero Architecture systems we have to break linking zero-knowledge proof presentations in both time and space while also compressing the amount of data being presented down to a single boolean value: true or false. Why? Because presenting anything more fails to preserve privacy and absolute privacy by default is the goal. Today I’d like to introduce you to what I call “Qualifications”. Qualifications are a form of zero-knowledge proof that resolves to a single boolean value. Am I qualified to buy alcohol? Yes. Am I licensed to fly an airplane? No.

Qualifications are the solution to what I call the “20 questions” attack against linked ZKP presentations. They prevent verifiers of ZKPs from asking for enough linked ZKP proofs to correlate the presenter of the ZKPs. Going back to the age range proof (e.g. >= 21 years old), if the verifier is able to ask for multiple arbitrary age range proofs, they only need to request seven or fewer proofs to know a persons’ exact age (log₂ 100 ≅ 7). It’s nothing more complicated than the algorithm to solve the guess-a-number game. In the following imaginary dialog, the verifier is a computer constructing proof demands and the holder is a computer providing range proofs that satisfy those demands. When the holder says “yes” that means they are able to provide a range proof and when they say “no” they cannot.

Verifier: is your age < 50?
Holder: yes.
Verifier: is your age < 25?
Holder: no.
Verifier: is your age < 37?
Holder: no.
Verifier: is your age < 44?
Holder: yes.
Verifier: is your age < 41?
Holder: no.
Verifier: is your age < 42?
Holder: no.
Verifier: your age is 43.

The verifier only asked the holder for six ZKP range proofs to know the age of the holder exactly. This example uses linked observations over time. It seems contrived but if we remember the example above where the linked ZKP range proofs for height and weight correlated an individual, it is easy to see how a system that carefully chooses the age range proof over multiple interactions with a single holder can reveal their exact age. It is the ZKPs linked in space at each interaction that correlates and enables this.

The solution is to use a single Qualifications in place of multiple linked ZKPs. A Qualification is the combination of standardized authentic data (e.g. digital passport, digital drivers’ license, etc) with a standard policy-as-code script (i.e. smart contract) that translates into a cryptographically bound proof of authenticity of the underlying data, proof of completeness of the underlying data, and proof of computation of the policy-as-code script along with the result of the computation. To understand what I mean, let us unpack that and look at the different pieces.

Qualifications are calculated from authentic data. As described in my article about the authentic data economy, authentic data is data with a cryptographic record, called a provenance log, of where it came from, to whom it was issued, and proof that it hasn’t been modified nor revoked. Drivers’ licenses are a form of real-world authentic data. They identify who issued it, to whom it was given, the security features ensure it hasn’t been modified and the license number and expiration date serve to verify if the license is still valid. Authentic data in computer systems uses cryptography instead of holograms, photos and expiration dates.

Qualifications are calculated from complete authentic data. Certain Qualifications require multiple pieces of authentic data to be calculated. For instance, for me to be qualified to vote, I have to live in the voting district and be registered to vote. A Qualification that proves I can vote requires information from my proof of residence as well as information from my proof of voter registration. The proof of completeness is just proof that all of the required inputs for the Qualification’s policy-as-code script were present, valid, an non-revoked at the time the Qualification was calculated.

Qualifications are calculated using a standard policy-as-code script. The fourth principle of user sovereignty is: open and standard protocols and formats for all data. For Qualifications to have any meaning and operational value, they must be based upon standard computation over standard data. For example, what constitutes a valid age check for purchasing alcohol? First of all, only certain kinds of identification can be used (e.g. drivers’ license), the age value must be present and it must be equal to or greater than the age required to purchase alcohol. That one is simple and I think it will be easy for us to develop a standard alcohol age check policy-as-code script. It goes deeper than that though, before we can standardize the alcohol age check script we first need to standardize a policy-as-code scripting language. To avoid a lot of unnecessary complications the language has to have some specific properties such as non-Turing completeness. I’m not going into the required properties of the scripting language but I do want to point out that I am working with several other like-minded organizations to propose such a standard in the Applied Crypto Working Group at the Decentralized Identity Foundation.

Qualifications bind together standard authentic data and standard policies into privacy preserving authorization “tokens”. Qualifications themselves are independently verifiable and a form of authentic data. Their primary goal is to break all linking over time and/or space and to prevent any possibility of correlation. By reducing the presented proof to a single yes/no value we have compressed all of the bound traits to a single observable trait that has only two possible values. If we go back to the compression/decompression perspective mentioned earlier, the process of constructing a Qualification is a form of extremely lossy compression. It takes an arbitrarily large set of authentic data as inputs and compresses all of it down to a single 1-bit value representing whether I am qualified or not. It is so lossy that no observer of the 1-bit value can — even with the world’s greatest empirical models — decompress it into predicted trait values with any degree of confidence.

Conclusion

The next time you have to qualify yourself by presenting multiple pieces of personal information (e.g. a utility bill, two forms of identification, bank statement, etc), you will remember that it doesn’t have to be like this and soon enough, it won’t be. Next time you use your web browser, you will remember that you are being fingerprinted by hundreds of linked trait observations that uniquely identify you everywhere you go on the web. Next time you read about Self-Sovereign Identity (SSI) and selective disclosure and privacy preserving BBS+ signatures, you will remember that those approaches don’t preserve privacy.

As stated in the opening quote, correlation is the primary tool used to identify and track individuals both online and in the real world. The push for digital vaccine passports will finally link online correlation with real world access. Being able to present Qualifications that use cryptography to enforce privacy is critical for breaking correlation and preventing individualized political aggression. If I get banned for express a dissenting opinion on Twitter but my vaccine passport is built using Zero Architecture and presents my vaccination status as a Qualification, it is impossible for powerful people to identify and correlate me from my Qualification and deny me freedom of movement, access to food, or the ability to work.

The entire western world is scrambling to put gates in front of every physical and digital space. The intention is to use those gates to deny people access to services and resources that have always been openly available to the public. The first digital system that will utilize these gates is the digital vaccine passport. Once in place and proven to work, the vaccine passport will immediately expand to include all of the other observable and non-observable traits. Miss a payment on your car loan? You can’t buy gas today. You cracked an off-color joke online and somebody was offended? You can’t ride the bus to our university any more. Once these gates go up, they will be all but impossible to tear down peacefully. If we do not stop the gate construction, the only way to prevent digital vaccine passports and other digital credential checks from immediately becoming a western social credit system is to use Zero Architecture and Qualifications. It is the only way forward from here.