The Evolution of Digital Identity

Photo by Lukenn Sabellano on Unsplash

This piece is the first installment of a series on digital identity called “Please Allow Me to Introduce Myself: The Past, Present, and Future of Digital Identity” by Arjun Govind. This series aims to explore the evolution of digital identity, the state of self-sovereign identity today, and its use cases.

Think back to the last time you called your bank’s customer service line, only to have to recall obscure facts about your elementary school to authenticate yourself. Or that scramble to see if your personal information was compromised when your usual fitness app or airline gets hacked (looking at you, MyFitnessPal and British Airways). Or the hassle of setting up yet another account on yet another e-commerce site, not knowing if they are secure enough to store your personal email ID. Digital identity underpins our every step on the internet, and yet — despite years of decades of evolution — problems still abound.

The notion of identity is arguably our most intrinsic characteristic — it’s our answer to “Who are you?”. However, in the online realm, figuring out who is who is a more complex process. Who assigns names? Who ensures that names are unique? Can we trust the entity in charge of all our IDs? Our answers to these questions have evolved just as the Internet has, and in this essay, I will talk about the long road of identity solutions we’ve walked down, where we are now, and where we could be headed in the future.

Identity Models of the Past

The very first days of the “internet” introduced us to the most basic model of storing identities: the humble database. ARPANET, a precursor to the Internet created by the Department of Defense, involved each computer having a numeric address. In order to make more intuitive to navigate and communicate, each computer could create a “hostname”, which essentially could serve as a nickname to make communication easier. However, who kept track of the nicknames for each of these nicknames and which numeric addresses each nickname corresponded to? That responsibility fell to the Stanford Research Institute (the SRI).[1] The SRI maintained one central database simply known as hosts.txt, containing each hostname and corresponding addresses. In order to add a new hostname to this host.txt, users would have to call the SRI (only during business hours, of course!) and manually add the file.

Clearly, this solution doesn’t scale. This approach may have been okay in the ’70s, when there weren’t that many computers on the network, however the hassle of manually maintaining this meant that people quickly searched for an alternative. Moreover, imagine the chaos if someone from the SRI made a typo while noting the hostname or corresponding address! Quickly, there began a searched for an automated system. From here, DNS was born. DNS comprises a part of the internet protocol suite, sometimes referred to as TCP/IP, two fundamental protocols within the suite. The premise behind it is that navigating the internet through IP addresses is cumbersome and unintuitive; instead, a much better system would be to assign pseudonyms as was done for hostnames earlier! This allows us to simply look up websites by going to https://r3.com instead of a more convoluted IP address to access R3’s website. Unlike the previous approach that needed to be updated by hand, the DNS system runs on a distributed database. The main authority in charge of domain registration is the Internet Corporation of Assigned Names and Numbers, a non-profit. DNS thus served as the first step towards a more robust digital identity.

And now, as with most tech discussions nowadays, we must take a detour to talk about public-key cryptography.

When you were first introduced to cryptography, you were probably given a metaphor involving a lock and a key. I have a message, I lock it with my key, and then if you have the same key then you can open the lock. Public-key cryptography is a little more complicated, as there are two linked keys — a public key and a private key. If a message is encrypted with a public key, it can be decrypted with a private key, and vice versa. While the two keys work hand-in-hand, one cannot be deduced from the other, allowing us to freely disseminate a public key. The way this is used for encryption is to encrypt a message using a public key, allowing it to be decrypted by only the person holding the private key. However, another application of this is to do the reverse. On the face of it, it makes little sense to encrypt something with a private key. After all, to encrypt a message with a private key means that anyone with a public key (readily available) can decrypt the message, right? While that is true, “encrypting” a message with a private key serves another purpose — it serves as a digital signature. The idea here is that only the holder of the private key (you) could encrypt a message in such a way that it is decryptable with your public key. Applications of this abound — from signing contracts to financial services.

However, we can take the concept of digital signatures one level further. Using digital signatures, all we can really identify is the public key associated with the owner of the message. To link a public key to an actual identity, we need a Public Key Infrastructure (PKI). A common approach to PKI is having a third-party Certificate Authority (CA) that validates and certifies the relationship between a public key and an actual entity or identity. However, the implicit assumption here is that the CA can be trusted; if the CA isn’t trusted by either the owner of the public key or the person seeking to verify the identity, then this model doesn’t work.

The digital signature and CA system seemed to be a good system, however it resulted in a great amount of centralization because of the CAs. This concentration of trust in the CAs made people start to look for alternative, decentralized solutions. The most notable example here is Pretty Good Privacy, or PGP, an attempt to use public-key cryptography to secure emails. The idea behind PGP was to have people verify other people they knew or whose identity they had verified, thus creating a P2P “Web of Trust”, as discussed further in this paper. In principle, this does accomplish the goal of eliminating a central certificate authority to definitively link public keys to real identities, however it is not without its drawbacks. Ultimately, adoption didn’t take off on a large scale for a number of factors. Most significantly, e-mail is highly centralized by design; emails are definitionally under the control of the institutions offering the service (Google, Yahoo and the like). This, coupled with clunky, hard-to-use software, meant that the only real examples of web-of-trust models come from highly technical and niche domains. — one example of this is the Debian operating system.

Now that we have reviewed a range of historical approaches to digital identity, let’s turn to some popular options today.

Identity Models Today

In this section, I shall explore identity models used today, broadly bifurcating them into siloed identity solutions and federated identities.

A sad reality today is that one of the most common approaches to digital identity is the siloed model we saw in ARPANET: massive, clunky databases holding customer identities and information. Consider the time you bought something on a small e-commerce site and it asked you to set up a username and password to sign in. This probably means that your information is stored in one such database. What’s the problem here?

Unfortunately, the strength of your identity online is only as strong as the weakest link. It only takes one breach to have information like your payment details and address (continuing the e-commerce example) exposed. The more accounts you have, the more likely it is for any one of those accounts to be compromised, ultimately hurting your identity. The reason this seemingly poor method is so commonly adopted is because it is easy to implement and allows firms to gather data on its users. However, more and more firms are realizing that having extensive data on its users could be a liability just as it is an asset. With greater customer data comes a greater risk of being the target of a hack, which could substantially hurt the trust users place in the app.

Perhaps the most substantial drawback is the fact that users in this model need to keep track of a slew of logins and passwords, massively hurting convenience. An important takeaway from the PGP section — reinforced countless times by academics — is that convenience remains of the utmost priority in consumer security products. A better solution would be to have a single identity solution that can be used in multiple places, bringing us to our next ID solution.

Photo by Kon Karampelas on Unsplash

If you’ve ever encountered a “Log In With Google” page, then you’ve encountered a federated identity solution before. The main goal of federated identity (FID) is to avoid the duplication of identities that happens in the traditional siloed model, thereby minimizing the possibility of a weak link getting compromised. While this is undeniably convenient, a number of challenges exist. For starters, federated identity involves placing a substantial amount of trust in the identity provider (IDP), such as Google in the example above. A common criticism is that a change in Google’s Terms of Service (for instance, the imposition of geographical restrictions on use) could lock you out of all linked accounts.

The Future of Digital Identity: Self-Sovereign Identity

While I certainly don’t have a crystal ball, I do believe that the next evolution of digital identity is Self-Sovereign Identity (SSI). For the uninitiated, SSI is an identity solution that most commonly uses credentials held in a mobile wallet, very much akin to how cryptocurrencies are held in a wallet. These credentials — given by the same organizations we use today like the government — can then be queried by anyone who wants to verify your identity, say a bar that wants to check if you are of legal drinking age.

While the mechanics of SSI are for another blog post, I thought it would be worthwhile to conclude with a seminal post by Christopher Allen where he lays out the “Ten Principles of Self-Soverign Identity”.[2]

  1. Existence. Users must have an independent existence.
  2. Control. Users must control their identities.
  3. Access. Users must have access to their own data.
  4. TransparencySystems and algorithms must be transparent.
  5. Persistence. Identities must be long-lived.
  6. Portability. Information and services about identity must be transportable.
  7. Interoperability. Identities should be as widely usable as possible.
  8. Consent. Users must agree to the use of their identity.
  9. Minimalization. Disclosure of claims must be minimized.
  10. Protection. The rights of users must be protected.

These core tenets, specifically control, consent and transparency, have been substantially lacking in current identity approaches today. The advent of SSI stands to let users get greater security and control over their identities online, allowing for increased convenience for everything from proving your age at a restaurant to completing Know Your Customer (KYC) requirements.

Digital identity has come a long way from manually updated .txt files, driven by great innovations like DNS and public-key cryptography. However, in the process, identities have either become duplicated ad nauseam or deeply centralized. Earlier on, trust was centralized due to the Certificate Authority; nowadays, that trust has migrated social networks and other similar IDPs. Self-Sovereign Identity has the potential to fundamentally alter the way we view identity, eliminating the need to trust a single entity and returning control back to the user.

[1] https://ns1.com/resources/dns-protocol

[2] http://www.lifewithalacrity.com/2016/04/the-path-to-self-soverereign-identity.html