Session is designed to be privacy first — decentralised servers, no phone numbers, the whole lot — but that creates some problems. Like, how do you let users exchange encryption keys and establish a chat ‘session’ (wink, wink) as quickly and smoothly as possible?
Session was originally a fork of Signal, but we wanted to improve upon some of Signal’s inherent vulnerabilities. When you use Signal or WhatsApp, your phone number is used to identify you. It’s how your friends know that it’s really you when they send you a message.
When you register your phone number on Signal, you upload several pre-key bundles to Signal’s servers. Once someone adds you as a friend, the server can then hand out your keys — even if you’re offline. The person adding you can then immediately establish an encrypted chat ‘session’. Using phone numbers and central servers makes exchanging encryption keys (and communication) super easy, but it compromises user privacy. For example, it’s very easy for your phone number to be hacked, sim-swapped, or for your provider to reassign your number to another person or account.
We didn’t want to make this compromise — user privacy is our core mission.
Where centralised messaging apps like Signal and Whatsapp collect, store and distribute keys when users receive friend requests, Session uses decentralised servers — so we needed to find a way to do this asynchronously.
MISSION: To design a ‘friend request’ system without using phone numbers, in order to establish a Signal protocol session in an asynchronous network.
Signal’s friend request system uses something called the Diffie-Hellman Key Exchange protocol — and so does Session’s, but we had to make some modifications.
The Diffie-Hellman protocol allows you to generate a shared secret. This shared secret is made through a sequence of concatenated keys — a sprinkle of your secret key, a dash of your receiving partner’s secret key, and a final splash of an agreed public key.
Check out this diagram to understand it better:
- Alice has Secret Key A and Bob has Secret Key B.
- Alice and Bob both agree to use Public Key C.
- Alice mixes A and C (AC) while Bob mixes B and C (BC).
- They share these results with each other.
- Alice then mixes A with BC (ABC) and Bob mixes B with AC (ABC) so they both come to the final key.
Note: Anybody listening to the communications would see C, AC and BC, but without seeing A or B alone – they cannot get to the final key, ABC.
Basically, Diffie-Hellman allows you to create a new secret key together. This method of encryption allows Alice and Bob to decrypt and read their messages easily without anyone else snooping.
Signal uses a protocol called the Extended Triple Diffie-Hellman, in which a sequence of these Diffie-Hellman shared secrets are generated and bundled up — into something called a ‘pre-key bundle’. People in conversation can use these pre-key bundles to mutually authenticate each other.
When you use Signal, you will generate a large amount of these pre-key bundles and store them on Signal servers. This way, if somebody wants to try and message you, they can immediately initiate a session, because Signal can provide a pre-key bundle.
But, because Session is decentralised, it doesn’t have servers for this kind of storage, which is extremely tricky to get around. Basically, this means there’s no way for a sender to know that the receiver has got their message until they respond. Establishing a session requires certain encryption primitives, so you need to be certain that both parties have the correct keys (and enough of them!) otherwise they won’t be able to read each other’s messages.
When you send a friend request in Session, you’re not just sending one Diffie-Hellman key — you’re sending four. Using ephemeral keys — which are one time use public keys, designed to be deleted once they’ve been used — you can make pre-keys that allow you to continue sending messages once the session has started.
These 4 keys are then bundled together using a key derivative function, forging one key to rule them all.
Finally, you can send this single pre-key bundle (the one to rule them all) as a friend request.
A few other messages need to be sent before all parties agree that the session has begun, and also agree on its parameters. To better understand this, let’s go back to Alice and Bob:
- Alice generates a pre-key bundle and sends it to Bob as a friend request
- Bob generates his own pre-key bundle and mixes it with Alice’s, initiating a session on Bob’s end.
- Bob then sends an empty message back to Alice with his pre-key bundle attached in the background. Note: Public keys are being sent in this step, but it’s still encrypted.
- Alice receives Bob’s keys, and mixes them with the ones she sent him earlier, initiating the session on Alice’s end. Note: Alice knows the session is valid IF she can decrypt Bob’s empty message.
- Alice confirms that the session is established on her end by sending an empty message back to Bob encrypted as a standard protocol message.
- Once Bob receives this message, he knows the session has been established.
The intricacies we needed to work out involved modelling the state of the exchange on both ends, and being able to smoothly recover any lost messages — without sacrificing on UX.
Contrast this with Signal’s protocol. When you friend request someone, they’ve prepared a bunch of pre-key bundles which are stored on Signal’s servers. So you never need to contact your new friend directly — you just need to communicate with Signal’s servers.
Session doesn’t have this luxury. Its decentralised design means you can’t have a bunch of pre-key bundles ready to go — they have to be created and sent peer to peer when you establish a chat session.
It wasn’t easy. We came back to this problem 3 – 4 times over several months, And it took a few iterations. We’d think we’d reached a solution, then a month later we’d discover an ‘edge case’, and need to keep revising.
This sounds exhausting but the truth is, our solutions became more ironed out — they just got better and better.