Understanding Privacy on the Blockchain
By the very nature of the blockchain, it is very transparent. This transparency that is a feature of the blockchain can quickly become a problem.
For example, let us say we have a ride-hailing platform on the blockchain, and every time you order a ride or make payments it is recorded on the blockchain. A lot of people will be very uncomfortable having their location available for anyone to view. This is where privacy on the blockchain becomes very important.
In this article, we will discuss a few ways that this is achieved.
A cryptographic hash is a calculation performed by the hash function in cryptography. A hash function is a mathematical calculation that when applied to any data always computes an answer with the same length.
Hash functions have two parts:
- The Pre-image
- The hash value.
The Pre-image is the part of the hash function that contains the data we are trying to hash. For example, let us say we want to upload to a blockchain the text “This is a major secret. Nobody except the people with this hash must know”. This text then becomes the pre-image that gets hashed.
The hash value is the result of the value, this is the hash that gets outputted.
Hashes are considered the fingerprint of the blockchain as they are very deterministic and as long the input stays the same, you’ll get the same hash.
It is important to note that there are different hashing algorithms and each blockchain has theirs. For example, bitcoin uses the SHA-256 hashing algorithm while Ethereum uses the Keccak-256 hashing algorithm.
The major drawback is that the data isn’t truly private. If someone has the hash for a transaction they can easily look it up on the blockchain explorer. This means that you need to keep your hash a secret if you don’t want anyone to discover it.
Another thing is that even though people may not be able to see the details, every time you do a transaction and a hash is generated anyone can know that you did a transaction. So say you took 5 rides a day, the stalker may not know where you went to but they know you have generated 5 hashes on that day.
This gave rise to another solution known as the Merkle tree.
Unlike regular hashing where we create a hash for each individual transaction, using a Merkle tree, we combine multiple pre-images and then get one single hash.
So let us assume we have 8 pre-images that we want to get hashed, each pre-image generates a hash and then combines with its nearest neighbour and then a hash is formed. Now that we have 4 hashes, each of these hashes will combine with its nearest neigbour and combine to form a new hash, so we have 2 hashes, then the final two hashes combine together to form a final hash known as the Merkle root.
When you want to verify a Merkle tree you need to deconstruct it and all you need to know are the hashes of just 1 side of the tree and this is known as Merkle Proof.
Think of the Merkle tree as a nest, and you can’t really decipher the exact data I uploaded because it has been covered up by multiple hashes. Merkle tree solves some of the most glaring issues with the regular hashing function.
Zero-Knowledge often referred to as “zk” is an L2 solution that has seen the most advanced improvement in privacy and scalability of the Layer-1s.
To understand what zero-knowledge is let us use an illustration. Journalists oftentimes say that they have been “reliably informed” that a certain thing happened but do not divulge the informant. That is what zero-knowledge means, you prove that you have a certain data/information without revealing your source.
There are two types of Zero-knowledge proofs:
- Interactive Zero-Knowledge: Interactive zero-knowledge as the name suggests involves an interaction between two parties. For example, I can say that I know how to predict heads or tails whenever you flip a coin. This zero-knowledge involves you flipping a coin while I guess correctly. This style is very conversational, it also creates a lot of doubt “what if it was beginner’s luck? what if I rigged the coin?” A third party might also see this and think we have rehearsed this performance. This lack of trust in the result plus the fact the blockchain doesn’t permit interactions is why Interactive Zero-knowledge proofs aren’t in existence.
- Non-Interactive Zero-knowledge: These are the ones currently being used on the blockchain, it requires no interaction. For example, if I show you that I have 4 apples, 4 oranges, and 2 Pineapples, then I say I have all the pineapples in my hands but I don’t want to show you the pineapples, I can then display the 4 apples and 4 oranges for anyone to confirm. There are two main types zk-SNARK and zk-STARK
This stands for:
S: Succinct, this means that the size of proof generated is small in relation to the size of the data. This is very important in the blockchain as storing a large amount of data can become expensive.
N: Non-interactive, this has already been explained.
AR: Argument, think of the argument as the “points” you are raising to prove that your assertions are correct. In the example above, the argument I used was that I had displayed ALL the other fruits so whatever is left is definitely the pineapples.
K: Knowledge. Knowledge is the thing you are trying to prove.
ZK-Snark is the more widely used zk proof as it has been around for a longer period than the STARKS.
To understand how the ZK-Snark works, we need to understand how zero-knowledge rollups work.
A zero-knowledge rollup batches transactions and sends a single transaction to the blockchain. This helps the blockchain save a lot of data which then keeps transactions cheap. For example, let us assume that you want to purchase 3 items on Amazon, you could either order each item and checkout immediately. This will mean you’ll get 3 separate deliveries and will spend a lot more on delivery fees. Alternatively, you can batch all the items together and checkout one time. This is how zk-rollups work.
A Zk-rollup has two main actors: the transactors and the relayers.
Transactors as the name implies are the ones that create a transaction and then broadcast it to the network. The transaction data consists of an indexed “to” and “from” addresses(to understand what indexes are read this doc on Solidity events), the value to transact, the network fee, and nonce. If the value of the transaction is greater than zero the contract creates a deposit but if it is less than zero it creates a withdrawal.
The smart contract records the data in two Merkle Tress; address in one Merkle tree and transfer amounts in the other.
Relayers then collect all these different transactions, batches them together, and create a rollup. The relayer is then tasked with creating a SNARK proof.
The major issues with SNARKS are:
- It requires a high degree of trust. I need to trust that the SNARK proof you generated is legitimate. Here is an idea of how it works, I want to confirm that all the transactions you batched and sent to the blockchain is legit. You set up a SNARK proof that acts as an intermediary and then check if it is legit. This SNARK proof then returns to me that the transactions are legit. As you can see it requires A LOT of trust, think of them as your electoral commission that counts votes for elections. We have to trust that the counting is not rigged.
- It makes use of quantum computing which poses a threat to hacking the blockchain.
This stands for Zero-Knowledge Scalable Transparent Arguments of Knowledge. Zk-STARKs has been pioneered by the StarkWare industries’ research.
With Zk-STARKs computations are done off-chain, this solves the quantum computing requirement and reduces the chances of a hack. It does this by leveraging off-chain services that will be able to generate STARK proofs, these proofs are then placed back on-chain(ie after the computation) for anyone to verify the authenticity of this proof. This differs from the SNARKs that requires the relayers to do the computation on-chain.
It also becomes publicly verfiable which makes it trustless unlike the SNARKs that requires a lot of trust.
Let me know if you found this article useful or if there are things I missed (or incorrectly explained).
You can also connect with me on twitter