Create your own blockchain using Python (pt. 2)

Merkle Tree

Guillaume Belanger
7 min readJul 4, 2021

In pt.1 of this tutorial, we created our own blockchain using Python. Even though this was awesome and fun, we ignored an important part of the Wikipedia definition of blockchain:

A blockchain is a growing list of records, called blocks, that are linked together using cryptography. Each block contains a cryptographic hash of the previous block, a timestamp, and transaction data (generally represented as a Merkle tree).

Here, I’m specifically referring to the last couple of words on Merkle trees.

Merkle tree

What is a Merkle tree? Here’s the Wikipedia definition:

Merkle tree is a tree in which every leaf node is labelled with the cryptographic hash of a data block, and every non-leaf node is labelled with the cryptographic hash of the labels of its child nodes. For example, in the picture hash 0 is the result of hashing the concatenation of hash 0–0 and hash 0–1. That is, hash 0 = hash( hash(0–0) + hash(0–1) ) where + denotes concatenation.

https://en.wikipedia.org/wiki/Merkle_tree

You can read more about Merkle trees here, but for the current tutorial, this simple definition will be enough.

Merkle trees and blockchains

The reason why Merkle trees are often used in blockchain is highlighted in Satoshi Nakamoto’s white paper on bitcoin:

Once the latest transaction in a coin is buried under enough blocks, the spent transactions before it can be discarded to save disk space. To facilitate this without breaking the block’s hash, transactions are hashed in a Merkle Tree, with only the root included in the block’s hash. Old blocks can then be compacted by stubbing off branches of the tree. The interior hashes do not need to be stored.

Bitcoin: A Peer-to-Peer Electronic Cash System, Satoshi Nakamoto

Wikipedia also mentions:

Hash trees allow efficient and secure verification of the contents of large data structures. Hash trees are a generalization of hash lists and hash chains.

Simply put, Merkle trees allow our blockchain to be as compact as possible in terms of disk space and to allow for secure verification of transactions. In the world of cryptocurrencies, there are the concepts of full nodes and light nodes. While full nodes store all information held on a blockchain, light nodes only hold block headers. In other words, full nodes will contain the whole transaction history while light nodes will contain the Merkle root hash.

Source: Blockchain Basics & Cryptography, MIT 15.S12 Blockchain and Money, Fall 2018, Gary Gensler (Youtube)

Building a Merkle tree in Python

As always, we will follow the Wikipedia definition of Merkle trees in order to create our own.

A binary tree

A Merkle tree is typically a binary tree. Two definitions are necessary here:

  • A tree is a collection of nodes, where each node is a data structure consisting of a value, together with a list of references to nodes.
  • A binary tree is a tree whose elements have at most 2 children. Since each element in a binary tree can have only 2 children, we typically name them the left and right child.

Let’s create a binary tree in Python using Object Oriented Programming.

A tree
# merkle_tree.pyclass Node:
def __init__(self, value: int, left_child=None, right_child=None):
self.value = value
self.left_child = left_child
self.right_child = right_child

# Example
head_node = Node(3)
node_0 = Node(4)
node_1 = Node(1)
node_2 = Node(1)
node_3 = Node(6)
node_4 = Node(2)

Computing a hash

The Merkle tree that we will be implementing will require us to compute hashes of strings of data. Let’s re-use the method we created in pt.1:

# utils.pyfrom Crypto.Hash import SHA256


def calculate_hash(data: str) -> str:
bytes_data = bytearray(data, "utf-8")
h = SHA256.new()
h.update(bytes_data)
return h.hexdigest()

Computing the tree depth

For a given set of data (or transactions), tree depth can also be easily computed. The question we need to answer is how many times do I have to multiply 2 in order to build a large enough tree to contain all of my leaves? In mathematical terms:

Where y is the number of leaves and x is the depth of our tree (the value we’re looking for). To find x, we simply take the base 2 logarithm on both sides and we get the following:

Now using Python and the built-in math library, we can create a function that will compute our tree depth:

import mathdef compute_tree_depth(number_of_leaves: int) -> int:
return math.ceil(math.log2(number_of_leaves))

And obviously, we round up to the next full integer because our depth has to be an integer.

Merkle tree: The simple case

All right, now that we have all the tools we need, here’s the assignement: Given a set of transactions create a method that will return the associated Merkle tree.

We first create a list of leaves by calculating the transaction’s hashes and creating node type objects, we call this list old_set_of_nodes. Then we have a set of 2 loops. The first loops iterates through each depth of our tree while the second loop iterates through each node inside of a said depth. For each node, we compute the hash of both of its children’s hashes concatenated together.

# merkle_tree.pyfrom utils import calculate_hash


def build_merkle_tree(node_data: [str]) -> Node:
old_set_of_nodes = [Node(calculate_hash(data)) for data in node_data]
tree_depth = compute_tree_depth(len(old_set_of_nodes))

for i in range(0, tree_depth):
num_nodes = 2**(tree_depth-i)
new_set_of_nodes = []
for j in range(0, num_nodes, 2):
child_node_0 = old_set_of_nodes[j]
child_node_1 = old_set_of_nodes[j+1]
new_node = Node(
value=calculate_hash(f"{child_node_0.value}{child_node_1.value}"),
left_child=child_node_0,
right_child=child_node_1
)
new_set_of_nodes.append(new_node)
old_set_of_nodes = new_set_of_nodes
return new_set_of_nodes[0]

Merkle tree: Generalized

As you might have observed, the method we implement will only work for an amount of transactions that is a power of 2 and it will crash for all other cases. How do we deal with cases where the amount of transaction is not a power of 2?

We simply append to our list of transactions. For even numbers, we copy and paste the last 2 transactions until the number of transaction is a power of 2. For odd numbers, we only copy the last transaction.

For the even number example below, we have a total of 6 transactions so we need to copy the last 2 transactions (l5 and l6) until we get to a total of 8 transactions.

Even amount of transactions

For the odd number example, we have a total of 5 transactions so we need to copy the last transactions (l5) until we get to a total of 8 transactions.

Odd amount of transactions

We know that a given number is a power of 2 if its log in base 2 will be an integer:

# merkle_tree.pyimport math
def is_power_of_2(number_of_leaves: int) -> bool:
return math.log2(number_of_leaves).is_integer()

We can now create a fill_set method that will take care of increasing the size of our set of transactions for it to be a power of 2. This method will be called before creating the actual tree.

# merkle_tree.pydef fill_set(list_of_nodes):
current_number_of_leaves = len(list_of_nodes)
if is_power_of_2(current_number_of_leaves):
return list_of_nodes
total_number_of_leaves = 2**compute_tree_depth(current_number_of_leaves)
if current_number_of_leaves % 2 == 0:
for i in range(current_number_of_leaves, total_number_of_leaves, 2):
list_of_nodes = list_of_nodes + [list_of_nodes[-2], list_of_nodes[-1]]
else:
for i in range(current_number_of_leaves, total_number_of_leaves):
list_of_nodes.append(list_of_nodes[-1])
return list_of_nodes

And that’s it, you’ve created your Merkle Tree in Python.

Note: This implementation of Merkle Trees is definitely not the most efficient as I purposefully tried to not to look at what already existed. As a fun exercise, I would suggest you to create your own implementation.

A more secure blockchain

While the blockchain we created is awesome, it contains a serious security flaw. When Alice wants to make a transaction, we are not validating that she is the true originator of the transaction nor that she has the funds to complete it. In the next section, we will take a closer look at transactions security and how we can solve this problem using public key cryptography.

Code repository

Create your own blockchain using Python

References

--

--

Guillaume Belanger

Guillaume is a software developer from Montreal who writes about bip bop stuff.