Summary:
Delphi Digital recently released a report titled “The Dawn of Bitcoin Programmability: Paving the Way for Rollups,” which outlines key concepts related to Bitcoin Rollups, including the BitVM suite, OP_CAT and Covenant restrictions, the Bitcoin ecosystem’s DA layer, bridges, and four major Layer 2 solutions using BitVM: Bitlayer, Citrea, Yona, and Bob. While the report provides an overview of Bitcoin Layer 2 technology, it remains quite general and lacks detailed descriptions, making it somewhat hard to grasp. Geek Web3 expanded on the Delphi report to help more people systematically understand technologies like BitVM.
We will be working with the Bitlayer research team and the BitVM Chinese community to launch a series called “Approaching BTC.” This series will focus on key topics such as BitVM, OP_CAT, and Bitcoin cross-chain bridges, aiming to demystify Bitcoin Layer 2 technologies for a broader audience and pave the way for more enthusiasts.

A few months ago, Robin Linus, the leader of ZeroSync, published an article titled “BitVM: Compute Anything on Bitcoin,” officially introducing the concept of BitVM and pushing forward Bitcoin Layer 2 technology. This is considered one of the most revolutionary innovations in the Bitcoin ecosystem, sparking significant interest and activity in the Bitcoin Layer 2 space. It has attracted notable projects like Bitlayer, Citrea, and BOB, bringing new energy to the market. Since then, more researchers have joined in to improve BitVM, resulting in several iterative versions such as BitVM1, BitVM2, BitVMX, and BitSNARK. The general overview is as follows:
At present, the construction of the BitVM-related developer ecosystem is becoming increasingly clear, and the iterative improvement of peripheral tools is also visible to the naked eye. Compared with last year, today’s BitVM ecosystem has become “vaguely visible” from the initial “castle in the air”, which has also attracted more and more people. More developers and VCs are rushing into the Bitcoin ecosystem.
For most people, understanding BitVM and the technical terms related to Bitcoin Layer 2 is not easy. It requires a systematic grasp of foundational knowledge, especially Bitcoin scripts and Taproot. Existing online resources are often either too lengthy and filled with irrelevant details or too brief to be clear. We aim to solve these issues by using clear and concise language to help more people understand the fundamental concepts of Bitcoin Layer 2 and build a comprehensive understanding of the BitVM system.

MATT and Commitments: The Core Concept of BitVM
The core concept of BitVM revolves around MATT, which stands for Merkleize All The Things. This approach uses a Merkle Tree, a hierarchical data storage structure, to represent the execution of complex programs. It aims to enable native fraud proof verification on the Bitcoin network. MATT can capture the details of a complex program and its data processing activities, but it does not publish these extensive data directly on the Bitcoin blockchain due to their large size. Instead, the MATT approach stores this data in an off-chain Merkle tree and only publishes the Merkle Root (the topmost summary of the Merkle tree) on the blockchain. The Merkle tree primarily contains three key components:

(A simple schematic diagram of Merkle Tree. Its Merkle Root is calculated from the 8 data fragments at the bottom of the picture through multi-layer hashing)
Under the MATT scheme, only the extremely small Merkle Root is stored on the chain, and the complete data set contained in the Merkle Tree is stored off-chain. This uses an idea called “commitment”.Here is an explanation of what “Commitment” is.
A promise is like a succinct statement, we can understand it as the “fingerprint” obtained after compressing a large amount of data. Generally speaking, the person who issues a “commitment” on the chain will claim that certain data stored off-chain is accurate. These off-chain data should correspond to a concise statement, and this statement is the “commitment.”
At some point, the hash of the data can be used as a “commitment” to the data itself. Other commitment schemes include KZG commitment or Merkle Tree. In Layer2’s usual fraud proof protocol,The data publisher will publish the complete data set off-chain and a commitment to publish the data set on-chain. If someone discovers invalid data in the off-chain data set, the on-chain data commitment will be challenged.
Through Commitment, the second layer can compress a large amount of data and only publish its “commitment” on the Bitcoin chain. Of course, it is also necessary to ensure that the complete data set released off-chain can be observed by the outside world.

Currently, major BitVM schemes such as BitVM0, BitVM1, BitVM2, and BitVMX all follow a similar abstract structure:
Understanding Bitcoin can be more challenging than Ethereum, as even the simplest transactions involve several key concepts. These include UTXO (Unspent Transaction Output), locking scripts (also known as ScriptPubKey), and unlocking scripts (also known as ScriptSig). Let’s break down these fundamental concepts first.

(A sample of Bitcoin script code consists of lower-level opcodes compared to high-level languages) Ethereum’s method of asset representation is akin to systems like Alipay or WeChat, where each transaction simply adjusts the balances of different accounts. This account-based approach treats asset balances as just numbers associated with accounts. In contrast, Bitcoin’s asset representation is more like dealing with gold, where each piece of gold (UTXO) is tagged with an owner. A Bitcoin transaction essentially destroys the old UTXO and creates a new one, with the ownership changing in the process. A Bitcoin UTXO includes two key components:

(The unlocking script must match the locking script) In Bitcoin transactions, each transaction consists of multiple Inputs and Outputs. Each Input specifies a UTXO to unlock and provides an unlocking script to do so, which then unlocks and destroys the UTXO. The Outputs of the transaction show the newly created UTXOs and publicly display the associated locking scripts. For instance, in a transaction’s Input, you prove you are Sam by unlocking multiple UTXOs that others have sent you, destroying them in the process. Then, you create multiple new UTXOs and specify that xxx can unlock them in the future.

Specifically, in the Input data of a transaction, you need to declare which UTXOs you intend to unlock and specify the “storage location” of these UTXO data. It’s important to understand that Bitcoin and Ethereum handle this differently. Ethereum uses contract accounts and Externally Owned Accounts (EOAs) to store data, with asset balances recorded as numbers under these accounts. All this information is stored in a database called the “world state.” When a transaction occurs, the “world state” directly updates the balances of specific accounts, making it easy to locate the data. In contrast, Bitcoin does not have a “world state.” Instead, asset data is distributed across previous blocks as unspent UTXOs, stored individually in the Output of each transaction.

If you want to unlock a certain UTXO, you must indicate which transaction’s Output the UTXO information exists in the past, and show the ID of the transaction (which is its hash).Let the Bitcoin node look for it in the history. If you want to query the Bitcoin balance of a certain address, you need to traverse all blocks from the beginning to find the unlocked UTXO associated with the xx address.
When you usually use a Bitcoin wallet, you can quickly check the Bitcoin balance owned by a certain address. This is often because the wallet service itself indexes all addresses by scanning blocks, making it easier for us to query quickly.

(When you create a transaction to transfer your UTXO to someone else, you need to specify the location of that UTXO in Bitcoin’s transaction history by referencing the transaction hash/ID it belongs to.) Interestingly, Bitcoin transaction results are computed off-chain. When users generate transactions on their local devices, they must create all Inputs and Outputs beforehand, effectively calculating the transaction’s outputs. The transaction is then broadcast to the Bitcoin network, verified by nodes, and added to the blockchain. This “off-chain computation — on-chain verification” model is entirely different from Ethereum’s. On Ethereum, you only need to provide transaction input parameters, and the transaction results are calculated and output by Ethereum nodes. Moreover, the locking script of a UTXO can be customized. You can set a UTXO to be “unlockable by the owner of a specific Bitcoin address,” requiring the owner to provide a digital signature and public key (P2PKH). In Pay-to-Script-Hash (P2SH) transactions, you can add a Script Hash to the UTXO’s locking script. Anyone who can submit the script corresponding to this hash and meet the conditions specified in the script can unlock the UTXO. The Taproot script, which BitVM relies on, uses features similar to those in P2SH.
To understand the triggering mechanism of Bitcoin scripts, we’ll start with the P2PKH example, which stands for “Pay to Public Key Hash.” In this setup, the locking script of a UTXO contains a public key hash, and to unlock it, the corresponding public key must be provided. This mechanism aligns with the standard process of Bitcoin transactions. In this context, a Bitcoin node must verify that the public key in the unlocking script matches the public key hash specified in the locking script. Essentially, it checks that the “key” provided by the user fits the “lock” set by the UTXO. In more detail, under the P2PKH scheme, when a Bitcoin node receives a transaction, it combines the user’s unlocking script (ScriptSig) with the locking script (ScriptPubKey) of the UTXO to be unlocked and then executes this combined script in the Bitcoin script execution environment. The image below illustrates the concatenated result before execution:

Readers might not be familiar with the BTC script execution environment, so let’s briefly introduce it. Bitcoin scripts consist of two elements: data and opcodes. These elements are pushed onto a stack sequentially from left to right and executed according to the specified logic to produce the final result (for an explanation of what a stack is, readers can consult ChatGPT). In the example above, the left side shows the unlocking script (ScriptSig) provided by someone, which includes their digital signature and public key. The right side shows the locking script (ScriptPubKey), which contains a series of opcodes and data set by the UTXO creator when generating that UTXO (understanding the general idea is enough; we don’t need to delve into each opcode’s meaning). The opcodes in the right-side locking script, such as DUP, HASH160, and EQUALVERIFY, hash the public key from the left-side unlocking script and compare it to the preset public key hash in the locking script. If they match, it confirms that the public key in the unlocking script matches the public key hash in the locking script, passing the first verification. However, there’s an issue: the locking script’s content is publicly visible on the blockchain, meaning anyone can see the public key hash. Therefore, anyone could submit the corresponding public key and falsely claim to be the authorized person. To address this, after verifying the public key and the public key hash, the system must also verify whether the transaction initiator actually controls the public key, which involves verifying the digital signature. The CHECKSIG opcode in the locking script handles this verification. In summary, under the P2PKH scheme, the transaction initiator’s unlocking script must include the public key and digital signature. The public key must match the public key hash specified in the locking script, and the digital signature must be correct. These conditions must be met to successfully unlock the UTXO.

(This is a dynamic illustration: A diagram of Bitcoin unlocking scripts under the P2PKH scheme
Source: https://learnmeabitcoin.com/technical/script)  
It’s important to note that the Bitcoin network supports various transaction types beyond Pay to Public Key/Public Key Hash, such as P2SH (Pay to Script Hash). The specific type of transaction depends on how the locking script is configured when the UTXO is created.

It’s important to understand that under the P2SH scheme, the locking script can preset a Script Hash, and the unlocking script must provide the complete script content that corresponds to this Script Hash. The Bitcoin node can then execute this script, and if it includes multi-signature verification logic, it effectively enables multi-signature wallets on the Bitcoin blockchain. In the P2SH scheme, the UTXO creator needs to inform the person who will unlock the UTXO in the future about the script content corresponding to the Script Hash. As long as both parties are aware of the script content, we can implement even more complex business logic than just multi-signature. It’s also worth noting that the Bitcoin blockchain does not directly record which UTXOs are linked to which addresses. Instead, it records which UTXOs can be unlocked by which public key hash or script hash. However, we can quickly derive the corresponding address (the string of characters that looks like gibberish displayed in wallet interfaces) from the public key hash or script hash.

The reason you can see the amount of Bitcoin associated with a specific address on block explorers and wallet interfaces is that these services parse and interpret the blockchain data for you. They scan all blocks and, based on the public key hash or script hash declared in the locking scripts, calculate the corresponding “address.” This allows them to display how much Bitcoin is associated with that address.
Understanding P2SH brings us closer to Taproot, a crucial component for BitVM. However, before diving into Taproot, it’s essential to grasp the concept of Witness and Segregated Witness (SegWit). Reviewing the unlocking and locking scripts, as well as the UTXO unlocking process, highlights an issue: the digital signature for a transaction is included in the unlocking script. When generating this signature, the unlocking script itself cannot be part of the data being signed (as the parameters used to generate the signature cannot include the signature itself).
Consequently, the digital signature can only cover parts of the transaction data outside of the unlocking script, meaning it can’t fully protect the entire transaction data. This leads to a vulnerability where an intermediary can slightly modify the unlocking script without affecting the signature verification. For example, Bitcoin nodes or mining pools could insert additional data into the unlocking script. Although this alteration doesn’t impact the verification and outcome of the transaction, it slightly changes the transaction data, which in turn alters the calculated transaction hash/transaction ID. This issue is known as transaction malleability.
The problem with this is that if you plan to initiate multiple sequential transactions that depend on each other (for instance, transaction 3 references the output of transaction 2, and transaction 2 references the output of transaction 1), the subsequent transactions must reference the hashes of the preceding transactions. Any intermediary, like a mining pool or Bitcoin node, can make slight modifications to the unlocking script, causing the transaction hash to differ from your expectation once it is on the blockchain.
This discrepancy can invalidate your pre-planned sequence of interdependent transactions. This issue is particularly relevant in the context of DLC bridges and BitVM2, where batches of sequentially related transactions are constructed, making such scenarios quite common.

In simple terms, the transaction malleability problem occurs because the transaction ID/hash calculation includes data from the unlocking script. Intermediaries, such as Bitcoin nodes, can make slight modifications to the unlocking script, resulting in a transaction ID that doesn’t match the user’s expectations. This issue stems from early design limitations in Bitcoin. The Segregated Witness (SegWit) upgrade addresses this problem by decoupling the transaction ID from the unlocking script. With SegWit, the transaction hash calculation excludes the unlocking script data. UTXO locking scripts under SegWit start with an “OP_0” opcode as a marker, and the corresponding unlocking script is renamed from SigScript to Witness.

By adhering to Segregated Witness (SegWit) rules, the transaction malleability issue is effectively resolved, eliminating concerns about transaction data being tampered with by Bitcoin nodes. The functionality of P2WSH (Pay to Witness Script Hash) is essentially the same as P2SH (Pay to Script Hash). You can preset a script hash in the UTXO locking script, and the person submitting the unlocking script will provide the corresponding script content to the chain for execution. However, if the script content you need is very large and contains a lot of code, conventional methods may not allow you to submit the entire script to the Bitcoin blockchain (due to block size limits). In such cases, Taproot comes into play. Taproot enables the compression of on-chain script content, making it possible to handle larger scripts. BitVM leverages Taproot to build more complex solutions. In the next article of our “Approaching BTC” series, we will provide detailed explanations of Taproot, pre-signature, and other advanced technologies related to BitVM. Stay tuned!





