Zero knowledge proofs (ZKPs) are remarkable cryptographic constructs with exciting applications in privacy and verifiability. However, the difficulty of writing circuits and high cost of proving greatly hinders its adoption. Many promising startups in the industry are tackling the first challenge by creating zkVMs with automatic circuit generation, and we at Giro Labs are working hard to solve the second one with our decentralized proving protocol.
High cost of proving comes from the fact that proving a statement is often orders of magnitude slower than executing the same statement while also having a similar blowup in memory footprint. As a result, expensive GPU servers with hundreds of GBs of RAMs, whether on premise or rented from the cloud, are required for proving large circuits.
There are many strategies for cutting down cost, ranging from dedicated hardware accelerators to theoretical research into more efficient proof systems. However, we have identified something that we can work on and deliver today: ZKP has a hardware gatekeeping problem, and we can solve it by splitting the proving workload into smaller components that can be completed on consumer-grade hardware.
Given the diminishing return of performance per buck on high-end hardware, proof splitting allows us to deliver similar proving latency and throughput with a cluster of regular machines compared to a single powerful workstation, at a much lower price point. Moreover, it enables us to tap into the vast market of idle consumer hardware in abandoned mining farms, off-hour gaming lounges, PCs in people’s homes and even mobile devices in their pockets. With compute power scattered around the world aggregated into a decentralized compute pool, not only can we further drive down cost, we also get the traditional decentralization goodies: censorship resistance and network liveness.
There are two approaches to proof splitting: we either split the statement to be proven or the proving workload itself. With the first approach, a ZKP is generated for each sub-statement and then aggregated into a single proof at the end using proof recursion. Although conceptually simple, this approach can only be applied on a case-by-case basis because it depends on the decomposability of the statement. In contrast, the second approach is statement-agnostic, but it does require careful modification of the proof system in use. We are taking the second approach for its generality, but note that those two approaches are not mutually exclusive and can be employed simultaneously to maximize proving efficiency.
More concretely, we are working on two splitting strategies powering our work-in-progress halo2 splitter, with plans to extend them to other proof systems in the future.
Target expensive and widely used cryptographic primitives such as MSMs and NTTs. For example, a single MSM used to commit a high-degree polynomial could take minutes to complete without hardware acceleration, but by decomposing it into smaller MSMs and distributing them across many nodes, we can scale linearly in the number of nodes with little to no overhead.
Divide proving workload by required input polynomials. For example, to produce a commitment for some advice column, only that advice column itself is required as input, so a natural way to divide the proving workload is to split the halo2 trace table vertically into column regions and assign each node a different region to work on. With some modifications, the same technique can be applied to constraint polynomials as well.
We also recognize that there will be challenges. For example, connectivity within a decentralized network can be slow and unreliable, so we are employing techniques such as independent circuit synthesis to minimize data transfer over the network at the cost of some duplicated compute. In addition, it is more difficult to effectively split some work than others, and among the most challenging is the construction and evaluation of h polynomials towards the end of the halo2 proving process. Unlike advice commitments that can be divided column by column, working with h polynomials involves more complex dependencies and incurs larger communication volume.
Despite those technical hurdles, we are on track to deliver an alpha release targeted for April 2024. The release is aimed to compatible with all prominent halo2 forks used in the industry, and by extension will support seamless integration into existing and upcoming halo2 projects. We will share more details in the coming weeks.
With proof splitting, a single proving task may be broken down into many child tasks distributed across multiple prover nodes, which requires careful coordination to complete successfully and in a timely fashion. In addition, the fact that prover nodes are not under our control but come from a decentralized network introduces further complexities. Hardware configurations vary across nodes, making it difficult to saturate their capacity. Network connection may be slow or unstable, making data transfer between nodes expensive. Even worse, nodes could behave maliciously and sabotage the network by withholding proofs or generating false proofs.
To tackle those challenges, our proof orchestrator will be equipped with the following tools.
A workload analyzer that estimates the minimum hardware requirement for each proof stage and the time required to complete each stage on any combination of hardware configurations.
A task distributor that collocates child tasks originating from the same proving task within a set of nodes with optimal network connections to each other (for example, they should be geographically close) to minimize communication cost.
An availability forecaster that predicts how responsive a node will be given its past behavior so that we can provision necessary redundancy to ensure the overall success rate
To kickstart the prover network, we will launch an alpha testnet in May 2024. As more nodes join the network, we can continuously enhance our orchestration algorithm by accumulating more data on workload performance profile and node characteristics, which in the future will enable us to deliver advance features such as customized proving SLAs that allow users to specify maximum tolerable latency and failure rates.
We envision that a specialized L1 blockchain will tie everything together in the end. Validators of the chain will run our orchestration algorithm to generate and assign child proving tasks, and nodes in our decentralized prover network will constantly monitor blockchain state to claim and run tasks in a timely manner. The generated proof will be recorded on chain after being verified by validators, and the record will entitle all contributing nodes to their fair share of our native tokens as proving reward. On the other hand, validators will also detect node misbehavior and slash accordingly.
With this approach, our entire proving protocol is fully decentralized, as we invite people with commodity hardware to not only join as compute nodes in the proving network but also as validators for the L1 blockchain.