Polygon unpacks zkEVM outage and ‘emergency’ upgrade
Polygon’s zkEVM rollup entered an emergency state on March 23 after the chain stopped processing blocks. Last week, Polygon released an analysis of the outage that took its zkEVM down.
Polygon reported late Saturday that its zkEVM sequencer had failed, attributing the cause to a reorg of the Ethereum mainnet. The postmortem states that the synchronizer for Polygon zkEVM mishandled the reorg, and the chain started processing batches of transactions with incorrect timestamps.
A blockchain reorganization, or reorg, happens when a network discards blocks from a previous version and adopts a new chain of blocks as the accurate history of transactions. This can occur when different parts of the network temporarily disagree on recent transaction order or validity. The network eventually converges on a single version of the truth, choosing the chain of blocks with the most computational work (or, in some consensus mechanisms, the most stake backing it) as the correct sequence. Reorgs help to ensure that the network remains secure and consistent, though it can lead to temporary uncertainties in transaction histories.
Read more: Polygon zkEVM prover reaches Type 1 status
Polygon zkEVM’s sequencer, which assembles and broadcasts batches of transactions to the network nodes, plays a crucial role in the ecosystem. It is not only responsible for ordering transactions but also assigns them timestamps, critical for maintaining a coherent history of transactions on the network. However, during the incident, the sequencer did not detect this reorg on the Ethereum mainnet.
This led to the processing of transactions based on outdated information, adding an incorrect global exit route — Polygon zkEVMs cryptographic method to ensure Ethereum layer-1 and Polygon’s layer-2 agreement — to the next transaction block. The resultant mismatch caused inaccuracies, disrupting the network’s ability to maintain a consistent record of the network’s ledger.
During the confusion, a number of transactions were processed by the network as no-operation, with the network acting “as if they were not there,” the post-mortem said. Synchronizing these empty transactions was part of what made the network’s downtime extend as long as it did.
All-told, it took around 14 hours for the rollup to reopen for withdrawals, and around 4,000 transactions may have been affected by the outage, Polygon said.
The outage only affected Polygon zkEVM, which is in so-called mainnet beta and therefore could have bugs and availability issues, according to Polygon’s support page. The Polygon PoS layer-1 and chains using Polygon’s chain development kit (CDK) were not impacted.
Polygon’s zkEVM, short for zero-knowledge Ethereum Virtual Machine, is a layer-2 that uses zero-knowledge proofs to verify transactions with the Polygon-operated sequencer before settling batches of transactions on Ethereum. StarkWare and zkSync are building similar products. This is all in contrast to popular layer-2s like Arbitrum and Optimism, which function as optimistic rollups and don’t use zero-knowledge proofs.
Read more: StarkWare’s zero-knowledge prover Stwo comes out of stealth
Shortly after the outage began, Polygon’s zkEVM went into an emergency state that halted the network and let an upgrade to the network’s prover and verifier be pushed through alongside a fix to keep the sequencer from malfunctioning during a reorg.
Upon completing this upgrade, the emergency state was deactivated, meaning any further updates to the system are subject to a 10-day timelock before taking effect.
This event marked the first time Polygon’s zkEVMs emergency state function was used, and doing so required approval from the network’s Security Council. When activated, this state halts certain operations, including the updating of transactions and the functioning of the blockchain bridge, to protect users’ assets.
The council operates as a 6/8 multisig. Polygon says two of the council members are from Polygon Labs, but the identities of the Security Council members are not publicly disclosed.
A 2023 blog post said the group is “made up of highly reputable members of the Ethereum community” and will be dismantled once Polygon zkEVM is ready to ditch its “training wheels.”
The outage and its remediation points to lingering centralization issues across Ethereum layer-2 networks, according to Jarrod Watts, a developer relations engineer at Polygon Labs:
“The worst case scenario is an upgrade is performed that steals user funds,” he said on X. “This kind of malicious upgrade is something that is possible on almost every [layer-2] today.”
But Watts added, “over time, IMO the risk of malicious upgrades to steal user funds on [layer-2s] will be zero or very close to zero!”
Polygon zkEVM has struggled to gain total volume locked (TVL) and usage, and has yet to benefit from Ethereum’s Dencun hard fork. The chain is slated to add support for the “blobspace” of EIP-4844 in about a month’s time, following its “Feijoa” upgrade.