Why Lean?
-
The Threat: AI find exploit fast
-
AIs like Mythos can find bugs really fast.
-
TODO: Attack /Defence asymmetry tilting.
-
-
The Opportunity: Formal method used to be costly
-
AI now makes it cheap
-
-
The Need: Automate coding beyond human reasoning
-
People might want to work with 20 agents, but still want some guarantees -- Interactive Theorem Proving helps. People call it vericoding code paper
-
The Lean Atlas paper argues that if we can trust Lean’s type checking and Mathlib, 95%~99% of the proofs need no human review (depending on whether the codebase is definition-heavy or proof-heavy). Naively this translates to a 20x~100x improvement in review efficiency. code demo
-
My experience
-
Lean Natural Number Game is a good place to start. It’s a fun game to play. Lean is like a chess game. You have a game board with a goal, and then you slowly apply theorems in your toolbox, to move the shape of statements to the final goal.
-
Arrow’s Impossibility Theorem is my attempt to verify a real world math paper. I chose a paper with just one page proof. But it takes me painful 2~3 weeks to carry the proof to the final form.
-
There was a struggle to convert between
Fin NandFin (N+1), which is the Intensional equality issue mentioned below.
-
UX Issues in Proving Assistants
Lean 4 / Mathlib
-
Universe polymorphism forces users to occasionally annotate or reconcile universe levels, which is conceptually heavy for newcomers
-
Intensional equality means
T(N+1)andT(1+N)are distinct types, requiring manual coercions or rewrites in places that feel obviously trivial -
Error messages can be cryptic when type unification fails — the system often reports the symptom deep in elaboration rather than the actual source of the mismatch
-
Classical logic (
Classical.em) is assumed silently by Mathlib, meaning non-constructive proofs like the √2 irrationality case work out of the box with no ceremony — a deliberate, pragmatic choice that smooths everyday proof writing
In the following code snippet, we show a non-constructive proof. We show that a irrational power of a irrational could be a rational number. However, we never tell you whether √2 ^ √2 is rational or not in this proof. “by_cases” here uses the p ∨ ¬p Law of exclusion of middle.
You can run the code here
leanimport Mathlib
theorem irrational_pow_irrational_rational :
∃ (a b : ℝ), Irrational a ∧ Irrational b ∧ ¬ Irrational (a ^ b) := by
by_cases h : Irrational (√2 ^ √2)
· -- Case: √2 ^ √2 is irrational → use it as base
refine ⟨√2 ^ √2, √2, h, irrational_sqrt_two, ?_⟩
show ¬Irrational ((√2 ^ √2) ^ √2)
rw[
← Real.rpow_mul (Real.sqrt_nonneg 2),
Real.mul_self_sqrt (Nat.ofNat_nonneg 2),
Real.rpow_ofNat,
Real.sq_sqrt (Nat.ofNat_nonneg 2)
]
exact not_irrational_ofNat 2
· -- Case: √2 ^ √2 is rational → we're already done
exact ⟨√2, √2, irrational_sqrt_two, irrational_sqrt_two, h⟩
Rocq/Coq
The same proof is much longer in Rocq.
-
Classical excluded middle requires an explicit
Require Import Classical, breaking proof flow when doing everyday mathematics; in the √2 case, this is just one line, but the cultural expectation of constructivity historically discouraged it. -
Sparse standard library means even basic facts — such as the irrationality of √2 — are
Admittedor must be proved from scratch, not because Rocq is incapable, but due to lack of community library investment compared to Mathlib.
High level things aside, Lean 4’s devEx is much better than Rocq. Lean 4 is very easy to install with VSCode extensions. And here are some issues I encountered when setting up Rocq.
-
Goals panel shows “Not in proof mode” — unlike Lean 4’s always-live InfoView, the vsrocq plugin requires the cursor to be inside a Proof./Qed. block, and in manual mode (proof.mode: 0) you must advance the proof tip with Alt+↓. Setting vsrocq.proof.mode: 1 gives continuous checking closer to Lean’s behavior.
-
VSCode extension can’t find the language server — vsrocq.path must point to the vsrocqtop binary explicitly; the extension does not search the OPAM environment automatically since VSCode launches outside of it.
-
Two conflicting Rocq installations (Homebrew + OPAM) — make picked up the wrong binary, causing mismatched library paths. Rocq does not have a single canonical package manager like Lean’s elan/lake.
-
Standard library is a separate OPAM package (rocq-stdlib) — not bundled with the core install. From Stdlib Require Export String fails silently until rocq-stdlib is installed explicitly. In Lean 4, Std is managed automatically via lake.
-
Makefile.coq / Makefile.coq.conf build system — Software Foundations uses a generated Makefile (rocq makefile) rather than a declarative build manifest like lakefile.lean. The generated files must be kept in sync with the active Rocq binary.
Cryptography
We’re seeing a trend that people move on beyond pens, papers, Latex, and human review, then craft machine verifiable math proofs.
I’m not sure if this is already helpful in software implementation side. AI says yes, but I’m now incompetent to verify the claim thus I’m skeptical.
VCV-io: Library for Cryptography Security Proofs
-
Repo:
VCV-io— Lean 4 framework for formally verified cryptographic proofs, built on Mathlib -
Organization: Academic / open collaboration; primary authors Devon Tuma and Quang Dao, with contributions from Alexander Hicks, Pietro Monticone, Gabe Robison, Bolton Bailey, and others
-
Language: Lean 4, with Mathlib as the mathematical foundation; C FFI for differential testing against native ML-DSA/ML-KEM/Falcon backends
-
Timeline: March 2024 – present (~26 months); development accelerated sharply in early 2026 (113 commits in April 2026 alone)
-
Goal: Machine-checked security proofs for cryptographic schemes — IND-CPA, EUF-CMA, perfect secrecy, UC emulation, forking lemmas, Fiat-Shamir transforms — quantified over all adversaries, not just sampled inputs
-
Scale: 491 Lean files, ~155,000 lines of Lean; 308 commits; 10+ contributors
-
License: Apache 2.0
Where Software Engineering is Going?
These are reflections after reading the case studies in the next section.
Traditional programming languages are optimized for humans — English-speaking humans especially. But now that agents provide cheap and capable coding skills, the organization is changing. From the point of view of Rich Sutton’s “The Bitter Lesson,” we should do things that favor agent productivity.
The way we channel our intents to implementation is changing. Originally we tried to speak English in a way logical enough that raw machines could understand and turn it into low-level data/compute operations.
Now, since implementations will grow beyond human reasoning, we speak the language of math to solidify our intents. Smart machines write opcodes directly to skip “compilation,” and that favors math.
It also seems the software input/output bug is solved — or must be solved — due to external threats. People now move a level up. We model the attackers and their mental state. This level of modeling used to belong to the academic world and paper writing. But the trend looks like it can cover implementation.
Where Will Vulnerabilities Go?
The arms race dynamic of formal verification doesn’t eliminate vulnerabilities — it relocates them. Each layer you formally verify pushes the attack surface to the boundary above or below it. The clean verification table is the canonical illustration: the constraint system is proven correct, but the serialization to the prover is unverified. The adversary simply targets the seam. This is close to a conservation law: verification hardens one layer, and the boundary becomes the most attractive target.
Concretely, the seams that remain:
-
Spec-to-intent gap. Formal verification proves your implementation matches your spec — not that your spec matches your intent. A subtly wrong spec is invisible to the type checker. The Circom
IsZeromissing-constraint bug would have failed to compile inclean, but only because the spec was written to catch it. This gap lives in human cognition, not in any tool. -
Trusted computing base. Lean’s kernel, Mathlib, the RISC-V semantics model — these are the axioms everything rests on. The lean-zip fuzzing campaign illustrates the irony: 105 million executions found no memory vulnerabilities in the verified library, but found a bug in Lean 4’s own runtime.
-
Composition boundaries. Individual components get verified; emergent behaviors from composition are harder. Each EVM opcode gets a proof, but reentrancy patterns and cross-contract state arise from opcode sequences. Verifying ADD doesn’t verify a flash loan attack.
-
Social and supply chain. If an agent writes both the spec and the proof, who audits the agent? A subtly manipulated model could write a spec with a deliberate hole — a software supply chain attack, one level up, disguised as a rigorous proof.
Case studies
Lean-zip: Formal Verification of a Compression Library in Lean 4
Background
-
Repo:
kim-em/lean-zip— a Lean 4 library for zlib, gzip, DEFLATE, and ZIP archive formats -
Author: Kim Morrison, core developer at the Lean Focused Research Organization (FRO)
-
AI co-contributor: Claude Code is listed as a GitHub contributor; ~660 AI-assisted sessions produced ~653 merged PRs
-
Timeline: February 19 – April 22, 2026 (roughly two months)
-
Goal: Not just a working library — a fully proved-correct one, with zero unfinished proof obligations (“sorries”) at completion
-
Toolchain: Lean 4 (v4.29.1), with specs in
Zip/Spec/(42 files, 20,606 lines) and native code inZip/Native/
Highlights
-
Proving costs 6–20× more than writing. Writing the DEFLATE decompressor took 4 sessions; proving it correct took 25. The compressor and full roundtrip proofs consumed ~80 sessions. The implementation is the easy part.
-
The capstone theorem is a universal guarantee.
inflate_deflateRawstates that for every input under 1 GiB, compress-then-decompress returns the original data exactly — a claim no test suite can make. The 1 GiB bound is an explicit zip-bomb guard, not a proof limitation; formal verification forces implicit assumptions to become named preconditions.
lean/-- Unified DEFLATE roundtrip: inflate ∘ deflateRaw = identity.
This is the Phase B4 capstone theorem from PLAN.md. Generalized to any
`maxOutputSize` large enough to hold the input. -/
theorem inflate_deflateRaw (data : ByteArray) (level : UInt8)
(maxOutputSize : Nat) (hsize : data.size < maxOutputSize) :
Zip.Native.Inflate.inflate (deflateRaw data level) maxOutputSize = .ok data := by
unfold deflateRaw
split
· exact inflate_deflateStoredPure data _ (by omega)
· split
· exact inflate_deflateFixedIter data _ (by omega)
· split
· exact inflate_deflateLazyIter data _ hsize
· exact inflate_deflateDynamic data _ (by omega)
-
Proof quality is first-class engineering. A campaign to eliminate fragile bare
simptactics (which silently break as Lean’s library evolves) required 30 pull requests to reduce ~129 occurrences to zero. Large proof files were split into focused modules. Reusable lemmas were extracted. The same discipline applied to code applies to proofs. -
Security fell out as a byproduct. Specifying what a “valid” ZIP entry is automatically enumerates everything it isn’t. The project closed dozens of security dimensions — NUL-byte injection, ZIP64 field smuggling, malformed EOCD consistency — and the malformed-fixture test suite grew from 12 to 47 entries as proof work surfaced new edge cases. These are exactly the bugs testing rarely finds.
-
AI handles implementation; humans handle architecture. The ~1:1 session-to-PR ratio shows a tight loop where each session yielded a mergeable contribution. Claude Code was effective at writing tactics, decomposing theorems, and mapping concepts into Lean’s type system. The human expert determined what to prove and diagnosed deep failures — a division of labor that may define AI-assisted formal verification going forward.
Claude Code investigation shows:
-
Structural invariants that are untestable. The Kraft inequality proof (nextCodes_plus_count_le) verifies that the canonical Huffman code assignment never overflows its code space at any bit length — a property about the internal state of a loop over arbitrary frequency distributions. No finite test set can cover all valid distributions.
-
Prefix-freedom for all code tables. canonical_prefix_free proves that no Huffman codeword is a prefix of another, for every valid length assignment. This property is combinatorially intractable to test exhaustively (all pairs of symbols, across all valid tables).
-
Bridging the imperative/spec gap. BitstreamCorrect.lean formally proves that the C-style BitReader (byte array + position/offset cursor) tracks the spec-level List Bool bit-by-bit at every step. Tests verify output correctness; the proof verifies that the cursor arithmetic is correct at every intermediate position — the source of the hardest-to-find off-by-one bugs in bit-packing code.
-
Algebraic checksum compositionality. Both CRC32 and Adler-32 specs prove checksum(xs ++ ys) = f(checksum(xs), ys) for all inputs. Beyond being untestable exhaustively, this identity is what enables larger proofs to compose — the gzip roundtrip theorem reuses it directly rather than re-reasoning about the full byte stream.
-
Proofs compose; tests don’t. The capstone gzip theorem (gzip_decompressSingle_compress) assembles from ~15 sub-theorems across bitstream, Huffman, LZ77, block-framing, and checksum layers. Each layer is independently verified and then reused. In a test suite, the same logic would be re-implemented in the test harness — which is itself unverified.
-
Scale: 20,606 lines of spec across 42 files, zero sorrys, ~653 merged PRs — the project demonstrates that this level of verification is achievable incrementally over ~660 sessions, not just in academic one-off proofs.
-
Bugs could still be found: 105 million fuzzing executions found:
-
No memory vulnerabilities.
-
A bug in Lean 4’s runtime.
-
A denial-of-service in a unverified archive parser.
-
Clean: Formal Verification of ZK Circuits
Background
-
Repo:
Verified-zkEVM/clean— Lean 4 library for writing and formally verifying ZK circuits -
Organization: zkSecurity, under the Verified-zkEVM grant program
-
Language: Lean 4, with a Rust backend (Plonky3)
-
Timeline: July 2024 – present (~22 months of active development, steep ramp-up in 2025)
-
Goal: Machine-checked proofs for ZK circuit correctness — soundness, completeness, and field wrap-around safety — over all possible field elements and adversarial witnesses, not just tested inputs
-
Scale: 26,404 lines of Lean across 153 files; ~800 commits/month peak in mid-2025, settled at ~200/month
Highlights
-
Familiar syntax, proof obligation attached. You write circuits with
do-notation — the same structure a Circom developer would recognise. The original Circom source is embedded as a comment alongside the Lean translation, making the correspondence explicit. The difference: you must also supply a machine-checked proof that the constraints do what the comment says.
lean-- Circom original (embedded as a comment):
-- inv <-- in!=0 ? 1/in : 0;
-- out <== -in*inv +1;
-- in*out === 0;
def main (input : Expression (F p)) := do
let inv ← witness fun env =>
let x := input.eval env
if x ≠ 0 then x⁻¹ else 0
let out <== -input * inv + 1
input * out === 0
return out
def circuit : FormalCircuit (F p) field field where
main
localLength _ := 2
Spec input output :=
output = (if input = 0 then 1 else 0)
soundness := by
circuit_proof_start
simp only [id_eq, h_holds]
split_ifs with h_ifs
. simp only [h_ifs, zero_mul, neg_zero, zero_add]
. rw [neg_add_eq_zero]
have h1 := h_holds.left
have h2 := h_holds.right
rw [h1] at h2
simp only [id_eq, mul_eq_zero] at h2
cases h2
case neg.inl hl => contradiction
case neg.inr hr =>
rw [neg_add_eq_zero] at hr
exact hr
completeness := by
circuit_proof_start
cases h_env with
| intro left right =>
simp only [left, id_eq, ite_not, mul_ite, mul_zero] at right
simp only [id_eq, right, left, ite_not, mul_ite, mul_zero, mul_eq_zero, true_and]
split_ifs <;> aesop
-
Proof is part of the type. A
FormalCircuitbundles the circuit with machine-checked soundness and completeness proofs. You cannot instantiate one without supplying both — the type system rejects incomplete definitions at compile time.
leanstructure FormalCircuit (F : Type) [Field F] (Input Output : TypeMap) where
main : Var Input F → Circuit F (Var Output F)
Assumptions : Input F → Prop
Spec : Input F → Output F → Prop
soundness : Soundness F ...
completeness : Completeness F ...
-
Universal quantification over adversaries. The soundness definition quantifies over
∀ env : Environment F— every possible witness assignment, including adversarial ones. This is a mathematical statement that no cheating prover can satisfy the constraints without satisfying the spec. The CircomIsZerobug (missingin*out === 0) would make this proof fail to compile, not just fail a test.
leandef Soundness ... :=
∀ offset : ℕ, ∀ env : Environment F,
∀ input_var : Var Input F, ∀ input : Input F, eval env input_var = input →
Assumptions input →
ConstraintsHold.Soundness env (circuit.main input_var |>.operations offset) →
let output := eval env (circuit.output input_var offset)
Spec input output ∧ ...
-
Even the simplest Boolean gadget comes with a proof.
assertBool— thex * (x - 1) = 0constraint every Circom circuit uses — is formally proven equivalent tox = 0 ∨ x = 1via a Mathlib lemma. TheNoZeroDivisorsconstraint is load-bearing: it is what lets Lean concludex = 0 ∨ x - 1 = 0from a zero product. The type system enforces this assumption is in scope.
leandef assertBool : FormalAssertion (F p) field where
main (x : Expression (F p)) := assertZero (x * (x - 1))
Spec (x : F p) := IsBool x
soundness := by circuit_proof_all [IsBool.iff_mul_sub_one, sub_eq_add_neg]
completeness := by circuit_proof_all [IsBool.iff_mul_sub_one, sub_eq_add_neg]
-
Field wrap-around bugs caught at compile time.
ByteDecompositionrequiresp_large_enough : Fact (p > 2^16 + 2^8)as a type-level assumption. Without it,ZMod.val_mul_of_ltdoesn’t apply and the proof fails to compile. You cannot accidentally deployByteDecompositionwith a too-small prime — the compiler enforces it. This is the class of bug that looks correct for small test values but wraps silently nearp.
leanhave : (2^n * x).val = 2^n * x.val := by
rw [ZMod.val_mul_of_lt (by linarith), h_mul_x]
-
Verified composition: proven once, trusted everywhere. Once a subcircuit is formally verified, it is a trusted black box.
IsEqualcallsIsZeroas a subcircuit; its soundness proof invokesIsZero’s proven spec without re-examining internals. In Circom, reusing a component still requires re-auditing it in every new context. Here, the proof already covers all cases.
leandef main (input : Expression (F p) × Expression (F p)) := do
let diff := input.1 - input.2
let out ← IsZero.circuit diff -- treated as a verified black box
return out
soundness := by
circuit_proof_start [IsZero.circuit]
rw [← h_input]
...
-
The trusted gap at the backend. The formal proofs cover the abstract constraint system inside Lean. Getting those constraints into an actual prover requires JSON serialization (witness generators are silently dropped:
| .witness m _ => ...— Json.lean#L49–L54) and a Rust re-interpretation layer — both unverified. The proofs guarantee the polynomial constraint system is correct; they do not guarantee the bitstring submitted to the verifier is the same system.
| Layer | Verified? |
|---|---|
| Lean kernel (type checker) | Yes |
FormalCircuit.soundness / completeness |
Yes |
toJson serialization |
No |
| Rust backend AST interpretation | No |
| Plonky3 proof system soundness | No |
-
Proof automation is still growing. The
circuit_proof_starttactic handles routine setup, but complex gadgets likeLessThanstill require substantial manual case analysis navigating variable offset arithmetic (env.get (i₀ + n + 1)). Proof automation improvements are listed as ongoing work in the roadmap.
evm-asm: Formally Verified EVM as RISC-V Assembly
Background
-
Repo:
Verified-zkEVM/evm-asm— Lean 4 verified macro assembler implementing the Ethereum EVM as RV64IM RISC-V assembly -
Organization: zkSecurity, Verified-zkEVM grant program. Principal author: Yoichi Hirai (~96% of commits)
-
Language: Lean 4; target runtime is RV64IM RISC-V (the SP1 zkVM substrate)
-
Timeline: February 2026 – present; ~4,550 commits in May 2026 alone, consistent with “200–600 commits per day” driven by AI agents
-
Goal: Machine-checked proofs that a RISC-V assembly program correctly implements the Ethereum state transition function, for use as a formally verified zkEVM guest
-
Scale: 9,904 Lean files, ~1.8M lines; 52+ EVM opcodes proved; 66 conformance vectors passing as theorems
Highlights
-
Why verify at the assembly level? Every language, compiler, and optimizer eventually produces machine instructions. Verifying there sidesteps all higher-level problems: C/C++ undefined behavior, Rust’s lack of a stable spec, and the 10–20% overhead of formally verified compilers like CompCert. RISC-V has no undefined behavior — every instruction has a total, formal semantics. In a zkVM context this bites even harder: if the guest program has a bug, the resulting SNARK proof is still valid. It just proves the wrong thing. Formal verification of the guest is the only defense.
-
A three-level proof pyramid for every opcode. Each of the 52+ opcodes is verified bottom-up. The 256-bit ADD illustrates the structure: Level 1 proves each 5-instruction limb group manipulates the right register and memory cell. Level 2 composes four limb proofs into a full 30-instruction carry-chain spec. Level 3 rewrites the raw limb postcondition into
evmWordIs (sp + 32) (a + b)— abstract EVM semantics visible to callers.
leantheorem evm_add_stack_spec_within (sp base : Word) (a b : EvmWord) ... :
cpsTripleWithin 30 base (base + 120) (evm_add_code base)
((.x12 ↦ᵣ sp) ** ... ** evmWordIs sp a ** evmWordIs (sp + 32) b)
((.x12 ↦ᵣ (sp + 32)) ** ... ** evmWordIs sp a ** evmWordIs (sp + 32) (a + b))
-
The frame rule is baked into the triple definition. In textbook separation logic the frame rule is a separate inference step. Here
cpsTripleWithinquantifies over an arbitrary frameRinternally — every spec says only what the code touches, and unchanged heap, registers, and code are automatically preserved. Each instruction is stated and proved once; callers never re-prove frame conditions.
leandef cpsTripleWithin (nSteps : Nat) (entry exit_ : Word) (cr : CodeReq)
(P Q : Assertion) : Prop :=
∀ (R : Assertion), R.pcFree → ∀ s, cr.SatisfiedBy s →
(P ** R).holdsFor s → s.pc = entry →
∃ k, k ≤ nSteps ∧ ∃ s', stepN k s = some s' ∧
s'.pc = exit_ ∧ (Q ** R).holdsFor s'
-
Proof as a compiler — the AI co-routine. AI agents write both the assembly subroutine and its Lean proof in the same session. If the proof fails to type-check, the agent knows the code or the spec is wrong — before any test is run, before any deployment. The proof failure is the bug report. This co-routine property is what makes 200–600 commits per day plausible: the type checker replaces the entire test-and-fix loop.
-
What this means for Ethereum clients.
| Dimension | geth / reth / besu | evm-asm |
|---|---|---|
| Correctness evidence | Passes ethereum/tests vectors | Machine-checked proof for all inputs |
| Spec | EIPs + Yellow Paper (informal) | Lean types + cpsTripleWithin + pure EL spec |
| Coverage | Test-driven (finite vectors) | Universal (∀ register values, ∀ memory layouts) |
| Trust base | Rust compiler + std | Lean kernel + Sail RISC-V model |
-
Caveats. The DIV/MOD semantic bridge is still in progress — Knuth’s Theorem B (trial quotient overestimates by ≤ 1) is the remaining blocker. 27 files use
native_decideagainst AGENTS.md policy, mostly in RLP round-trip tests over finite concrete values. And the PLAN.md is significantly behind the code: phases listed as future work (interpreter, gas, world state, transactions) already exist inEvm64/andEL/.
evm-smith: What if you can bring Ethereum contracts to final form?
Write EVM bytecode directly and skip HLL(High level language) like Solidity/Vyper directly.
Solvency property: Sum of all users’ balance is less or equal to contract’s ETH balance.
leandef WethInv (σ : AccountMap .EVM) (C : AccountAddress) : Prop :=
storageSum σ C ≤ balanceOf σ C
See Leo’s intro post for more.
Background
-
Repo:
evm-smith— Lean 4 framework for writing raw EVM bytecode and proving safety properties directly against the official Ethereum semantics -
Upstream:
leonardoalt/EVMYulLean— a mechanical encoding of the Ethereum Yellow Paper in Lean 4; EVM-Smith is a thin layer on top -
Language: Lean 4, with Solidity/Vyper comparisons (Foundry test suite)
-
Goal: Machine-checked proofs of EVM bytecode safety — without a compiler in the trust boundary. Bytecode is written as Lean values, and invariants are proved directly against the
EvmYul.stepsemantics -
Scale: 13,441 lines of Lean across 38 files; 1,652 lines of Solidity; 4 worked demos (Add3, Register, WETH, ERC-20)
-
Timeline: April 2026 – present (~6 weeks of active development, rapid ramp from framework skeleton to WETH solvency + ERC-20 refinement)
-
Axioms: 2 EVM-specific axioms (
precompile_preserves_accountMap,lambda_derived_address_ne_C); 0sorrydeclarations -
Investigated version:
eac777c
Highlights
-
The bytecode is the theorem source. Instead of proving “this Solidity source is safe and then trusting the compiler,” you write the bytecode directly as a Lean
Program— a list of(opcode, optional-immediate)pairs — and prove properties about it using the officialEvmYul.stepsemantics as ground truth. No compiler. No FFI. The proof checker is the auditor. -
Per-opcode lemmas as a proof API. Naively unfolding
EvmYul.step(a 60-branch dependent match) repeatedly explodes kernel term size. The solution: cache each opcode branch as a named lemma that closes by a singlerfl. An 8-opcode correctness proof then becomes 8 cheap rewrites plus one arithmetic goal, with every intermediate stack state readable as an explicit cons pattern (a :: b :: rest). -
Fully symbolic proofs: the Add3 example. The Add3 demo reads three calldata words and sums them. The correctness theorem is universally quantified over all
2^256³input combinations simultaneously — no fuzzing, no sampling. A counter-intuitive caveat: the proof cannot reach byte-level return correctness becauseMSTORE/RETURNroutes through anopaquedeclaration in EVMYulLean, irreducible by design. -
Cross-transaction balance monotonicity from 20 bytes of bytecode. The Register demo proves that Register’s ETH balance is monotonically non-decreasing across any Ethereum transaction — including arbitrary reentrancy. The key step is
bytecodePreservesBalance: given Register’s bytecode is installed, no Ξ-frame decreases the balance (the only CALL has value 0 hardcoded). This per-bytecode fact is plugged intoΥ_balanceOf_ge, which handles the full call tree including reentrancy, CREATE, SELFDESTRUCT, and gas accounting. The hardest part was threading the invariant through Ethereum’s full transaction driver, not the bytecode itself. -
WETH solvency proven end-to-end. The WETH demo proves
storageSum σ C ≤ balanceOf σ Cis preserved across any transaction. Getting there required 11,332 lines in two proof files and aWethTracepredicate with 64 disjuncts enumerating every reachable(pc, stack-shape, storage-facts, invariant-slack)combination. Each SSTORE/CALL transition is shown to leave slack unchanged: Deposit and Withdraw both net zero change tobalanceOf − storageSum. -
Type system as optimization validator. The ERC-20 demo investigates replacing Solidity’s
storage[keccak256(addr ++ slot_id)]withstorage[~addr](bitwise NOT), saving ~8 opcodes per balance read/write. The framework requires discharging two obligations — injectivity and disjointness from metadata slots — as proof fields in aSlotAbstractionstructure. The naïveidoptimization is silently rejected at definition time:id 0 = 0is a named slot, sodisjointcannot be proved and the structure cannot be instantiated. -
Two axioms, zero sorries, one trust boundary. The entire codebase rests on exactly two EVM-specific axioms:
precompile_preserves_accountMap(provable by case inspection; deferred) andlambda_derived_address_ne_C(equivalent to Keccak-256 collision resistance). Any reader can runlake buildand watch the kernel verify every proof independently. -
Caveats. ERC-20 proofs are peephole-local: they verify opcode-sequence equivalence for the same address but do not prove that compiled Solidity/Vyper bytecode matches the hand-rolled sequences (that correspondence is verified by disassembly, not proof). The structural hypotheses in
WethAssumptionsandRegSDExclusion— deployment pinning, SELFDESTRUCT exclusion — are explicit preconditions the deployer must discharge against actual chain state. The proofs are conditional, not unconditional.