Development Analysis · Feb 3–17, 2026

106 PRs in 14 Days. One Architect.

Output equivalent to a 3–4 person elite engineering team. 85% of builders completed fully autonomously. 20 pre-merge bugs caught by multi-agent review.

Data sources: 26 review files, 106 merged PRs, 105 closed issues, 801 commits, consult stats

106

PRs merged

85%

Fully autonomous

57m

Median build time

Pre-merge bugs caught

Key Metrics

53/week

PRs merged per week

$1.59

Consultation cost per PR

3.4x

Return on investment

100%

Context recovery success rate

Team Equivalent

By PR volume alone, one architect with autonomous builders matched 11–19 individual developers against industry elite benchmarks.

PRs merged / week (Codev)53

PRs / dev / week (elite benchmark)5.0

PRs / dev / week (median benchmark)2.8

Developer-equivalents (elite)~11

SPIR features only (conservative)3–4 elite devs

Sources: LinearB 2026 (8.1M PRs), Worklytics 2025, byteiota 2026. Caveats: solo codebase, no cross-team coordination overhead, single TypeScript project.

Autonomous Builder Performance

22 of 26 builders (85%) completed fully autonomously. The 4 interventions were caused by infrastructure issues, not builder capability gaps.

22/26

Fully autonomous (no intervention)

26/26

Reached PR creation (100%)

The 4 Interventions

0102, 0103 Pre-existing broken tests blocked porch advancement despite correct implementations

0104 Claude consultation timeouts on a 3,700-line file (motivated Spec 0105: Server Decomposition)

0106 Merge artifacts from 40+ file rename (builder detected and self-resolved)

Context Recovery: 100% Success Rate

4 specs exhausted context windows (all had 40+ files or 5+ phases). Every recovery succeeded via porch status.yaml + git history. No builder had to redo completed work.

Spec	Files	Context Windows
0104	74	4 compactions
0105	28 (7 phases)	2
0112	124	3
0126	80+	3

Remaining 22 projects completed within a single context window.

How Fast Builders Ship

Median autonomous implementation time: 57 minutes from first commit to PR creation.

57m

Median (SPIR features)

38h 12m

Total autonomous time

13/24

Completed in <60 min

Bugfix Pipeline

Issue filed → builder spawned → fix implemented → 3-way review → merged. Fully automated end-to-end.

66%

Ship in under 30 min

13m

Median PR→merge time

Bugfix PRs merged

Multi-Agent Review Catches

20 pre-merge bugs caught: 1 security-critical, 8 runtime failures, 11 quality/completeness gaps.

Security-Critical

Socket permissions gap

Codex · Spec 0104

Shellper Unix socket created without restrictive permissions (0600). Any local user could connect.

Runtime Failures (8)

Catch	Spec	Reviewer	Description
Startup race condition	0105	Codex + Gemini	getInstances() returns [] before initInstances() completes, breaking dashboard on startup
body.name truthiness bug	0107	Codex	{ name: "" } treated as reconnect instead of validation error
Nonce placement error	0107	Claude	OAuth nonce placed on authUrl instead of callback, breaking CSRF protection
Pong timeout not armed	0109	Codex	Dead WebSocket connections never cleaned up when ws.ping() throws
stderrClosed value-copy bug	0113	Claude	Boolean was a local copy, never updating the session object
Zero-padded spec matching	0126	Codex	getProjectSummary() failed to match spec files with leading zeros
Bugfix regex extraction	0126	Codex	Issue number extraction didn't handle all bugfix branch naming patterns
Missing workspacePath	0126	Claude	Critical: Dashboard would have been completely non-functional for workspace views

Quality & Completeness (11)

Catch	Spec	Reviewer
gate-status.ts deletion prevented (would break build)	0108	Gemini
Gate transition vs re-request spam	0108	Codex
buildWorktreeLaunchScript side effect	0105	Claude
tower.html rename miss (124-file rename)	0112	Gemini
Stale StatusPanel test assertions	0112	Gemini
Pre-HELLO gating (unauthenticated socket access)	0118	Codex
Workspace scoping for sockets	0118	Codex
Unnecessary templates.test.ts change prevented	0111	Claude
ReconnectRestartOptions missing	0104	Codex
Documentation gaps (INSTALL, MIGRATION, DEPENDENCIES)	0104	Gemini
Secondary race path in /api/state	BF #274	Codex

Reviewer Effectiveness

No single reviewer caught all bugs. Each model's blind spot is another's strength.

Codex

11 unique catches · ~25% FP rate

Security edge cases, test completeness, exhaustive sweeps. Found the socket permissions gap, truthiness bug, and secondary race path that others missed.

Claude

5 catches · ~8% FP rate

Line-by-line traceability, type safety, critical runtime bugs. Caught the two most critical issues: missing workspacePath (would have broken entire dashboard) and stderrClosed value-copy bug.

Gemini

4 catches · ~3% FP rate

Architecture, documentation, build-breaking deletions. Caught the gate-status.ts deletion that would have broken the build and the tower.html rename miss in a 124-file change.

What Escaped Despite Review

16 post-merge escapes (8 code defects, 8 design gaps). Understanding the boundaries of multi-agent review.

Issue	Description	Why Review Missed It
#294	Shellper process leak	Resource lifecycle across process boundaries
#313	Terminal unresponsive under backpressure	Flow control behavior under load
#324	Processes don't survive restart	Pipe FD lifecycle dependency
#319	Duplicate notifications	Event ordering in async pipeline
#335	Notify before review completes	Race between notification and consultation
#336	Worktree changes leak via CWD	Side effect invisible in diff review
#342	Consult subprocesses never exit	Process cleanup in SDK teardown
#341	Orphaned processes accumulate	Resource lifecycle across sessions

Key Pattern

5 of 8 code-defect escapes were process lifecycle issues (shellper leaks, orphaned processes, pipe FD dependencies). These are fundamentally hard to catch via static code review — they require runtime observation of process behavior over time. Spec 0126 (80+ files, 6 phases) alone produced 7 of 16 total escapes, suggesting review effectiveness degrades with spec complexity.

Cost & ROI

Model	Invocations	Avg Duration	Cost	Success Rate
Claude	2,291	8s	$96.69	84%
Codex	613	21s	$70.81	63%
Gemini	211	64s	$1.14*	98%
Total	3,115	12.2h	$168.64*	81%

*Gemini cost tracking was broken during this period (Bug #374). Actual Gemini costs likely 10–50x higher. Corrected estimate: $180–$225 total.

Category	Hours
Savings: Security catches (1)	~10h
Savings: Runtime failures (8)	~12h
Savings: Quality/completeness (11)	~11h
Total Savings	~33h

Overhead: False positive iterations (~36 × 5 min)	~3.0h
Overhead: Consultation wait time (~200 rounds × 2 min)	~6.7h
Total Overhead	~9.7h

Net Value	~23.3 hours
ROI	~3.4x

Conservative floor (halving security estimate): ~28h saved, ~18.3h net, 2.9x ROI. Cost efficiency: $8.43 per catch, $5.11 per hour of engineering time saved.

Raw Output

801

Non-merge commits

2,698

Files changed

+95K

Net lines of code

PR Type	Count	Additions	Deletions	Files
SPIR (feature)	30	+54,049	-22,049	614
Bugfix	59	+11,896	-7,407	410
Other (maintenance, docs)	17	+15,316	-27,310	447
Total	106	+81,261	-56,766	1,471

Test Suite Growth

~845

Tests at period start

~1,368

Tests at period end

+523

Net test growth

What's Working

85% fully autonomous builder completion
100% context recovery via porch state
Phase-gated review catches bugs early
66% of bugfixes ship in under 30 minutes
Reviewer complementarity — no single model catches everything

What Needs Improvement

Codex reads main instead of builder worktree (8+ wasted iterations)
Review effectiveness degrades above ~40 files
Process lifecycle bugs escape static review
No supermajority override when 2/3 reviewers approve

Methodology

All claims backed by specific PR numbers, commit hashes, review file citations, or consult stats output. Data covers 26 SPIR/bugfix projects (Specs 0102–0127, 0350, 0364, bugfix-274, bugfix-324), 106 merged PRs, 105 closed issues, and 801 non-merge commits over Feb 3–17, 2026. Industry benchmarks from LinearB 2026 (8.1M PRs), Worklytics 2025, byteiota 2026. Full source reviews available in the Codev repository .

Want autonomous builders shipping code in your codebase?

Get started with Codev View on GitHub →