CX — Conduit Patch Experiment

Purpose

Cross-validation of the GS claim that a well-specified codebase is more patchable than an unspecified one. Inspired by SWE-bench’s patch-application methodology but scoped to a single domain (RealWorld Conduit) with two quality tiers already characterized by BX.

Design

Conditions:

v7 — GS treatment-v7, rubric score 13/14, Hurl 13/13, 146 tests
repo-b — gothinkster/node-express-realworld-example-app, rubric score 7/14, 27 tests (BX repo B)

Task structure: 5 patch tasks, one per SWE-bench structural class. Each task is presented as an issue description to Claude. Gate: patch compiles + relevant tests pass. No rubric involved.

Hypothesis: v7 passes more patch tasks than repo-b because GS navigability properties (Self-describing, Bounded, Composable) reduce the AI’s ambiguity about where to make changes.

SWE-bench Class Mapping

Task	Class	GS property tested
CX-1	Multi-file propagation	Bounded + Composable
CX-2	Contract mismatch	Verifiable
CX-3	Missing error case	Executable
CX-4	Config/constant extraction	Bounded
CX-5	Validation rule addition	Verifiable

Tasks

See tasks/ directory for individual issue descriptions.

Results

See results/v7/ and results/repo-b/ for patch outputs and pass/fail per task.