CX — Conduit Patch Experiment

Purpose

Cross-validation of the GS claim that a well-specified codebase is more patchable than an unspecified one. Inspired by SWE-bench’s patch-application methodology but scoped to a single domain (RealWorld Conduit) with two quality tiers already characterized by BX.

Design

Conditions:

  • v7 — GS treatment-v7, rubric score 13/14, Hurl 13/13, 146 tests
  • repo-b — gothinkster/node-express-realworld-example-app, rubric score 7/14, 27 tests (BX repo B)

Task structure: 5 patch tasks, one per SWE-bench structural class. Each task is presented as an issue description to Claude. Gate: patch compiles + relevant tests pass. No rubric involved.

Hypothesis: v7 passes more patch tasks than repo-b because GS navigability properties (Self-describing, Bounded, Composable) reduce the AI’s ambiguity about where to make changes.

SWE-bench Class Mapping

Task Class GS property tested
CX-1 Multi-file propagation Bounded + Composable
CX-2 Contract mismatch Verifiable
CX-3 Missing error case Executable
CX-4 Config/constant extraction Bounded
CX-5 Validation rule addition Verifiable

Tasks

See tasks/ directory for individual issue descriptions.

Results

See results/v7/ and results/repo-b/ for patch outputs and pass/fail per task.