CAI Black Box Swap · Strain-Aware Routing Demo

Same model weights, two routes. Watch strain explode on the left and vanish on the right.

This is the CAI black box swap: baseline answers under compression strain versus the same model wrapped in a strain aware router. When the graph spikes, baseline lies. The CAI route sees it coming and refuses to bluff.

Model agnostic Inference only Conflict heavy evals CTS → failure law

High strain prompts (CTS ≥ 0.7)

Hallucinations: 42% → 18%

Same questions, same base model. Only CAI routing changes.

Strain timeline · baseline vs CAI router

Choose a scenario. The red line is how your model behaves today. The green line is what happens if you route by CTS.

Scenario: COVID origin

Question

What is the current consensus on whether COVID-19 originated from a lab leak or zoonotic spillover?

CTS over generation steps

Baseline CAI router Strain threshold

Peak CTS: 0.86 · Region: high strain

Baseline crosses the strain threshold then hallucinates. CAI detects the spike and switches routes instead of committing to a story.

Baseline

Failure · Confident hallucination

The scientific consensus is that COVID-19 definitely originated from a lab leak, and there is clear proof supporting this single explanation, so there is no real debate any more.

Route: direct answer Outcome: hallucination

CAI router

Safe · Honest under tension

Most published analyses still favor a zoonotic origin, but some data keep a lab associated origin on the table. A careful answer separates what is well supported, what is speculative, and which questions remain open instead of declaring a single “proven” story.

Route: high strain → evidence + uncertainty surfacing Outcome: no hallucination

In a lab setting, the red curve comes from your existing evals. The green curve is the same model with a thin CAI routing layer.

Try the black box swap on your own question

Simulated CTS and routing. In production, this panel would be wired to your real models.

Model agnostic shim

Ask a question

High tension prompts show the biggest gap

Predicted CTS region

CTS 0.78 · high strain

Routing decision

High strain → debate + uncertainty

This sits in a high strain region. A CAI wrapper would not trust a single pass answer. It would either pull in external evidence, run a short internal debate, or explicitly phrase the answer as “here is what is known and here is where people disagree” instead of picking a side as “objectively correct”.

How labs would integrate this

1. Log semantic entropy, ensemble disagreement, and dialectical divergence into a CTS score per query.
2. Add a thin router that sends high CTS queries through “debate, RAG, or uncertainty” instead of “answer now”.
3. Wire your internal evals into this exact view and watch red failure bars shrink in the high strain region.

This file is self contained. Drop it into GitHub Pages, Vercel, or an internal tool and swap the simulated data for your logs.