00Evidence — proof, not promise

Show your work.

Every clinical-decision-support tool claims accuracy. Few publish their eval portfolio, calibration error, threshold-sensitivity analysis, or model-disagreement rate. Verum runs every change through a 30-case fixture suite and surfaces the numbers below — moveable, inspectable, and per-case visible.

01Eval portfolio · 30 synthetic cases

Thirty cases.
28 pass, 2 fail.

Each tile is a synthetic prior-auth case with a hand-curated expected verdict. Green = engine matched expected. Red = mismatch (a true regression you'd want to investigate). Click any tile for the case detail.

Cases evaluated
0
across 4 procedures × 3 payers
Pass rate
0.0%
28/30 match expected verdict
Expected calibration error
0.00
lower is better · target ≤ 0.05
P50 latency
0.0s
full pipeline · cached < 80ms
02Policy simulator

Move the threshold.
Watch the portfolio shift.

The conservative-therapy threshold is ≥ 6 weeks by current Medicare LCD. Drag the slider to see how raising or lowering it would shift the 30-case portfolio between approve, pend, and deny outcomes — instantly, in pure code, using the same deterministic gate as the production engine.

conservativeTx threshold
6.0weeks
LCD baseline · 6.0
0w
3w
6w
9w
12w
Approve
17/ 30
56.7% of portfolio
Pend
5/ 30
16.7% of portfolio
Deny
8/ 30
26.7% of portfolio
At the LCD baseline of 6 weeks, the portfolio is dominated by approvals — the engine matches expected verdicts on 28/30 cases.
03A/B model disagreement

When models disagree,
the gate decides.

Two LLM providers, same case, same prompt. They sometimes disagree on edge cases. The deterministic gate evaluates both extractions against the same encoded rules and produces one verdict — even when the upstream extractions vary. This is the core robustness claim: probabilistic intake, deterministic verdict.

Scenario
Lumbar MRI · 5.5 weeks PT · ODI 32 · no red flags
ground truthpend
Groq · gpt-oss-120b
DENY
conf 0.74
✗ off

Conservative therapy below the 6-week threshold; recommend additional documentation.

Anthropic · claude-sonnet-4.6
PEND
conf 0.68
✓ matches ground

Borderline conservative-therapy duration. Weight ODI ≥ 30 favorably; pend for additional records.

Disagreement detected — the deterministic gate breaks the tie.
04Coverage matrix

What's encoded.
What's not.

Two ontology cells (lumbar Medicare 2026, cervical Medicare 2026) are flagship REAL — encoded verbatim from CMS LCD L34220 and Aetna CPB 0236, cross-validated against Carelon Imaging of the Spine. The other ten cells are synthetic- representative, flagged in their sourceDocument field. No hidden mocks, no overstated coverage.

Policy version · 202612 cells
ProcedureMedicareCommercialMedicaid
Lumbar MRI
REAL · LCD L34220
SYN · marked
SYN · marked
Cervical MRI
REAL · LCD L34220
SYN · marked
SYN · marked
Brain MRI
SYN · marked
SYN · marked
SYN · marked
CT Abd/Pelvis
SYN · marked
SYN · marked
SYN · marked
Policy version · 202412 cells
ProcedureMedicareCommercialMedicaid
Lumbar MRI
SYN · marked
SYN · marked
SYN · marked
Cervical MRI
SYN · marked
SYN · marked
SYN · marked
Brain MRI
SYN · marked
SYN · marked
SYN · marked
CT Abd/Pelvis
SYN · marked
SYN · marked
SYN · marked
Flagship REAL — verbatim from cited policy
Synthetic — flagged in sourceDocument
Run it yourself

Move a threshold. Watch the portfolio shift.