00Evidence — proof, not promise

Show your work.

Every clinical-decision-support tool claims accuracy. Few publish their eval portfolio, calibration error, threshold-sensitivity analysis, or model-disagreement rate. Verum runs every change through a 30-case fixture suite and surfaces the numbers below — moveable, inspectable, and per-case visible.

01Eval portfolio · 30 synthetic cases

Thirty cases.
28 pass, 2 fail.

Each tile is a synthetic prior-auth case with a hand-curated expected verdict. Green = engine matched expected. Red = mismatch (a true regression you'd want to investigate). Click any tile for the case detail.

Cases evaluated

across 4 procedures × 3 payers

Pass rate

0.0%

28/30 match expected verdict

Expected calibration error

0.00

lower is better · target ≤ 0.05

P50 latency

0.0s

full pipeline · cached < 80ms

02Policy simulator

Move the threshold.
Watch the portfolio shift.

The conservative-therapy threshold is ≥ 6 weeks by current Medicare LCD. Drag the slider to see how raising or lowering it would shift the 30-case portfolio between approve, pend, and deny outcomes — instantly, in pure code, using the same deterministic gate as the production engine.

conservativeTx threshold

6.0weeks

LCD baseline · 6.0

12w

Approve

17/ 30

56.7% of portfolio

Pend

5/ 30

16.7% of portfolio

Deny

8/ 30

26.7% of portfolio

At the LCD baseline of 6 weeks, the portfolio is dominated by approvals — the engine matches expected verdicts on 28/30 cases.

03A/B model disagreement

When models disagree,
the gate decides.

Two LLM providers, same case, same prompt. They sometimes disagree on edge cases. The deterministic gate evaluates both extractions against the same encoded rules and produces one verdict — even when the upstream extractions vary. This is the core robustness claim: probabilistic intake, deterministic verdict.

Scenario

Lumbar MRI · 5.5 weeks PT · ODI 32 · no red flags

ground truthpend

Groq · gpt-oss-120b

DENY

conf 0.74

✗ off

“Conservative therapy below the 6-week threshold; recommend additional documentation.”

Anthropic · claude-sonnet-4.6

PEND

conf 0.68

✓ matches ground

“Borderline conservative-therapy duration. Weight ODI ≥ 30 favorably; pend for additional records.”

Disagreement detected — the deterministic gate breaks the tie.

04Coverage matrix

What's encoded.
What's not.

Two ontology cells (lumbar Medicare 2026, cervical Medicare 2026) are flagship REAL — encoded verbatim from CMS LCD L34220 and Aetna CPB 0236, cross-validated against Carelon Imaging of the Spine. The other ten cells are synthetic- representative, flagged in their sourceDocument field. No hidden mocks, no overstated coverage.

Policy version · 202612 cells

Procedure	Medicare	Commercial	Medicaid
Lumbar MRI	REAL · LCD L34220	SYN · marked	SYN · marked
Cervical MRI	REAL · LCD L34220	SYN · marked	SYN · marked
Brain MRI	SYN · marked	SYN · marked	SYN · marked
CT Abd/Pelvis	SYN · marked	SYN · marked	SYN · marked

Policy version · 202412 cells

Procedure	Medicare	Commercial	Medicaid
Lumbar MRI	SYN · marked	SYN · marked	SYN · marked
Cervical MRI	SYN · marked	SYN · marked	SYN · marked
Brain MRI	SYN · marked	SYN · marked	SYN · marked
CT Abd/Pelvis	SYN · marked	SYN · marked	SYN · marked

Flagship REAL — verbatim from cited policy

Synthetic — flagged in sourceDocument

—Run it yourself

Show your work.

Thirty cases.28 pass, 2 fail.

Move the threshold.Watch the portfolio shift.

When models disagree,the gate decides.

What's encoded.What's not.

Move a threshold. Watch the portfolio shift.

Thirty cases.
28 pass, 2 fail.

Move the threshold.
Watch the portfolio shift.

When models disagree,
the gate decides.

What's encoded.
What's not.