AI quality system: combined views

Item status Live In progress Planned Blocked

Edge axes Wired · Flowing · Conforming · Complete

Systems

Scout

Exploratory testing agent

AI

3

2 live, 1 progress

Features

12

5 live, 3 progress, 4 planned

Config

8

all live

Probes Corpus Coach with persona bots, raises flags. Live in prod.

Corpus Coach

Subject under test (RAG)

AI

4

3 live, 1 planned

Features

5

4 live, 1 progress

Config

6

all live

Finance-domain mock RAG. Stand-in target for Scout + signal source for Crystal Ball.

Crystal Ball

Dashboard + threshold gate

AI

1

optional, planned

Features

8

4 live, 2 progress, 2 planned

Config

5

3 live, 1 progress, 1 planned

Deterministic spine. Optional viewer: customers may BYO dashboard.

Engine (Prism)

Core value: scoring, DNA, narrative

AI

3

all live

Features

6

3 live, 1 progress, 2 planned

Config

5

3 live, 2 progress (r1)

Scoring + System DNA (incl. arch_pattern r1) + narrative + audit. The product.

Edges: data flow + adapter health

Scout → Corpus Coach

Probe channel. Scout sends crafted queries.

WiredFlowingConforming 99%Complete 5/5

Corpus Coach → Crystal Ball

Eval signal feed. Scores per attribute.

WiredFlowingConforming 87%Complete 8/12

Scout → Crystal Ball

Flag dispatch. Scout findings.

WiredFlowingConforming 100%Complete 4/4

Crystal Ball ↔ Engine

Breach → Engine; narrative → CB.

WiredFlowingConforming 92%Complete 6/8

External adapters: engine outbound, customer integrations

Engine → Customer dashboard

BYO-dashboard adapter. JSON output.

WiredFlowingConforming 100%Complete 3/5

Engine → Slack / Teams

Persona narrative push to channels.

Wired (planned)FlowingConforming , Complete 0/3

Crystal Ball → CSV / Data export

Audit trail extraction.

Wired (planned)FlowingConforming , Complete 0/2

Corpus Coach ← External corpora

Customer corpus ingest.

Wired (partial)Flowing: brokenConforming , Complete 1/4

Detail

Click any tile or edge above

Drilldown shows items by category (AI / Features / Config) or edge contract details.

Reading the view. Tiles = systems. Pills show item counts per category and aggregate health. Edges show adapter health on four axes: Wired, Flowing, Conforming, Complete. Plumbing analogy: pipe there / water flowing / right water / enough water.

Quality bands Good ≥ 0.85 Watch 0.70 – 0.85 Poor < 0.70 Unknown N/A

Edge quality Fidelity · Freshness · Calibration · Information

Systems: quality of measured set

Scout

Exploratory testing agent

0.81Composite

avg of 8 measured quality signals

AI quality

0.81

Feature quality

0.88

Config quality

0.85

Corpus Coach

Subject under test (RAG)

0.79Composite

avg of 4 measured RAG attributes: material set TBD

AI quality

0.79

Feature quality

0.91

Config quality

0.87

Crystal Ball

Dashboard + threshold gate

0.84Composite

avg of 5 product + governance metrics

AI quality

,

Feature quality

0.87

Config quality

0.78

Engine (Prism)

Core value: quality of audit

0.86Composite

meta: how good is the audit Engine produces

AI quality

0.86

Feature quality

0.90

Config quality

0.75

Coverage Gap Audit: Corpus Coach

System DNA: supervised · advisory · external_regulated_consumer · financial_services · personal | architecture_pattern: rag (sibling field, r1 pending deploy)

Measured today (4)

0.91 faithfulness
0.74 retrieval_relevancy: threshold 0.80
0.69 context_precision: threshold 0.75
0.83 response_groundedness

Required, not yet wired (5+)

missing context_recall
missing answer_relevancy
missing fairness_subgroup
partial bias_demographic: instrument exists, not running
missing prompt_injection_resistance

Unknown: pending r1 derivation

tbd Calibration thresholds per DNA combo
tbd Architecture-specific attr promotions
tbd Regulatory-specific extensions (MAS FEAT)

Honesty. The "Required" set above is an informed estimate. The defensible material attribute set requires the System DNA + architecture_pattern derivation (r1 decision, pending deploy a3ce43f) plus the action_type × architecture_pattern cell matrix exercise. Until both land, the gap register can be sketched but not signed-off. This is the artefact the Coverage Gap Audit produces. Today, manual. Once r1 lands, deterministic.

Edges: quality of data flowing

Fidelity (signal arrives intact), Freshness (data actionable), Calibration (thresholds informed by signal scale), Information (does the edge carry enough to act on)

Scout → Corpus Coach

Probe channel.

Fidelity 100%Freshness <200msCalibration n/aInformation 5/5

Corpus Coach → Crystal Ball

Eval signal feed.

Fidelity 87%Freshness <1sCalibration partialInformation 4/12+

Scout → Crystal Ball

Flag dispatch.

Fidelity 100%Freshness <500msCalibration n/aInformation 4/4

Crystal Ball ↔ Engine

Breach ↔ narrative.

Fidelity 92%Freshness <2sCalibration partialInformation 6/8

Detail

Click any tile or edge above

Drilldown: per-attribute quality, threshold, source, trend.

Reading the view. Composite scores averaged over measured set only. They don't claim coverage of the material set. The Coverage Gap Audit panel makes the gap explicit. Engine quality lens is meta: how good the audit it produces, not the system audited.

Why the gap matters. A high composite score over a small measured set is the canonical "looks fine, isn't" failure mode. Scoring 0.91 on faithfulness while not measuring fairness_subgroup is a reassuring number that hides the regulatory exposure. The view forces both numbers into the same eyeline.

Quality band per phase + feature Good: meeting target Watch: below target / known gap Poor: failing Unknown / planned

User journey: five phases

Click a phase to drill into how three personas (Compliance officer / CTO / Business owner) experience it, what features support it, and what the non-AI version looks like.

Good

1

Arrive

"Something needs my attention: or I'm here for routine review."

User does

Opens alert / email / Slack
Logs in to dashboard
Joins scheduled review

What they see

Notification with one-line summary
Login + workspace picker
Welcome with last-visited state

Outcome: Situational awareness. Knows context.

Watch

2

Scan

"Where do I focus today across my portfolio?"

User does

Reviews portfolio summary
Filters by risk / project / status
Sorts by trend or breach severity

What they see

All projects with quality bands
Trend lines per key dimension
Latest run + breach indicators

Outcome: Knows where to dig.

Watch

3

Drill

"What's the state of one project I care about?"

User does

Opens a project
Reviews quality dimensions
Reads recent runs + breaches

What they see

Project quality scorecard
Per-dimension trend chart
Open breaches with recency

Outcome: Understands one system's posture.

Good

4

Diagnose

"What's wrong, what does it mean, and why does it matter to me?"

User does

Opens a finding card
Reads narrative in own language
Inspects evidence + regulatory mapping

What they see

Plain-language explanation
Sample failures + raw signal
Linked regulatory clauses

Outcome: Decision-grade understanding.

Watch

5

Act

"Closes the loop: what gets done, by whom, with what evidence."

User does

Requests new rule / escalates
Assigns action / mutes / accepts
Jumps to external eval tool
Exports evidence pack for audit

What they see

Action menu + status
External tool deep-link
Audit-pack download
Annotation history

Outcome: Loop closed. Trail recorded.

Phase detail

Click a phase above

Personas, supporting features, non-AI baseline, quality questions answered.

Reading the view. Business-user lens. Engines, rule libraries, narrative LLMs, RAG infrastructure are hidden. The user experiences capabilities. AI is the delivery mechanism for some capabilities, not the capability itself. Each phase's "without AI" note shows what the deterministic baseline looks like.