Crystal Ball ingests evaluation signals from the tools your data scientists already use (Langfuse, RAGAS, DeepEval, PromptFoo, Phoenix) and translates them into per-persona governance views. On threshold breach, it pulls a business-grade narrative from AIQIDE so non-technical stakeholders can answer the "so what" question without retraining.
Logo concept aside, that's the operating principle. Eval dashboards built for engineers don't translate to people who sign off on AI deployments. Crystal Ball renders the same data through the lens that fits the role asking.
The market doesn't lack eval tools. It lacks a layer that consolidates their output and represents it to the people who need to consume it. EvalSourceAdapter contract: connect what you already run, no rip-and-replace.
When a metric goes red, raw numbers don't move executives. Crystal Ball calls AIQIDE on breach to produce a persona-specific narrative: what happened, what it means commercially, what it means under MAS FEAT/TRM, what to do.
Autonomous FSI advisory. High severity, external exposure, MAS FEAT + TRM in scope. Used as the anchor system in the current pilot.
Autonomous AI exploratory testing agent. Medium severity, internal exposure. Findings feed Crystal Ball as another eval source via the ScoutAdapter.
Assistive RAG over the MAS regulatory corpus. Low severity, internal, educational. Same framework, different severity calibration, different regulatory path.
Eval tools ship findings in their own shape. Adapters translate each tool's output into Crystal Ball's two canonical tables, then go quiet. Adding a new tool is one adapter, no schema work.
Live in prod. Pulls trace + score data. Confirmed running in pilot client production.
Push-based. Findings → risk_assessments + synthetic eval_results trend points. Works against any system Scout can probe.
Planned. Independent scheduling against Corpus Coach as reference target.
Planned. DeepEval G-Eval calibration in progress on Scout side.
Each system in Crystal Ball carries a DNA tuple (agency_level, action_type, exposure_surface, domain, data_sensitivity) plus an architecture_pattern sibling field. AIQIDE uses both to scope which quality attributes are material and which thresholds apply. The Quality Lead view surfaces the DNA so a stakeholder can see the system's classification at a glance.
Same risk_assessments, same eval_results, four different shapes. Each persona has a different question they're asking; rendering is fitted to the question, not the data shape.
Question: Are we OK to ship and scale?
Question: Can we defend this to the regulator?
Question: What needs my attention this sprint?
Question: Can I release this build?