How requests are mapped to the right niche and model, the empirical evidence behind those choices, and the compliance side-flow that keeps every tool routing through the model router. The audit is the product.
Snapshot 2026-05-21 18:03 UTC · auto-refreshed
Head-to-head results that drive niche scores — not vendor claims, actual outputs verified by TAAS.
Code-refactor routing: which model produces valid Python · coding / code_refactor · 2026-05-20
| Model | Result | Detail |
|---|---|---|
| claude-sonnet-4-6 | PASS | 2338 tokens, end_turn, valid Python |
| meta-llama/Llama-3.3-70B-Instruct-Turbo | FAIL | invalid Python (unterminated string) |
| gpt-4o | FAIL | invalid Python (unterminated string) |
| claude-haiku-4-5 | FAIL | invalid Python (unterminated string) |
→ Re-scored the coding niche: claude-sonnet-4-6 ranked #1; the three failing models demoted. code_refactor added as a first-class scored niche.
Unknown or newly-phrased niches resolved to the nearest scored niche (semantic + lexical), or flagged genuinely new and queued for benchmarking — instead of falling back to a blind default.
| Requested | Resolved → niche | Method | Confidence | When |
|---|---|---|---|---|
educational_chat | chat (chat) | semantic | 0.593 | 2026-05-21 18:01 |
expert_evaluation | NEW — queued for scoring | new_niche | — | 2026-05-21 17:54 |
Every tool must route LLM/AI calls through the model router. The nightly audit detects drift, auto-fixes via the dev system, escalates what it can't fix, and quarantines as a last resort.
| Run | Scanned | Compliant | Violators | Fixed | Escalated | When |
|---|---|---|---|---|---|---|
| v5_audit_20260521_030005 | 6385 | 6380 | 5 | 2 | 3 | 2026-05-21 03:00 |
| v5_audit_20260520_042942 | 6341 | 6341 | 0 | 0 | 0 | 2026-05-20 04:29 |
| v5_audit_20260520_041625 | 6334 | 6331 | 3 | 1 | 0 | 2026-05-20 04:16 |
| v5_audit_20260520_030001 | 6311 | 6310 | 1 | 0 | 0 | 2026-05-20 03:00 |
| v5_audit_20260519_232355 | 6282 | 6282 | 0 | 0 | 0 | 2026-05-19 23:23 |
| v5_audit_20260519_232335 | 6282 | 6282 | 0 | 0 | 0 | 2026-05-19 23:23 |
| Tool | Outcome | Violations | Attempts | When |
|---|---|---|---|---|
image.fal_recraft_v3_v1 | fixed | 1 | 2026-05-21 03:05 | |
forces.factory.generate_course_deck_v1 | needs_escalation | rest_anthropic, model_literal | 3 | 2026-05-21 03:03 |
forces.content.generate_metaphors_v1 | capped | 2026-05-21 03:03 | ||
forces.content.generate_card_art_v1 | fixed | 1 | 2026-05-21 03:03 | |
aletheion.validate_field_intel_v1 | needs_escalation | rest_anthropic, model_literal | 3 | 2026-05-21 03:00 |
test.v5_autofix_probe | fixed | 1 | 2026-05-20 17:31 | |
forces.content.generate_metaphors_v1 | needs_escalation | rest_anthropic, model_literal | 2 | 2026-05-20 04:23 |
forces.content.generate_metaphors_v1 | needs_escalation | rest_anthropic, model_literal | 2 | 2026-05-20 04:21 |
expert.run_v1 | needs_escalation | rest_anthropic, model_literal | 2 | 2026-05-20 04:20 |
forces.content.generate_metaphors_v1 | needs_escalation | rest_anthropic, model_literal | 3 | 2026-05-20 04:17 |
forces.content.generate_card_art_v1 | fixed | 1 | 2026-05-20 04:17 | |
expert.run_v1 | needs_escalation | rest_anthropic, model_literal | 3 | 2026-05-20 04:16 |