Mock fixtures
Run run_demo_001
Inspect what happened in this batch — every persona, prompt, error, and recommendation.
Surface
Design the product so insight emerges fast: status first, repeated patterns second, raw evidence last.
Mode
High-signal operator view for debugging agent behavior, docs quality, and product adoption risk.
Status
succeeded
Created May 16, 12:00 PM
Batch
gauntlet_203
Top blame: agent
Success rate
67%
6/9 tasks succeeded
Failures
3
Across all personas/prompts
Task status
succeeded6 · 67%
failed3 · 33%
Error class
clean-success6 · 67%
sdk-usage2 · 22%
reasoning/recovery1 · 11%
Blame surface
agent2 · 67%
docs1 · 33%
Overview
Run id: run_demo_001
Project: demo-project
Workflow: —
Batch: gauntlet_203
Started: May 16, 12:00 PM
Finished: May 16, 12:03 PM
Progress
{
"current": 9,
"total": 9,
"message": "done"
}Per-prompt results (3)
Use the analyst view below to compare all personas, isolate repeated failure patterns, and inspect one result at a time without wading through raw JSON.
There are 3 persona results in this batch across 3 personas.
Issue groups (2)
SDK method-path mismatch
×3Agents emitted Steel.scrape instead of the tool's expected method path.
blame: agentseverity: high
Repeated grounding loop
×2Some personas re-grounded instead of executing after docs retrieval.
blame: agentseverity: medium
Recommendations (2)
Normalize SDK method-path variants
highMap common SDK shapes onto the expected runtime contract.
Owner: gauntlet
Tighten finalization checks
mediumRequire evidence-backed extraction answers before finalizing a run.
Owner: gauntlet
Analyst view
What happened and what matters
This view is structured for triage: understand the batch outcome, compare personas quickly, inspect repeated failures, and only then drop into raw evidence.
Run statussucceeded
CreatedMay 16, 12:00 PM
FinishedMay 16, 12:03 PM
Batchgauntlet_203
Top takeaways
Insight 1
6 of 9 persona runs completed successfully (67%).Insight 2
3 persona runs failed. Start with the failed cells in the matrix to inspect the exact breakpoints.Insight 3
Most repeated issue: SDK method-path mismatch (3 occurrences).Insight 4
Highest-leverage fix surfaced by the report: Normalize SDK method-path variants.Immediate actions
Normalize SDK method-path variants
highMap common SDK shapes onto the expected runtime contract.
Owner: gauntlet
Tighten finalization checks
mediumRequire evidence-backed extraction answers before finalizing a run.
Owner: gauntlet
Most repeated failures
SDK method-path mismatch
×3Agents emitted Steel.scrape instead of the tool's expected method path.
Repeated grounding loop
×2Some personas re-grounded instead of executing after docs retrieval.
Cross-agent patterns
No cross-agent patterns were recorded.
Execution notes
Project id: demo-project
Workflow id: —
Progress message: done