Mock fixtures

Run run_demo_001

Inspect what happened in this batch — every persona, prompt, error, and recommendation.

Surface

Design the product so insight emerges fast: status first, repeated patterns second, raw evidence last.

Mode

High-signal operator view for debugging agent behavior, docs quality, and product adoption risk.

Back to all runs

Status

succeeded

Created May 16, 12:00 PM

Batch

gauntlet_203

Top blame: agent

Success rate

67%

6/9 tasks succeeded

Failures

Across all personas/prompts

Task status

succeeded6 · 67%

failed3 · 33%

Error class

clean-success6 · 67%

sdk-usage2 · 22%

reasoning/recovery1 · 11%

Blame surface

agent2 · 67%

docs1 · 33%

Overview

Run id: run_demo_001

Project: demo-project

Workflow: —

Batch: gauntlet_203

Started: May 16, 12:00 PM

Finished: May 16, 12:03 PM

Progress

{
  "current": 9,
  "total": 9,
  "message": "done"
}

Per-prompt results (3)

Use the analyst view below to compare all personas, isolate repeated failure patterns, and inspect one result at a time without wading through raw JSON.

There are 3 persona results in this batch across 3 personas.

Issue groups (2)

SDK method-path mismatch

×3

Agents emitted Steel.scrape instead of the tool's expected method path.

blame: agentseverity: high

Repeated grounding loop

×2

Some personas re-grounded instead of executing after docs retrieval.

blame: agentseverity: medium

Recommendations (2)

Normalize SDK method-path variants

high

Map common SDK shapes onto the expected runtime contract.

Owner: gauntlet

Tighten finalization checks

medium

Require evidence-backed extraction answers before finalizing a run.

Owner: gauntlet

Analyst view

What happened and what matters

This view is structured for triage: understand the batch outcome, compare personas quickly, inspect repeated failures, and only then drop into raw evidence.

Run statussucceeded

CreatedMay 16, 12:00 PM

FinishedMay 16, 12:03 PM

Batchgauntlet_203

Top takeaways

Insight 1

6 of 9 persona runs completed successfully (67%).

Insight 2

3 persona runs failed. Start with the failed cells in the matrix to inspect the exact breakpoints.

Insight 3

Most repeated issue: SDK method-path mismatch (3 occurrences).

Insight 4

Highest-leverage fix surfaced by the report: Normalize SDK method-path variants.

Immediate actions

Normalize SDK method-path variants

high

Map common SDK shapes onto the expected runtime contract.

Owner: gauntlet

Tighten finalization checks

medium

Require evidence-backed extraction answers before finalizing a run.

Owner: gauntlet

Most repeated failures

SDK method-path mismatch

×3

Agents emitted Steel.scrape instead of the tool's expected method path.

Repeated grounding loop

×2

Some personas re-grounded instead of executing after docs retrieval.

Cross-agent patterns

No cross-agent patterns were recorded.

Execution notes

Project id: demo-project

Workflow id: —

Progress message: done