Xtended Blog

Why Traditional Metrics Fail

Standard application metrics:

Request latency
Error rates
Throughput
Uptime

These tell you if the system is running. They don't tell you if the agent is working.

// A "successful" request by traditional metrics
{
  status: 200,
  latency: 450ms,
  response: "I don't have enough context to answer that question."
}

// System worked perfectly. Agent was useless.

The New Metrics Stack

1. Context Quality Metrics

How good is the information the agent has access to?

{
  "context_coverage": {
    "description": "% of queries with sufficient context",
    "target": ">90%",
    "current": "78%",
    "trend": "improving"
  },
  "context_freshness": {
    "description": "Average age of retrieved context",
    "target": "<24 hours for active entities",
    "current": "18 hours",
    "stale_percentage": "12%"
  },
  "context_depth": {
    "description": "Average records per entity",
    "target": ">5 for key entities",
    "current": "3.2",
    "sparse_entities": 145
  }
}

2. Retrieval Accuracy

Is the agent finding the right context?

{
  "retrieval_metrics": {
    "precision": {
      "description": "% of retrieved context that was relevant",
      "value": 0.82
    },
    "recall": {
      "description": "% of relevant context that was retrieved",
      "value": 0.71
    },
    "empty_retrievals": {
      "description": "Queries that returned no context",
      "rate": "8%",
      "top_patterns": ["new customer queries", "historical comparisons"]
    }
  }
}

3. Action Success Rates

When the agent does something, does it work?

{
  "action_metrics": {
    "suggestions_accepted": {
      "rate": "67%",
      "by_type": {
        "email_drafts": "78%",
        "meeting_scheduling": "82%",
        "data_updates": "45%"  // Problem area
      }
    },
    "actions_completed": {
      "rate": "89%",
      "failure_reasons": {
        "permission_denied": "5%",
        "invalid_data": "4%",
        "timeout": "2%"
      }
    },
    "actions_reverted": {
      "rate": "3%",
      "indicates": "User disagreed with action"
    }
  }
}

User Satisfaction Signals

Implicit Signals

// What users DO tells you more than what they SAY
{
  "regeneration_rate": {
    "description": "How often users ask agent to try again",
    "value": "12%",
    "high_regen_queries": ["complex analysis", "forecasting"]
  },
  "abandonment_rate": {
    "description": "Started interaction, didn't finish",
    "value": "18%",
    "abandonment_points": ["after first response", "during multi-step"]
  },
  "edit_rate": {
    "description": "User modified agent output before using",
    "value": "34%",
    "heavy_edit_types": ["formal emails", "customer communications"]
  }
}

Explicit Signals

// Direct feedback (use sparingly - fatigue is real)
{
  "thumbs_up_rate": "71%",
  "feedback_by_category": {
    "helpful": 68,
    "not_helpful": 12,
    "wrong": 8,
    "needs_improvement": 12
  },
  "feedback_participation": "15%"  // Most users don't rate
}

Cost and Efficiency Metrics

{
  "cost_metrics": {
    "cost_per_successful_interaction": {
      "value": "$0.023",
      "breakdown": {
        "context_retrieval": "$0.003",
        "llm_inference": "$0.018",
        "action_execution": "$0.002"
      }
    },
    "cost_per_user_per_month": {
      "average": "$4.50",
      "median": "$2.10",
      "p99": "$45.00"  // Power users
    },
    "cost_efficiency": {
      "description": "Value generated per dollar spent",
      "estimated_time_saved": "15 min/user/day",
      "roi_multiple": "12x"
    }
  }
}

Building Your Dashboard

Real-Time View

// What ops needs to see NOW
{
  "current_status": {
    "active_agent_sessions": 1420,
    "avg_response_time": "1.2s",
    "error_rate": "0.3%",
    "context_hit_rate": "92%"
  },
  "last_hour": {
    "requests": 8540,
    "successful": 8489,
    "context_insufficient": 320,
    "hallucinations_detected": 12
  }
}

Daily Health Check

// Morning review metrics
{
  "yesterday_summary": {
    "total_interactions": 45000,
    "success_rate": "94%",
    "user_satisfaction": "4.2/5",
    "cost": "$1,035"
  },
  "trends": {
    "satisfaction": "up 3% WoW",
    "cost_per_interaction": "down 8% WoW",
    "context_quality": "stable"
  },
  "issues_requiring_attention": [
    "Retrieval accuracy for 'competitor analysis' queries down 15%",
    "High regeneration rate for email drafts (22%)"
  ]
}

Weekly Deep Dive

// Strategic review
{
  "context_health": {
    "total_records": 2400000,
    "added_this_week": 45000,
    "stale_records": 12000,
    "orphaned_records": 340
  },
  "capability_gaps": [
    "Users asking for features agent can't do",
    "Top requests: 'compare to last year', 'predict next quarter'"
  ],
  "improvement_opportunities": [
    "Add historical comparison context",
    "Improve email draft templates"
  ]
}

Alerting on the Right Things

// What to page on at 3am
alerts: {
  critical: [
    "Context retrieval failure rate > 20%",
    "Hallucination rate > 5%",
    "Error rate > 10%"
  ],
  warning: [
    "User satisfaction drop > 10% day-over-day",
    "Cost spike > 50% above baseline",
    "Context staleness > 48 hours for active entities"
  ],
  info: [
    "New capability requests trending",
    "Context coverage gaps identified"
  ]
}

Getting Started

Instrument context retrieval. Log every retrieval with query, results, and relevance scores.
Track action outcomes. Did the suggested action get accepted? Completed? Reverted?
Add implicit signals. Regeneration, abandonment, and edit rates tell the real story.
Calculate true costs. Per-interaction, per-user, per-outcome.
Build feedback loops. Connect metrics to context improvements.

The companies that win with agents will be the ones that know exactly how their agents are performing—and continuously improve them.

Agent Analytics: Metrics That Matter for AI-Assisted Operations