I Built pytest for AI Agents — Here's What I Learned
Every AI agent developer knows this pain: You run your agent. It works. You run it again. It doesn't. You have no idea why. You check the logs — there are no logs. You check the cost — $4.20 for a ...

Source: DEV Community
Every AI agent developer knows this pain: You run your agent. It works. You run it again. It doesn't. You have no idea why. You check the logs — there are no logs. You check the cost — $4.20 for a single run. You cry. I got tired of this, so I built AgentProbe — an open-source testing framework for AI agents. Think pytest, but for LLM-powered agents. What it does Record — Capture every LLM call, tool call, and decision your agent makes. Test — Run 35+ built-in assertions: cost limits, quality checks, safety validation, PII detection. Replay — Swap models and compare results. "What happens if I switch from GPT-4o to Claude?" Fuzz — 55 prompt injection attacks built-in. Find vulnerabilities before your users do. The features nobody asked for (but everyone loves) Agent Roast — Run agentprobe roast and get a brutally honest (and funny) analysis of your agent: "Your agent spends money like a drunk sailor at a token store. Cost grade: D" X-Ray Mode — Visualize exactly how your agent thinks,