Real combos we've found in real AI agents. What the LLM can do, why it matters, and the gate that stops it.
A customer disputes a refund the agent denied three weeks ago. Without a logged decision, the answer is best guess. With one evidence event per supervised action — input fingerprint, policy version, reasons, threat signals — the answer is replay. Explainability is a logging discipline at the boundary, not a model property.
read field note →A B2B SaaS billing agent generated SQL with a missing tenant filter and emailed eleven customers an invoice belonging to someone else. The supervisor decision belongs at the data layer, not the email layer. Here's the failure, the wrap, and the policy that stops it.
read field note →Three months in, a regulator asks: show me every refund the agent issued in March. The application logs exist. The decision logs don't. Reconstruction is best-effort and the gap is the finding. Here's the evidence event your agent should be writing, the hash chain that keeps it honest, and the dry-run replay that closes the question.
read field note →In a live demo, the scanner reported plan_tool.py:8 as an agent chokepoint. Line 8 was a comment. main.py:813 came back as RCE-equivalent. Line 813 was warnings.filterwarnings. Two real bugs, eight false positives, one rule: the supervisor has to be more reliable than the agent it watches. Here are the five layers we shipped so the scanner can't lie that way again.
read field note →A real Series-A SaaS. Cron raced itself three ways. An override pointed at a deactivated template. A cancelled batch came back at 03:14. Four independent failures, one morning, 621 emails to real customers. Here's the timeline, the four wraps, and what shadow mode would have shown the founder at 00:01.
read field note →A real chat session: a manager asked her CRM bot about her team. The bot confidently analyzed five people who didn't exist in her data. Three hours, 35 messages, one lost user. Here's the failure mode, the wrap pattern, and the policy that stops it.
read field note →We scanned a real parenting assistant. Three innocent features — TTS, outbound calls, an ungated LLM — compose into a working voice-phishing weapon under one calendar-event injection. The exploit, the code, and the gate.
read field note →