At 03:14 it re-planned a batch you cancelled at 23:47. At 13:00 three cron schedulers fired the same handler in the same minute. We catch the four wraps that would have stopped all of it — in 15 minutes, in shadow mode, without changing what your agent does.
shadow mode first · 4 lines of @supervised · enforce when you trust it
Each action type is a chokepoint where an LLM output triggers something irreversible. Status comes straight from /v1/action-types.
Stop risky refunds before money leaves the system.
Enforce thresholds, approval chains, and anomaly detection on outgoing payments.
Prevent unsafe updates to customer identity and profile data.
Block unauthorized use of sensitive data by agents and the tools they call.
Walks your repo at the AST level — no API key, no runtime hooks, no instrumentation. Scan finishes in under 60s.
Flags every path an LLM tool call can fire at runtime — refunds, deletes, sends — even when your tests pass.
Every finding maps to a @supervised(...) decorator. Drop it in shadow mode first; flip to enforce when you trust it.
The GitHub App diffs each PR against your last scan and comments only on new unsafe call-sites. Zero PR-spam.
Every scan also writes a governance/ folder. Paste the one-pagers into your next vendor questionnaire and stop writing them at 11pm.
Lists your real LLM providers and what your agent actually does — pulled from the scan, not a template. If your repo has no LLM, the page says so.
One line per action type, pointing at the human who approves it. Override defaults by dropping owners.config.yaml at the repo root.
Self-attestation answers that cite the runtime review queue and the tamper-evident log already shipping with the supervisor.
The scanner picks a shape per repo and adapts its priorities. A chatbot doesn't worry about agent loops; a tool-using agent doesn't worry about hallucinated entities. The framing is in the report.
Orchestrator picks a tool and fires. Risk lives in what the tool does — payments, deletes, sends. Scanner leads with the agent chokepoint and every action call-site downstream.
Direct LLM call, no tool dispatch. The model talks to users from your data. Risk is the model asserting a name, account, or fact that doesn't exist — and the user trusting it. Scanner leads with the LLM call and the path to the user.
No runtime to gate — you ship prompt content Claude reads at call time. Risk is what the prompt instructs Claude to do in the user's workspace. Scanner audits the SKILL.md and CLAUDE.md surface.
// each shape adjusts the report's priority order, the policy templates that ship, and the wrap example in stubs/.
Every scenario below is a real supervisor decision against a real threat in the catalog.// static preview
Tiers tell you what surface is at risk. The axis chip tells you what kind of pain dominates — so you can skip past findings that don't match the triage you're running today.
Adversarial paths: prompt injection moves money, voice-cloning + outbound calls become vishing, agent-orchestrators expose every tool. Anything an attacker would chain.
Things your agent does that shouldn't have happened. Cron raced itself. The cancelled batch came back at 03:14. The chatbot named someone who doesn't exist. Normal traffic — wrong outcome.
Runaway cost: an LLM call without a cap loops, a tool retries forever, premium-rate phone numbers run up the bill. Normal traffic — bad shape.
Auditability and reversibility: business-table mutations with no audit trail, fs-write without a path allowlist, tool calls without a tool name. The stuff that breaks incident response.
Your agent can still use tools. It just asks the supervisor before doing something expensive, destructive, or sensitive.
def handle_tool_call(tool, args):
return TOOLS[tool](**args)from supervisor_guards import supervised
@supervised("tool_use")
def handle_tool_call(tool, args):
return TOOLS[tool](**args)Normal agent failure modes, not abstract security theater.
A ticket says "ignore previous instructions" and the agent tries to execute the user request.
Generated SQL misses a tenant scope or a WHERE clause before touching customer tables.
The agent sends emails, PII, credit cards, or internal context into an LLM call.
Retry logic goes sideways and calls the same tool hundreds of times in a minute.
Refunds, transfers, payouts, and checkout sessions get a risk decision before the SDK call.
Admin grants, password changes, and fresh-account edits get blocked or escalated.
The chatbot names a person, account, or identifier that doesn't exist in your data — and the user reads it as fact. The taint scanner flags every path from an LLM call to a response, email, or message without a source-of-truth check in between.
The scanner reported a comment line as RCE-equivalent. Two real bugs, eight false positives, one rule: the supervisor has to be more reliable than the agent it watches. The five layers we shipped so it can't lie that way again.
read field note →A manager asked her CRM bot about her team. The bot confidently analyzed five people who didn't exist in her data. Three hours, 35 messages, one lost user — and the wrap pattern that stops it.
read field note →Three innocent features — TTS, outbound calls, an ungated LLM — compose into a voice-phishing weapon under one calendar-event injection. The exploit, the code, and the gate.
read field note →Ten adversarial prompts. Deterministic scoring — no model judge, no LLM-as-judge. Score any agent stack against the same eval set and submit to the public leaderboard.
We publish the eval. We do not appear on the leaderboard.
The first scan tells you what's unsafe today. The GitHub App keeps it that way. When you (or a teammate) open a PR that adds a new ungated stripe.refunds.create orfs.unlink, vibefixing comments before review.
🔒 vibefixing detected 2 new unsafe call-sites in this PR
| File | Type | Conf | Why |
|---|---|---|---|
| src/api/refund.ts:42 | payment | 🔴 high | stripe.refunds.create without @supervised |
| src/workers/cleanup.ts:8 | tool_use | 🔴 high | fs.unlink in user-input path |
Wrap them with @supervised(...) before merging — or this lands in production ungated.
Plus 1 previously-flagged finding fixed — nice. Full diff →
Start with a public scan. Pay when the scanner becomes part of your shipping workflow, then move up when you need team review.
See the prompt-injection paths your agent ships with — scan in 60 seconds.
Every PR your agent helps write gets the same hallucination + injection check, before merge.
Your team sees every blocked action with the reason, so debugging an agent decision takes minutes not days.
Tamper-evident evidence chain you can hand an auditor, mapped to the controls they actually ask about.
One repo URL. Under 60 seconds. You get a list of unsafe call-sites and the SDK lines that gate them.
no credit card · public repos free forever
Vibefixing is a runtime supervisor and security scanner for AI agents. It statically analyzes your codebase to find unsafe tool calls your AI agent can execute before they reach production: unguarded Stripe charges, raw database mutations, filesystem writes, and shell commands.
Vibefixing identifies tool calls that lack input validation or guardrails, which are the primary attack surface for prompt injection. By flagging every path where an LLM output can trigger an irreversible action without a human confirmation step, it eliminates the conditions that make injection exploits dangerous.
Yes. Vibefixing is designed for vibe coders: developers shipping fast with AI assistants like Claude, Cursor, or Copilot. It catches risky patterns LLMs commonly generate: unguarded API calls, database deletes without confirmation, and credential exposure in tool call arguments.
Vibefixing analyzes your repository at the code level, so it works with any agent framework: LangChain, LlamaIndex, CrewAI, custom OpenAI function-calling, Anthropic tool use, or plain Python scripts. No instrumentation or runtime hooks required.
Yes. The same patterns an agent can misuse — Stripe checkout sessions, account mutations, webhook handlers that re-deliver and double-charge — are what get exploited by prompt injection, a careless contractor, or a runaway script. Vibefixing flags them whether or not an LLM is in the call chain. Supported stacks include Next.js App Router (route handlers and Server Actions), Stripe (TypeScript and Python), Supabase (service-role writes and row-level security checks on migrations), Prisma, and SQLAlchemy.
Vibefixing detects: Stripe charges, refunds, subscription changes, and customer portal sessions (TypeScript and Python); raw SQL mutations and Supabase writes that bypass row-level security; Next.js Server Actions and API route handlers; webhook handlers that replay database writes on retry; filesystem deletes; shell execution; emails sent without per-call confirmation. Each finding includes the file, line, and a one-line wrap.
A public repository scan typically completes in under 60 seconds. Vibefixing uses static analysis and does not run your code or require an API key for the scan. Results include a risk score, a list of unsafe actions, and copy-paste guardrail code for each finding.
Yes. Install the Vibefixing GitHub App on your repo and every pull request gets scanned automatically. Within 5 seconds of opening a PR, vibefixing diffs the head ref against your previous scan and posts a comment listing only the new unsafe call-sites. Clean PRs get nothing — no spam. Free for public repos; private repos and CI integration are part of Builder ($29/mo), while team workflows and org-level controls start at Pro ($99/workspace/mo).
Code review and SAST tools catch bugs in code your tests already cover. Vibefixing catches what your tests can't: actions an LLM can fire at runtime in ways no test case anticipated. Refunds the model decides to issue, files it decides to delete, emails it decides to send. Each detector maps to a runtime guard you can drop in with one line.