for vibe coders shipping AI agents

Your agent sent 621 emails before you got out of bed.

At 03:14 it re-planned a batch you cancelled at 23:47. At 13:00 three cron schedulers fired the same handler in the same minute. We catch the four wraps that would have stopped all of it — in 15 minutes, in shadow mode, without changing what your agent does.

shadow mode first · 4 lines of @supervised · enforce when you trust it

4
action types supervised
preview catalog
0
threat detectors
preview catalog
3
risk axes
security · efficiency · quality
from the registry

What every scan gates

Each action type is a chokepoint where an LLM output triggers something irreversible. Status comes straight from /v1/action-types.

refundlive

Refund supervision

Stop risky refunds before money leaves the system.

signals · 4·policy refund.base@v1
paymentplanned

Payment approvals

Enforce thresholds, approval chains, and anomaly detection on outgoing payments.

signals · 4
account_changeplanned

Account changes

Prevent unsafe updates to customer identity and profile data.

signals · 3
data_accessplanned

Restricted data access

Block unauthorized use of sensitive data by agents and the tools they call.

signals · 4
why vibefixing

Catches what tests can't

$

Static analysis

Walks your repo at the AST level — no API key, no runtime hooks, no instrumentation. Scan finishes in under 60s.

LLM-firable code

Flags every path an LLM tool call can fire at runtime — refunds, deletes, sends — even when your tests pass.

One-line guardrails

Every finding maps to a @supervised(...) decorator. Drop it in shadow mode first; flip to enforce when you trust it.

Continuous on PRs

The GitHub App diffs each PR against your last scan and comments only on new unsafe call-sites. Zero PR-spam.

governance pack

When your customer asks “do you have AI policies?”

Every scan also writes a governance/ folder. Paste the one-pagers into your next vendor questionnaire and stop writing them at 11pm.

Policy in one page

Lists your real LLM providers and what your agent actually does — pulled from the scan, not a template. If your repo has no LLM, the page says so.

@

Owners named

One line per action type, pointing at the human who approves it. Override defaults by dropping owners.config.yaml at the repo root.

Audit trail you can paste

Self-attestation answers that cite the runtime review queue and the tamper-evident log already shipping with the supervisor.

agent shapes

What kind of agent are you building?

The scanner picks a shape per repo and adapts its priorities. A chatbot doesn't worry about agent loops; a tool-using agent doesn't worry about hallucinated entities. The framing is in the report.

Tool-using agent

langchain-agent · mcp-server

Orchestrator picks a tool and fires. Risk lives in what the tool does — payments, deletes, sends. Scanner leads with the agent chokepoint and every action call-site downstream.

e.g. LangChain · CrewAI · MCP servers · custom OpenAI function-calling

Chatbot or RAG

chatbot-rag

Direct LLM call, no tool dispatch. The model talks to users from your data. Risk is the model asserting a name, account, or fact that doesn't exist — and the user trusting it. Scanner leads with the LLM call and the path to the user.

e.g. Anthropic + Express · OpenAI + FastAPI · vector-store RAG · Q&A bots

Claude skill or plugin

claude-skill

No runtime to gate — you ship prompt content Claude reads at call time. Risk is what the prompt instructs Claude to do in the user's workspace. Scanner audits the SKILL.md and CLAUDE.md surface.

e.g. .claude/skills/* · claude-code-plugin.json · CLAUDE.md packages

// each shape adjusts the report's priority order, the policy templates that ship, and the wrap example in stubs/.

live attack scenarios

Inputs hit the supervisor first

Every scenario below is a real supervisor decision against a real threat in the catalog.// static preview

scan your repo →
Loading live threat catalog…
triage by axis

Each finding tells you what kind of pain dominates

Tiers tell you what surface is at risk. The axis chip tells you what kind of pain dominates — so you can skip past findings that don't match the triage you're running today.

[Security]Security

Adversarial paths: prompt injection moves money, voice-cloning + outbound calls become vishing, agent-orchestrators expose every tool. Anything an attacker would chain.

e.g. voice-actions, payment-calls, agent-orchestrators, fs-shell · shell-exec
[Reliability]Reliability

Things your agent does that shouldn't have happened. Cron raced itself. The cancelled batch came back at 03:14. The chatbot named someone who doesn't exist. Normal traffic — wrong outcome.

e.g. agent-loop, llm-calls (low-confidence drift), cron-overlap, hallucinated-entities
[Efficiency]Efficiency

Runaway cost: an LLM call without a cap loops, a tool retries forever, premium-rate phone numbers run up the bill. Normal traffic — bad shape.

e.g. llm-calls (high confidence), tool retries, oversized prompts
[Quality]Quality

Auditability and reversibility: business-table mutations with no audit trail, fs-write without a path allowlist, tool calls without a tool name. The stuff that breaks incident response.

e.g. db-mutations on business tables, fs-write, http-routes, cron-schedules
the fix

One gate before execution

Your agent can still use tools. It just asks the supervisor before doing something expensive, destructive, or sensitive.

// before
def handle_tool_call(tool, args):
    return TOOLS[tool](**args)
// after
from supervisor_guards import supervised

@supervised("tool_use")
def handle_tool_call(tool, args):
    return TOOLS[tool](**args)
field notes

Failure modes vibefixing has caught

Normal agent failure modes, not abstract security theater.

Prompt-injected tool calls

A ticket says "ignore previous instructions" and the agent tries to execute the user request.

Dangerous DB mutations

Generated SQL misses a tenant scope or a WHERE clause before touching customer tables.

Data leakage

The agent sends emails, PII, credit cards, or internal context into an LLM call.

Cost loops

Retry logic goes sideways and calls the same tool hundreds of times in a minute.

Unreviewed money movement

Refunds, transfers, payouts, and checkout sessions get a risk decision before the SDK call.

Role and account changes

Admin grants, password changes, and fresh-account edits get blocked or escalated.

Hallucinated entities reaching users

The chatbot names a person, account, or identifier that doesn't exist in your data — and the user reads it as fact. The taint scanner flags every path from an LLM call to a response, email, or message without a source-of-truth check in between.

public benchmark

vf hallucination-rate

Ten adversarial prompts. Deterministic scoring — no model judge, no LLM-as-judge. Score any agent stack against the same eval set and submit to the public leaderboard.

We publish the eval. We do not appear on the leaderboard.

# install + score any agent
$ npm i -g @runtime-supervisor/hallucination-eval
$ vf-hallucination-rate score \
--cmd 'python my_agent.py'
# sample output
eval-set: v0.1.0
total: 10 · passed: 7 · failed: 3
hallucination rate: 30.0%
continuous protection

Every PR, auto-scanned.
New unsafe code can't slip in.

The first scan tells you what's unsafe today. The GitHub App keeps it that way. When you (or a teammate) open a PR that adds a new ungated stripe.refunds.create orfs.unlink, vibefixing comments before review.

  • 5 seconds from PR open to comment posted.
  • Diffs against your previous scans — only new findings flagged.
  • Catches what tests can't — code your LLM can fire, not what your tests cover.
  • Same UX whether you ship 1 PR/week or 100.
install on a repo →free for public repos
V
vibefixingcommented on PR #1425 seconds ago

🔒 vibefixing detected 2 new unsafe call-sites in this PR

FileTypeConfWhy
src/api/refund.ts:42payment🔴 highstripe.refunds.create without @supervised
src/workers/cleanup.ts:8tool_use🔴 highfs.unlink in user-input path

Wrap them with @supervised(...) before merging — or this lands in production ungated.

Plus 1 previously-flagged finding fixed — nice. Full diff →

pricing

For builders and teams

Start with a public scan. Pay when the scanner becomes part of your shipping workflow, then move up when you need team review.

Free

$0

See the prompt-injection paths your agent ships with — scan in 60 seconds.

  • Public GitHub repo scan
  • Top findings preview
  • Risk tier summary
  • Local CLI install
scan a public repo

Builder

$29/mo

Every PR your agent helps write gets the same hallucination + injection check, before merge.

  • Private repo scans
  • Full runtime-supervisor export
  • Stubs and YAML policies
  • Scan history and diffs
  • CI/GitHub PR comments
upgrade to builder

Pro

best first paid plan
$99/workspace/mo

Your team sees every blocked action with the reason, so debugging an agent decision takes minutes not days.

  • Everything in Builder, for your team
  • Unlimited repos in this workspace
  • Shared fix queue + team review
  • Audit retention + webhooks
  • SSO (rolling out — email us)
contact us

Enterprise

Talk to us

Tamper-evident evidence chain you can hand an auditor, mapped to the controls they actually ask about.

  • Multi-workspace
  • Custom retention + audit export
  • Priority support + SLAs
  • Dedicated infrastructure
  • Mono-repos >200MB / high-volume scans
email sales

Ready to ship safer agents?

One repo URL. Under 60 seconds. You get a list of unsafe call-sites and the SDK lines that gate them.

no credit card · public repos free forever

Frequently Asked Questions

What is Vibefixing and what does it scan?

Vibefixing is a runtime supervisor and security scanner for AI agents. It statically analyzes your codebase to find unsafe tool calls your AI agent can execute before they reach production: unguarded Stripe charges, raw database mutations, filesystem writes, and shell commands.

How does Vibefixing prevent prompt injection in AI agents?

Vibefixing identifies tool calls that lack input validation or guardrails, which are the primary attack surface for prompt injection. By flagging every path where an LLM output can trigger an irreversible action without a human confirmation step, it eliminates the conditions that make injection exploits dangerous.

Is Vibefixing safe to use for vibe coders shipping with AI-generated code?

Yes. Vibefixing is designed for vibe coders: developers shipping fast with AI assistants like Claude, Cursor, or Copilot. It catches risky patterns LLMs commonly generate: unguarded API calls, database deletes without confirmation, and credential exposure in tool call arguments.

Does Vibefixing work with any AI agent framework?

Vibefixing analyzes your repository at the code level, so it works with any agent framework: LangChain, LlamaIndex, CrewAI, custom OpenAI function-calling, Anthropic tool use, or plain Python scripts. No instrumentation or runtime hooks required.

I'm not building an AI agent — does Vibefixing still help?

Yes. The same patterns an agent can misuse — Stripe checkout sessions, account mutations, webhook handlers that re-deliver and double-charge — are what get exploited by prompt injection, a careless contractor, or a runaway script. Vibefixing flags them whether or not an LLM is in the call chain. Supported stacks include Next.js App Router (route handlers and Server Actions), Stripe (TypeScript and Python), Supabase (service-role writes and row-level security checks on migrations), Prisma, and SQLAlchemy.

What unsafe actions does Vibefixing detect?

Vibefixing detects: Stripe charges, refunds, subscription changes, and customer portal sessions (TypeScript and Python); raw SQL mutations and Supabase writes that bypass row-level security; Next.js Server Actions and API route handlers; webhook handlers that replay database writes on retry; filesystem deletes; shell execution; emails sent without per-call confirmation. Each finding includes the file, line, and a one-line wrap.

How long does a Vibefixing scan take?

A public repository scan typically completes in under 60 seconds. Vibefixing uses static analysis and does not run your code or require an API key for the scan. Results include a risk score, a list of unsafe actions, and copy-paste guardrail code for each finding.

Does Vibefixing scan every pull request automatically?

Yes. Install the Vibefixing GitHub App on your repo and every pull request gets scanned automatically. Within 5 seconds of opening a PR, vibefixing diffs the head ref against your previous scan and posts a comment listing only the new unsafe call-sites. Clean PRs get nothing — no spam. Free for public repos; private repos and CI integration are part of Builder ($29/mo), while team workflows and org-level controls start at Pro ($99/workspace/mo).

How is Vibefixing different from regular code review or static analysis?

Code review and SAST tools catch bugs in code your tests already cover. Vibefixing catches what your tests can't: actions an LLM can fire at runtime in ways no test case anticipated. Refunds the model decides to issue, files it decides to delete, emails it decides to send. Each detector maps to a runtime guard you can drop in with one line.