What is Vibefixing and what does it scan?

Vibefixing is a runtime supervisor and security scanner for AI agents. It statically analyzes your codebase to find unsafe tool calls your AI agent can execute before they reach production: unguarded Stripe charges, raw database mutations, filesystem writes, and shell commands.

How does Vibefixing prevent prompt injection in AI agents?

Vibefixing identifies tool calls that lack input validation or guardrails, which are the primary attack surface for prompt injection. By flagging every path where an LLM output can trigger an irreversible action without a human confirmation step, it eliminates the conditions that make injection exploits dangerous.

Is Vibefixing safe to use for vibe coders shipping with AI-generated code?

Yes. Vibefixing is designed for vibe coders: developers shipping fast with AI assistants like Claude, Cursor, or Copilot. It catches risky patterns LLMs commonly generate: unguarded API calls, database deletes without confirmation, and credential exposure in tool call arguments.

Does Vibefixing work with any AI agent framework?

Vibefixing analyzes your repository at the code level, so it works with any agent framework: LangChain, LlamaIndex, CrewAI, custom OpenAI function-calling, Anthropic tool use, or plain Python scripts. No instrumentation or runtime hooks required.

I'm not building an AI agent — does Vibefixing still help?

Yes. The same patterns an agent can misuse — Stripe checkout sessions, account mutations, webhook handlers that re-deliver and double-charge — are what get exploited by prompt injection, a careless contractor, or a runaway script. Vibefixing flags them whether or not an LLM is in the call chain. Supported stacks include Next.js App Router (route handlers and Server Actions), Stripe (TypeScript and Python), Supabase (service-role writes and row-level security checks on migrations), Prisma, and SQLAlchemy.

What unsafe actions does Vibefixing detect?

Vibefixing detects: Stripe charges, refunds, subscription changes, and customer portal sessions (TypeScript and Python); raw SQL mutations and Supabase writes that bypass row-level security; Next.js Server Actions and API route handlers; webhook handlers that replay database writes on retry; filesystem deletes; shell execution; emails sent without per-call confirmation. Each finding includes the file, line, and a one-line wrap.

How long does a Vibefixing scan take?

A public repository scan typically completes in under 60 seconds. Vibefixing uses static analysis and does not run your code or require an API key for the scan. Results include a risk score, a list of unsafe actions, and copy-paste guardrail code for each finding.

Does Vibefixing scan every pull request automatically?

Yes. Install the Vibefixing GitHub App on your repo and every pull request gets scanned automatically. Within 5 seconds of opening a PR, vibefixing diffs the head ref against your previous scan and posts a comment listing only the new unsafe call-sites. Clean PRs get nothing — no spam. Free for public repos; private repos and CI integration are part of Builder ($29/mo), while team workflows, SSO, and org-level controls start at Pro ($99/workspace/mo).

How is Vibefixing different from regular code review or static analysis?

Code review and SAST tools catch bugs in code your tests already cover. Vibefixing catches what your tests cannot: actions an LLM can fire at runtime in ways no test case anticipated. Refunds the model decides to issue, files it decides to delete, emails it decides to send. Each detector maps to a runtime guard you can drop in with one line.

How is Vibefixing different from Snyk or Semgrep for AI agents?

Snyk and Semgrep are built for human-written code vulnerabilities: CVEs, dependency audits, known patterns. Vibefixing is built specifically for LLM-driven execution paths — the tool calls an AI agent fires at runtime based on user input. It understands agent shapes (tool-using, chatbot, RAG, MCP-server) and surfaces risk framed around what the model can do, not what a human wrote incorrectly.

Does Vibefixing replace runtime monitoring tools like LangSmith?

No — they are complementary. LangSmith and similar tools observe what your agent does at runtime after deployment. Vibefixing runs before deployment, on your static codebase, to catch unsafe tool call patterns before they ever reach production. Use Vibefixing in CI to prevent risky code from shipping, and runtime monitoring to observe behavior after it ships.

← all field notes

Field note · April 30, 2026

When the chatbot invents a person

A people-analytics platform we'd scanned three weeks earlier got a report from a manager. She'd been chatting with the assistant about her team — eight people whose answers to a behavioral instrument live in the platform's database. She asked about a colleague from a sister team who isn't in her data. The assistant confidently analyzed him. Then her manager. Then three more directors. None of them had ever filled out the instrument.

Three hours and 35 messages into the chat, she wrote: "then what is this tool even for?"

The shape of the chatbot

Express server, Anthropic SDK, Firestore as the source of truth. When a manager opens chat, the system prompt injects her team's archetype assignments and a few aggregated metrics. The user types a question. The model responds. The response goes back to the browser as JSON. There's no agent loop, no tool dispatcher, no LangChain — just client.messages.create and a return.

The author had already wrapped the LLM call when we showed up. Their wrapper checked prompt length and tracked latency, the kind of guardrail our base policy emits. None of it caught this.

What actually happened in the chat

Message 10, the user asks about a director on a different team. The model pattern-matches against the names she does have access to, picks the closest archetype, and presents the answer with full confidence. She corrects: that's not him, the director from commercial. The model rolls with the correction and gives a fresh, equally confident analysis of the new name. There is no data on the new name.

User:    "I have doubts about Garbett showing up as Flexible Adapter,
          he's always at full speed, resolutive, doesn't soften
          his delivery..."

Bot:     "Luis, although he identifies as Flexible Adapter, his
          behavior suggests he may be acting more like a 'Resolutive
          Dominant'..."

User:    "not Luis, it's Garbett, the commercial director."

Bot:     "Garbett, with his fast pace and resolutive style, fits
          the profile of a Resolutive Dominant. He probably also
          shows..."

(Garbett is not in the manager's team_members. The bot has no data
on him. It generated the analysis from the name and the framework
labels in the system prompt.)

By message 28 the user asked the bot to analyze five people from the commercial leadership group. The bot complied. Five archetype assignments, five rationales, all hedged with the most dangerous chatbot phrasing on earth: "Pablo could be a Strategic Resolver if he focuses on solving problems...". Authoritative tone. Systematic format. The user reads it as data.

Why prompt rules don't catch this

The author had a clear system prompt. It listed the team members. It listed their archetypes. It said, in plain Spanish, only discuss people in this list. The model ignored it under the gentlest social pressure. Can you tell me Pablo's archetype? — that's all it took.

A prompt rule is advisory. It's a request, not a gate. Models violate prompt rules under pressure with a frequency we already accept for jailbreak research; we have to accept it for routine conversation too. The fix has to live outside the model.

The fix — gate the response, not just the call

Today's scanner version flags this case as a new family:llm-output-without-validation. A taint detector walks the function from the LLM call to the response, and if no entity-validation step runs in between, it fires.

src/services/claude.service.js:314

// before — ungated, the prompt was the only contract
async function respond(userMessage, allowed) {
  const r = await client.messages.create({
    model: 'claude-haiku',
    system: `Only discuss ${allowed.join(', ')}`,  // advisory only
    messages: [{ role: 'user', content: userMessage }],
  });
  return r.content[0].text;
}

src/services/claude.service.js:314

// after — the response is checked against the allowed set
import { assert_entities_in_scope, supervised } from 'supervisor_guards';

async function respond(userMessage, allowed) {
  const r = await client.messages.create({
    model: 'claude-haiku',
    system: `Only discuss ${allowed.join(', ')}`,
    messages: [{ role: 'user', content: userMessage }],
  });
  const reply = r.content[0].text;

  const check = assert_entities_in_scope(reply, allowed);
  if (!check.in_scope) {
    return `I don't have data on ${check.unknown.join(', ')}.` +
           ` Want to invite them to the test?`;
  }
  return reply;
}

The helper extracts proper-noun candidates from the model output, folds case + accents, and compares against the caller's authorized list. The supervisor policy scope_guard.base.v1 turns the comparison into an audited deny — the response never leaves the wrapper if it mentions someone the user can't see.

runtime-supervisor/policies/scope_guard.base.v1.yaml

when:   set(payload['entities_mentioned']) - set(payload['allowed_entities'])
        is non-empty
action: deny
reason: out-of-scope-entity-in-llm-response

when:   allowed_entities is empty AND entities_mentioned is non-empty
action: review
reason: scope-not-passed (likely a wiring bug — surface to a human)

The policy ships with every repo_type=chatbot-rag scan. The caller does the source-of-truth lookup once per request and feeds the two lists into the policy payload. The DSL stays synchronous; the domain knowledge lives in the caller.

Why this is a class, not a one-off

Most agents ship in 2026 don't move money. They're chatbots and Q&A surfaces over the customer's data. The dominant failure mode there isn't the agent doing the wrong thing — it's the model saying the wrong thing. A name that doesn't exist. A balance that's off by a decimal. A status that hasn't been true in two months.

Our threat model used to skip that whole surface. We modeled actions: payment, fs-delete, db-write. We didn't model assertions. Andrea's incident is the receipt for that gap. We've closed it: chatbot-shaped repos now get a different report (LLM as the lead risk, not email), a different policy template (scope_guard ships alongside the action policies), and a different stub in stubs/py/chatbot_scope_guard_example.stub.py.

The dashboard nudge that closes the loop

The author had the wrapper installed in shadow mode for six days when this happened. Six days of would_block_in_shadow data the dashboard never surfaced. The card we just added reads the same telemetry endpoint and tells the operator one of four things: keep soaking, tune your policies, flip to enforce, or you have a wiring gap. No interpretation of raw numbers required.

Try it

Scan your chatbot. Get the scope-guard policy.

Free public scan. If your repo classifies as chatbot-rag, the output ships with policies/scope_guard.base.v1.yaml and the wrap example. Drop the helper between your LLM call and the response, install the policy, deploy in shadow.

scan your repo →github →