Polleo · Product N°01

Bring AI to your vault —
with guardrails at runtime.

Putting Claude inside your Obsidian vault is the easy part — the integration is ordinary engineering. The interesting part is what happens to your assumptions about security once an AI has reach into your second brain. Gryphon is built around a guardian that refuses passage on the operations no AI should ever auto-approve, even when you've told it to. The two-dimensional security model is the design choice the industry needs to adopt as a baseline for any AI tool with access to valuable personal data.

View on GitHub → Read the essay → Skip to the security model

§ 01 Why a guardian

Your vault is not a repository. Losing it is losing your externalized memory.

Most thinking about "AI can touch my files" happens in the context of a codebase, where the worst case is broken code and the fix is one git revert away. A vault is different. It holds years of reflection, decisions and their rationale, the source material for the book you've been planning. Some of it is replaceable. Most of it is not.

The operations an AI might perform on a vault are not all in the same risk bucket. Renaming a note is trivial. Rewriting a meeting summary is recoverable. A shell command writing into ~/.ssh is a different category entirely — the AI just touched the trust boundaries around your notes, not the notes themselves. Treating those as points on the same dimension of risk is what most permission systems get wrong. Gryphon doesn't.

§ 02 Two-dimensional security

Convenience and guardrail are independent. The two knobs do not interact.

The industry ships AI tools with a single permission dimension: how much friction the user tolerates between asking and acting. Reasonable for a coding agent in a scratch directory. Wrong for a plugin operating on a vault. The things you want auto-approved (edit a note, rename a file, draft a summary) and the things you never want auto-approved (recursive delete, writes into .ssh, pipes to a shell, sudo, scheduled-task persistence, fetches that deposit untrusted content into the vault) are not on the same spectrum. They are categorically different.

Dimension 1 · Convenience

Permission modes

Choose how often Gryphon asks you before acting. Prompt (the default) asks for every edit and every shell command. Safe auto-accepts file edits but still prompts on shell. YOLO silences ordinary prompts so you can move at speed. Plan proposes only — read-only mode for review.

Dimension 2 · Guardrail

Protected patterns

A curated catalogue of categorically-risky operations — recursive deletion, system and credential paths, shell-pipe interpreters, privilege escalation, persistence artifacts, vault-deposit web fetches — that always require explicit approval. Crank convenience to YOLO; the guardrail still fires.

The effect: silence prompts for the 95% of work that's safe without ever lowering the bar on the 5% that matters. Turning convenience up does not turn the guardrail down. Gryphon makes that property a hard invariant — not a best-effort default that erodes as the codebase grows.

For the full argument behind the design, read The Second Dimension: A Security Model for AI Inside Your Second Brain →

§ 03 The attack you can't see

The AI cannot be the last line of defense for AI.

The most underdiscussed real-world risk in AI-assisted knowledge work isn't a dramatic exploit. It's the fetch. You ask the AI to save an article, summarize a webpage, clip a markdown copy of a blog post. Ordinary second-brain work. The AI does exactly what you asked.

But that fetched content now sits in your vault as a note — indistinguishable, at the filesystem level, from content you wrote yourself. Future sessions will read it the same way they read anything else you authored. And it can contain instructions: prompt-injection payloads embedded in scraped HTML, in zero-width text, in summaries from sites that themselves contain injection attempts. Anything that enters your vault inherits its trust, and anything it carries gets executed as context in every subsequent session — until you notice and clean it up.

The shortcut is to hope the model is robust enough to ignore injection attempts. Every major AI lab has published research on this; the consensus is the same. Models can be made more resistant; they cannot be made reliable defenders against attacks aimed at themselves. And when the model fails, the failure is invisible — no error, no anomaly, no warning. The blast radius unfolds silently: a leaked credential, an exfiltrated note, a quiet persistence hook. By the time consequences surface, it's too late.

That's the argument for a layer that doesn't depend on the AI's judgment. Gryphon applies the second-dimension principle to the fetch step. Web fetches that deposit content into the vault always go through the guardrail, regardless of the convenience setting. You see what's about to enter, you decide whether to trust the source, and every downstream session benefits from the decision you made once.

§ 04 What it does

Obsidian-native

Gryphon lives in a side panel alongside your notes. It reads your vault with your permission — folder structure, tags, links, frontmatter — and respects every convention you already rely on.

Pick your provider

Use the Anthropic API directly with your own key, or route through a locally-installed Claude Code subprocess for advanced workflows. Pick the model per conversation: Opus for the hardest thinking, Sonnet for fast everyday drafting, Haiku for near-instant answers.

III

Your data stays yours

Your vault isn't uploaded, indexed on our servers, or used to train anything. API calls go directly from your machine to Anthropic — Polleo sits out of the data path entirely. Your notes, your keys, your machine.

Keyboard-first, palette-friendly

Every surface is built for how Obsidian power-users actually work: command palette entries, keybindings, slash commands, a typographic feed that reads like a document — not a chat app bolted on.

Untrusted-content framing

Web fetches, shell output, and out-of-vault reads are tagged when Claude sees them. Prompt injection in fetched content can't redirect the conversation — the model is told what came from you and what didn't.

Per-call approval modal

Every tool-use that matches one of your protected patterns surfaces in an Obsidian dialog with diff or command preview before it runs. Approve once, approve never, or remember the decision for the session. Bash decisions are never cached — every command is asked individually.

§ 05 Open source

Gryphon is open source under the MIT license. The full plugin lives on GitHub — read the code, file issues, send pull requests, or fork it and make it yours.

What sets Gryphon apart from other Claude plugins is how it treats user protection. Safety isn't a checkbox buried in a menu. It's a full second dimension, independent of convenience — so turning auto-approve all the way up never weakens the guardrails that matter. As more of your life becomes AI-reachable — your calendar, your email, your research tooling, your wallet — the guardian matters more, not less.

View on GitHub →

Repositorypolleoai / gryphon

LicenseMIT

HostObsidian

APIAnthropic · user key

Questions, feedback, or security reports: gryphon@polleo.ai.

Welcome the AI. Demand the guardian.

Bring AI to your vault —with guardrails at runtime.