An Enforceable Framework for Operating LLMs Safely

Most AI-safety advice stops at principles. "Detect sensitive data." "Keep an audit trail." "Don't send private data to third parties." Few things are wrong with the principles — the gap is that a principle you cannot enforce on every code path is a wish, not a control. A redaction rule one route skips, an audit log an admin can rewrite, a "local-only" flag a background job ignores: each reads as safety on the page and leaks in production.

This is a framework for operating LLMs safely that is organized around enforceability. It has seven pillars. For each one we state the principle (the one-sentence guarantee), the failure mode (what goes wrong when it is absent), and the reference control (how a gateway actually implements it). It is principle-based on purpose — it describes the failure a safe system must prevent, never a specific incident.

We build an open-source gateway, Membrain, that implements each of these, so "is this enforceable?" has a concrete answer rather than a hand-wave. Where a pillar names a control, the Membrain implementation is noted in parentheses. You do not need to run Membrain for the framework to be useful — the seven questions stand on their own.

The single test that runs through all seven pillars: a control is only as strong as its weakest covered path. Ask of every rule — on which paths does it run, what happens when it errors, and who can bypass or forge it?

The Seven Pillars

Pillar 1

Detection — know what is in the traffic

Principle. Sensitive content (PII, secrets, regulated data) and risky content (prompt injection, tool-description poisoning) must be identified in both directions before it crosses a trust boundary.

Failure mode. What you cannot see, you cannot govern. Undetected secrets ship to third-party providers; undetected injection rewrites the agent's intent. Exfiltration happens on the response path too, so scanning only the request is half a control.

Normalize before you match (Unicode NFKC, encoding variants) — homoglyph and width tricks slip naive patterns. Combine deterministic patterns with ML/NER and take the union of overlapping matches, never just the first. Let users register exact-match "guarded values" that are always caught regardless of confidence thresholds.

Reference control: a scanning service over a tunable pattern + NER engine with guarded-value overlays (Membrain: PIIService / PIIScanner, 25+ categories).

Pillar 2

Enforcement — act on what you detect

Principle. Detection without action is theater. Every finding maps to a declared action: pass, log, alert, redact, confirm, or block.

Failure mode. Computing a redaction and then forwarding the original; logging a "blocked" event while the request proceeds. The audit says safe; the wire says leak.

Make policy per-tenant and declarative (categories → actions). Fail closed: if the scanner errors, reject — do not forward. Enforce the redacted artifact on the wire, so what you logged equals what you sent. No "skip" mode bypasses non-negotiables — registered secrets stay redacted in every mode.

Reference control: a middleware pipeline that mutates the outgoing payload and can short-circuit a request (Membrain: rate-limit → budget → PII/data-policy → tool-policy → cache → knowledge).

Pillar 3

Memory — govern what the system remembers

Principle. Stored context — RAG knowledge, caches, transcripts — is an attack surface and a data-residency obligation, not a free convenience.

Failure mode. Private prompts embedded into a third-party vector store; one tenant's memory surfaced to another; secrets persisted in a shared cache.

Re-scan content for sensitive data before it is written to any store. Scope every read and write by tenant, and where relevant by actor. Treat "private/local-only" as covering the memory-write hop, not just inference — never seed a shared cache or embed off-box for a request marked private.

Reference control: a knowledge store with per-tenant scoping, PII re-scan on inject, and egress gating tied to the privacy flag (Membrain: KnowledgeStore, semantic cache).

Pillar 4

Visibility — be able to prove what happened

Principle. Every AI interaction — model, tokens, cost, findings, tools, actor — leaves a tamper-evident trail, and unsanctioned AI use is discoverable.

Failure mode. No record of what data left the building; "shadow AI" tools no one approved; metrics you can't trust because the log can be rewritten.

Audit every exit path — success, fail-closed rejection, and upstream error alike. Make the trail tamper-evident (keyed hash chain + external anchor), not merely append-only. Surface shadow-AI usage by tool, endpoint, and actor. Keep raw sensitive values out of audit rows; gate any plaintext reveal behind strong auth and tenant scope.

Reference control: structured audit with an HMAC-keyed, sequence-bound chain plus shadow-AI detection (Membrain: audit service, MCP audit, shadow endpoints).

Pillar 5

Routing — control where requests go

Principle. The destination of a request is a security decision. Privacy, residency, and cost constraints must bind the actual egress, including fallbacks.

Failure mode. A "local-only" request that reaches a cloud provider through a fallback chain, a default sentinel, or a side channel (embedder, cache, tagger).

Make privacy/residency a property of the pipeline context, not a local variable, so every egress-capable component honors it. Re-apply constraints to fallback targets, not just the primary. Fail closed (e.g., a hard 502) when no compliant route exists — never silently downgrade. "Local model" does not cover embedders, caches, or background jobs; each is its own egress.

Reference control: a router with explicit local-provider sets, fallback re-validation, and a propagated egress flag (Membrain: Router, private-egress guard).

Pillar 6

Coverage — no path is exempt

Principle. A control that protects one route and not another is a false sense of security. Every surface gets the same governance, or the weakest path defines your posture.

Failure mode. The application route redacts; the transparent proxy doesn't. The OpenAI path is governed; the Anthropic path drifts. Attackers find the gap.

Route every surface through one shared enforcement entry point — avoid parallel, divergent implementations. Test the contract per-surface against the database and runtime you actually deploy, not a lighter test double. Treat "skip" and "no-op" branches as security-relevant; that is where coverage silently ends.

Reference control: a single shared pipeline consumed by all entry points, per-surface contract tests, and CI that runs against the real datastore.

Pillar 7

Trust — identity, isolation, and integrity

Principle. Multi-tenant boundaries, operator identity, and the integrity of the controls themselves — certs, keys, audit chain, supply chain — are the foundation the other six stand on.

Failure mode. A tenant admin acting cross-tenant; an unconstrained interception CA with its key on disk; a security fix that never reaches installed clients; a forgeable audit chain. Break trust and the other six pillars are decorative.

Separate global authority from tenant authority; clamp every read and write to the caller's scope unless explicitly global. Name-constrain any interception CA and destroy its private key after use. Key integrity material (audit HMAC, signing keys) outside the database the data lives in. Make the update path itself a control — a fix that doesn't ship isn't a fix.

Reference control: a role hierarchy with global-vs-tenant scoping, a name-constrained proxy CA, an externally-keyed audit chain, and a guarded publish pipeline.

How to use the framework

Two of the seven pillars are force-multipliers. Coverage (pillar 6) and Trust (pillar 7) determine whether the other five are real: a perfect redactor on one of three ingress paths is a third of a control, and an audit chain an admin can forge is none. If you audit in order, audit those two first.

As an operator: treat each pillar as a checklist and find your weakest one — that, not your strongest, is your actual posture.
As an assessor: for each control, ask "on which paths?", "what happens on error?", and "who can forge or bypass it?" A control that fails any of the three is incomplete.
As a builder: wire detection, enforcement, memory, routing, and audit as one pipeline that shares state — the cross-cutting guarantees (PII-clean audit trails, privacy-bound routing) are properties of that architecture, not features you bolt on later.

These are guidelines, not a standard. Authority over how to run AI safely is earned by adoption, not declared — so treat this as v0.1, and tell us where it's wrong.

See the framework as running code

Membrain is the open-source reference implementation — each pillar maps to a control you can read and run yourself. Self-hosted, Apache-2.0.

Get started on GitHub →