Why AI Gateways Aren't Enough

The AI security tooling market has exploded. You can now buy a platform to discover your AI assets, a gateway to proxy your AI requests, a DLP tool to scan your AI traffic, and a compliance logger to record all of it. Each product does its job. Most do it reasonably well. And yet the combination of all four still leaves a category of risk fully unaddressed — the risk that lives in the gaps between them.

The problem isn't the individual tools. The problem is that they don't talk to each other. And in AI security, the decisions that matter most happen at the intersection of detection, routing, memory, and enforcement — not inside any single product's silo.

The Three Silos of AI Security

To understand the integration gap, it helps to be precise about what each category actually does.

Security Platforms (Varonis Atlas, Cisco AI Defense)

This category approaches AI risk from the discovery and monitoring angle. Varonis Atlas, for instance, scans your cloud environment and SaaS applications to build an inventory of AI tools in use across your organization. It identifies which AI assets exist, flags risky configurations and overpermissioned integrations, and maps findings to compliance frameworks like SOC 2 and HIPAA. It's a bird's-eye view of your AI attack surface.

What it cannot do: intercept a request. A security platform lives outside the request path. It observes metadata, configuration, and audit logs — not the actual content of what your employees are sending to AI providers. When a sensitive prompt leaves your network, the platform may eventually surface a compliance finding. But "eventually" is not the same as "before the data left."

Routing Gateways (LiteLLM, Portkey)

Routing gateways sit in the request path — but only for traffic explicitly configured to flow through them. LiteLLM proxies requests to 100+ model providers, handles load balancing, applies rate limits, and surfaces cost analytics. Portkey adds guardrails, 250+ provider integrations, and SOC 2-compliant logging. These are genuinely useful tools for engineering teams building AI-powered applications.

The scope boundary is the limitation. A routing gateway sees only the requests your application was built to send through it. It cannot see a developer using Claude CLI from their terminal. It cannot see an employee pasting data into the ChatGPT web interface. It cannot see a contractor using the Gemini app on a personal device. The gateway's policy engine has no opportunity to act on traffic it never sees.

DLP Tools (Nightfall)

Data Loss Prevention tools were designed to detect sensitive data leaving your environment — credit card numbers in email attachments, SSNs in web form submissions, health record IDs in file transfers. Nightfall applies this model to AI by scanning outbound API calls for PII patterns before they reach the provider.

The architectural mismatch is significant. Modern AI requests are JSON payloads containing nested message arrays, tool call blocks, multi-modal content, and multi-turn conversation history. PII doesn't sit in a predictable field — it's embedded in natural language inside deeply nested structures, distributed across conversation turns, sometimes split across streaming chunks. A DLP tool scanning at the wrong layer, or pattern-matching against concatenated text without understanding AI protocol structure, will generate both false positives and misses. And critically: a DLP detection that fires a webhook to a SIEM while the request continues to the provider is not enforcement. It's a log entry.

The Integration Gap

Here is the scenario that illustrates the problem precisely.

Your DLP tool detects PII in an outbound prompt — a customer name and account number, mid-sentence in a support ticket summary. It fires an alert. The alert lands in your SIEM. An analyst sees it the next morning during triage.

Meanwhile, your routing gateway, which is handling requests from your internal application, has no idea the DLP scan happened. Its routing decision was already made: the request went to GPT-4o, as configured. The routing engine had no opportunity to redirect to a local Ollama instance — it doesn't know PII was present, because that information lives in a different system.

Your security platform will surface this incident in its dashboard by the time your weekly compliance review comes around. The mapping to your SOC 2 control framework will be accurate. The timestamp will match the SIEM log.

The PII already left the building.

The critical decisions in AI security — strip this data, redirect this request, block this tool, log this for compliance — need to happen in milliseconds, in the same system, on the same request. A pipeline of separate tools connected by webhooks and log exports cannot achieve this. By the time the alert crosses a system boundary, the enforcement window is closed.

This is the integration gap. It's not a gap in any individual product's features. It's a gap in the architecture of a stitched-together stack. Detection is in one system. Routing decisions are in another. Compliance logging is in a third. None of them share state in real time. None of them can act on each other's findings within the lifetime of a single request.

What Integration Actually Looks Like

Integration isn't a checkbox or a feature flag. It's an architectural property — the ability for detection, routing, memory, and compliance to operate on the same request, at the same moment, sharing state with each other. Here are five concrete examples of what this looks like when the layers are genuinely unified.

PII-Aware Cache Keys

When two users send semantically identical prompts — "summarize the Q4 board report" — but one prompt contains a real customer name embedded in context while the other doesn't, a naive cache system treats them as different keys. The prompts differ in their raw text, so no cache hit occurs.

A unified system strips PII before computing the cache key. Both prompts, after PII removal, resolve to the same normalized form. One LLM call serves both users. Cache hit rates improve. Provider costs drop.

This is only possible if PII detection and caching operate inside the same pipeline. A DLP tool that runs asynchronously, after the cache lookup has already been attempted, cannot participate in the key computation. The PII strip must happen before the cache lookup, which means PII detection must be upstream of the cache layer, in the same request handler.

Privacy-Based Routing

A request arrives containing a patient name, a diagnosis code, and a treatment history. The PII classifier identifies multiple HIPAA-relevant entities. That classification should immediately inform the routing decision: this request should not go to a third-party cloud provider. It should go to a self-hosted Ollama instance running inside your HIPAA-compliant environment, where the data never crosses a network boundary.

For this to happen, PII classification and routing must share state. The classifier produces a signal — "this request contains sensitive health data" — and the router must receive that signal before selecting a provider. In a stitched-together stack, the DLP tool fires an alert after the routing decision has already been made. In an integrated system, detection feeds routing, and routing reflects the real-time state of every request.

Audit-to-Knowledge Pipeline

Every AI response that flows through your gateway is a potential knowledge artifact. An engineer documenting a debugging session, a lawyer drafting contract language, a consultant summarizing client findings — these responses contain institutional knowledge that currently lives nowhere persistent.

An integrated system extracts knowledge entries from responses automatically, embeds them using a semantic encoder, deduplicates against the existing knowledge store, and makes them retrievable for future interactions. This pipeline requires audit logging, embedder access, semantic deduplication, and knowledge store writes to happen as part of the same request lifecycle — not as a downstream batch job hours later. The audit trail and the knowledge base are the same artifact, produced by the same pipeline.

Knowledge Context Injection

When a user asks a question that overlaps with a previous conversation — across providers, across sessions, across weeks — an integrated system can retrieve the relevant prior context and inject it into the new prompt. The user gets a better answer. The provider sees richer context. The organization's accumulated knowledge becomes an input to every future interaction.

This requires a knowledge store that is populated by the audit pipeline, indexed by a semantic embedder, and queried by the request handler before forwarding to the provider. All of these must be components of the same system. A routing gateway that calls an external RAG endpoint as a pre/post hook can approximate this, but the integration is brittle — the external call adds latency, the hook cannot access PII classification state, and the knowledge store is disconnected from the audit log that would populate it.

PII-Clean Audit Trail

Compliance logs must be auditable without becoming a liability themselves. If your audit trail contains the actual PII from every request — real names, account numbers, health record IDs — then the audit trail is itself a sensitive data store that requires the same protections as production data, subject to the same breach notification obligations, the same access controls, the same retention policies.

An integrated system strips PII before any persistence occurs. Compliance logs contain placeholders — [PERSON_NAME], [ACCOUNT_NUMBER], [DIAGNOSIS_CODE] — not the underlying values. The audit trail is complete and queryable. The sensitive data never reaches the log layer at all.

This requires PII detection to run before the logger, in the same pipeline. A DLP tool that scans logs after they're written cannot retroactively clean them. A logger that receives raw request content and calls an external PII service introduces a race condition and a latency spike. The strip must happen upstream, in-process, before any downstream component receives the raw content.

Integration Depth Comparison

The standard capability comparisons — how many providers, which compliance frameworks, which PII patterns — obscure the dimension that actually matters: how deeply are the layers connected? A PII scanner that is architecturally isolated from the router, the cache, and the audit trail provides far less protection than its feature list suggests.

Tool	PII / DLP	Knowledge / Memory	Routing	Audit / Compliance	Self-Hosted OSS	Layers Connected
MemBrain	25+ patterns + ML NER, feeds cache + routing	Semantic store, auto-extraction, context injection	Privacy / cost / tier-based, fallback chains	PII-clean logs, compliance export	✓ Yes (open core)	All layers connected
LiteLLM	Plugin (Presidio)	None	100+ providers	Basic logging	✓ Yes (MIT)	PII is isolated plugin
Cloudflare	DLP (Presidio + ML)	None	Limited (their workers)	Logging + analytics	✗ No (SaaS only)	DLP is isolated feature
Portkey	40+ guardrails	None	250+ providers	SOC2 / HIPAA logging	~ Partial (Gateway OSS)	Guardrails are pre/post hooks
Kong	Plugin (2025)	RAG pipeline (external)	Enterprise API mgmt	Enterprise audit	~ Partial ($50K+)	Plugins don't share state

The key column is the last one. Not how many providers a gateway supports, not how many PII patterns a scanner covers, not whether a compliance log is SOC 2 certified — but whether the layers are connected. Whether a PII detection in one layer can change a routing decision in another, whether an audit event can populate a knowledge store, whether a cache key reflects the current state of PII classification. That connection is what converts a collection of features into a functioning security system.

The Cognitive-First Approach

There is a tempting middle ground: assemble the best individual tools and connect them with APIs, webhooks, and shared databases. LiteLLM for routing. Presidio for PII. pgvector for knowledge. Grafana for observability. Each component is best-in-class. The integration is your problem.

This approach can approximate individual features. It cannot achieve compound intelligence.

When PII detection runs as an external Presidio call inside a LiteLLM plugin, it executes after the routing decision has already been committed. The router selected a provider without knowing whether PII was present. The PII scanner runs, flags the content, and fires a webhook — but the request is already in flight. The two components share no state because they were never designed to. They were designed to be connected by humans, with duct tape, at integration time.

The same limitation applies to every cross-layer capability. Knowledge injection via a pre-request hook cannot access PII classification state to decide which knowledge entries are safe to inject. Cache key computation that runs before PII stripping cannot produce PII-neutral keys. Audit logging that receives raw request content before PII stripping cannot produce a PII-clean compliance trail. Each layer, operating in isolation with only the information its position in the pipeline gives it, cannot make decisions that require information from other layers.

The brain metaphor is apt here. You cannot build a functioning nervous system by connecting separate organs with cables. The integration is the architecture. A brain does not have a "memory plugin" — memory is a property of the system, emerging from the way detection, routing, and persistence are wired together. Trying to assemble that integration from loosely coupled components is like trying to build a nervous system by plugging optic nerves into a hearing aid.

A cognitive-first architecture treats PII detection, routing, knowledge, caching, and compliance as layers of a single pipeline — not as separate products that happen to be pointed at the same traffic. Each layer sees the full state of the request. Each layer's outputs are available to every downstream layer. The compound capabilities that emerge from this — PII-aware caching, privacy-based routing, PII-clean audit trails — are not features that can be added later. They are properties of the architecture from the beginning, or they don't exist at all.

This is why we built MemBrain

The only AI gateway where security, knowledge, and routing are one integrated system.

Talk to us about your deployment →