Membrain Quickstart Guide¶

Get the Membrain AI Safety Gateway running in under 5 minutes. This guide walks you through installation, configuration, your first API call, and the key features.

Table of Contents¶

Prerequisites
Installation
Configuration
Database Setup (Optional)
Start the Server
First API Call
Dashboard
Common Workflows
Next Steps

Prerequisites¶

Python 3.12+ (required)
PostgreSQL 16 with pgvector (optional -- enables persistence, knowledge search, API key auth)
Redis 7+ (optional -- enables caching, rate limiting, budget enforcement)

If you skip Postgres and Redis, Membrain runs entirely in-memory with no external dependencies.

Installation¶

From source (recommended for development)¶

git clone https://github.com/your-org/membrain.git
cd membrain
python -m venv .venv
source .venv/bin/activate

# Core install (minimal dependencies)
pip install -e .

# Full install (Postgres, Redis, caching, auth)
pip install -e ".[full]"

# Development install (full + pytest, ruff)
pip install -e ".[dev]"

Optional extras¶

# LiteLLM support (100+ model providers)
pip install -e ".[litellm]"

# Local sentence-transformers for knowledge embeddings
pip install -e ".[knowledge]"

# ML-based NER for PII detection (BERT model)
pip install -e ".[ml]"

Docker (all-in-one)¶

cp .env.example .env
# Edit .env with your API keys (see Configuration below)
docker compose up

This starts the gateway on http://localhost:8100, the dashboard on http://localhost:3100, plus Postgres and Redis automatically.

Configuration¶

Copy the example environment file and edit it:

cp .env.example .env

Minimal configuration¶

Create a .env file. If you want to proxy requests to cloud AI providers, add their API keys:

# Cloud provider API keys (optional — only needed if routing to these providers)
OPENAI_API_KEY=sk-your-openai-key-here
# ANTHROPIC_API_KEY=sk-ant-your-anthropic-key-here

You can also run MemBrain with local models only (Ollama), or with no provider keys at all if you're using it for auth, PII detection, or knowledge features. Membrain will run with in-memory storage, no caching, and no auth by default.

Recommended configuration¶

For production or full-featured usage:

# Server
HOST=0.0.0.0
PORT=8000

# Cloud provider API keys (add the providers you use)
OPENAI_API_KEY=sk-your-openai-key-here
ANTHROPIC_API_KEY=sk-ant-your-anthropic-key-here

# PostgreSQL — enables persistence, knowledge, audit logs, API key auth
DATABASE_URL=postgresql+asyncpg://membrain:membrain@localhost:5432/membrain

# Redis — enables caching, rate limiting, budget enforcement
REDIS_URL=redis://localhost:6379

# Rate limiting (requests per minute per key; 0 = disabled)
RATE_LIMIT_RPM=60

# Budget enforcement (USD; 0 = disabled)
BUDGET_DAILY_LIMIT_USD=10.0
BUDGET_MONTHLY_LIMIT_USD=200.0

All configuration options¶

Variable	Default	Description
`HOST`	`0.0.0.0`	Server bind address
`PORT`	`8000`	Server port
`OPENAI_API_KEY`	(none)	OpenAI API key
`ANTHROPIC_API_KEY`	(none)	Anthropic API key
`GOOGLE_API_KEY`	(none)	Google AI API key
`DATABASE_URL`	(none)	PostgreSQL connection string
`REDIS_URL`	(none)	Redis connection string
`EMBEDDING_BACKEND`	`local`	`local` (sentence-transformers) or `openai`
`EMBEDDING_DIMENSION`	`384`	`384` for local, `1536` for OpenAI embeddings
`RATE_LIMIT_RPM`	`60`	Requests per minute per key (0 = disabled)
`BUDGET_DAILY_LIMIT_USD`	`0.0`	Daily budget cap in USD (0 = disabled)
`BUDGET_MONTHLY_LIMIT_USD`	`0.0`	Monthly budget cap in USD (0 = disabled)
`OLLAMA_URL`	`http://localhost:11434`	Ollama server for local models
`PROXY_MODE`	`application`	`application`, `network`, or `hybrid`
`DEFAULT_PROVIDER`	`claude_cli`	Default AI provider
`DEFAULT_MODEL`	`sonnet`	Default model name

Database Setup (Optional)¶

If you want persistence, knowledge search, audit logging, or API key authentication, you need PostgreSQL with the pgvector extension.

1. Create the database¶

# Using the pgvector Docker image (recommended)
docker run -d \
  --name membrain-postgres \
  -e POSTGRES_DB=membrain \
  -e POSTGRES_USER=membrain \
  -e POSTGRES_PASSWORD=membrain \
  -p 5432:5432 \
  pgvector/pgvector:pg16

# Or if you have Postgres installed locally, create the DB and enable pgvector:
# createdb membrain
# psql membrain -c "CREATE EXTENSION IF NOT EXISTS vector;"

2. Run migrations¶

Make sure DATABASE_URL is set in your .env, then run:

alembic upgrade head

This creates all tables: audit logs, knowledge entries (with vector embeddings), API keys, and projects.

Redis (Optional)¶

If you want caching, rate limiting, or budget enforcement:

docker run -d --name membrain-redis -p 6379:6379 redis:7-alpine

Then set REDIS_URL=redis://localhost:6379 in your .env.

Start the Server¶

Using the CLI¶

membrain

This starts the gateway and prints connection instructions:

Starting Membrain gateway on http://0.0.0.0:8000
  Anthropic proxy: http://localhost:8000/v1/messages
  OpenAI compat:   http://localhost:8000/v1/chat/completions

Using uvicorn directly¶

uvicorn membrain.main:app --host 0.0.0.0 --port 8000

Add --reload during development for auto-reload on code changes:

uvicorn membrain.main:app --host 0.0.0.0 --port 8000 --reload

Using Docker Compose¶

docker compose up

This starts the full stack: gateway (port 8100), dashboard (port 3100), Postgres, and Redis.

Verify it is running¶

curl http://localhost:8000/health

Expected response:

{"status": "ok"}

Interactive API docs are available at http://localhost:8000/docs (Swagger UI).

First API Call¶

Send a chat completion request¶

Membrain exposes an OpenAI-compatible endpoint. Any application that uses the OpenAI SDK format can point at Membrain with no code changes.

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ]
  }'

Expected response:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1709900000,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 14,
    "completion_tokens": 8,
    "total_tokens": 22
  }
}

See PII detection in action¶

Send a request containing personal information. Membrain automatically detects and strips PII before it reaches the AI provider:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {
        "role": "user",
        "content": "Summarize this customer record: John Smith, email john@example.com, SSN 123-45-6789, phone (555) 123-4567"
      }
    ]
  }'

Membrain will: 1. Detect the email, SSN, and phone number 2. Replace them with placeholders (e.g., [EMAIL_1], [SSN_1], [PHONE_1]) before sending to the provider 3. Restore the original values in the response back to you 4. Log the PII findings in the audit trail

Use routing headers¶

Control how Membrain routes your request:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "X-Membrain-Tier: performance" \
  -H "X-Membrain-User-Id: user-42" \
  -H "X-Membrain-Project: my-app" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Available routing headers:

Header	Values	Description
`X-Membrain-Tier`	`economy`, `balanced`, `performance`	Route to cheaper or faster models
`X-Membrain-Private`	`true` / `false`	Route only to local/private models (Ollama)
`X-Membrain-Max-Cost`	float (e.g., `0.01`)	Maximum cost per 1K tokens
`X-Membrain-User-Id`	string	User ID for audit and rate limiting
`X-Membrain-Project`	string	Project slug for cost tracking

Streaming¶

Enable streaming by setting "stream": true:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "Tell me a short story"}],
    "stream": true
  }'

Use with the OpenAI Python SDK¶

Point the OpenAI SDK at Membrain -- zero code changes required:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="sk-your-openai-key",  # or any string if auth is disabled
)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello from Membrain!"}],
)
print(response.choices[0].message.content)

Use with the Membrain Python Client¶

The Membrain client SDK is a drop-in OpenAI replacement with built-in header support:

pip install clients/python

from membrain_client import MembrainClient

client = MembrainClient(
    base_url="http://localhost:8000",
    api_key="ck_live_...",  # if auth is enabled
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)

Use as an Anthropic proxy¶

Membrain also proxies Anthropic's Messages API with PII protection:

curl http://localhost:8000/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-20250514",
    "max_tokens": 1024,
    "messages": [{"role": "user", "content": "Hello, Claude!"}]
  }'

Use with Claude Code¶

Point Claude Code at Membrain to get PII protection and audit logging:

export ANTHROPIC_BASE_URL=http://localhost:8000
claude

Dashboard¶

Membrain includes a React dashboard for monitoring and management.

Running the dashboard¶

With Docker Compose: The dashboard is served automatically at http://localhost:3100.

In development mode:

cd dashboard
npm install
npm run dev

The Vite dev server starts at http://localhost:5173 and proxies API calls to the gateway at http://localhost:8001.

Dashboard tabs¶

The dashboard has 8 tabs:

Tab	What it shows
Chat	Interactive chat interface with model selector and PII detection badges
Overview	Aggregate stats: total requests, cost, cache hit rate, PII detections
PII Findings	All detected PII values with categories, filterable by type
Knowledge Search	Browse and search organizational knowledge entries
Audit Trail	Full request/response audit log with pagination and filters
Cost Breakdown	Per-provider, per-model cost and token usage
Cache	Cache hit rates (exact + semantic), estimated savings
Reports	Compliance reports, PII summaries, CSV/JSON export

Common Workflows¶

Add knowledge manually via API¶

Feed organizational context into Membrain so it can be injected into future requests:

curl -X POST http://localhost:8000/v1/knowledge \
  -H "Content-Type: application/json" \
  -d '{
    "content": "Our company vacation policy allows 20 days PTO per year. Unused days roll over up to 5 days.",
    "project": "hr-bot"
  }'

Search knowledge semantically:

curl "http://localhost:8000/v1/knowledge/search?q=how+many+vacation+days&limit=5"

Bulk ingest multiple documents:

curl -X POST http://localhost:8000/v1/knowledge/ingest \
  -H "Content-Type: application/json" \
  -d '{
    "documents": [
      {"content": "Engineering on-call rotation is weekly, starting Mondays.", "project": "eng"},
      {"content": "Our deployment process uses blue-green deploys on Kubernetes.", "project": "eng"},
      {"content": "Customer refunds must be approved by a manager for amounts over $500.", "project": "support"}
    ]
  }'

Set up rate limits per API key¶

When Postgres and Redis are configured, you can create projects and API keys with per-key rate limits and budgets. The admin API is protected by API key authentication.

Create a project:

curl -X POST http://localhost:8000/api/admin/projects \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ck_live_your-admin-key" \
  -d '{
    "name": "my-app",
    "display_name": "My Application",
    "default_rate_limit_rpm": 100,
    "default_budget_daily_usd": 5.0,
    "default_budget_monthly_usd": 100.0
  }'

Create an API key with custom limits:

curl -X POST http://localhost:8000/api/admin/keys \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ck_live_your-admin-key" \
  -d '{
    "name": "frontend-key",
    "project": "my-app",
    "rate_limit_rpm": 30,
    "budget_daily_usd": 2.0,
    "allowed_models": ["gpt-4o-mini", "gpt-4o"]
  }'

The response includes the raw API key (shown only once):

{
  "id": "a1b2c3d4-...",
  "raw_key": "ck_live_abc123...",
  "name": "frontend-key",
  "project": "my-app",
  "created_at": "2026-03-08T12:00:00Z"
}

Use the key in requests via either header format:

# Custom header
curl http://localhost:8000/v1/chat/completions \
  -H "X-Membrain-Api-Key: ck_live_abc123..." \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "Hello"}]}'

# Or standard Bearer token
curl http://localhost:8000/v1/chat/completions \
  -H "Authorization: Bearer ck_live_abc123..." \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "Hello"}]}'

When rate limited, you will receive a 429 response with a Retry-After header. When the budget is exceeded, you get a 402 response.

Enable caching¶

Caching requires Redis. Once REDIS_URL is set, exact-match caching is enabled automatically.

For semantic caching (returns cached responses for semantically similar queries), you also need the knowledge system running (Postgres + embeddings):

# .env for full caching support
REDIS_URL=redis://localhost:6379
DATABASE_URL=postgresql+asyncpg://membrain:membrain@localhost:5432/membrain
EMBEDDING_BACKEND=local

Semantic caching uses a cosine similarity threshold of 0.95 by default, meaning only very similar queries hit the cache.

Use local models with Ollama¶

For fully private, offline AI routing:

Install and start Ollama:

ollama pull llama3.1
ollama serve

Set the Ollama URL in .env (defaults to http://localhost:11434):

OLLAMA_URL=http://localhost:11434

Route requests to local models using the privacy header:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "X-Membrain-Private: true" \
  -d '{
    "model": "llama3.1",
    "messages": [{"role": "user", "content": "Hello, local model!"}]
  }'

Monitor with Prometheus¶

Membrain exposes a Prometheus-compatible metrics endpoint:

curl http://localhost:8000/metrics

This returns counters, histograms, and gauges for requests, latency, tokens, and more. Add it to your Prometheus scrape config:

# prometheus.yml
scrape_configs:
  - job_name: 'membrain'
    static_configs:
      - targets: ['localhost:8000']
    metrics_path: '/metrics'

Next Steps¶

Now that you have MemBrain running, explore these resources:

Configuration Reference -- All settings and environment variables
API Reference -- Complete endpoint documentation
Providers -- Supported AI providers and routing
Python SDK -- Drop-in OpenAI SDK replacement
API Documentation -- Interactive Swagger UI (available when server is running)

Quick reference: key endpoints¶

Endpoint	Method	Description
`/v1/chat/completions`	POST	OpenAI-compatible chat completions
`/v1/messages`	POST	Anthropic Messages API proxy
`/v1/knowledge`	POST	Add knowledge entry
`/v1/knowledge/search`	GET	Semantic knowledge search
`/v1/knowledge/ingest`	POST	Bulk knowledge ingestion
`/api/admin/projects`	POST	Create a project
`/api/admin/keys`	POST	Create an API key
`/api/dashboard/overview`	GET	Dashboard stats
`/health`	GET	Health check
`/metrics`	GET	Prometheus metrics
`/docs`	GET	Swagger UI