Skip to content

Membrain Quickstart Guide

Get the Membrain AI Safety Gateway running in under 5 minutes. This guide walks you through installation, configuration, your first API call, and the key features.


Table of Contents

  1. Prerequisites
  2. Installation
  3. Configuration
  4. Database Setup (Optional)
  5. Start the Server
  6. First API Call
  7. Dashboard
  8. Common Workflows
  9. Next Steps

Prerequisites

  • Python 3.12+ (required)
  • PostgreSQL 16 with pgvector (optional -- enables persistence, knowledge search, API key auth)
  • Redis 7+ (optional -- enables caching, rate limiting, budget enforcement)

If you skip Postgres and Redis, Membrain runs entirely in-memory with no external dependencies.


Installation

git clone https://github.com/your-org/membrain.git
cd membrain
python -m venv .venv
source .venv/bin/activate

# Core install (minimal dependencies)
pip install -e .

# Full install (Postgres, Redis, caching, auth)
pip install -e ".[full]"

# Development install (full + pytest, ruff)
pip install -e ".[dev]"

Optional extras

# LiteLLM support (100+ model providers)
pip install -e ".[litellm]"

# Local sentence-transformers for knowledge embeddings
pip install -e ".[knowledge]"

# ML-based NER for PII detection (BERT model)
pip install -e ".[ml]"

Docker (all-in-one)

cp .env.example .env
# Edit .env with your API keys (see Configuration below)
docker compose up

This starts the gateway on http://localhost:8100, the dashboard on http://localhost:3100, plus Postgres and Redis automatically.


Configuration

Copy the example environment file and edit it:

cp .env.example .env

Minimal configuration

Create a .env file. If you want to proxy requests to cloud AI providers, add their API keys:

# Cloud provider API keys (optional — only needed if routing to these providers)
OPENAI_API_KEY=sk-your-openai-key-here
# ANTHROPIC_API_KEY=sk-ant-your-anthropic-key-here

You can also run MemBrain with local models only (Ollama), or with no provider keys at all if you're using it for auth, PII detection, or knowledge features. Membrain will run with in-memory storage, no caching, and no auth by default.

For production or full-featured usage:

# Server
HOST=0.0.0.0
PORT=8000

# Cloud provider API keys (add the providers you use)
OPENAI_API_KEY=sk-your-openai-key-here
ANTHROPIC_API_KEY=sk-ant-your-anthropic-key-here

# PostgreSQL — enables persistence, knowledge, audit logs, API key auth
DATABASE_URL=postgresql+asyncpg://membrain:membrain@localhost:5432/membrain

# Redis — enables caching, rate limiting, budget enforcement
REDIS_URL=redis://localhost:6379

# Rate limiting (requests per minute per key; 0 = disabled)
RATE_LIMIT_RPM=60

# Budget enforcement (USD; 0 = disabled)
BUDGET_DAILY_LIMIT_USD=10.0
BUDGET_MONTHLY_LIMIT_USD=200.0

All configuration options

Variable Default Description
HOST 0.0.0.0 Server bind address
PORT 8000 Server port
OPENAI_API_KEY (none) OpenAI API key
ANTHROPIC_API_KEY (none) Anthropic API key
GOOGLE_API_KEY (none) Google AI API key
DATABASE_URL (none) PostgreSQL connection string
REDIS_URL (none) Redis connection string
EMBEDDING_BACKEND local local (sentence-transformers) or openai
EMBEDDING_DIMENSION 384 384 for local, 1536 for OpenAI embeddings
RATE_LIMIT_RPM 60 Requests per minute per key (0 = disabled)
BUDGET_DAILY_LIMIT_USD 0.0 Daily budget cap in USD (0 = disabled)
BUDGET_MONTHLY_LIMIT_USD 0.0 Monthly budget cap in USD (0 = disabled)
OLLAMA_URL http://localhost:11434 Ollama server for local models
PROXY_MODE application application, network, or hybrid
DEFAULT_PROVIDER claude_cli Default AI provider
DEFAULT_MODEL sonnet Default model name

Database Setup (Optional)

If you want persistence, knowledge search, audit logging, or API key authentication, you need PostgreSQL with the pgvector extension.

1. Create the database

# Using the pgvector Docker image (recommended)
docker run -d \
  --name membrain-postgres \
  -e POSTGRES_DB=membrain \
  -e POSTGRES_USER=membrain \
  -e POSTGRES_PASSWORD=membrain \
  -p 5432:5432 \
  pgvector/pgvector:pg16

# Or if you have Postgres installed locally, create the DB and enable pgvector:
# createdb membrain
# psql membrain -c "CREATE EXTENSION IF NOT EXISTS vector;"

2. Run migrations

Make sure DATABASE_URL is set in your .env, then run:

alembic upgrade head

This creates all tables: audit logs, knowledge entries (with vector embeddings), API keys, and projects.

Redis (Optional)

If you want caching, rate limiting, or budget enforcement:

docker run -d --name membrain-redis -p 6379:6379 redis:7-alpine

Then set REDIS_URL=redis://localhost:6379 in your .env.


Start the Server

Using the CLI

membrain

This starts the gateway and prints connection instructions:

Starting Membrain gateway on http://0.0.0.0:8000
  Anthropic proxy: http://localhost:8000/v1/messages
  OpenAI compat:   http://localhost:8000/v1/chat/completions

Using uvicorn directly

uvicorn membrain.main:app --host 0.0.0.0 --port 8000

Add --reload during development for auto-reload on code changes:

uvicorn membrain.main:app --host 0.0.0.0 --port 8000 --reload

Using Docker Compose

docker compose up

This starts the full stack: gateway (port 8100), dashboard (port 3100), Postgres, and Redis.

Verify it is running

curl http://localhost:8000/health

Expected response:

{"status": "ok"}

Interactive API docs are available at http://localhost:8000/docs (Swagger UI).


First API Call

Send a chat completion request

Membrain exposes an OpenAI-compatible endpoint. Any application that uses the OpenAI SDK format can point at Membrain with no code changes.

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ]
  }'

Expected response:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1709900000,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 14,
    "completion_tokens": 8,
    "total_tokens": 22
  }
}

See PII detection in action

Send a request containing personal information. Membrain automatically detects and strips PII before it reaches the AI provider:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {
        "role": "user",
        "content": "Summarize this customer record: John Smith, email john@example.com, SSN 123-45-6789, phone (555) 123-4567"
      }
    ]
  }'

Membrain will: 1. Detect the email, SSN, and phone number 2. Replace them with placeholders (e.g., [EMAIL_1], [SSN_1], [PHONE_1]) before sending to the provider 3. Restore the original values in the response back to you 4. Log the PII findings in the audit trail

Use routing headers

Control how Membrain routes your request:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "X-Membrain-Tier: performance" \
  -H "X-Membrain-User-Id: user-42" \
  -H "X-Membrain-Project: my-app" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Available routing headers:

Header Values Description
X-Membrain-Tier economy, balanced, performance Route to cheaper or faster models
X-Membrain-Private true / false Route only to local/private models (Ollama)
X-Membrain-Max-Cost float (e.g., 0.01) Maximum cost per 1K tokens
X-Membrain-User-Id string User ID for audit and rate limiting
X-Membrain-Project string Project slug for cost tracking

Streaming

Enable streaming by setting "stream": true:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "Tell me a short story"}],
    "stream": true
  }'

Use with the OpenAI Python SDK

Point the OpenAI SDK at Membrain -- zero code changes required:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="sk-your-openai-key",  # or any string if auth is disabled
)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello from Membrain!"}],
)
print(response.choices[0].message.content)

Use with the Membrain Python Client

The Membrain client SDK is a drop-in OpenAI replacement with built-in header support:

pip install clients/python
from membrain_client import MembrainClient

client = MembrainClient(
    base_url="http://localhost:8000",
    api_key="ck_live_...",  # if auth is enabled
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)

Use as an Anthropic proxy

Membrain also proxies Anthropic's Messages API with PII protection:

curl http://localhost:8000/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-20250514",
    "max_tokens": 1024,
    "messages": [{"role": "user", "content": "Hello, Claude!"}]
  }'

Use with Claude Code

Point Claude Code at Membrain to get PII protection and audit logging:

export ANTHROPIC_BASE_URL=http://localhost:8000
claude

Dashboard

Membrain includes a React dashboard for monitoring and management.

Running the dashboard

With Docker Compose: The dashboard is served automatically at http://localhost:3100.

In development mode:

cd dashboard
npm install
npm run dev

The Vite dev server starts at http://localhost:5173 and proxies API calls to the gateway at http://localhost:8001.

Dashboard tabs

The dashboard has 8 tabs:

Tab What it shows
Chat Interactive chat interface with model selector and PII detection badges
Overview Aggregate stats: total requests, cost, cache hit rate, PII detections
PII Findings All detected PII values with categories, filterable by type
Knowledge Search Browse and search organizational knowledge entries
Audit Trail Full request/response audit log with pagination and filters
Cost Breakdown Per-provider, per-model cost and token usage
Cache Cache hit rates (exact + semantic), estimated savings
Reports Compliance reports, PII summaries, CSV/JSON export

Common Workflows

Add knowledge manually via API

Feed organizational context into Membrain so it can be injected into future requests:

curl -X POST http://localhost:8000/v1/knowledge \
  -H "Content-Type: application/json" \
  -d '{
    "content": "Our company vacation policy allows 20 days PTO per year. Unused days roll over up to 5 days.",
    "project": "hr-bot"
  }'

Search knowledge semantically:

curl "http://localhost:8000/v1/knowledge/search?q=how+many+vacation+days&limit=5"

Bulk ingest multiple documents:

curl -X POST http://localhost:8000/v1/knowledge/ingest \
  -H "Content-Type: application/json" \
  -d '{
    "documents": [
      {"content": "Engineering on-call rotation is weekly, starting Mondays.", "project": "eng"},
      {"content": "Our deployment process uses blue-green deploys on Kubernetes.", "project": "eng"},
      {"content": "Customer refunds must be approved by a manager for amounts over $500.", "project": "support"}
    ]
  }'

Set up rate limits per API key

When Postgres and Redis are configured, you can create projects and API keys with per-key rate limits and budgets. The admin API is protected by API key authentication.

Create a project:

curl -X POST http://localhost:8000/api/admin/projects \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ck_live_your-admin-key" \
  -d '{
    "name": "my-app",
    "display_name": "My Application",
    "default_rate_limit_rpm": 100,
    "default_budget_daily_usd": 5.0,
    "default_budget_monthly_usd": 100.0
  }'

Create an API key with custom limits:

curl -X POST http://localhost:8000/api/admin/keys \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ck_live_your-admin-key" \
  -d '{
    "name": "frontend-key",
    "project": "my-app",
    "rate_limit_rpm": 30,
    "budget_daily_usd": 2.0,
    "allowed_models": ["gpt-4o-mini", "gpt-4o"]
  }'

The response includes the raw API key (shown only once):

{
  "id": "a1b2c3d4-...",
  "raw_key": "ck_live_abc123...",
  "name": "frontend-key",
  "project": "my-app",
  "created_at": "2026-03-08T12:00:00Z"
}

Use the key in requests via either header format:

# Custom header
curl http://localhost:8000/v1/chat/completions \
  -H "X-Membrain-Api-Key: ck_live_abc123..." \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "Hello"}]}'

# Or standard Bearer token
curl http://localhost:8000/v1/chat/completions \
  -H "Authorization: Bearer ck_live_abc123..." \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "Hello"}]}'

When rate limited, you will receive a 429 response with a Retry-After header. When the budget is exceeded, you get a 402 response.

Enable caching

Caching requires Redis. Once REDIS_URL is set, exact-match caching is enabled automatically.

For semantic caching (returns cached responses for semantically similar queries), you also need the knowledge system running (Postgres + embeddings):

# .env for full caching support
REDIS_URL=redis://localhost:6379
DATABASE_URL=postgresql+asyncpg://membrain:membrain@localhost:5432/membrain
EMBEDDING_BACKEND=local

Semantic caching uses a cosine similarity threshold of 0.95 by default, meaning only very similar queries hit the cache.

Use local models with Ollama

For fully private, offline AI routing:

  1. Install and start Ollama:
ollama pull llama3.1
ollama serve
  1. Set the Ollama URL in .env (defaults to http://localhost:11434):
OLLAMA_URL=http://localhost:11434
  1. Route requests to local models using the privacy header:
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "X-Membrain-Private: true" \
  -d '{
    "model": "llama3.1",
    "messages": [{"role": "user", "content": "Hello, local model!"}]
  }'

Monitor with Prometheus

Membrain exposes a Prometheus-compatible metrics endpoint:

curl http://localhost:8000/metrics

This returns counters, histograms, and gauges for requests, latency, tokens, and more. Add it to your Prometheus scrape config:

# prometheus.yml
scrape_configs:
  - job_name: 'membrain'
    static_configs:
      - targets: ['localhost:8000']
    metrics_path: '/metrics'

Next Steps

Now that you have MemBrain running, explore these resources:

Quick reference: key endpoints

Endpoint Method Description
/v1/chat/completions POST OpenAI-compatible chat completions
/v1/messages POST Anthropic Messages API proxy
/v1/knowledge POST Add knowledge entry
/v1/knowledge/search GET Semantic knowledge search
/v1/knowledge/ingest POST Bulk knowledge ingestion
/api/admin/projects POST Create a project
/api/admin/keys POST Create an API key
/api/dashboard/overview GET Dashboard stats
/health GET Health check
/metrics GET Prometheus metrics
/docs GET Swagger UI