Membrain Quickstart Guide¶
Get the Membrain AI Safety Gateway running in under 5 minutes. This guide walks you through installation, configuration, your first API call, and the key features.
Table of Contents¶
- Prerequisites
- Installation
- Configuration
- Database Setup (Optional)
- Start the Server
- First API Call
- Dashboard
- Common Workflows
- Next Steps
Prerequisites¶
- Python 3.12+ (required)
- PostgreSQL 16 with pgvector (optional -- enables persistence, knowledge search, API key auth)
- Redis 7+ (optional -- enables caching, rate limiting, budget enforcement)
If you skip Postgres and Redis, Membrain runs entirely in-memory with no external dependencies.
Installation¶
From source (recommended for development)¶
git clone https://github.com/your-org/membrain.git
cd membrain
python -m venv .venv
source .venv/bin/activate
# Core install (minimal dependencies)
pip install -e .
# Full install (Postgres, Redis, caching, auth)
pip install -e ".[full]"
# Development install (full + pytest, ruff)
pip install -e ".[dev]"
Optional extras¶
# LiteLLM support (100+ model providers)
pip install -e ".[litellm]"
# Local sentence-transformers for knowledge embeddings
pip install -e ".[knowledge]"
# ML-based NER for PII detection (BERT model)
pip install -e ".[ml]"
Docker (all-in-one)¶
This starts the gateway on http://localhost:8100, the dashboard on http://localhost:3100, plus Postgres and Redis automatically.
Configuration¶
Copy the example environment file and edit it:
Minimal configuration¶
Create a .env file. If you want to proxy requests to cloud AI providers, add their API keys:
# Cloud provider API keys (optional — only needed if routing to these providers)
OPENAI_API_KEY=sk-your-openai-key-here
# ANTHROPIC_API_KEY=sk-ant-your-anthropic-key-here
You can also run MemBrain with local models only (Ollama), or with no provider keys at all if you're using it for auth, PII detection, or knowledge features. Membrain will run with in-memory storage, no caching, and no auth by default.
Recommended configuration¶
For production or full-featured usage:
# Server
HOST=0.0.0.0
PORT=8000
# Cloud provider API keys (add the providers you use)
OPENAI_API_KEY=sk-your-openai-key-here
ANTHROPIC_API_KEY=sk-ant-your-anthropic-key-here
# PostgreSQL — enables persistence, knowledge, audit logs, API key auth
DATABASE_URL=postgresql+asyncpg://membrain:membrain@localhost:5432/membrain
# Redis — enables caching, rate limiting, budget enforcement
REDIS_URL=redis://localhost:6379
# Rate limiting (requests per minute per key; 0 = disabled)
RATE_LIMIT_RPM=60
# Budget enforcement (USD; 0 = disabled)
BUDGET_DAILY_LIMIT_USD=10.0
BUDGET_MONTHLY_LIMIT_USD=200.0
All configuration options¶
| Variable | Default | Description |
|---|---|---|
HOST |
0.0.0.0 |
Server bind address |
PORT |
8000 |
Server port |
OPENAI_API_KEY |
(none) | OpenAI API key |
ANTHROPIC_API_KEY |
(none) | Anthropic API key |
GOOGLE_API_KEY |
(none) | Google AI API key |
DATABASE_URL |
(none) | PostgreSQL connection string |
REDIS_URL |
(none) | Redis connection string |
EMBEDDING_BACKEND |
local |
local (sentence-transformers) or openai |
EMBEDDING_DIMENSION |
384 |
384 for local, 1536 for OpenAI embeddings |
RATE_LIMIT_RPM |
60 |
Requests per minute per key (0 = disabled) |
BUDGET_DAILY_LIMIT_USD |
0.0 |
Daily budget cap in USD (0 = disabled) |
BUDGET_MONTHLY_LIMIT_USD |
0.0 |
Monthly budget cap in USD (0 = disabled) |
OLLAMA_URL |
http://localhost:11434 |
Ollama server for local models |
PROXY_MODE |
application |
application, network, or hybrid |
DEFAULT_PROVIDER |
claude_cli |
Default AI provider |
DEFAULT_MODEL |
sonnet |
Default model name |
Database Setup (Optional)¶
If you want persistence, knowledge search, audit logging, or API key authentication, you need PostgreSQL with the pgvector extension.
1. Create the database¶
# Using the pgvector Docker image (recommended)
docker run -d \
--name membrain-postgres \
-e POSTGRES_DB=membrain \
-e POSTGRES_USER=membrain \
-e POSTGRES_PASSWORD=membrain \
-p 5432:5432 \
pgvector/pgvector:pg16
# Or if you have Postgres installed locally, create the DB and enable pgvector:
# createdb membrain
# psql membrain -c "CREATE EXTENSION IF NOT EXISTS vector;"
2. Run migrations¶
Make sure DATABASE_URL is set in your .env, then run:
This creates all tables: audit logs, knowledge entries (with vector embeddings), API keys, and projects.
Redis (Optional)¶
If you want caching, rate limiting, or budget enforcement:
Then set REDIS_URL=redis://localhost:6379 in your .env.
Start the Server¶
Using the CLI¶
This starts the gateway and prints connection instructions:
Starting Membrain gateway on http://0.0.0.0:8000
Anthropic proxy: http://localhost:8000/v1/messages
OpenAI compat: http://localhost:8000/v1/chat/completions
Using uvicorn directly¶
Add --reload during development for auto-reload on code changes:
Using Docker Compose¶
This starts the full stack: gateway (port 8100), dashboard (port 3100), Postgres, and Redis.
Verify it is running¶
Expected response:
Interactive API docs are available at http://localhost:8000/docs (Swagger UI).
First API Call¶
Send a chat completion request¶
Membrain exposes an OpenAI-compatible endpoint. Any application that uses the OpenAI SDK format can point at Membrain with no code changes.
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [
{"role": "user", "content": "What is the capital of France?"}
]
}'
Expected response:
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1709900000,
"model": "gpt-4o-mini",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of France is Paris."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 14,
"completion_tokens": 8,
"total_tokens": 22
}
}
See PII detection in action¶
Send a request containing personal information. Membrain automatically detects and strips PII before it reaches the AI provider:
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [
{
"role": "user",
"content": "Summarize this customer record: John Smith, email john@example.com, SSN 123-45-6789, phone (555) 123-4567"
}
]
}'
Membrain will:
1. Detect the email, SSN, and phone number
2. Replace them with placeholders (e.g., [EMAIL_1], [SSN_1], [PHONE_1]) before sending to the provider
3. Restore the original values in the response back to you
4. Log the PII findings in the audit trail
Use routing headers¶
Control how Membrain routes your request:
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-Membrain-Tier: performance" \
-H "X-Membrain-User-Id: user-42" \
-H "X-Membrain-Project: my-app" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello"}]
}'
Available routing headers:
| Header | Values | Description |
|---|---|---|
X-Membrain-Tier |
economy, balanced, performance |
Route to cheaper or faster models |
X-Membrain-Private |
true / false |
Route only to local/private models (Ollama) |
X-Membrain-Max-Cost |
float (e.g., 0.01) |
Maximum cost per 1K tokens |
X-Membrain-User-Id |
string | User ID for audit and rate limiting |
X-Membrain-Project |
string | Project slug for cost tracking |
Streaming¶
Enable streaming by setting "stream": true:
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "Tell me a short story"}],
"stream": true
}'
Use with the OpenAI Python SDK¶
Point the OpenAI SDK at Membrain -- zero code changes required:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="sk-your-openai-key", # or any string if auth is disabled
)
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello from Membrain!"}],
)
print(response.choices[0].message.content)
Use with the Membrain Python Client¶
The Membrain client SDK is a drop-in OpenAI replacement with built-in header support:
from membrain_client import MembrainClient
client = MembrainClient(
base_url="http://localhost:8000",
api_key="ck_live_...", # if auth is enabled
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)
Use as an Anthropic proxy¶
Membrain also proxies Anthropic's Messages API with PII protection:
curl http://localhost:8000/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Hello, Claude!"}]
}'
Use with Claude Code¶
Point Claude Code at Membrain to get PII protection and audit logging:
Dashboard¶
Membrain includes a React dashboard for monitoring and management.
Running the dashboard¶
With Docker Compose: The dashboard is served automatically at http://localhost:3100.
In development mode:
The Vite dev server starts at http://localhost:5173 and proxies API calls to the gateway at http://localhost:8001.
Dashboard tabs¶
The dashboard has 8 tabs:
| Tab | What it shows |
|---|---|
| Chat | Interactive chat interface with model selector and PII detection badges |
| Overview | Aggregate stats: total requests, cost, cache hit rate, PII detections |
| PII Findings | All detected PII values with categories, filterable by type |
| Knowledge Search | Browse and search organizational knowledge entries |
| Audit Trail | Full request/response audit log with pagination and filters |
| Cost Breakdown | Per-provider, per-model cost and token usage |
| Cache | Cache hit rates (exact + semantic), estimated savings |
| Reports | Compliance reports, PII summaries, CSV/JSON export |
Common Workflows¶
Add knowledge manually via API¶
Feed organizational context into Membrain so it can be injected into future requests:
curl -X POST http://localhost:8000/v1/knowledge \
-H "Content-Type: application/json" \
-d '{
"content": "Our company vacation policy allows 20 days PTO per year. Unused days roll over up to 5 days.",
"project": "hr-bot"
}'
Search knowledge semantically:
Bulk ingest multiple documents:
curl -X POST http://localhost:8000/v1/knowledge/ingest \
-H "Content-Type: application/json" \
-d '{
"documents": [
{"content": "Engineering on-call rotation is weekly, starting Mondays.", "project": "eng"},
{"content": "Our deployment process uses blue-green deploys on Kubernetes.", "project": "eng"},
{"content": "Customer refunds must be approved by a manager for amounts over $500.", "project": "support"}
]
}'
Set up rate limits per API key¶
When Postgres and Redis are configured, you can create projects and API keys with per-key rate limits and budgets. The admin API is protected by API key authentication.
Create a project:
curl -X POST http://localhost:8000/api/admin/projects \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ck_live_your-admin-key" \
-d '{
"name": "my-app",
"display_name": "My Application",
"default_rate_limit_rpm": 100,
"default_budget_daily_usd": 5.0,
"default_budget_monthly_usd": 100.0
}'
Create an API key with custom limits:
curl -X POST http://localhost:8000/api/admin/keys \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ck_live_your-admin-key" \
-d '{
"name": "frontend-key",
"project": "my-app",
"rate_limit_rpm": 30,
"budget_daily_usd": 2.0,
"allowed_models": ["gpt-4o-mini", "gpt-4o"]
}'
The response includes the raw API key (shown only once):
{
"id": "a1b2c3d4-...",
"raw_key": "ck_live_abc123...",
"name": "frontend-key",
"project": "my-app",
"created_at": "2026-03-08T12:00:00Z"
}
Use the key in requests via either header format:
# Custom header
curl http://localhost:8000/v1/chat/completions \
-H "X-Membrain-Api-Key: ck_live_abc123..." \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "Hello"}]}'
# Or standard Bearer token
curl http://localhost:8000/v1/chat/completions \
-H "Authorization: Bearer ck_live_abc123..." \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "Hello"}]}'
When rate limited, you will receive a 429 response with a Retry-After header. When the budget is exceeded, you get a 402 response.
Enable caching¶
Caching requires Redis. Once REDIS_URL is set, exact-match caching is enabled automatically.
For semantic caching (returns cached responses for semantically similar queries), you also need the knowledge system running (Postgres + embeddings):
# .env for full caching support
REDIS_URL=redis://localhost:6379
DATABASE_URL=postgresql+asyncpg://membrain:membrain@localhost:5432/membrain
EMBEDDING_BACKEND=local
Semantic caching uses a cosine similarity threshold of 0.95 by default, meaning only very similar queries hit the cache.
Use local models with Ollama¶
For fully private, offline AI routing:
- Install and start Ollama:
- Set the Ollama URL in
.env(defaults tohttp://localhost:11434):
- Route requests to local models using the privacy header:
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-Membrain-Private: true" \
-d '{
"model": "llama3.1",
"messages": [{"role": "user", "content": "Hello, local model!"}]
}'
Monitor with Prometheus¶
Membrain exposes a Prometheus-compatible metrics endpoint:
This returns counters, histograms, and gauges for requests, latency, tokens, and more. Add it to your Prometheus scrape config:
# prometheus.yml
scrape_configs:
- job_name: 'membrain'
static_configs:
- targets: ['localhost:8000']
metrics_path: '/metrics'
Next Steps¶
Now that you have MemBrain running, explore these resources:
- Configuration Reference -- All settings and environment variables
- API Reference -- Complete endpoint documentation
- Providers -- Supported AI providers and routing
- Python SDK -- Drop-in OpenAI SDK replacement
- API Documentation -- Interactive Swagger UI (available when server is running)
Quick reference: key endpoints¶
| Endpoint | Method | Description |
|---|---|---|
/v1/chat/completions |
POST | OpenAI-compatible chat completions |
/v1/messages |
POST | Anthropic Messages API proxy |
/v1/knowledge |
POST | Add knowledge entry |
/v1/knowledge/search |
GET | Semantic knowledge search |
/v1/knowledge/ingest |
POST | Bulk knowledge ingestion |
/api/admin/projects |
POST | Create a project |
/api/admin/keys |
POST | Create an API key |
/api/dashboard/overview |
GET | Dashboard stats |
/health |
GET | Health check |
/metrics |
GET | Prometheus metrics |
/docs |
GET | Swagger UI |