Rate Limiting

Token-bucket rate limiting per IP or API key. Add five lines to your policy file.

Protect your upstream LLM quota and prevent abuse. Rate limiting uses a token bucket algorithm. Requests consume tokens and tokens refill at a steady rate.

Add a rate_limit block to config/policy.yaml:

rate_limit:
  requests_per_minute: 60   # token refill rate
  burst: 10                 # max tokens available at once
  key_by: ip                # ip | api_key

Key strategies

key_by: ip: One bucket per client IP. Reads X-Real-IP when behind a reverse proxy (set by nginx/Caddy to $remote_addr). X-Forwarded-For is intentionally ignored here — it can be injected by the client to bypass rate limiting.

key_by: api_key: One bucket per API key passed in x-llm-api-key or Authorization: Bearer. Use this when your users authenticate with their own keys.

When limits are exceeded

The proxy returns HTTP 429 immediately:

{"error": "rate limit exceeded"}

The request is logged as action: rate_limited in the audit log.

Disabling rate limiting

Remove or comment out the entire rate_limit block. No rate limiting overhead is added.

Rate Limiting

Key strategies

When limits are exceeded

Disabling rate limiting

On this page