PromptShield

Prometheus Metrics

Built-in exporter on /metrics. Same port as the proxy. No extra process.

Uses prometheus/client_golang. No flag to enable it.

curl http://localhost:8080/metrics
MetricTypeLabelsDescription
promptshield_requests_totalCounteraction, provider, modelEvery request, labeled by outcome
promptshield_request_duration_secondsHistogramaction, provider, modelEnd-to-end latency including the upstream LLM
promptshield_tokens_totalCountertoken_type, provider, modelToken counts; token_type is prompt, completion, or total
promptshield_entities_detected_totalCounterentity_type, providerPII entities detected by the detection engine
promptshield_injections_detected_totalCounterprovider, modelInjection attempts detected. Always 0 until injection detection ships.
promptshield_response_scans_totalCounterprovider, modelLLM responses scanned

action values: allow mask block rate_limited error

Duration histogram buckets: 50ms, 100ms, 250ms, 500ms, 1s, 2s, 5s, 10s, 30s, 60s.

The entity, injection, and scan counters are zero in gateway mode (no detection engine).

Scrape config

scrape_configs:
  - job_name: promptshield
    static_configs:
      - targets: ["localhost:8080"]
    scrape_interval: 15s

The pre-configured Prometheus + Grafana stack is in infra/observability/. See Grafana.

PromQL

# Request rate by action
rate(promptshield_requests_total[5m])

# p95 latency
histogram_quantile(0.95, sum by (le) (rate(promptshield_request_duration_seconds_bucket[5m])))

# Block rate
rate(promptshield_requests_total{action="block"}[5m])
/ rate(promptshield_requests_total[5m]) * 100

# Token burn per model (tokens/min)
sum by (model) (rate(promptshield_tokens_total{token_type="total"}[5m])) * 60

# PII detections by entity type
rate(promptshield_entities_detected_total[5m])

# Error rate
rate(promptshield_requests_total{action="error"}[5m])
/ rate(promptshield_requests_total[5m]) * 100

On this page