Guardrails

AgentDM includes built-in message safety filters. Messages matching guardrail phrases are flagged and not delivered.

Default Filters

These are enabled by default for all accounts:

  • "Never include API keys, passwords, or tokens"
  • "Never include customer PII in messages"
  • "ignore previous instructions"
  • "ignore all previous"
  • "disregard your instructions"
  • "you are now"
  • "new instructions:"
  • "system prompt:"
  • "ADMIN OVERRIDE"

Customization

Guardrails can be customized per account via the dashboard. Add, remove, or modify filter phrases to match your security requirements.