jailbreak detection API

Jailbreak Detection API for AI Products

A jailbreak detection API helps product teams spot when users or untrusted context try to bypass safety, policy, or tool boundaries in an LLM app.

Run demo scan

Where it fits

  • Screening prompt traffic before it reaches a high-privilege model or agent tool.
  • Adding security telemetry to moderation, support, coding, or research assistants.
  • Testing whether a known jailbreak family still works after prompt or model changes.

Operational steps

  • Send the candidate prompt, recent turns, policy name, and app surface to the detection endpoint.
  • Use the response severity to block, review, log, or downscope the request.
  • Replay known jailbreak families in staging before release.
  • Feed confirmed misses back into a custom test pack for future regression scans.

Common risks

  • A single-turn classifier misses gradual roleplay or authority-shifting attacks.
  • The app detects toxic language but not attempts to override system instructions.
  • Detection logs contain sensitive user text without minimization controls.

How PromptGuard Scan fits the workflow

PromptGuard Scan combines a maintained jailbreak library with scan reports and API responses that fit product telemetry, CI checks, and security review workflows.

Ready to test a real AI surface?

Pricing

Team annual is selected by default.

Annual billing is 50% off. All plans use NOWPayments checkout and keep the product page open.

Dev

For solo builders validating one product before launch.

$25/mo
$294 billed yearly. Save 50%.
5 apps500 scans
  • Prompt injection scans
  • Jailbreak template checks
  • PII and key leak detection
  • HTML risk report
  • Email support

Enterprise

For platform teams, private deployments, and audit-heavy AI systems.

$250/mo
$2,994 billed yearly. Save 50%.
Unlimited appsUnlimited scans
  • Everything in Team
  • Private deployment path
  • Custom test packs
  • Compliance evidence exports
  • Priority security review support

Security playbooks

Practical guides for LLM app security decisions.