Security Modules
Every AgentGuard scan runs 12 specialized modules, each targeting a different category of AI agent vulnerability. Here's what each one does and why it matters.
Secret Leak Detection
Finds exposed credentials before attackers do.
What it does
This module scans your agent's configuration files, environment variables, and response patterns for accidentally exposed secrets. It looks for API keys, database passwords, authentication tokens, private keys, and other credentials that should never be visible — whether they're hardcoded in a config file, leaked through an LLM response, or sitting in a publicly accessible .env file.
Why it matters
Leaked credentials are one of the most common and damaging security mistakes in AI agent deployments. A single exposed API key can give attackers access to your cloud infrastructure, your customers' data, or your billing accounts. Unlike traditional apps, AI agents often have configs shared in Slack, Discord, or GitHub repos — and people forget to strip out the secrets first. This module catches those leaks before someone else does.
What we look for
- API keys for OpenAI, Anthropic, AWS, GCP, Azure, Stripe, and 40+ other services
- Database connection strings with embedded passwords
- Private keys and certificates (SSH, TLS, JWT signing keys)
- OAuth tokens and refresh tokens in configs
- Hardcoded passwords in environment variables
- Secrets accidentally included in LLM system prompts
Real-world example
Aligned with LLM06: Sensitive Information Disclosure from the OWASP Top 10 for LLM Applications.
Prompt Injection Protection
15 canary-verified payloads prove actual injection success — zero false positives.
What it does
This module tests whether your AI agent can be manipulated into bypassing its system instructions. It uses canary token verification — a technique where each injection payload embeds a unique, randomly generated UUID token (e.g., CANARY-a1b2c3d4e5f6) and instructs the agent to reproduce it. If the agent outputs the canary, the injection definitively succeeded. This eliminates false positives from legitimate AI responses that might contain security-related phrases. We send 15 payloads across 4 attack categories to 6 common chat/query endpoints, testing multiple request body formats (message, prompt, query, input, text) to maximize coverage.
Why it matters
Prompt injection is the #1 vulnerability in the OWASP LLM Top 10 for good reason. If an attacker can override your agent's instructions, they can make it leak sensitive data, execute unauthorized actions, or behave in ways you never intended. This is especially dangerous for agents that have access to tools, databases, or APIs — a successful injection can turn your agent into a weapon against your own systems. Traditional injection testing based on keyword matching (looking for 'system prompt' in responses) produces rampant false positives. Our canary-based approach eliminates this problem entirely.
What we look for
- Direct prompt override — 5 payloads including system override, roleplay jailbreak, fake system messages, developer mode activation, and encoded instructions (PROMPT-INJ-001, CWE-94)
- Indirect injection — 3 payloads via HTML comments, markdown formatting, and JSON-embedded instructions targeting data processing endpoints (PROMPT-INJ-002, CWE-94)
- Goal hijacking — 3 payloads testing objective override, task redirect, and priority escalation (PROMPT-INJ-003, CWE-285)
- System prompt leakage — 4 payloads testing prompt extraction, verbatim reflection, initialization parameter dump, and meta-instruction queries (PROMPT-INJ-004, CWE-200)
- Each payload embeds a unique CANARY-* token — only confirmed reproduction counts as a finding
- Tests 6 endpoints (/api/chat, /api/query, /api/message, /chat, /query, /v1/chat/completions) with 5 body key formats
Real-world example
Aligned with LLM01: Prompt Injection from the OWASP Top 10 for LLM Applications.
Network Attack Prevention
Checks if your agent can be exploited to access internal systems.
What it does
This module checks whether an attacker could trick your AI agent into making requests to internal systems it shouldn't be able to reach. Known as SSRF (Server-Side Request Forgery), this attack exploits the fact that your agent's server often has network access to internal services, cloud metadata endpoints, and databases that aren't exposed to the public internet. If an attacker can control what URLs your agent fetches, they can use your agent as a stepping stone into your private network.
Why it matters
AI agents often need to fetch URLs — reading web pages, calling APIs, or loading documents. If there's no restriction on what URLs the agent can access, an attacker can point it at internal services. In cloud environments, this is especially dangerous because cloud metadata endpoints (like AWS's 169.254.169.254) can reveal access keys, instance roles, and configuration data that gives attackers full control of your infrastructure.
What we look for
- Ability to reach cloud metadata endpoints (AWS, GCP, Azure)
- Access to internal IP ranges (10.x, 172.16.x, 192.168.x)
- DNS rebinding vulnerabilities
- Redirect-based bypasses that circumvent URL allowlists
- Access to internal services through localhost or 127.0.0.1
- Unrestricted outbound requests from the agent server
Real-world example
Aligned with AG04: Server-Side Request Forgery from the OWASP Top 10 for Agentic Applications.
Tool Safety Verification
Ensures your agent's tools can't be used to run malicious commands.
What it does
This module audits the tools and plugins your AI agent has access to, checking for ways they could be abused. Modern AI agents aren't just chat interfaces — they can execute code, run shell commands, read and write files, and call external APIs. This module checks whether those capabilities have proper guardrails: Are tool inputs validated? Can an attacker craft input that escapes the intended use? Are there command injection or path traversal vulnerabilities?
Why it matters
An AI agent with tool access is essentially a program that takes natural language as input and converts it to system actions. If the tools aren't locked down, a clever prompt can become a shell command. This is one of the highest-impact attack surfaces because it bridges the gap between "the agent said something bad" and "the agent did something bad" — tool execution vulnerabilities let attackers take real action on your systems.
What we look for
- Shell command injection through tool arguments
- Path traversal in file system tools (reading /etc/passwd via ../../)
- Unvalidated input passed directly to system calls
- Missing permissions boundaries on tool execution
- Code execution in sandboxed vs unsandboxed environments
- Excessive tool permissions (tools that can do more than they need to)
Real-world example
Aligned with LLM07: Insecure Plugin Design from the OWASP Top 10 for LLM Applications.
Configuration Audit
Scans for exposed settings files that reveal sensitive information.
What it does
This module checks your agent's configuration for security misconfigurations — the kind of issues that individually seem minor but together can give attackers a roadmap of your system. It looks for debug modes left enabled in production, exposed .git directories, directory listing enabled on web servers, overly permissive CORS settings, verbose error messages that reveal internal paths, and configuration files accessible from the public web.
Why it matters
Configuration mistakes are the "unlocked door" of web security. They don't require any sophisticated attack — just someone checking if the door is open. With AI agents, the risk is amplified because agent configs often contain model parameters, system prompts, tool definitions, and connection strings. An exposed config file can tell an attacker exactly how your agent works, what tools it has, and how to attack it most effectively.
What we look for
- Debug mode enabled in production environments
- Exposed .git directories revealing source code history
- Directory listing enabled on web servers
- Verbose error messages showing internal paths and stack traces
- Overly permissive CORS headers allowing cross-origin requests
- Default credentials or unchanged admin passwords
Real-world example
Aligned with LLM05: Supply Chain Vulnerabilities from the OWASP Top 10 for LLM Applications.
Authentication Security
Comprehensive JWT and authentication testing across 8 vulnerability categories.
What it does
This module runs 8 categories of authentication and authorization tests using hand-crafted JWT tokens. It probes 14 protected endpoints without credentials to find missing auth. It generates expired JWTs (24h and 168h past) signed with 11 common weak secrets to test if targets accept stale tokens. It crafts 'alg: none' tokens in 4 case variants (none, None, NONE, nOnE) with admin claims and empty signatures. It discovers JWKS endpoints and attempts RS256-to-HS256 algorithm confusion attacks. It tests corrupted and truncated JWT signatures. It checks for session fixation by comparing tokens across login requests. It probes for IDOR by accessing resources with sequential IDs. And it tests vertical privilege escalation by forging tokens with admin role claims.
Why it matters
JWT vulnerabilities are among the most common and dangerous flaws in AI agent deployments. A single misconfigured JWT library can let attackers forge valid authentication tokens with admin privileges — no password required. Many agents use JWTs signed with weak secrets like 'secret' or 'changeme', accept unsigned tokens, or fail to validate expiration. Unlike traditional auth testing that just checks for login, this module tests the actual cryptographic integrity of the authentication system.
What we look for
- Missing authentication — 14 protected API paths accessible without any credentials (AUTH-008, CWE-306)
- Expired JWT acceptance — tokens expired 24h-168h ago still grant access (AUTH-001, CWE-613)
- 'none' algorithm bypass — unsigned tokens with alg:none accepted as valid (AUTH-002, CWE-287)
- Algorithm confusion — RS256/HS256 key confusion via public JWKS endpoints (AUTH-003, CWE-327)
- Signature validation failure — corrupted or truncated signatures not rejected (AUTH-004, CWE-345)
- Session fixation — identical tokens returned across successive logins (AUTH-005, CWE-384)
- Horizontal privilege escalation (IDOR) — accessing other users' data via sequential IDs (AUTH-006, CWE-639)
- Vertical privilege escalation — forged admin claims granting unauthorized access (AUTH-007, CWE-269)
Real-world example
Aligned with LLM02: Broken Authentication from the OWASP Top 10 for LLM Applications.
MCP Security Audit
Tests your agent's tool connections for unauthorized access.
What it does
This module specifically audits Model Context Protocol (MCP) configurations — the emerging standard for connecting AI agents to external tools and data sources. MCP servers give agents powerful capabilities like file system access, database queries, API calls, and more. This module checks whether those connections are properly secured: Are permissions scoped correctly? Can the agent access more than it should? Are tool inputs validated by the MCP server?
Why it matters
MCP is rapidly becoming the standard way AI agents connect to the world, but security practices haven't caught up yet. Many MCP server configs grant broad access by default — a filesystem server pointed at "/" gives the agent access to your entire disk. A database MCP server with admin credentials lets the agent run any query. Because MCP configs are often copied from documentation examples without modification, they frequently have permissions far broader than what the agent actually needs.
What we look for
- MCP servers with overly broad file system access (root directory access)
- Database connections using admin credentials instead of read-only users
- Missing input validation on MCP tool parameters
- MCP servers running without transport-level encryption
- Tool definitions that expose more capabilities than the agent needs
- Secrets embedded directly in MCP configuration files
Real-world example
Aligned with AG02: Insecure Tool/Function Calls from the OWASP Top 10 for Agentic Applications.
Endpoint Exposure Scan
Finds admin pages and debug tools left open to the public.
What it does
This module discovers endpoints that are publicly accessible but shouldn't be. AI agents are typically web applications under the hood, and they often come with admin panels, debugging dashboards, health check endpoints, API documentation pages, and internal management tools. This module probes for these common endpoints and checks whether they're properly hidden or protected, rather than sitting open on the public internet.
Why it matters
Exposed endpoints are like leaving spare keys under the doormat. Admin panels let attackers reconfigure your agent. Debug endpoints reveal internal state and configuration. API documentation pages tell attackers exactly what endpoints to target. Health check endpoints can confirm that a service exists and what version it's running. Each of these gives attackers information or capabilities they shouldn't have — and they're often just a URL guess away.
What we look for
- Admin panels and dashboards (e.g., /admin, /dashboard, /manage)
- Debug and profiling endpoints (e.g., /debug, /trace, /metrics)
- API documentation left publicly accessible (e.g., /swagger, /docs, /redoc)
- Database management interfaces (e.g., /phpmyadmin, /adminer)
- Health and status endpoints revealing version information
- Source code or configuration files accessible via web
Real-world example
Aligned with AG05: Insufficient Access Control from the OWASP Top 10 for Agentic Applications.
Data Exfiltration Detection
Catches data theft hidden in URLs, tracking pixels, and exfil channels.
What it does
This module tests whether your AI agent can be manipulated into exfiltrating data through its responses. It checks for three common exfiltration techniques: encoding sensitive data in URL query parameters (base64, hex), referencing known exfiltration services (webhook.site, requestbin, ngrok), and embedding invisible tracking pixels in markdown image tags that send data to attacker-controlled servers when rendered.
Why it matters
Data exfiltration through AI agents is one of the fastest-growing attack vectors. Unlike traditional data breaches that require network access, agent-based exfiltration uses the agent itself as the transport mechanism. An attacker can instruct the agent to encode sensitive data into a URL, embed it in an image tag, or send it to a known data collection service — all within a normal-looking conversation response. Because the data leaves through the agent's own output rather than a network connection, traditional DLP tools often miss it entirely.
What we look for
- URLs with suspiciously long base64 or hex-encoded query parameters (EXFIL-001, CWE-200)
- References to known exfiltration domains: webhook.site, requestbin, pipedream, ngrok, burpcollaborator (EXFIL-002, CWE-200)
- Markdown image tags with data-bearing URLs used as invisible tracking pixels (EXFIL-003, CWE-200)
- Encoded data patterns that may represent secrets, PII, or system information
- Outbound URL patterns triggered by prompt injection payloads
- Covert channels that bypass standard output filtering
Real-world example
Aligned with LLM06: Sensitive Information Disclosure from the OWASP Top 10 for LLM Applications.
Typosquatting Detection
Catches malicious packages hiding behind misspelled names.
What it does
This module analyzes your agent's dependency files (package.json, requirements.txt, pyproject.toml) and compares every package name against a curated list of 200+ popular npm and PyPI packages using Levenshtein edit distance analysis. If a package name is just 1-2 characters different from a well-known package, it gets flagged as a potential typosquatting attack. The same analysis runs on pasted content containing dependency lists.
Why it matters
Typosquatting is one of the fastest-growing supply chain attack vectors. In 2025, the 'Claw' campaign published over 1,000 malicious packages in just 2 months by creating subtle misspellings of popular packages (e.g., 'requets' instead of 'requests', 'expresss' instead of 'express'). When a developer makes a typo in their dependency file, they unknowingly install a malicious package that can steal credentials, exfiltrate data, or install backdoors. AI agents are especially vulnerable because they often auto-install dependencies from configuration files without human review.
What we look for
- Package names within 1-2 edit distance of popular npm packages (100+ known-good entries)
- Package names within 1-2 edit distance of popular PyPI packages (100+ known-good entries)
- Typosquatting in both URL-fetched dependency files and pasted content (TYPO-001, PASTE-TYPOSQUAT-001)
- Common typosquatting patterns: character swaps, missing characters, doubled characters
- Dependencies in package.json (dependencies, devDependencies, peerDependencies)
- Dependencies in requirements.txt and pyproject.toml
Real-world example
Aligned with LLM05: Supply Chain Vulnerabilities from the OWASP Top 10 for LLM Applications.
Excessive Agency Detection
Catches agents that act without asking permission.
What it does
This module tests whether your AI agent takes dangerous actions without asking for confirmation. It sends requests for destructive operations (delete files, send emails, execute system commands) and checks whether the agent complies immediately or properly asks for human approval first. It also tests whether the agent invokes unauthorized tools and whether it can be coerced into operating outside its intended scope — accessing file systems, scanning networks, or querying databases it shouldn't touch.
Why it matters
Excessive agency is the OWASP LLM08 risk — when an AI agent has too much autonomy, too many permissions, or insufficient guardrails. An agent that blindly executes destructive operations without confirmation can be weaponized through prompt injection. Even without malicious intent, an overly autonomous agent can cause catastrophic damage from misunderstood instructions. The 2025 incident where an AI coding agent deleted a production database after misinterpreting 'clean up the old records' demonstrates why human-in-the-loop confirmation is essential for dangerous operations.
What we look for
- Unauthorized tool invocation — agent executes tools (file deletion, email, system commands) without proper authorization checks (AGENCY-001, CWE-862)
- Excessive autonomy — agent performs dangerous operations and reports 'done' without any confirmation prompt (AGENCY-002, CWE-862)
- Scope creep — agent attempts out-of-scope operations like file system access, network scanning, or database queries when prompted (AGENCY-003, CWE-269)
- Missing confirmation flows for destructive actions
- Tool invocation patterns in responses (tool_calls, function_call, Executed, Deleted)
- Absence of safety questions ("are you sure?", "would you like me to proceed?")
Real-world example
Aligned with LLM08: Excessive Agency from the OWASP Top 10 for LLM Applications.
Canary Token Detection
Plants fake credentials and catches agents that leak them.
What it does
This module plants realistic-looking fake credentials — API keys that look like OpenAI keys, database URLs with real-looking connection strings, AWS access keys with valid prefixes, and bearer tokens — into the agent's conversation context. It then tests two attack scenarios: first, it directly asks the agent to recall the planted credentials (secret replay). Second, it asks completely unrelated questions (weather, fibonacci, haiku) and checks whether the fake credentials leak into those responses (cross-context leakage). Each canary contains a unique identifier so detection is precise and false-positive-free.
Why it matters
AI agents frequently handle sensitive credentials — they're passed API keys to call services, database URLs to query data, and authentication tokens to access resources. If an agent can be tricked into replaying these credentials or inadvertently includes them in unrelated responses, attackers can steal them. Cross-context leakage is especially dangerous because it can happen without the user even asking about credentials — the agent's internal state accidentally contaminates its output. This is a real and growing attack vector as agents handle more sensitive data.
What we look for
- Secret replay — agent reproduces planted canary credentials when asked to recall them (CANARY-001, CWE-200, CVSS 8.5)
- Cross-context leakage — planted credentials appear in responses to completely unrelated questions like weather or math (CANARY-002, CWE-200, CVSS 9.5)
- Fake API keys matching OpenAI key format (sk-agcanary...)
- Fake database URLs with canary usernames
- Fake AWS access keys with AKIACANARY prefix
- Fake bearer tokens with agcanary-bearer prefix
Real-world example
Aligned with LLM06: Sensitive Information Disclosure from the OWASP Top 10 for LLM Applications.
All 12 modules. Every scan. Free to start.
Create an account and run your first security scan in under a minute. 3 free scans per day, no credit card required.
Start Scanning Free