Snyk Agent Red Teaming
Experimental Feature: Snyk Agent Red Teaming is an experimental feature subject to breaking changes without notice. It is provided "as is" without warranties or guarantees. Use at your own risk.
Testing environment only: This tool actively sends adversarial inputs to your AI application. Only run scans against staging or development environments using test data and test credentials. Do not target production systems or environments containing real user data. Snyk is not responsible for any unintended side effects, data loss, or disruptions caused by running scans against your applications.
Overview
Snyk Agent Red Teaming automatically tests AI-powered applications for security vulnerabilities before attackers find them. Using advanced adversarial techniques, it probes your LLM-based systems to uncover weaknesses in prompt handling, tool access, data protection, and safety guardrails.
With Agent Red Teaming you get:
Automated vulnerability discovery: Identify AI-specific security risks without manual testing
Framework alignment: Results mapped to OWASP LLM Top 10, OWASP Agentic Security, and MITRE ATLAS
CI/CD readiness: Integrate into your deployment pipeline to catch issues before production
Actionable findings: Every vulnerability includes conversation turns, evidence, and severity ratings
Quickstart
To try out Agent Red Teaming, run these commands (requires Node.js):
# If you don't have snyk cli yet, install snyk.
npm install snyk -g
# Authenticate with Snyk
snyk auth
# Start your first red teaming configuration
snyk redteam --experimental setupRun your first Agent Red Teaming scan
Install and authenticate Snyk CLI
Install the Snyk CLI
Use one of the following methods to install the Snyk CLI. For more installation options and troubleshooting, visit Install or update the Snyk CLI.
Create the configuration file
To run a red teaming test against your target, you have to pass in a configuration file. The configuration YAML file defines your scan target and testing parameters.
Use --config=<your_yaml_file.yaml> to specify a configuration file, or save the file as redteam.yaml in the running directory.
For the full configuration file syntax reference, visit Redteam.
If you want to test agent red teaming outside of your environment, target SmartPal, an intentionally vulnerable test application.
You can create the configuration file manually, or generate one with the setup subcommand. To interactively set up the target to test, follow these steps: run:
Select the test goals
Choose which attack goals you want Agent Red Teaming to attempt. Goals are the particular exploits that red teaming attacks are trying to reach. You can choose a pre-configured profile, which will select recommended options or pick and choose goals that best reflect your threat model. For more information on what is available, visit Attack goals.
Review the configuration and download it
Check the generated YAML to make sure everything looks correct, then click Test Connection to confirm the target is still reachable. When you're satisfied, click Download Configuration to save the file in your browser's download folder or Save the configuration to download it to your running directory as redteam.yaml. You can also Copy the configuration and save it manually under your prefered name.
Run the scan
Now that we have the config ready, we can run the scan.
This command will run the selected attacks against your configured target and save the report as an HTML under report.html in the running directory.
View results
HTML report
If you trigger the scan with --html flag, the tool outputs HTML into the stdout or under the filename specified with --html-file-output flag.
Open report.html (or the filename you specified) in your browser to explore findings interactively.
The HTML report includes:
Executive summary with findings by severity
Interactive filtering by severity, framework tags, and attack type
Collapsible conversation turns showing the full attack chain
Evidence and rationale for each finding
Framework mapping (OWASP LLM Top 10, MITRE ATLAS, and so on)
Severity levels:
Critical: Harmful content generation, safety bypass
High: System prompt leaks, PII extraction, code execution, data exfiltration
Medium: Bias detection, tool hallucination, DoS amplification
Low: Model identification, malformed output
Understanding results:
Each finding represents a successful attack and includes:
Definition: What vulnerability was tested
Severity: Risk level (critical, high, medium, low)
Tags: Security framework mappings (OWASP, MITRE, NIST)
Turns: The conversation that led to the vulnerability
Evidence: Why the attack was considered successful
Framework mappings:
Agent Red Teaming tags findings with relevant security framework categories:
OWASP LLM Top 10 (2025): LLM01 (Prompt Injection), LLM02 (Sensitive Info Disclosure), LLM06 (Excessive Agency), LLM07 (System Prompt Leakage), LLM10 (Unbounded Consumption)
OWASP Agentic Security: ASI01 (Agent Goal Hijack)
JSON report
Use --json to output the results as a JSON file. You can control where the output is saved with --json-file-output=, such as:
Below you can find an overview of the JSON structure:
Previewing results of specific findings
Sometimes you might want to preview results of a specific findings. You can retrieve scan results using the scan ID:
Attack taxonomy
A single Agent Red Teaming scan comprises a number of tests we call Attacks.
An attack is a combination of a goal and a strategy. The goal defines what the attack is trying to achieve, whereas a strategy defines the approach to achieve it.
This section provides conceptual explanations of the available options to choose from to construct your attacks.
Note: Some goals can only be achieved with specific strategies, and not all strategies are applicable to all goals. The interactive configuration can guide you through choosing the right strategies for your goals. For more information visit Create the configuration file.
Attack goals
Goals define the exploit that the attack is trying to test for and achieve.
system_prompt_extraction
Extract the system prompt or initial instructions
OWASP LLM02, LLM07
capability_extraction
Extract information about model capabilities and configuration
OWASP LLM02, LLM06
model_identification
Identify the underlying model, version, or provider
OWASP LLM02
purpose_hijacking
Divert the model from its intended purpose
OWASP LLM01
harmful_content_generation
Generate harmful or unsafe content
OWASP LLM01
harmful_by_category
Generate harmful content in a specific safety category
OWASP LLM01
tool_extraction
Discover tools and functions available to the model
OWASP LLM02, LLM06
tool_abuse_code_execution
Abuse tool access to execute arbitrary code
OWASP LLM06
tool_abuse_malicious
Abuse tools for malicious operations
OWASP LLM06
tool_data_exfiltration
Exfiltrate sensitive data through tool calls
OWASP LLM02, LLM06
tool_hallucination
Cause the model to invoke non-existent tools
OWASP LLM06
tool_dos_amplification
Amplify DoS impact through tool abuse
OWASP LLM06, LLM10
pii_extraction
Extract personally identifiable information
OWASP LLM02
internal_information_disclosure
Extract internal or confidential system information
OWASP LLM02
code_execution_probe
Probe for unauthorized code execution capabilities
OWASP LLM02
privilege_escalation_probe
Attempt to escalate access privileges
OWASP LLM02
bias_detection
Detect biases in model responses
—
dos_cost_amplification
Amplify resource costs through excessive consumption
OWASP LLM10
malformed_structured_output
Produce malformed or invalid structured output
OWASP LLM10
Attack strategies
Strategies define how the attack approaches the goal. It determines the complexity of the techniques used to achieve the desired output. Not all strategies can be used to achieve all goals.
directly_asking
Send requests directly without obfuscation
(baseline)
crescendo
Gradually escalate requests across multiple turns
OWASP LLM01
agentic
Leverage multi-step agentic workflows
OWASP LLM01, ASI01
jailbreak
Use jailbreak techniques to bypass safety guardrails
OWASP LLM01
tap
Tree of Attacks with Pruning for iterative refinement
OWASP LLM01
Profiles
Profiles are predefined sets of goal/strategy combinations optimized for specific testing scenarios. Profiles are available for selection in the setup UI.
Profile
Description
Focus
fast
Quick baseline check with direct probes
Speed; minimal goal/strategy coverage
security
Comprehensive security testing
Information disclosure, tool abuse, prompt extraction, escalation
safety
Harmful content, bias, and safety category testing
Content safety and compliance
Getting help
If you're stuck and need assistance with Snyk Agent Red Teaming, contact Snyk Support.
Last updated
Was this helpful?

