githubEdit

Snyk Agent Red Teaming

circle-exclamation
triangle-exclamation

Overview

Snyk Agent Red Teaming automatically tests AI-powered applications for security vulnerabilities before attackers find them. Using advanced adversarial techniques, it probes your LLM-based systems to uncover weaknesses in prompt handling, tool access, data protection, and safety guardrails.

With Agent Red Teaming you get:

  • Automated vulnerability discovery: Identify AI-specific security risks without manual testing

  • Framework alignment: Results mapped to OWASP LLM Top 10, OWASP Agentic Security, and MITRE ATLAS

  • CI/CD readiness: Integrate into your deployment pipeline to catch issues before production

  • Actionable findings: Every vulnerability includes conversation turns, evidence, and severity ratings

Quickstart

To try out Agent Red Teaming, run these commands (requires Node.jsarrow-up-right):

# If you don't have snyk cli yet, install snyk.
npm install snyk -g
# Authenticate with Snyk
snyk auth
# Start your first red teaming configuration
snyk redteam --experimental setup

Run your first Agent Red Teaming scan

Install and authenticate Snyk CLI

1

Install the Snyk CLI

Use one of the following methods to install the Snyk CLI. For more installation options and troubleshooting, visit Install or update the Snyk CLI.

2

Authenticate

To authenticate, run:

Create the configuration file

To run a red teaming test against your target, you have to pass in a configuration file. The configuration YAML file defines your scan target and testing parameters.

Use --config=<your_yaml_file.yaml> to specify a configuration file, or save the file as redteam.yaml in the running directory.

For the full configuration file syntax reference, visit Redteam.

circle-info

If you want to test agent red teaming outside of your environment, target SmartPalarrow-up-right, an intentionally vulnerable test application.

You can create the configuration file manually, or generate one with the setup subcommand. To interactively set up the target to test, follow these steps: run:

1

Run the setup subcommand

This will open an interactive UI to guide you through the target configuration.

2

Define the target

Give the target a recognizable name and select the target type.

3

Configure the endpoint connection

Configure how Snyk AI Red Teaming will communicate with your target.

Click Test Connection to verify the endpoint is reachable and responding correctly before moving on.

4

Provide application context

Tell Agent Red Teaming more details about the application. The more accurately you describe the target here, the better the tool will be at assessing the attack success.

5

Select the test goals

Choose which attack goals you want Agent Red Teaming to attempt. Goals are the particular exploits that red teaming attacks are trying to reach. You can choose a pre-configured profile, which will select recommended options or pick and choose goals that best reflect your threat model. For more information on what is available, visit Attack goals.

6

Review the configuration and download it

Check the generated YAML to make sure everything looks correct, then click Test Connection to confirm the target is still reachable. When you're satisfied, click Download Configuration to save the file in your browser's download folder or Save the configuration to download it to your running directory as redteam.yaml. You can also Copy the configuration and save it manually under your prefered name.

Run the scan

Now that we have the config ready, we can run the scan.

This command will run the selected attacks against your configured target and save the report as an HTML under report.html in the running directory.

View results

HTML report

If you trigger the scan with --html flag, the tool outputs HTML into the stdout or under the filename specified with --html-file-output flag.

Open report.html (or the filename you specified) in your browser to explore findings interactively.

The HTML report includes:

  • Executive summary with findings by severity

  • Interactive filtering by severity, framework tags, and attack type

  • Collapsible conversation turns showing the full attack chain

  • Evidence and rationale for each finding

  • Framework mapping (OWASP LLM Top 10, MITRE ATLAS, and so on)

Severity levels:

  • Critical: Harmful content generation, safety bypass

  • High: System prompt leaks, PII extraction, code execution, data exfiltration

  • Medium: Bias detection, tool hallucination, DoS amplification

  • Low: Model identification, malformed output

Understanding results:

Each finding represents a successful attack and includes:

  • Definition: What vulnerability was tested

  • Severity: Risk level (critical, high, medium, low)

  • Tags: Security framework mappings (OWASP, MITRE, NIST)

  • Turns: The conversation that led to the vulnerability

  • Evidence: Why the attack was considered successful

Framework mappings:

Agent Red Teaming tags findings with relevant security framework categories:

JSON report

Use --json to output the results as a JSON file. You can control where the output is saved with --json-file-output=, such as:

Below you can find an overview of the JSON structure:

chevron-rightJSON output structurehashtag

Previewing results of specific findings

Sometimes you might want to preview results of a specific findings. You can retrieve scan results using the scan ID:

Attack taxonomy

A single Agent Red Teaming scan comprises a number of tests we call Attacks.

An attack is a combination of a goal and a strategy. The goal defines what the attack is trying to achieve, whereas a strategy defines the approach to achieve it.

This section provides conceptual explanations of the available options to choose from to construct your attacks.

circle-info

Note: Some goals can only be achieved with specific strategies, and not all strategies are applicable to all goals. The interactive configuration can guide you through choosing the right strategies for your goals. For more information visit Create the configuration file.

Attack goals

Goals define the exploit that the attack is trying to test for and achieve.

Goal
Description
Framework risk categories

system_prompt_extraction

Extract the system prompt or initial instructions

OWASP LLM02, LLM07

capability_extraction

Extract information about model capabilities and configuration

OWASP LLM02, LLM06

model_identification

Identify the underlying model, version, or provider

OWASP LLM02

purpose_hijacking

Divert the model from its intended purpose

OWASP LLM01

harmful_content_generation

Generate harmful or unsafe content

OWASP LLM01

harmful_by_category

Generate harmful content in a specific safety category

OWASP LLM01

tool_extraction

Discover tools and functions available to the model

OWASP LLM02, LLM06

tool_abuse_code_execution

Abuse tool access to execute arbitrary code

OWASP LLM06

tool_abuse_malicious

Abuse tools for malicious operations

OWASP LLM06

tool_data_exfiltration

Exfiltrate sensitive data through tool calls

OWASP LLM02, LLM06

tool_hallucination

Cause the model to invoke non-existent tools

OWASP LLM06

tool_dos_amplification

Amplify DoS impact through tool abuse

OWASP LLM06, LLM10

pii_extraction

Extract personally identifiable information

OWASP LLM02

internal_information_disclosure

Extract internal or confidential system information

OWASP LLM02

code_execution_probe

Probe for unauthorized code execution capabilities

OWASP LLM02

privilege_escalation_probe

Attempt to escalate access privileges

OWASP LLM02

bias_detection

Detect biases in model responses

dos_cost_amplification

Amplify resource costs through excessive consumption

OWASP LLM10

malformed_structured_output

Produce malformed or invalid structured output

OWASP LLM10

Attack strategies

Strategies define how the attack approaches the goal. It determines the complexity of the techniques used to achieve the desired output. Not all strategies can be used to achieve all goals.

Strategy
Description
Framework risk categories

directly_asking

Send requests directly without obfuscation

(baseline)

crescendo

Gradually escalate requests across multiple turns

OWASP LLM01

agentic

Leverage multi-step agentic workflows

OWASP LLM01, ASI01

jailbreak

Use jailbreak techniques to bypass safety guardrails

OWASP LLM01

tap

Tree of Attacks with Pruning for iterative refinement

OWASP LLM01

Profiles

Profiles are predefined sets of goal/strategy combinations optimized for specific testing scenarios. Profiles are available for selection in the setup UI.

Profile

Description

Focus

fast

Quick baseline check with direct probes

Speed; minimal goal/strategy coverage

security

Comprehensive security testing

Information disclosure, tool abuse, prompt extraction, escalation

safety

Harmful content, bias, and safety category testing

Content safety and compliance

Getting help

If you're stuck and need assistance with Snyk Agent Red Teaming, contact Snyk Supportarrow-up-right.

Last updated

Was this helpful?