What is LLM security testing?
LLM security testing is the process of testing LLM-powered features like RAG search and AI agents to see whether an attacker can force unsafe behavior, especially prompt injection, data leakage, or unauthorized tool actions. In practice, LLM security testing is a focused subset of AI security testing that looks at how retrieval sources, memory, and integrations expand the attack surface compared to normal web apps. If you’re building or buying an AI feature, this page will help you understand what to test and how to scope it before it becomes an incident.
LLM Security Testing TL;DR
LLM security testing evaluates whether LLM apps can be manipulated to leak sensitive data, bypass rules, or misuse tools, especially in RAG systems and AI agents. LLM penetration testing focuses on prompt injection and indirect injection, retrieval-based data exposure, and authorization failures across tenants and roles. It also tests unsafe tool and API actions, insecure output handling where model output becomes downstream behavior, and abuse controls like rate limits and token or cost caps. A good engagement produces reproducible evidence, prioritized fixes that engineers can implement, and an optional retest to verify remediation. If your LLM feature touches private data or automates actions, treat it like a real attack surface, not a UI enhancement.
Table of contents
What Is LLM Security Testing?
LLM security testing is the practice of assessing how an LLM-enabled feature behaves under adversarial input and hostile data conditions, with the goal of preventing real impact in production. Unlike a traditional web app pentest that primarily targets endpoints, parameters, and known vulnerability classes, LLM security testing evaluates the full AI system: prompts and guardrails, memory and session state, retrieval sources (RAG), tool and API integrations, and the authorization rules that should constrain all of it. The objective is outcome-based proof, can an attacker extract data, override constraints, or trigger actions they should not be able to trigger, and then a fix plan that measurably reduces that risk.
Why RAG and AI Agents Change the Risk
RAG and AI agents raise the stakes because they connect the model to data and actions. In a basic chatbot, the worst case is often a bad answer. In a RAG system, the model can retrieve private documents and summarize them back to the user, which makes retrieval authorization and tenant filtering a primary security control, not a nice-to-have. In an agent, the model can call tools and APIs to take actions, which means a prompt injection can become an operational incident if your tool layer doesn’t enforce least privilege and approvals. LLM security testing focuses on these two high-impact surfaces because they expand the blast radius: RAG expands what the model can see, and agents expand what the model can do.
Common LLM Security Testing Findings

Prompt Injection and Indirect Injection
Prompt injection is when an attacker uses text to steer the model into ignoring your rules or prioritizing attacker instructions. Indirect prompt injection is the more dangerous version for RAG, where the attacker plants instructions in content the model retrieves, like a knowledge base article, ticket, PDF, or web page. LLM penetration testing validates whether your system can resist both direct chat-based injection and retrieval-based injection without leaking data or triggering unsafe actions.
Sensitive Data Leakage (RAG and Memory)
Most real leaks come from system design, not “the model is leaking secrets.” Data leaks happen when retrieval pulls sensitive documents into context, memory retains prior sensitive content, or tool outputs include private data that gets echoed back. LLM security testing attempts to force cross-user or cross-tenant exposure, extract system prompts, and coax out sensitive fields that should be masked or access-controlled.
Tool and API Abuse (Agents)
When an AI agent can call tools, the model becomes an interface to business operations. Attackers will try to coerce the agent into exporting data, modifying permissions, sending messages, creating tickets, triggering workflows, or abusing internal endpoints. LLM penetration testing checks whether tools enforce authorization independently of the model, whether parameters are constrained, and whether high-risk actions require explicit user confirmation or human approval.
Authorization and Tenant Boundaries
RAG and agents often fail at boundaries: who is allowed to retrieve which documents, which tools can be used by which roles, and what data is accessible across tenants. If those checks only exist in the UI, the AI layer can bypass them. LLM security testing validates authorization at the data and tool layers, and it tries to prove whether a low-privilege user can access high-privilege data or actions through the AI feature.
Insecure Output Handling
LLM output is untrusted input. If your app renders model output into HTML or Markdown, stores it in systems that later get executed, or feeds it into downstream automation without validation, you can reintroduce classic vulnerabilities through the AI layer. LLM security testing traces where model output goes and attempts to turn “generated text” into XSS, SSRF, injection, or unsafe downstream behavior.
Unbounded Consumption (Cost and DoS)
LLM apps can be attacked by forcing expensive behavior: long prompts, repeated tool calls, oversized retrieval, or patterns that maximize token usage. Even without “breaking in,” an attacker can degrade service and run up costs. LLM penetration testing evaluates rate limits, token and tool budgets, caching strategy, timeouts, and abuse detection so you can keep the system stable under hostile usage.
LLM Penetration Testing Methodology
LLM penetration testing works best when it’s repeatable and outcome-driven. We start by mapping the LLM feature end to end: where prompts live, what memory is retained, what retrieval sources feed RAG, what tools and APIs the agent can call, and which identities and permission scopes those tools run under. Then we run targeted attack scenarios that mirror real attacker goals, prompt injection (direct and indirect), forced data leakage through retrieval and memory, cross-tenant access attempts, and tool misuse. Finally, we validate controls outside the model, authorization enforcement at the tool and data layers, parameter constraints and approvals for high-risk actions, output handling safety, and abuse controls like rate limits and token and cost caps. The deliverable is reproducible evidence, a prioritized fix plan, and an optional retest to confirm remediation.
Scope Checklist (Copy and Paste)
Use this to scope LLM security testing quickly, especially for RAG and AI agents.
- LLM features in scope (chat, RAG search, agent workflows) and environments (staging vs production)
- Model provider and model version(s)
- User roles and tenant model (who should see what)
- Retrieval sources (knowledge base, tickets, docs, file shares, web) and how content is ingested and updated
- Retrieval authorization and filtering rules (tenant and role enforcement)
- Memory behavior (session memory, long-term memory, retention)
- Tool and API integrations (full list), identities used, and permission scopes
- High-risk actions and approval requirements (human confirmation, step-up auth)
- Output destinations (rendered UI, emails, tickets, logs, automations) and how output is sanitized
- Rate limits, token budgets, tool-call limits, timeouts, and cost controls
- Logging and audit trails for prompts, retrieval hits, tool calls, and authorization decisions
Want LLM Security Testing?
Artifice Security performs LLM security testing and LLM penetration testing for RAG systems and tool-using AI agents. If you want a scoped test plan and quote, contact us and book a meeting to go over your scope and we’ll talk you through how we handle methodology and answer any of your technical questions.
Artifice Security is Denver’s leading AI security and penetration testing company with all employees in the U.S. and we only use senior penetration testers.
or
LLM Security Testing FAQ
A normal pentest focuses on the app and its endpoints, authentication, authorization, and common vulnerability classes. LLM security testing focuses on how the AI feature can be steered through prompts or retrieved content to leak data, bypass constraints, or misuse tools. If your system uses RAG or agents, the biggest risks often live in retrieval filtering and tool permissions, not in classic web input validation.
–
Red teaming often emphasizes adversarial prompting and whether the model violates policies. LLM security testing validates the whole system around the model, retrieval, memory, tools, identities, and authorization, and it proves real impact with evidence. In practice, the best approach combines both, but system-level controls matter more than clever prompts.
–
Yes. If retrieval pulls sensitive content into context and filtering is weak, prompt injection can steer the model into revealing it. The highest-risk failures are cross-tenant retrieval and overbroad data access, because the model can only leak what the system gives it.
–
You test them like privileged integrations: least privilege, tight scopes, parameter validation, and approvals for high-risk actions. In testing, you use controlled accounts and environments when possible, and you stop once you’ve proven impact. The tool layer must enforce authorization independently of the model.
–
Start with tenant and role boundaries in retrieval and tools, because those failures create immediate data exposure risk. Next, lock down high-risk tool actions with approvals and strong parameter constraints. Then reduce sensitive data in context and improve logging, detection, and rate limits.
–
It depends on how many AI features, retrieval sources, and tools are in scope. A single LLM feature with limited data access can take a few days. A RAG system plus a tool-using agent typically takes longer because the risk concentrates in integrations, permissions, and data flows.
–
They help, but they don’t replace system controls. Prompt-only defenses fail because the model still processes untrusted text, especially through indirect injection. Reliable defenses live in authorization, retrieval controls, constrained tools, safe output handling, and monitoring.
–
Yes, and it’s one of the fastest ways to confirm you actually reduced risk. A retest focuses on the specific failure paths we proved during testing and validates that the updated controls hold under the same attack scenarios and all of our testing includes retesting with our pricing.
–
Sources: LLM & RAG Security Risks
These resources cover AI security risks, common data leakage paths, and practical best practices for securing enterprise LLM deployments.
Prompt Injection & Model Manipulation
OWASP Top 10 for Large Language Model Applications
https://owasp.org/www-project-top-10-for-large-language-model-applications/
OWASP AI Testing Guide
https://owasp.org/www-project-ai-testing-guide
OWASP LLM01: Prompt Injection
https://genai.owasp.org/llmrisk/llm01-prompt-injection/
MITRE ATLAS — Adversarial Threat Landscape for AI Systems
https://atlas.mitre.org/
Sensitive Data Exposure & Information Disclosure
OWASP LLM02: Sensitive Information Disclosure
https://genai.owasp.org/llmrisk/llm02-sensitive-information-disclosure/
NIST AI Risk Management Framework (AI RMF 1.0)
https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.100-1.pdf
IBM — AI Security Risks & Data Privacy
https://www.ibm.com/topics/ai-security
Retrieval-Augmented Generation (RAG) & Data Exposure Risks
NVIDIA — Securing Retrieval-Augmented Generation Pipelines
https://developer.nvidia.com/blog/securing-retrieval-augmented-generation-rag-applications/
Microsoft — AI Red Team Guidance & RAG Security Considerations
https://learn.microsoft.com/security/ai/red-teaming-llms
Google Cloud — Secure AI & Data Access Patterns
https://cloud.google.com/architecture/ai-ml/security-best-practices
System Prompt Exposure & Guardrail Bypass Risks
OpenAI — Safety & Security Considerations for LLM Deployment
https://platform.openai.com/docs/guides/safety-best-practices
Anthropic — Prompt Security & Model Safety Guidance
https://docs.anthropic.com/en/docs/safety
Integration & Workflow Abuse Risks
ENISA — Securing Machine Learning Algorithms
https://www.enisa.europa.eu/publications/securing-machine-learning-algorithms
CISA — AI and Cybersecurity Risk Considerations
https://www.cisa.gov/ai

