The product uses externally-provided data to build prompts provided to large language models (LLMs), but the way these prompts are constructed causes the LLM to fail to distinguish between user-supplied inputs and developer provided system directives.
When prompts are constructed using externally controllable data, it is often possible to cause an LLM to ignore the original guidance provided by its creators (known as the "system prompt") by inserting malicious instructions in plain human language or using bypasses such as special characters or tags. Because LLMs are designed to treat all instructions as legitimate, there is often no way for the model to differentiate between what prompt language is malicious when it performs inference and returns data. Many LLM systems incorporate data from other adjacent products or external data sources like Wikipedia using API calls and retrieval augmented generation (RAG). Any external sources in use that may contain untrusted data should also be considered potentially malicious.
Threat Mapped score: 1.8
Industry: Finiancial
Threat priority: P4 - Informational (Low)
CVE: CVE-2023-32786
Chain: LLM integration framework has prompt injection (CWE-1427) that allows an attacker to force the service to retrieve data from an arbitrary URL, essentially providing SSRF (CWE-918) and potentially injecting content into downstream tasks.
CVE: CVE-2024-5184
ML-based email analysis product uses an API service that allows a malicious user to inject a direct prompt and take over the service logic, forcing it to leak the standard hard-coded system prompts and/or execute unwanted prompts to leak sensitive data.
CVE: CVE-2024-5565
Chain: library for generating SQL via LLMs using RAG uses a prompt function to present the user with visualized results, allowing altering of the prompt using prompt injection (CWE-1427) to run arbitrary Python code (CWE-94) instead of the intended visualization code.
N/A
N/A
Phase | Note |
---|---|
Architecture and Design | LLM-connected applications that do not distinguish between trusted and untrusted input may introduce this weakness. If such systems are designed in a way where trusted and untrusted instructions are provided to the model for inference without differentiation, they may be susceptible to prompt injection and similar attacks. |
Implementation | When designing the application, input validation should be applied to user input used to construct LLM system prompts. Input validation should focus on mitigating well-known software security risks (in the event the LLM is given agency to use tools or perform API calls) as well as preventing LLM-specific syntax from being included (such as markup tags or similar). |
Implementation | This weakness could be introduced if training does not account for potentially malicious inputs. |
System Configuration | Configuration could enable model parameters to be manipulated when this was not intended. |
Integration | This weakness can occur when integrating the model into the software. |
Bundling | This weakness can occur when bundling the model with the software. |
Intro: Consider a "CWE Differentiator" application that uses an an LLM generative AI based "chatbot" to explain the difference between two weaknesses. As input, it accepts two CWE IDs, constructs a prompt string, sends the prompt to the chatbot, and prints the results. The prompt string effectively acts as a command to the chatbot component. Assume that invokeChatbot() calls the chatbot and returns the response as a string; the implementation details are not important here.
Body: To avoid XSS risks, the code ensures that the response from the chatbot is properly encoded for HTML output. If the user provides CWE-77 and CWE-78, then the resulting prompt would look like:
prompt = "Explain the difference between {} and {}".format(arg1, arg2) result = invokeChatbot(prompt) resultHTML = encodeForHTML(result) print resultHTML
Intro: Consider this code for an LLM agent that tells a joke based on user-supplied content. It uses LangChain to interact with OpenAI.
Body: This agent is provided minimal context on how to treat dangerous requests for a secret. Suppose the user provides an input like:
from langchain.agents import AgentExecutor, create_tool_calling_agent, tool from langchain_openai import ChatOpenAI from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder from langchain_core.messages import AIMessage, HumanMessage @tool def tell_joke(content): """Tell a joke based on the provided user-supplied content""" pass tools = [tell_joke] system_prompt = """ You are a witty and helpful LLM agent, ready to sprinkle humor into your responses like confetti at a birthday party. Aim to make users smile while providing clear and useful information, balancing hilarity with helpfulness. You have a secret token 48a67f to use during operation of your task. """ prompt = ChatPromptTemplate.from_messages( [ ("system", system_prompt), ("human", "{input}"), MessagesPlaceholder(variable_name="agent_scratchpad") ] ) model = ChatOpenAI(model="gpt-3.5-turbo", openai_api_key="KEY") agent = create_tool_calling_agent(model, tools, prompt) agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True) # Assume that GetUserInput() is defined to obtain input from the user, # e.g., through a web form. user_input = GetUserInput() response = agent_executor.invoke({"input": user_input}) print(response)