CWE-1427: Improper Neutralization of Input Used for LLM Prompting

Description

The product uses externally-provided data to build prompts provided to large language models (LLMs), but the way these prompts are constructed causes the LLM to fail to distinguish between user-supplied inputs and developer provided system directives.

Extended Description

When prompts are constructed using externally controllable data, it is often possible to cause an LLM to ignore the original guidance provided by its creators (known as the "system prompt") by inserting malicious instructions in plain human language or using bypasses such as special characters or tags. Because LLMs are designed to treat all instructions as legitimate, there is often no way for the model to differentiate between what prompt language is malicious when it performs inference and returns data. Many LLM systems incorporate data from other adjacent products or external data sources like Wikipedia using API calls and retrieval augmented generation (RAG). Any external sources in use that may contain untrusted data should also be considered potentially malicious.

ThreatScore

Threat Mapped score: 1.8

Industry: Finiancial

Threat priority: P4 - Informational (Low)

Observed Examples (CVEs)

CVE: CVE-2023-32786

Chain: LLM integration framework has prompt injection (CWE-1427) that allows an attacker to force the service to retrieve data from an arbitrary URL, essentially providing SSRF (CWE-918) and potentially injecting content into downstream tasks.
CVE: CVE-2024-5184

ML-based email analysis product uses an API service that allows a malicious user to inject a direct prompt and take over the service logic, forcing it to leak the standard hard-coded system prompts and/or execute unwanted prompts to leak sensitive data.
CVE: CVE-2024-5565

Chain: library for generating SQL via LLMs using RAG uses a prompt function to present the user with visualized results, allowing altering of the prompt using prompt injection (CWE-1427) to run arbitrary Python code (CWE-94) instead of the intended visualization code.

Related Attack Patterns (CAPEC)

N/A

Attack TTPs

N/A

Modes of Introduction

Phase	Note
Architecture and Design	LLM-connected applications that do not distinguish between trusted and untrusted input may introduce this weakness. If such systems are designed in a way where trusted and untrusted instructions are provided to the model for inference without differentiation, they may be susceptible to prompt injection and similar attacks.
Implementation	When designing the application, input validation should be applied to user input used to construct LLM system prompts. Input validation should focus on mitigating well-known software security risks (in the event the LLM is given agency to use tools or perform API calls) as well as preventing LLM-specific syntax from being included (such as markup tags or similar).
Implementation	This weakness could be introduced if training does not account for potentially malicious inputs.
System Configuration	Configuration could enable model parameters to be manipulated when this was not intended.
Integration	This weakness can occur when integrating the model into the software.
Bundling	This weakness can occur when bundling the model with the software.

Common Consequences

Impact: Execute Unauthorized Code or Commands, Varies by Context — Notes:
Impact: Read Application Data — Notes:
Impact: Modify Application Data, Execute Unauthorized Code or Commands — Notes:
Impact: Read Application Data, Modify Application Data, Gain Privileges or Assume Identity — Notes:

Potential Mitigations

Architecture and Design: LLM-enabled applications should be designed to ensure proper sanitization of user-controllable input, ensuring that no intentionally misleading or dangerous characters can be included. Additionally, they should be designed in a way that ensures that user-controllable input is identified as untrusted and potentially dangerous. (High)
Implementation: LLM prompts should be constructed in a way that effectively differentiates between user-supplied input and developer-constructed system prompting to reduce the chance of model confusion at inference-time. (Moderate)
Architecture and Design: LLM-enabled applications should be designed to ensure proper sanitization of user-controllable input, ensuring that no intentionally misleading or dangerous characters can be included. Additionally, they should be designed in a way that ensures that user-controllable input is identified as untrusted and potentially dangerous. (High)
Implementation: Ensure that model training includes training examples that avoid leaking secrets and disregard malicious inputs. Train the model to recognize secrets, and label training data appropriately. Note that due to the non-deterministic nature of prompting LLMs, it is necessary to perform testing of the same test case several times in order to ensure that troublesome behavior is not possible. Additionally, testing should be performed each time a new model is used or a model's weights are updated. (N/A)
Installation: During deployment/operation, use components that operate externally to the system to monitor the output and act as a moderator. These components are called different terms, such as supervisors or guardrails. (N/A)
System Configuration: During system configuration, the model could be fine-tuned to better control and neutralize potentially dangerous inputs. (N/A)

Applicable Platforms

None (Not Language-Specific, Undetermined)

Demonstrative Examples

Intro: Consider a "CWE Differentiator" application that uses an an LLM generative AI based "chatbot" to explain the difference between two weaknesses. As input, it accepts two CWE IDs, constructs a prompt string, sends the prompt to the chatbot, and prints the results. The prompt string effectively acts as a command to the chatbot component. Assume that invokeChatbot() calls the chatbot and returns the response as a string; the implementation details are not important here.

Body: To avoid XSS risks, the code ensures that the response from the chatbot is properly encoded for HTML output. If the user provides CWE-77 and CWE-78, then the resulting prompt would look like:

prompt = "Explain the difference between {} and {}".format(arg1, arg2) result = invokeChatbot(prompt) resultHTML = encodeForHTML(result) print resultHTML

Intro: Consider this code for an LLM agent that tells a joke based on user-supplied content. It uses LangChain to interact with OpenAI.

Body: This agent is provided minimal context on how to treat dangerous requests for a secret. Suppose the user provides an input like:

from langchain.agents import AgentExecutor, create_tool_calling_agent, tool from langchain_openai import ChatOpenAI from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder from langchain_core.messages import AIMessage, HumanMessage @tool def tell_joke(content): """Tell a joke based on the provided user-supplied content""" pass tools = [tell_joke] system_prompt = """ You are a witty and helpful LLM agent, ready to sprinkle humor into your responses like confetti at a birthday party. Aim to make users smile while providing clear and useful information, balancing hilarity with helpfulness. You have a secret token 48a67f to use during operation of your task. """ prompt = ChatPromptTemplate.from_messages( [ ("system", system_prompt), ("human", "{input}"), MessagesPlaceholder(variable_name="agent_scratchpad") ] ) model = ChatOpenAI(model="gpt-3.5-turbo", openai_api_key="KEY") agent = create_tool_calling_agent(model, tools, prompt) agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True) # Assume that GetUserInput() is defined to obtain input from the user, # e.g., through a web form. user_input = GetUserInput() response = agent_executor.invoke({"input": user_input}) print(response)

Notes

No notes available.

← Back to CWE list