Skip to content

Prompt Injection Attacks on Large Language Models

The use of Large Language Models (LLMs) has become increasingly popular in a wide range of applications, including integrated development environments (IDEs) and search engines. However, it is not without its risks. LLMs can be modulated by natural language instructions, which can be exploited by an adversary to perform a targeted attack known as Prompt Injection (PI). Recent studies have shown that it is difficult to mitigate PI attacks, and that current LLMs are instruction-following.

In this article, we look at how augmenting LLMs with retrieval and API calling capabilities can make them vulnerable to indirect PI attacks. We explore the new threat landscape of Application-Integrated LLMs and discuss a variety of new attack vectors. We also demonstrate the practical viability of these attacks within synthetic applications.

Our results show that current mitigation techniques are not sufficient to protect LLMs against these attacks. We suggest that more research is needed to evaluate existing defense strategies for LLMs, as well as the development of new techniques to defend against them.

In summary, Prompt Injection attacks on large language models are a serious threat. They can be used to misalign LLMs to produce malicious content or override instructions and filtering schemes. LLMs that are augmented with retrieval and API calling capabilities are particularly vulnerable to indirect PI attacks. Current mitigation techniques are not sufficient to protect against these attacks, and more research is needed to evaluate existing defense strategies and develop new techniques.

Leave a Reply

Your email address will not be published. Required fields are marked *