Notice: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the neve domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /var/www/vhosts/sigmacybersecurity.com/httpdocs/wp-includes/functions.php on line 6114
Indirect Instruction Injection in Multi-Modal LLMs - Sigma Cyber Security
Skip to content

Indirect Instruction Injection in Multi-Modal LLMs

Indirect Instruction Injection in Multi-Modal LLMs

Researchers have discovered a fascinating method of indirect prompt and instruction injection in multi-modal Language Models (LLMs). In a recent paper titled “(Ab)using Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs,” the authors demonstrate how an attacker can generate adversarial perturbations and blend them into images or audio recordings. These perturbations then guide the LLMs to output the attacker’s desired text or manipulate subsequent dialogues according to the attacker’s instructions. The attack examples discussed in the paper focus on LLaVa and PandaGPT, two prominent LLMs.

Posted on July 28, 2023 at 7:06 AM

0 Comments

Sidebar photo of Bruce Schneier by Joe MacInnis.

Key Points:
– Researchers have discovered a method of indirect prompt and instruction injection in multi-modal LLMs.
– Adversarial perturbations blended into images or audio recordings can steer LLMs to output attacker-chosen text.
– The perturbations can also manipulate subsequent dialogues to follow the attacker’s instructions.
– The attack examples discussed in the research paper target LLaVa and PandaGPT LLMs.
– This research highlights potential vulnerabilities in multi-modal LLMs that could be exploited by attackers.

Leave a Reply

Your email address will not be published. Required fields are marked *