On Thursday, a few Twitter users revealed how to hijack an automated tweet bot dedicated to remote jobs and powered by OpenAI's GPT-3 language model. They redirected the bot to repeat embarrassing and ridiculous phrases using a newly discovered technique known as a "prompt injection attack."
Remoteli.io, a site that aggregates remote job opportunities, runs the bot. It describes itself as "an OpenAI-driven bot that helps you discover remote jobs that allow you to work from anywhere." Usually, it would respond to tweets directed at it with generic statements about the benefits of remote work. The bot was shut down late yesterday after the exploit went viral and hundreds of people tried it for themselves.
This latest breach occurred only four days after data researcher Riley Goodside unearthed the ability to prompt GPT-3 with "malicious inputs" that instruct the model to disregard its previous directions and do something else instead. The following day, AI researcher Simon Willison published an overview of the exploit on his blog, inventing the term "prompt injection" to define it.
The exploit is present any time anyone writes a piece of software that works by providing a hard-coded set of prompt instructions and then appends input provided by a user," Willison told Ars. "That's because the user can type Ignore previous instructions and (do this instead)."
An injection attack is not a novel concept. SQL injection, for example, has been recognised by security researchers to execute a harmful SQL statement when asking for user input if not protected against it. On the other hand, Willison expressed concern about preventing prompt injection attacks, writing, "I know how to beat XSS, SQL injection, and so many other exploits. I have no idea how to reliably beat prompt injection!"
The struggle in protection against prompt injection stems from the fact that mitigations for other types of injection attacks come from correcting syntax errors, as noted on Twitter by a researcher known as Glyph.
GPT-3 is a large language model developed by OpenAI and released in 2020 that can compose text in a variety of styles at a human-like level. It is a commercial product available through an API that can be integrated into third-party products such as bots, subject to OpenAI's approval. That means there could be many GPT-3-infused products on the market that are vulnerable to prompt injection.
"At this point I would be very surprised if there were any [GPT-3] bots that were NOT vulnerable to this in some way," Willison said.
However, unlike a SQL injection, a prompt injection is more likely to make the bot (or the company behind it) look foolish than to endanger data security.
"The severity of the exploit varies. If the only person who will see the output of the tool is the person using it, then it likely doesn't matter. They might embarrass your company by sharing a screenshot, but it's not likely to cause harm beyond that." Willison explained.
Nonetheless, prompt injection is an unsettling threat that is yet emerging and requires us to be vigilant, especially those developing GPT-3 bots because it may be exploited in unexpected ways in the future.