Independent security researcher Johann Rehberger found a flaw in the memory feature of ChatGPT. Hackers can manipulate the stored information that gets extracted to steal user data by exploiting the long-term memory setting of ChatGPT. This is actually an "issue related to safety, rather than security" as OpenAI termed the problem, showing how this feature allows storing of false information and captures user data over time.
Rehberger had initially reported the incident to OpenAI. The point was that the attackers could fill the AI's memory settings with false information and malicious commands. OpenAI's memory feature, in fact, allows the user's information from previous conversations to be put in that memory so during a future conversation, the AI can recall the age, preferences, or any other relevant details of that particular user without having been fed the same data repeatedly.
But what Rehberger had highlighted was the vulnerability that hackers capitalised on to permanently store false memories through a technique known as prompt injection. Essentially, it occurs when an attacker manipulates the AI by malicious content attached to emails, documents, or images. For example, he demonstrated how he could get ChatGPT to believe he was 102 and living in a virtual reality of sorts. Once these false memories were implanted, they could haunt and influence all subsequent interaction with the AI.
How Hackers Can Use ChatGPT's Memory to Steal Data
In proof of concept, Rehberger demonstrated how this vulnerability can be exploited in real-time for the theft of user inputs. In chat, hackers can send a link or even open an image that hooks ChatGPT into a malicious link and redirects all conversations along with the user data to a server owned by the hacker. Such attacks would not have to be stopped because the memory of the AI holds the instructions planted even after starting a new conversation.
Although OpenAI has issued partial fixes to prevent memory feature exploitation, the underlying mechanism of prompt injection remains. Attackers can still compromise ChatGPT's memory by embedding knowledge in their long-term memory that may have been seeded through unauthorised channels.
What Users Can Do
There are also concerns for users who care about what ChatGPT is going to remember about them in terms of data. Users need to monitor the chat session for any unsolicited shift in memory updates and screen regularly what is saved into and deleted from the memory of ChatGPT. OpenAI has put out guidance on how to manage the memory feature of the tool and how users may intervene in determining what is kept or deleted.
Though OpenAI did its best to address the issue, such an incident brings out a fact that continues to show how vulnerable AI systems remain when it comes to safety issues concerning user data and memory. Regarding AI development, safety regarding the protected sensitive information will always continue to raise concerns from developers to the users themselves.
Therefore, the weakness revealed by Rehberger shows how risky the introduction of AI memory features might be. The users need to be always alert about what information is stored and avoid any contacts with any content they do not trust. OpenAI is certainly able to work out security problems as part of its user safety commitment, but in this case, it also turns out that even the best solutions without active management on the side of a user will lead to breaches of data.