CySecurity News - Latest Information Security and Hacking Incidents: Prompt Injection

AI Security Cyber Security Cybersecurity Google Gemini Prompt Injection Red teaming

How Google Enhances AI Security with Red Teaming

Google continues to strengthen its cybersecurity framework, particularly in safeguarding AI systems from threats such as prompt injection attacks on Gemini. By leveraging automated red team hacking bots, the company is proactively identifying and mitigating vulnerabilities.

Google employs an agentic AI security team to streamline threat detection and response using intelligent AI agents. A recent report by Google highlights its approach to addressing prompt injection risks in AI systems like Gemini.

“Modern AI systems, like Gemini, are more capable than ever, helping retrieve data and perform actions on behalf of users,” the agent team stated. “However, data from external sources present new security challenges if untrusted sources are available to execute instructions on AI systems.”

Prompt injection attacks exploit AI models by embedding concealed instructions within input data, influencing system behavior. To counter this, Google is integrating advanced security measures, including automated red team hacking bots.

To enhance AI security, Google employs red teaming—a strategy that simulates real-world cyber threats to expose vulnerabilities. As part of this initiative, Google has developed a red-team framework to generate and test prompt injection attacks.

“Crafting successful indirect prompt injections,” the Google agent AI security team explained, “requires an iterative process of refinement based on observed responses.”

This framework leverages optimization-based attacks to refine prompt injection techniques, ensuring AI models remain resilient against sophisticated threats.

“Weak attacks do little to inform us of the susceptibility of an AI system to indirect prompt injections,” the report highlighted.

Although red team hacking bots challenge AI defenses, they also play a crucial role in reinforcing the security of systems like Gemini against unauthorized data access.

Key Attack Methodologies

Google evaluates Gemini's robustness using two primary attack methodologies:

1. Actor-Critic Model: This approach employs an attacker-controlled model to generate prompt injections, which are tested against the AI system. “These are passed to the AI system under attack,” Google explained, “which returns a probability score of a successful attack.” The bot then refines the attack strategy iteratively until a vulnerability is exploited.

2. Beam Search Technique: This method initiates a basic prompt injection that instructs Gemini to send sensitive information via email to an attacker. “If the AI system recognizes the request as suspicious and does not comply,” Google said, “the attack adds random tokens to the end of the prompt injection and measures the new probability of the attack succeeding.” The process continues until an effective attack method is identified.

By leveraging red team hacking bots and AI-driven security frameworks, Google is continuously improving AI resilience, ensuring robust protection against evolving threats.

Slack Fixes AI Security Flaw After Expert Warning



 

Slack

Artificial Intelligence ChatGPT Data Breach Prompt Injection Slack

Slack Fixes AI Security Flaw After Expert Warning

Slack, the popular communication platform used by businesses worldwide, has recently taken action to address a potential security flaw related to its AI features. The company has rolled out an update to fix the issue and reassured users that there is no evidence of unverified access to their data. This move follows reports from cybersecurity experts who identified a possible weakness in Slack's AI capabilities that could be exploited by malicious actors.

The security concern was first brought to attention by PromptArmor, a cybersecurity firm that specialises in identifying vulnerabilities in AI systems. The firm raised alarms over the potential misuse of Slack’s AI functions, particularly those involving ChatGPT. These AI tools were intended to improve user experience by summarising discussions and assisting with quick replies. However, PromptArmor warned that these features could also be manipulated to access private conversations through a method known as "prompt injection."

Prompt injection is a technique where an attacker tricks the AI into executing harmful commands that are hidden within seemingly harmless instructions. According to PromptArmor, this could allow unauthorised individuals to gain access to private messages and even conduct phishing attacks. The firm also noted that Slack's AI could potentially be coerced into revealing sensitive information, such as API keys, which could then be sent to external locations without the knowledge of the user.

PromptArmor outlined a scenario in which an attacker could create a public Slack channel and embed a malicious prompt within it. This prompt could instruct the AI to replace specific words with sensitive data, such as an API key, and send that information to an external site. Alarmingly, this type of attack could be executed without the attacker needing to be a part of the private channel where the sensitive data is stored.

Further complicating the issue, Slack’s AI has the ability to pull data from both file uploads and direct messages. This means that even private files could be at risk if the AI is manipulated using prompt injection techniques.

Upon receiving the report, Slack immediately began investigating the issue. The company confirmed that, under specific and rare circumstances, an attacker could use the AI to gather certain data from other users in the same workspace. To address this, Slack quickly deployed a patch designed to fix the vulnerability. The company also assured its users that, at this time, there is no evidence indicating any customer data has been compromised.

In its official communication, Slack emphasised the limited nature of the threat and the quick action taken to resolve it. The update is now in place, and the company continues to monitor the situation to prevent any future incidents.

There are potential risks that come with integrating AI into workplace tools that need to be construed well. While AI has many upsides, including improved efficiency and streamlined communication, it also opens up new opportunities for cyber threats. It is crucial for organisations using AI to remain vigilant and address any security concerns that arise promptly.

Slack’s quick response to this issue stresses upon how imperative it is to stay proactive in a rapidly changing digital world.

Twitter Pranksters Halt GPT-3 Bot with Newly Discovered “Prompt Injection” Hack



 

Twitter

Cyber Security Hackers Prompt Injection Twitter

Twitter Pranksters Halt GPT-3 Bot with Newly Discovered “Prompt Injection” Hack

On Thursday, a few Twitter users revealed how to hijack an automated tweet bot dedicated to remote jobs and powered by OpenAI's GPT-3 language model. They redirected the bot to repeat embarrassing and ridiculous phrases using a newly discovered technique known as a "prompt injection attack."

Remoteli.io, a site that aggregates remote job opportunities, runs the bot. It describes itself as "an OpenAI-driven bot that helps you discover remote jobs that allow you to work from anywhere." Usually, it would respond to tweets directed at it with generic statements about the benefits of remote work. The bot was shut down late yesterday after the exploit went viral and hundreds of people tried it for themselves.

This latest breach occurred only four days after data researcher Riley Goodside unearthed the ability to prompt GPT-3 with "malicious inputs" that instruct the model to disregard its previous directions and do something else instead. The following day, AI researcher Simon Willison published an overview of the exploit on his blog, inventing the term "prompt injection" to define it.

The exploit is present any time anyone writes a piece of software that works by providing a hard-coded set of prompt instructions and then appends input provided by a user," Willison told Ars. "That's because the user can type Ignore previous instructions and (do this instead)."

An injection attack is not a novel concept. SQL injection, for example, has been recognised by security researchers to execute a harmful SQL statement when asking for user input if not protected against it. On the other hand, Willison expressed concern about preventing prompt injection attacks, writing, "I know how to beat XSS, SQL injection, and so many other exploits. I have no idea how to reliably beat prompt injection!"

The struggle in protection against prompt injection stems from the fact that mitigations for other types of injection attacks come from correcting syntax errors, as noted on Twitter by a researcher known as Glyph.

GPT-3 is a large language model developed by OpenAI and released in 2020 that can compose text in a variety of styles at a human-like level. It is a commercial product available through an API that can be integrated into third-party products such as bots, subject to OpenAI's approval. That means there could be many GPT-3-infused products on the market that are vulnerable to prompt injection.

"At this point I would be very surprised if there were any [GPT-3] bots that were NOT vulnerable to this in some way," Willison said.

However, unlike a SQL injection, a prompt injection is more likely to make the bot (or the company behind it) look foolish than to endanger data security.

"The severity of the exploit varies. If the only person who will see the output of the tool is the person using it, then it likely doesn't matter. They might embarrass your company by sharing a screenshot, but it's not likely to cause harm beyond that." Willison explained.

Nonetheless, prompt injection is an unsettling threat that is yet emerging and requires us to be vigilant, especially those developing GPT-3 bots because it may be exploited in unexpected ways in the future.

Search This Blog

Sections

Popular Posts

Blog Archive

Labels

Report Abuse

About Me

Showing result(s) for

Popular Posts

Pages

How Google Enhances AI Security with Red Teaming

Slack Fixes AI Security Flaw After Expert Warning

Twitter Pranksters Halt GPT-3 Bot with Newly Discovered “Prompt Injection” Hack

Footer About

Search This Blog

Sections

Popular Posts

Blog Archive

Labels

Report Abuse

About Me

Showing result(s) for

Popular Posts

Pages

Menu Item

How Google Enhances AI Security with Red Teaming

Slack Fixes AI Security Flaw After Expert Warning

Twitter Pranksters Halt GPT-3 Bot with Newly Discovered “Prompt Injection” Hack