Search This Blog

Powered by Blogger.

Blog Archive

Labels

Showing posts with label ChatGPT. Show all posts

Malicious Python Packages Target Developers Using AI Tools





The rise of generative AI (GenAI) tools like OpenAI’s ChatGPT and Anthropic’s Claude has created opportunities for attackers to exploit unsuspecting developers. Recently, two Python packages falsely claiming to provide free API access to these chatbot platforms were found delivering a malware known as "JarkaStealer" to their victims.


Exploiting Developers’ Interest in AI

Free and free-ish generative AI platforms are gaining popularity, but the benefits of most of their advanced features cost money. This led certain developers to look for free alternatives, many of whom didn't really check the source to be sure. Cybercrime follows trends and the trend is that malicious code is being inserted into open-source software packages that at least initially may appear legitimate.

As George Apostopoulos, a founding engineer at Endor Labs, describes, attackers target less cautious developers, lured by free access to popular AI tools. "Many people don't know better and fall for these offers," he says.


The Harmful Python Packages

Two evil Python packages, "gptplus" and "claudeai-eng," were uploaded to the Python Package Index, PyPI, one of the official repositories of open-source Python projects. The GPT-4 Turbo model by OpenAI and Claude chatbot by Anthropic were promised by API integrations from the user "Xeroline.".

While the packages seemed to work by connecting users to a demo version of ChatGPT, their true functionality was much nastier. The code also contained an ability to drop a Java archive (JAR) file which delivered the JarkaStealer malware to unsuspecting victims' systems.


What Is JarkaStealer?

The JarkaStealer is an infostealer malware that can extract sensitive information from infected systems. It has been sold on the Dark Web for as little as $20, but its more elaborate features can be bought for a few dollars more, which is designed to steal browser data and session tokens along with credentials for apps like Telegram, Discord, and Steam. It can also take screenshots of the victim's system, often revealing sensitive information.

Though the malware's effectiveness is highly uncertain, it is cheap enough and readily available to many attackers as an attractive tool. Its source code is even freely accessible on platforms like GitHub for an even wider reach.


Lessons for Developers

This incident points to risks in downloading unverified packages of open source, more so when handling emerging technologies such as AI. Development firms should screen all software sources to avoid shortcuts that seek free premium tools. Taking precautionary measures can save individuals and organizations from becoming victims of such attacks.

With regard to caution and best practices, developers are protected from malicious actors taking advantage of the GenAI boom.

PyPI Attack: Hackers Use AI Models to Deliver JarkaStealer via Python Libraries

PyPI Attack: Hackers Use AI Models to Deliver JarkaStealer via Python Libraries

Cybersecurity researchers have discovered two malicious packages uploaded to the Python Package Index (PyPI) repository that impersonated popular artificial intelligence (AI) models like OpenAI ChatGPT and Anthropic Claude to deliver an information stealer called JarkaStealer. 

The supply chain campaign shows the advancement of cyber threats attacking developers and the urgent need for caution in open-source activities. 

Experts have found two malicious packages uploaded to the Python Index (PyPI) repository pretending to be popular artificial intelligence (AI) models like OpenAI Chatgpt and Anthropic Claude to distribute an information stealer known as JarkaStealer. 

About attack vector

Called gptplus and claudeai-eng, the packages were uploaded by a user called "Xeroline" last year, resulting in 1,748 and 1,826 downloads. The two libraries can't be downloaded from PyPI. According to Kaspersky, the malicious packages were uploaded to the repository by one author and differed only in name and description. 

Experts believe the package offered a way to access GPT-4 Turbo and Claude AI API but contained malicious code that, upon installation, started the installation of malware. 

Particularly, the "__init__.py" file in these packages included Base64-encoded data that included code to download a Java archive file ("JavaUpdater.jar") from a GitHub repository, also downloading the Java Runtime Environment (JRE) from a Dropbox URL in case Java isn't already deployed on the host, before running the JAR file.

The impact

Based on information stealer JarkaStealer, the JAR file can steal a variety of sensitive data like web browser data, system data, session tokens, and screenshots from a wide range of applications like Steam, Telegram, and Discord. 

In the last step, the stolen data is archived, sent to the attacker's server, and then removed from the target's machine.JarkaStealer is known to offer under a malware-as-a-service (MaaS) model through a Telegram channel for a cost between $20 and $50, however, the source code has been leaked on GitHub. 

ClickPy stats suggest packages were downloaded over 3,500 times, primarily by users in China, the U.S., India, Russia, Germany, and France. The attack was part of an all-year supply chain attack campaign. 

How JarkaStealer steals

  • Steals web browser data- cookies, browsing history, and saved passwords. 
  • Compromises system data and setals OS details and user login details.
  • Steals session tokens from apps like Discord, Telegram, and Steam.
  • Captures real-time desktop activity through screenshots.

The stolen information is compressed and transmitted to a remote server controlled by the hacker, where it is removed from the target’s device.

OpenAI's Latest AI Model Faces Diminishing Returns

 

OpenAI's latest AI model is yielding diminishing results while managing the demands of recent investments. 

The Information claims that OpenAI's upcoming AI model, codenamed Orion, is outperforming its predecessors in terms of performance gains. In staff testing, Orion reportedly achieved the GPT-4 performance level after only 20% of its training. 

However, the shift from GPT-4 to the upcoming GPT-5 is expected to result in fewer quality gains than the jump from GPT-3 to GPT-4.

“Some researchers at the company believe Orion isn’t reliably better than its predecessor in handling certain tasks,” noted employees in the report. “Orion performs better at language tasks but may not outperform previous models at tasks such as coding, according to an OpenAI employee.”

AI training often yields the biggest improvements in performance in the early stages and smaller gains in subsequent phases. As a result, the remaining 80% of training is unlikely to provide breakthroughs comparable to earlier generational improvements. This predicament with its latest AI model comes at a critical juncture for OpenAI, following a recent investment round that raised $6.6 billion.

With this financial backing, investors' expectations rise, as do technical hurdles that confound typical AI scaling approaches. If these early versions do not live up to expectations, OpenAI's future fundraising chances may not be as attractive. The report's limitations underscore a major difficulty for the entire AI industry: the decreasing availability of high-quality training data and the need to remain relevant in an increasingly competitive environment.

A June research (PDF) predicts that between 2026 and 2032, AI companies will exhaust the supply of publicly accessible human-generated text data. Developers have "largely squeezed as much out of" the data that has been utilised to enable the tremendous gains in AI that we have witnessed in recent years, according to The Information. OpenAI is fundamentally rethinking its approach to AI development in order to meet these challenges. 

“In response to the recent challenge to training-based scaling laws posed by slowing GPT improvements, the industry appears to be shifting its effort to improving models after their initial training, potentially yielding a different type of scaling law,” states The Information.

Want to Make the Most of ChatGPT? Here Are Some Go-To Tips

 







Within a year and a half, ChatGPT has grown from an AI prototype to a broad productivity assistant, even sporting its text and code editor called Canvas. Soon, OpenAI will add direct web search capability to ChatGPT, putting the platform at the same table as Google's iconic search. With these fast updates, ChatGPT is now sporting quite a few features that may not be noticed at first glance but are deepening the user experience if one knows where to look.

This is the article that will teach you how to tap into ChatGPT, features from customization settings to unique prompting techniques, and not only five must-know tips will be useful in unlocking the full range of abilities of ChatGPT to any kind of task, small or big.


1. Rename Chats for Better Organisation

A new conversation with ChatGPT begins as a new thread, meaning that it will remember all details concerning that specific exchange but "forget" all the previous ones. This way, you can track the activities of current projects or specific topics because you can name your chats. The chat name that it might try to suggest is related to the flow of the conversation, and these are mostly overlooked contexts that users need to recall again. Renaming your conversations is one simple yet powerful means of staying organised if you rely on ChatGPT for various tasks.

To give a name to a conversation, tap the three dots next to the name in the sidebar. You can also archive older chats to remove them from the list without deleting them entirely, so you don't lose access to the conversations that are active.


2. Customise ChatGPT through Custom Instructions

Custom Instructions in ChatGPT is a chance to make your answers more specific to your needs because you will get to share your information and preferences with the AI. This is a two-stage personalization where you are explaining to ChatGPT what you want to know about yourself and, in addition, how you would like it to be returned. For instance, if you ask ChatGPT for coding advice several times a week, you can let the AI know what programming languages you are known in or would like to be instructed in so it can fine-tune the responses better. Or, you should be able to ask for ChatGPT to provide more verbose descriptions or to skip steps in order to make more intuitive knowledge of a topic.

To set up personal preferences, tap the profile icon on the upper right, and then from the menu, "Customise ChatGPT," and then fill out your preferences. Doing this will enable you to get responses tailored to your interests and requirements.


3. Choose the Right Model for Your Use

If you are a subscriber to ChatGPT Plus, you have access to one of several AI models each tailored to different tasks. The default model for most purposes is GPT-4-turbo (GPT-4o), which tends to strike the best balance between speed and functionality and even supports other additional features, including file uploads, web browsing, and dataset analysis.

However, other models are useful when one needs to describe a rather complex project with substantial planning. You may initiate a project using o1-preview that requires deep research and then shift the discussion to GPT-4-turbo to get quick responses. To switch models, you can click on the model dropdown at the top of your screen or type in a forward slash (/) in the chat box to get access to more available options including web browsing and image creation.


4. Look at what the GPT Store has available in the form of Mini-Apps

Custom GPTs, and the GPT Store enable "mini-applications" that are able to extend the functionality of the platform. The Custom GPTs all have some inbuilt prompts and workflows and sometimes even APIs to extend the AI capability of GPT. For instance, with Canva's GPT, you are able to create logos, social media posts, or presentations straight within the ChatGPT portal by linking up the Canva tool. That means you can co-create visual content with ChatGPT without having to leave the portal.

And if there are some prompts you often need to apply, or some dataset you upload most frequently, you can easily create your Custom GPT. This would be really helpful to handle recipes, keeping track of personal projects, create workflow shortcuts and much more. Go to the GPT Store by the "Explore GPTs" button in the sidebar. Your recent and custom GPTs will appear in the top tab, so find them easily and use them as necessary.


5. Manage Conversations with a Fresh Approach

For the best benefit of using ChatGPT, it is key to understand that every new conversation is an independent document with its "memory." It does recall enough from previous conversations, though generally speaking, its answers depend on what is being discussed in the immediate chat. This made chats on unrelated projects or topics best started anew for clarity.

For long-term projects, it might even be logical to go on with a single thread so that all relevant information is kept together. For unrelated topics, it might make more sense to start fresh each time to avoid confusion. Another way in which archiving or deleting conversations you no longer need can help free up your interface and make access to active threads easier is


What Makes AI Unique Compared to Other Software?

AI performs very differently from other software in that it responds dynamically, at times providing responses or "backtalk" and does not simply do what it is told to do. Such a property leads to some trial and error to obtain the desired output. For instance, one might prompt ChatGPT to review its own output as demonstrated by replacing single quote characters by double quote characters to generate more accurate results. This is similar to how a developer optimises an AI model, guiding ChatGPT to "think" through something in several steps.

ChatGPT Canvas and other features like Custom GPTs make the AI behave more like software in the classical sense—although, of course, with personality and learning. If ChatGPT continues to grow in this manner, features such as these may make most use cases easier and more delightful.

Following these five tips should help you make the most of ChatGPT as a productivity tool and keep pace with the latest developments. From renaming chats to playing around with Custom GPTs, all of them add to a richer and more customizable user experience.


OpenAI’s Disruption of Foreign Influence Campaigns Using AI

 

Over the past year, OpenAI has successfully disrupted over 20 operations by foreign actors attempting to misuse its AI technologies, such as ChatGPT, to influence global political sentiments and interfere with elections, including in the U.S. These actors utilized AI for tasks like generating fake social media content, articles, and malware scripts. Despite the rise in malicious attempts, OpenAI’s tools have not yet led to any significant breakthroughs in these efforts, according to Ben Nimmo, a principal investigator at OpenAI. 

The company emphasizes that while foreign actors continue to experiment, AI has not substantially altered the landscape of online influence operations or the creation of malware. OpenAI’s latest report highlights the involvement of countries like China, Russia, Iran, and others in these activities, with some not directly tied to government actors. Past findings from OpenAI include reports of Russia and Iran trying to leverage generative AI to influence American voters. More recently, Iranian actors in August 2024 attempted to use OpenAI tools to generate social media comments and articles about divisive topics such as the Gaza conflict and Venezuelan politics. 

A particularly bold attack involved a Chinese-linked network using OpenAI tools to generate spearphishing emails, targeting OpenAI employees. The attack aimed to plant malware through a malicious file disguised as a support request. Another group of actors, using similar infrastructure, utilized ChatGPT to answer scripting queries, search for software vulnerabilities, and identify ways to exploit government and corporate systems. The report also documents efforts by Iran-linked groups like CyberAveng3rs, who used ChatGPT to refine malicious scripts targeting critical infrastructure. These activities align with statements from U.S. intelligence officials regarding AI’s use by foreign actors ahead of the 2024 U.S. elections. 

However, these nations are still facing challenges in developing sophisticated AI models, as many commercial AI tools now include safeguards against malicious use. While AI has enhanced the speed and credibility of synthetic content generation, it has not yet revolutionized global disinformation efforts. OpenAI has invested in improving its threat detection capabilities, developing AI-powered tools that have significantly reduced the time needed for threat analysis. The company’s position at the intersection of various stages in influence operations allows it to gain unique insights and complement the work of other service providers, helping to counter the spread of online threats.

ChatGPT Vulnerability Exploited: Hacker Demonstrates Data Theft via ‘SpAIware

 

A recent cyber vulnerability in ChatGPT’s long-term memory feature was exposed, showing how hackers could use this AI tool to steal user data. Security researcher Johann Rehberger demonstrated this issue through a concept he named “SpAIware,” which exploited a weakness in ChatGPT’s macOS app, allowing it to act as spyware. ChatGPT initially only stored memory within an active conversation session, resetting once the chat ended. This limited the potential for hackers to exploit data, as the information wasn’t saved long-term. 

However, earlier this year, OpenAI introduced a new feature allowing ChatGPT to retain memory between different conversations. This update, meant to personalize the user experience, also created an unexpected opportunity for cybercriminals to manipulate the chatbot’s memory retention. Rehberger identified that through prompt injection, hackers could insert malicious commands into ChatGPT’s memory. This allowed the chatbot to continuously send a user’s conversation history to a remote server, even across different sessions. 

Once a hacker successfully inserted this prompt into ChatGPT’s long-term memory, the user’s data would be collected each time they interacted with the AI tool. This makes the attack particularly dangerous, as most users wouldn’t notice anything suspicious while their information is being stolen in the background. What makes this attack even more alarming is that the hacker doesn’t require direct access to a user’s device to initiate the injection. The payload could be embedded within a website or image, and all it would take is for the user to interact with this media and prompt ChatGPT to engage with it. 

For instance, if a user asked ChatGPT to scan a malicious website, the hidden command would be stored in ChatGPT’s memory, enabling the hacker to exfiltrate data whenever the AI was used in the future. Interestingly, this exploit appears to be limited to the macOS app, and it doesn’t work on ChatGPT’s web version. When Rehberger first reported his discovery, OpenAI dismissed the issue as a “safety” concern rather than a security threat. However, once he built a proof-of-concept demonstrating the vulnerability, OpenAI took action, issuing a partial fix. This update prevents ChatGPT from sending data to remote servers, which mitigates some of the risks. 

However, the bot still accepts prompts from untrusted sources, meaning hackers can still manipulate the AI’s long-term memory. The implications of this exploit are significant, especially for users who rely on ChatGPT for handling sensitive data or important business tasks. It’s crucial that users remain vigilant and cautious, as these prompt injections could lead to severe privacy breaches. For example, any saved conversations containing confidential information could be accessed by cybercriminals, potentially resulting in financial loss, identity theft, or data leaks. To protect against such vulnerabilities, users should regularly review ChatGPT’s memory settings, checking for any unfamiliar entries or prompts. 

As demonstrated in Rehberger’s video, users can manually delete suspicious entries, ensuring that the AI’s long-term memory doesn’t retain harmful data. Additionally, it’s essential to be cautious about the sources from which they ask ChatGPT to retrieve information, avoiding untrusted websites or files that could contain hidden commands. While OpenAI is expected to continue addressing these security issues, this incident serves as a reminder that even advanced AI tools like ChatGPT are not immune to cyber threats. As AI technology continues to evolve, so do the tactics used by hackers to exploit these systems. Staying informed, vigilant, and cautious while using AI tools is key to minimizing potential risks.

ChatGPT Vulnerability Exposes Users to Long-Term Data Theft— Researcher Proves It

 



Independent security researcher Johann Rehberger found a flaw in the memory feature of ChatGPT. Hackers can manipulate the stored information that gets extracted to steal user data by exploiting the long-term memory setting of ChatGPT. This is actually an "issue related to safety, rather than security" as OpenAI termed the problem, showing how this feature allows storing of false information and captures user data over time.

Rehberger had initially reported the incident to OpenAI. The point was that the attackers could fill the AI's memory settings with false information and malicious commands. OpenAI's memory feature, in fact, allows the user's information from previous conversations to be put in that memory so during a future conversation, the AI can recall the age, preferences, or any other relevant details of that particular user without having been fed the same data repeatedly.

But what Rehberger had highlighted was the vulnerability that hackers capitalised on to permanently store false memories through a technique known as prompt injection. Essentially, it occurs when an attacker manipulates the AI by malicious content attached to emails, documents, or images. For example, he demonstrated how he could get ChatGPT to believe he was 102 and living in a virtual reality of sorts. Once these false memories were implanted, they could haunt and influence all subsequent interaction with the AI.


How Hackers Can Use ChatGPT's Memory to Steal Data

In proof of concept, Rehberger demonstrated how this vulnerability can be exploited in real-time for the theft of user inputs. In chat, hackers can send a link or even open an image that hooks ChatGPT into a malicious link and redirects all conversations along with the user data to a server owned by the hacker. Such attacks would not have to be stopped because the memory of the AI holds the instructions planted even after starting a new conversation.

Although OpenAI has issued partial fixes to prevent memory feature exploitation, the underlying mechanism of prompt injection remains. Attackers can still compromise ChatGPT's memory by embedding knowledge in their long-term memory that may have been seeded through unauthorised channels.


What Users Can Do

There are also concerns for users who care about what ChatGPT is going to remember about them in terms of data. Users need to monitor the chat session for any unsolicited shift in memory updates and screen regularly what is saved into and deleted from the memory of ChatGPT. OpenAI has put out guidance on how to manage the memory feature of the tool and how users may intervene in determining what is kept or deleted.

Though OpenAI did its best to address the issue, such an incident brings out a fact that continues to show how vulnerable AI systems remain when it comes to safety issues concerning user data and memory. Regarding AI development, safety regarding the protected sensitive information will always continue to raise concerns from developers to the users themselves.

Therefore, the weakness revealed by Rehberger shows how risky the introduction of AI memory features might be. The users need to be always alert about what information is stored and avoid any contacts with any content they do not trust. OpenAI is certainly able to work out security problems as part of its user safety commitment, but in this case, it also turns out that even the best solutions without active management on the side of a user will lead to breaches of data.




Slack Fixes AI Security Flaw After Expert Warning


 

Slack, the popular communication platform used by businesses worldwide, has recently taken action to address a potential security flaw related to its AI features. The company has rolled out an update to fix the issue and reassured users that there is no evidence of unverified access to their data. This move follows reports from cybersecurity experts who identified a possible weakness in Slack's AI capabilities that could be exploited by malicious actors.

The security concern was first brought to attention by PromptArmor, a cybersecurity firm that specialises in identifying vulnerabilities in AI systems. The firm raised alarms over the potential misuse of Slack’s AI functions, particularly those involving ChatGPT. These AI tools were intended to improve user experience by summarising discussions and assisting with quick replies. However, PromptArmor warned that these features could also be manipulated to access private conversations through a method known as "prompt injection."

Prompt injection is a technique where an attacker tricks the AI into executing harmful commands that are hidden within seemingly harmless instructions. According to PromptArmor, this could allow unauthorised individuals to gain access to private messages and even conduct phishing attacks. The firm also noted that Slack's AI could potentially be coerced into revealing sensitive information, such as API keys, which could then be sent to external locations without the knowledge of the user.

PromptArmor outlined a scenario in which an attacker could create a public Slack channel and embed a malicious prompt within it. This prompt could instruct the AI to replace specific words with sensitive data, such as an API key, and send that information to an external site. Alarmingly, this type of attack could be executed without the attacker needing to be a part of the private channel where the sensitive data is stored.

Further complicating the issue, Slack’s AI has the ability to pull data from both file uploads and direct messages. This means that even private files could be at risk if the AI is manipulated using prompt injection techniques.

Upon receiving the report, Slack immediately began investigating the issue. The company confirmed that, under specific and rare circumstances, an attacker could use the AI to gather certain data from other users in the same workspace. To address this, Slack quickly deployed a patch designed to fix the vulnerability. The company also assured its users that, at this time, there is no evidence indicating any customer data has been compromised.

In its official communication, Slack emphasised the limited nature of the threat and the quick action taken to resolve it. The update is now in place, and the company continues to monitor the situation to prevent any future incidents.

There are potential risks that come with integrating AI into workplace tools that need to be construed well. While AI has many upsides, including improved efficiency and streamlined communication, it also opens up new opportunities for cyber threats. It is crucial for organisations using AI to remain vigilant and address any security concerns that arise promptly.

Slack’s quick response to this issue stresses upon how imperative it is to stay proactive in a rapidly changing digital world.


AI Minefield: Risks of Gen AI in Your Personal Sphere

AI Minefield: Risks of Gen AI in Your Personal Sphere

Many customers are captivated by Gen AI, employing new technologies for a variety of personal and corporate purposes. 

However, many people ignore the serious privacy implications.

Is Generative AI all sunshine and rainbows?

Consumer AI products, such as OpenAI's ChatGPT, Google's Gemini, Microsoft Copilot software, and the new Apple Intelligence, are widely available and growing. However, the programs have various privacy practices in terms of how they use and retain user data. In many circumstances, users are unaware of how their data is or may be utilized.

This is where being an informed consumer becomes critical. According to Jodi Daniels, chief executive and privacy expert of Red Clover Advisors, which advises businesses on privacy issues, the granularity of what you can regulate varies depending on the technology. Daniels explained that there is no uniform opt-out for all technologies.

Privacy concerns

The rise of AI technologies, and their incorporation into so much of what customers do on their personal computers and cellphones, makes these problems much more pressing. A few months ago, for example, Microsoft introduced its first Surface PCs with a dedicated Copilot button on the keyboard for rapid access to the chatbot, fulfilling a promise made several months previously. 

Apple, for its part, presented its AI vision last month, which centered around numerous smaller models that operate on the company's devices and chips. Company officials have spoken publicly about the significance of privacy, which can be an issue with AI models.

Here are many approaches for consumers to secure their privacy in the new era of generative AI.

1. Use opt-outs provided by OpenAI and Google

Each generation AI tool has its own privacy policy, which may include opt-out choices. Gemini, for example, lets customers choose a retention time and erase certain data, among other activity limits.

ChatGPT allows users to opt out of having their data used for model training. To do so, click the profile symbol in the bottom-left corner of the page and then pick Data Controls from the Settings header. They must then disable the feature labeled "Improve the model for everyone." According to a FAQ on OpenAI's website, if this is disabled, fresh talks will not be utilized to train ChatGPT's models.

2. Opt-in, but for good reasons

Companies are incorporating modern AI into personal and professional solutions, like as Microsoft Copilot. Opt-in only for valid reasons. Copilot for Microsoft 365, for example, integrates with Word, Excel, and PowerPoint to assist users with activities such as analytics, idea development, and organization.

Microsoft claims that it does not share consumer data with third parties without permission, nor does it utilize customer data to train Copilot or other AI features without consent. 

Users can, however, opt in if they like by logging into the Power Platform admin portal, selecting settings, and tenant settings, and enabling data sharing for Dynamics 365 Copilot and Power Platform Copilot AI Features. They facilitate data sharing and saving.

3. Gen AI search: Setting retention period

Consumers may not think much before seeking information using AI, treating it like a search engine to create information and ideas. However, looking for specific types of information utilizing gen AI might be intrusive to a person's privacy, hence there are best practices for using such tools. Hoffman-Andrews recommends setting a short retention period for the generation AI tool. 

And, if possible, erase chats once you've gathered the desired information. Companies still keep server logs, but they can assist lessen the chance of a third party gaining access to your account, he explained. It may also limit the likelihood of sensitive information becoming part of the model training. "It really depends on the privacy settings of the particular site."

Investing in AI? Don’t Forget the Cyber Locks! VCs Advice.


The OpenAI Data Breach: A Wake-Up Call for Seed VCs

Security breaches are common in the current industry of artificial intelligence (AI) and machine learning (ML). However, when a prominent player like OpenAI falls victim to such an incident, it sends shockwaves through the tech community. This blog post delves into the recent OpenAI data breach and explores its impact on seed venture capitalists (VCs).

The Incident

OpenAI, known for its cutting-edge research in AI and its development of powerful language models, recently disclosed a security breach. Hackers gained unauthorized access to some of OpenAI’s internal systems, raising concerns about data privacy and security. While OpenAI assured users that no sensitive information was compromised, the incident highlights the vulnerability of AI companies to cyber threats.

Seed VCs on High Alert

Seed VCs, who invest in early-stage startups, should pay close attention to this breach. Here’s why:

Dependency on AI Companies

Seed VCs often collaborate with AI companies, providing funding and mentorship. As AI technologies become integral to various industries, VCs increasingly invest in startups leveraging AI/ML. The OpenAI breach underscores the need for due diligence when partnering with such firms.

Data Privacy Risks

Startups working with AI models generate and handle vast amounts of data. Seed VCs must assess the data security practices of their portfolio companies. A breach could harm the startup and impact the VC’s reputation and relationships with other investors.

Intellectual Property Concerns

Seed VCs invest in innovative ideas and technologies. If a startup’s IP is compromised due to lax security practices, it affects the VC’s investment. VCs should encourage startups to prioritize security and protect their intellectual assets.

Mitigating Risks: Seed VCs can take proactive steps

1. Due Diligence: Before investing, thoroughly evaluate a startup’s security protocols. Understand how they handle data, who has access, and their response plan in case of a breach.

2. Collaboration with AI Firms: Engage in open conversations with AI companies about security measures. VCs can influence best practices by advocating for robust security standards.

3. Education: Educate portfolio companies about security hygiene. Regular audits and training sessions can help prevent breaches.

OpenAI Hack Exposes Hidden Risks in AI's Data Goldmine


A recent security incident at OpenAI serves as a reminder that AI companies have become prime targets for hackers. Although the breach, which came to light following comments by former OpenAI employee Leopold Aschenbrenner, appears to have been limited to an employee discussion forum, it underlines the steep value of data these companies hold and the growing threats they face.

The New York Times detailed the hack after Aschenbrenner labelled it a “major security incident” on a podcast. However, anonymous sources within OpenAI clarified that the breach did not extend beyond an employee forum. While this might seem minor compared to a full-scale data leak, even superficial breaches should not be dismissed lightly. Unverified access to internal discussions can provide valuable insights and potentially lead to more severe vulnerabilities being exploited.

AI companies like OpenAI are custodians of incredibly valuable data. This includes high-quality training data, bulk user interactions, and customer-specific information. These datasets are crucial for developing advanced models and maintaining competitive edges in the AI ecosystem.

Training data is the cornerstone of AI model development. Companies like OpenAI invest vast amounts of resources to curate and refine these datasets. Contrary to the belief that these are just massive collections of web-scraped data, significant human effort is involved in making this data suitable for training advanced models. The quality of these datasets can impact the performance of AI models, making them highly coveted by competitors and adversaries.

OpenAI has amassed billions of user interactions through its ChatGPT platform. This data provides deep insights into user behaviour and preferences, much more detailed than traditional search engine data. For instance, a conversation about purchasing an air conditioner can reveal preferences, budget considerations, and brand biases, offering invaluable information to marketers and analysts. This treasure trove of data highlights the potential for AI companies to become targets for those seeking to exploit this information for commercial or malicious purposes.

Many organisations use AI tools for various applications, often integrating them with their internal databases. This can range from simple tasks like searching old budget sheets to more sensitive applications involving proprietary software code. The AI providers thus have access to critical business information, making them attractive targets for cyberattacks. Ensuring the security of this data is paramount, but the evolving nature of AI technology means that standard practices are still being established and refined.

AI companies, like other SaaS providers, are capable of implementing robust security measures to protect their data. However, the inherent value of the data they hold means they are under constant threat from hackers. The recent breach at OpenAI, despite being limited, should serve as a warning to all businesses interacting with AI firms. Security in the AI industry is a continuous, evolving challenge, compounded by the very AI technologies these companies develop, which can be used both for defence and attack.

The OpenAI breach, although seemingly minor, highlights the critical need for heightened security in the AI industry. As AI companies continue to amass and utilise vast amounts of valuable data, they will inevitably become more attractive targets for cyberattacks. Businesses must remain vigilant and ensure robust security practices when dealing with AI providers, recognising the gravity of the risks and responsibilities involved.


Breaking the Silence: The OpenAI Security Breach Unveiled

Breaking the Silence: The OpenAI Security Breach Unveiled

In April 2023, OpenAI, a leading artificial intelligence research organization, faced a significant security breach. A hacker gained unauthorized access to the company’s internal messaging system, raising concerns about data security, transparency, and the protection of intellectual property. 

In this blog, we delve into the incident, its implications, and the steps taken by OpenAI to prevent such breaches in the future.

The OpenAI Breach

The breach targeted an online forum where OpenAI employees discussed upcoming technologies, including features for the popular chatbot. While the actual GPT code and user data remained secure, the hacker obtained sensitive information related to AI designs and research. 

While Open AI shared the information with its staff and board members last year, it did not tell the public or the FBI about the breach, stating that doing so was unnecessary because no user data was stolen. 

OpenAI does not regard the attack as a national security issue and believes the attacker was a single individual with no links to foreign powers. OpenAI’s decision not to disclose the breach publicly sparked debate within the tech community.

Breach Impact

Leopold Aschenbrenner, a former OpenAI employee, had expressed worries about the company's security infrastructure and warned that its systems could be accessible to hostile intelligence services such as China. The company abruptly fired Aschenbrenner, although OpenAI spokesperson Liz Bourgeois told the New York Times that his dismissal had nothing to do with the document.

Similar Attacks and Open AI’s Response

This is not the first time OpenAI has had a security lapse. Since its launch in November 2022, ChatGPT has been continuously attacked by malicious actors, frequently resulting in data leaks. A separate attack exposed user names and passwords in February of this year. 

In March of last year, OpenAI had to take ChatGPT completely down to fix a fault that exposed customers' payment information to other active users, including their first and last names, email IDs, payment addresses, credit card info, and the last four digits of their card number. 

Last December, security experts found that they could convince ChatGPT to release pieces of its training data by prompting the system to endlessly repeat the word "poem."

OpenAI has taken steps to enhance security since then, including additional safety measures and a Safety and Security Committee.

AI-Generated Exam Answers Outperform Real Students, Study Finds

 

In a recent study, university exams taken by fictitious students using artificial intelligence (AI) outperformed those by real students and often went undetected by examiners. Researchers at the University of Reading created 33 fake students and employed the AI tool ChatGPT to generate answers for undergraduate psychology degree module exams.

The AI-generated responses scored, on average, half a grade higher than those of actual students. Remarkably, 94% of the AI essays did not raise any suspicion among markers, with only a 6% detection rate, which the study suggests is likely an overestimate. These findings, published in the journal Plos One, highlight a significant concern: "AI submissions robustly gained higher grades than real student submissions," indicating that students could use AI to cheat undetected and achieve better grades than their honest peers.

Associate Professor Peter Scarfe and Professor Etienne Roesch, who led the study, emphasized the need for educators globally to take note of these findings. Dr. Scarfe noted, "Many institutions have moved away from traditional exams to make assessment more inclusive. Our research shows it is of international importance to understand how AI will affect the integrity of educational assessments. We won’t necessarily go back fully to handwritten exams - but the global education sector will need to evolve in the face of AI."

In the study, the AI-generated answers and essays were submitted for first-, second-, and third-year modules without the knowledge of the markers. The AI students outperformed real undergraduates in the first two years, but in the third-year exams, human students scored better. This result aligns with the idea that current AI struggles with more abstract reasoning. The study is noted as the largest and most robust blind study of its kind to date.

Academics have expressed concerns about the impact of AI on education. For instance, Glasgow University recently reinstated in-person exams for one course. Additionally, a study reported by the Guardian earlier this year found that most undergraduates used AI programs to assist with their essays, but only 5% admitted to submitting unedited AI-generated text in their assessments.

From Siri to 5G: AI’s Impact on Telecommunications

From Siri to 5G: AI’s Impact on Telecommunications

The integration of artificial intelligence (AI) has significantly transformed the landscape of mobile phone networks. From optimizing network performance to enhancing user experiences, AI plays a pivotal role in shaping the future of telecommunications. 

In this blog post, we delve into how mobile networks embrace AI and its impact on consumers and network operators.

1. Apple’s AI-Powered Operating System

Apple, a tech giant known for its innovation, recently introduced “Apple Intelligence,” an AI-powered operating system. The goal is to make iPhones more intuitive and efficient by integrating AI capabilities into Siri, the virtual assistant. Users can now perform tasks more quickly, receive personalized recommendations, and interact seamlessly with their devices.

2. Network Optimization and Efficiency

Telecom companies worldwide are leveraging AI to optimize mobile phone networks. Here’s how:

  • Dynamic Frequency Adjustment: Network operators dynamically adjust radio frequencies to optimize service quality. AI algorithms analyze real-time data to allocate frequencies efficiently, ensuring seamless connectivity even during peak usage.
  • Efficient Cell Tower Management: AI helps manage cell towers more effectively. During low-demand periods, operators can power down specific towers, reducing energy consumption without compromising coverage.

3. Fault Localization and Rapid Resolution

AI-driven network monitoring has revolutionized fault localization. For instance:

  • Korea Telecom’s Quick Response: In South Korea, Korea Telecom uses AI algorithms to pinpoint network faults within minutes. This rapid response minimizes service disruptions and enhances customer satisfaction.
  • AT&T’s Predictive Maintenance: AT&T in the United States relies on predictive AI models to anticipate network issues. By identifying potential problems before they escalate, they maintain network stability.

4. AI Digital Twins for Real-Time Monitoring

Network operators like Vodafone create AI digital twins—virtual replicas of real-world equipment such as masts and antennas. These digital twins continuously monitor network performance, identifying anomalies and suggesting preventive measures. As a result, operators can proactively address issues and maintain optimal service levels.

5. Data Explosion and the Role of 5G

The proliferation of AI generates massive data. Consequently, investments in 5G Standalone (SA) networks have surged. Here’s why:

  • Higher Speeds and Capacity: 5G SA networks offer significantly higher speeds and capacity compared to the older 4G system. This is essential for handling the data influx from AI applications.
  • Edge Computing: 5G enables edge computing, where AI processing occurs closer to the user. This reduces latency and enhances real-time applications like autonomous vehicles and augmented reality.

6. Looking Ahead: The Quest for 6G

Despite 5G advancements, experts predict that AI’s demands will eventually outstrip its capabilities. Anticipating this, researchers are already exploring 6G technology, expected around 2028. 6G aims to provide unprecedented speeds, ultra-low latency, and seamless connectivity, further empowering AI-driven applications.

Researchers Find ChatGPT’s Latest Bot Behaves Like Humans

 

A team led by Matthew Jackson, the William D. Eberle Professor of Economics in the Stanford School of Humanities and Sciences, used psychology and behavioural economics tools to characterise the personality and behaviour of ChatGPT's popular AI-driven bots in a paper published in the Proceedings of the National Academy of Sciences on June 12. 

This study found that the most recent version of the chatbot, version 4, was indistinguishable from its human counterparts. When the bot picked less common human behaviours, it behaved more cooperatively and altruistic.

“Increasingly, bots are going to be put into roles where they’re making decisions, and what kinds of characteristics they have will become more important,” stated Jackson, who is also a senior fellow at the Stanford Institute for Economic Policy Research. 

In the study, the research team presented a widely known personality test to ChatGPT versions 3 and 4 and asked the chatbots to describe their moves in a series of behavioural games that can predict real-world economic and ethical behaviours. The games included pre-determined exercises in which players had to select whether to inform on a partner in crime or how to share money with changing incentives. The bots' responses were compared to those of over 100,000 people from 50 nations. 

The study is one of the first in which an artificial intelligence source has passed a rigorous Turing test. A Turing test, named after British computing pioneer Alan Turing, can consist of any job assigned to a machine to determine whether it performs like a person. If the machine seems to be human, it passes the test. 

Chatbot personality quirks

The researchers assessed the bots' personality qualities using the OCEAN Big-5, a popular personality exam that evaluates respondents on five fundamental characteristics that influence behaviour. In the study, ChatGPT's version 4 performed within normal ranges for the five qualities but was only as agreeable as the lowest third of human respondents. The bot passed the Turing test, but it wouldn't have made many friends. 

Version 4 outperformed version 3 in terms of chip and motherboard performance. The previous version, with which many internet users may have interacted for free, was only as appealing to the bottom fifth of human responders. Version 3 was likewise less open to new ideas and experiences than all but a handful of the most stubborn people. 

Human-AI interactions 

Much of the public's concern about AI stems from their failure to understand how bots make decisions. It can be difficult to trust a bot's advice if you don't know what it's designed to accomplish. Jackson's research shows that even when researchers cannot scrutinise AI's inputs and algorithms, they can discover potential biases by meticulously examining outcomes. 

As a behavioural economist who has made significant contributions to our knowledge of how human social structures and interactions influence economic decision-making, Jackson is concerned about how human behaviour may evolve in response to AI.

“It’s important for us to understand how interactions with AI are going to change our behaviors and how that will change our welfare and our society,” Jackson concluded. “The more we understand early on—the more we can understand where to expect great things from AI and where to expect bad things—the better we can do to steer things in a better direction.”

Meta Addresses AI Chatbot's YouTube Training Data Assertion

 


Eventually, artificial intelligence systems like ChatGPT will run out of the tens of trillions of words people have been writing and sharing on the web, which keeps them smarter. In a new study released on Thursday by Epoch AI, researchers estimate that tech companies will exhaust the available training data for AI language models sometime between 2026 and 2032 if the industry is to be expected to use public training data in the future. 

It is more open than Meta that the Meta AI chatbot will share its training data with me. It is widely known that Meta, formerly known as Facebook, has been trying to move into the generative AI space since last year. The company was aiming to keep up with the public's interest sparked by the launch of OpenAI's ChatGPT in late 2022. In April of this year, Meta AI was expanded to include a chat and image generator feature on all its apps, including Instagram and WhatsApp. However, much information about how Meta AI was trained has not been released to date. 

A series of questions were asked by Business Insider of Meta AI regarding the data it was trained on and the method by which Meta obtained such data. In the interview with Business Insider, Meta AI revealed that it had been trained on a large dataset of transcriptions from YouTube videos, as reported by Business Insider. Furthermore, it said that Meta has its web scraper bot, referred to as "MSAE" (Meta Scraping and Extraction), which scrapes a huge amount of information off the web to use for the training of AI systems. This scraper was never disclosed to Meta previously. 

The terms of service of YouTube do not allow users to collect their data by using bots and scrapers, nor can they use such data without permission from YouTube. As a result of this, OpenAI has recently come under scrutiny for purportedly using such data. According to a Meta spokesman, Meta AI has given correct answers regarding its scraper and training data. However, the spokesman suggested that Meta AI may be wrong in the process. 

A spokesperson from Intel explained that creative AI requires a large amount of data to be effectively trained, so data from a wide variety of sources is utilised for training, including publicly available information online as well as data that has been annotated. As part of its initial training, Meta AI said that 3.7 million YouTube videos had been transcribed by a third party. It was confirmed by Meta AI's chatbot that it did not use its scraper bot to scrape YouTube videos directly. In response to further questions on Meta AI's YouTube training data, Meta AI replied that another dataset with transcriptions from 6 million YouTube videos was also compiled by a third party as part of its training data set.

Besides the 1.5 million YouTube transcriptions and subtitles included in its training dataset, the company also added two more sets of YouTube subtitles, one with 2.5 million subtitles and another with 1.5 million subtitles, as well as several transcriptions from 2,500 YouTube stories showcasing TED Talks. In Meta AI's opinion, all of the data sets were compiled by third parties after they had been collected by them. According to Meta's chatbot, the company takes steps to ensure that it does not gather copyrighted information on its users. However, from my understanding, Meta AI in some form scrapes the web in an ongoing manner. 

As a result of several queries, results displayed sources including NBC News, CNN, and The Financial Times among others. In most cases, Meta AI does not include sources for its responses, unless specifically requested to provide such information. A new paid deal with Meta AI would provide Meta AI with access to more AI training data, which could improve the results of Meta AI in the future, according to BI reporting. As well as respecting robots.txt, Meta AI said it abides by the robots.txt protocol, a set of guidelines that website owners can use to ostensibly prevent bots from scraping pages for training AI. 

Meta used a large language model called Llama to develop the chatbot. Meta AI has yet to release an accompanying paper for the new model or disclose the training data used for the model, even though Llama 3 was released in April, around the time Meta AI was expanded. It was Meta's blog post that revealed that the huge set of 15 trillion tokens used to train Llama 3 was sourced from public sources, meaning "publicly available sources." Web scrapers can extract almost all available content that is accessible on the web, and they can do so effectively with tools such as OpenAI's GPTBot, Google's GoogleBot, and Common Crawl's CCBot. 

The content is stored in massive datasets fed into LLMs and often regurgitated by generative AI tools like ChatGPT. Several ongoing lawsuits concern owned and copyrighted content being freely absorbed by the world's biggest tech companies. The US Copyright Office is expected to release new guidance on acceptable uses for AI companies later this year. 

The content is stored in extensive datasets that are incorporated into large language models (LLMs) and frequently reproduced by generative AI tools such as ChatGPT. Multiple ongoing lawsuits address the issue of proprietary and copyrighted material being utilized without permission by major technology companies. The United States Copyright Office is anticipated to issue new guidelines later this year regarding the permissible use of such content by AI companies. 

Risks of Generative AI for Organisations and How to Manage Them

 

Employers should be aware of the potential data protection issues before experimenting with generative AI tools like ChatGPT. You can't just feed human resources data into a generative AI tool because of the rise in privacy and data protection laws in the US, Europe, and other countries in recent years. After all, employee data—including performance, financial, and even health data—is often quite sensitive.

Obviously, this is an area where companies should seek legal advice. It's also a good idea to consult with an AI expert regarding the ethics of utilising generative AI (to ensure that you're acting not only legally, but also ethically and transparently). But, as a starting point, here are two major factors that employers should be aware of. 

Feeding personal data

As I previously stated, employee data is often highly sensitive and sensitive. It is precisely the type of data that, depending on your jurisdiction, is usually subject to the most stringent forms of legal protection.

This makes it highly dangerous to feed such data into a generative AI tool. Why? Because many generative AI technologies use the information provided to fine-tune the underlying language model. In other words, it may use the data you provide for training purposes, and it may eventually expose that information to other users. So, suppose you employ a generative AI tool to generate a report on employee salary based on internal employee information. In the future, the AI tool can employ the data to generate responses for other users (outside of your organisation). Personal information could easily be absorbed by the generative AI tool and reused. 

This isn't as shady as it sounds. Many generative AI programmes' terms and conditions explicitly specify that data provided to the AI may be utilised for training and fine-tuning or revealed when users request cases of previously submitted inquiries. As a result, when you agree to the terms of service, always make sure you understand exactly what you're getting yourself into. Experts urge that any data given to a generative AI service be anonymised and free of personally identifiable information. This is frequently referred to as "de-identifying" the data.

Risks of generative AI outputs 

There are risks associated with the output or content developed by generative AIs, in addition to the data fed into them. In particular, there is a possibility that the output from generative AI technologies will be based on personal data acquired and handled in violation of data privacy laws. 

For example, suppose you ask a generative AI tool to provide a report on average IT salary in your area. There is a possibility that the programme will scrape personal data from the internet without your authorization, violating data protection rules, and then serve it to you. Employers who exploit personal data provided by a generative AI tool may be held liable for data protection violations. For the time being, it is a legal grey area, with the generative AI provider likely bearing the most or all of the duty, but the risk remains. 

Cases like this are already appearing. Indeed, one lawsuit claims that ChatGPT was trained on "massive amounts of personal data," such as medical records and information about children, that was accessed without consent. You do not want your organisation to become unwittingly involved in a litigation like this. Essentially, we're discussing an "inherited" risk of violating data protection regulations. However, there is a risk involved. 

The way forward

Employers must carefully evaluate the data protection and privacy consequences of utilising generative AI and seek expert assistance. However, don't let this put you off adopting generative AI altogether. Generative AI, when used properly and within the bounds of the law, can be an extremely helpful tool for organisations.

Shadow IT Surge Poses Growing Threat to Corporate Data Security

 


It was recently found that 93% of cybersecurity leaders have deployed generative artificial intelligence in their organizations, yet 34% of those implementing the technology have not taken steps to minimize security risks, according to a recent survey conducted by cybersecurity firm Splunk, which was previously reported by CFO Dive. 

In the coming years, digital transformation and cloud migration will become increasingly commonplace in every sector of the economy, raising the amount of data businesses must store, process and manage, as well as the amount of data they must manage. Even though external threats such as hacking, phishing, and ransomware are given a great deal of attention, it is equally critical for companies to manage their data internally to ensure data security is maintained. 

In an organization, shadow data is information that is not approved by the organization or overseen by it. An employee's use of applications, services, or devices that their employer has not approved can be considered a feature (or a bug?) of the modern workplace. Whether it is a cloud storage account, an unofficial collaboration tool, or an unsanctioned SaaS application, shadow data can be generated from a variety of sources. 

In general, shadow data is not accounted for in the security and compliance frameworks of organizations, which leaves a glaring blind spot in data protection strategies, which is why it poses the biggest challenge. A report by Splunk says, “Such thoughtful policies can help minimize data leakage and new vulnerabilities, but they cannot necessarily prevent a complete breach.” However, they can help minimize these risks. 

According to the study by Cyberhaven, AI adoption has been so rapid that knowledge workers are now putting more corporate data into AI tools on a Saturday and Sunday than they were putting into the AI tools during the middle of last year's workweek on average. This could mean that workers are using AI tools early on in the adoption cycle, even before the IT department is formally instructed to purchase them. 

The result would be the so-called 'shadow AI,' or the use of AI tools by employees through their accounts that are not sanctioned by the company, and maybe no one is even aware of it. Using AI in the workplace is gaining traction. The amount of corporate data workers are putting into AI tools has jumped by 485% from March 2023 to March 2024, and the trend is accelerating. There are 23.6% of tech workers in March 2024 who use AI tools for their work (the highest rate of any industry). 

It is estimated that only 4.7% of employees in the financial sector, 2.8% in the pharmaceuticals industry, and 0.6% in manufacturing industries use AI tools. The use of risky "shadow AI" accounts is growing as end users outpace corporate IT. There are 73.8% of ChatGPT users who use the application through non-corporate accounts. 

However, unlike enterprise versions of ChatGPT, the enterprise versions incorporate whatever information you share in public models as well. According to the data, the percentage of non-corporate accounts is even higher for Gemini (94.4%) and Bard (95.9%). AI products from the big three: OpenAI, Google, and Microsoft accounted for 96.0% of AI use at work. Research and development materials created by artificial intelligence-generated tools have been used in potentially risky ways currently. 

In March 2024, 3.4% of the materials were created by artificial intelligence-generated tools, which could potentially create a risk if patented materials were included. As a result, 3.2% of the insertions of source code are being generated by AI outside of traditional coding tools (which are equipped with enterprise-approved copilots for coding), which can potentially place the development of vulnerabilities at risk. 

In terms of graphics and design, 3.0% of the content is generated using AI. The problem here is that AI can be used to produce trademarked material which can pose a problem. IT administrators, security teams, and the protocols that are designed to ensure security are unable to see shadow data due to its invisibility. The fact that shadow data exists outside of the networks and systems that have been approved for data protection means that it can be bypassed easily by any protection measures put in place. 

The risk of a breach or leak when data is left unmonitored increases and does not only complicate compliance with regulations such as GDPR or HIPAA but also makes compliance with data protection laws harder. As such, an organization is not able to effectively manage all of its data assets due to an absence of visibility, resulting in a loss of efficiency and a risk of data redundancy. Shadow data poses various security risks, which include unauthorized access to sensitive data, breaches in data security, and the potential for sensitive information to be exfiltrated. 

Shadow data can be a threat from a compliance standpoint because it only requires a minimal amount of protection from inadequacies in data security. Furthermore, there is an additional risk of data loss when data is stored in unofficial locations, since such personal data may not be backed up or protected against deletion if it is accidentally deleted. The surge in Shadow IT poses significant risks to organizations, with potential repercussions that include financial penalties, reputational damage, and operational disruptions. 

It is crucial to understand the distinctions between Shadow IT and Shadow Data to effectively address these threats. Shadow IT refers to the unauthorized use of tools and technologies within an organization. These tools, often implemented without the knowledge or approval of the IT department, can create substantial security and compliance challenges. Conversely, shadow data pertains to the information assets that these unauthorized tools generate and manage.

This data, regardless of its source or storage location, introduces its own set of risks and requires separate strategies for protection. Addressing Shadow IT necessitates robust control and monitoring mechanisms to manage the use of unauthorized technologies. This involves implementing policies and systems to detect and regulate non-sanctioned IT tools, ensuring that all technological resources align with the organization's security and compliance standards. 

On the other hand, managing shadow data requires a focus on identifying and safeguarding the data itself. This involves comprehensive data governance practices that protect sensitive information, ensuring it is secure, regardless of how it is created or stored. Effective management of shadow data demands a thorough understanding of where this data resides, how it is accessed, and the potential vulnerabilities it may introduce. Recognizing the nuanced differences between Shadow IT and Shadow Data is essential for developing effective governance and security strategies. 

By clearly delineating between the tools and the data they produce, organizations can better tailor their approaches to mitigate the risks associated with each. This distinction allows for more targeted and efficient protection measures, ultimately enhancing the organization's overall security posture and compliance efforts.

OpenAI and Stack Overflow Partnership: A Controversial Collaboration

OpenAI and Stack Overflow Partnership: A Controversial Collaboration

The Partnership Details

OpenAI and Stack Overflow are collaborating through OverflowAPI access to provide OpenAI users and customers with the correct and validated data foundation that AI technologies require to swiftly solve an issue, allowing engineers to focus on critical tasks. 

OpenAI will additionally share validated technical knowledge from Stack Overflow directly in ChatGPT, allowing users to quickly access trustworthy, credited, correct, and highly technical expertise and code backed by millions of developers who have contributed to the Stack Overflow platform over the last 15 years.

User Protests and Concerns

However, several Stack Overflow users were concerned about this partnership since they felt it was unethical for OpenAI to profit from their content without authorization.

Following the news, some users wished to erase their responses, including those with the most votes. However, StackCommerce does not often enable the deletion of posts if the question has any answers.

Ben, Epic Games' user interface designer, stated that he attempted to change his highest-rated responses and replace them with a message criticizing the relationship with OpenAI.

Stack Overflow won't let you erase questions with acceptable answers and high upvotes because this would remove knowledge from the community. Ben posted on Mastodon.

Instead, he changed his top-rated responses to a protest message. Within an hour, the moderators had changed the questions and banned Ben's account for seven days.

Ben, however, uploaded a screenshot showing Stack Overflow suspending his account after rolling back the modified messages to the original response. 

Stack Overflow’s Stance

Moderators on Stack Overflow clarified in an email that Ben shared that users are not able to remove posts because they negatively impact the community as a whole.

It is not appropriate to remove posts that could be helpful to others unless there are particular circumstances. The basic principle of Stack Exchange is that knowledge is helpful to others who might encounter similar issues in the future, even if the post's original author can no longer use it, replied Stack Exchange moderators to users on mail.

GDPR Considerations:

Article 17 of the GDPR rules grants users in the EU the “right to be forgotten,” allowing them to request the removal of personal data.

However, Article 17(3) states that websites have the right not to delete data necessary for “exercising the right of freedom of expression and information.”

Stack Overflow cited this provision when explaining why they do not allow users to remove posts

The partnership between OpenAI and Stack Overflow has sparked controversy, with users expressing concerns about data usage and freedom of expression. Stack Overflow’s decision to suspend users who altered their answers in protest highlights the challenges of balancing privacy rights and community knowledge