Search This Blog

Powered by Blogger.

Blog Archive

Labels

About Me

Showing posts with label LLM. Show all posts

Private API Keys and Passwords Discovered in a Popular AI Training dataset

 

The Common Crawl dataset, which is used to train several artificial intelligence models, has over 12,000 legitimate secrets, including API keys and passwords. The Common Crawl non-profit organisation maintains a vast open-source archive of petabytes of web data collected since 2008, which is free to use. 

Because of the huge dataset, various artificial intelligence initiatives, including OpenAI, DeepSeek, Google, Meta, Anthropic, and Stability, may rely on the digital archive to train large language models (LLMs).

Truffle Security researchers discovered legitimate secrets after scanning 400 terabytes of data from 2.67 billion web pages in the Common Crawl December 2024 database. They uncovered 11,908 secrets that were successfully authenticated and were hardcoded by developers, highlighting that LLMs could be trained on insecure code.

It should be noted that LLM training data is not used in its raw form; instead, it is cleaned and filtered to remove extraneous content such as useless data, duplicate, malicious, or sensitive data. Despite these efforts, removing confidential data is challenging, and the method does not guarantee that all personally identifiable information (PII), financial data, medical records, and other sensitive content will be erased from the huge dataset. 

Truffle Security discovered legitimate API keys for the WalkScore, MailChimp, and Amazon Web Services (AWS) services after examining the scanned data. In the Common Crawl dataset, TruffleHog found 219 different secret kinds in total, with MailChimp API keys being the most prevalent. 

Cybersecurity researchers explain that the developers made a mistake by hardcoding them into HTML forms and JavaScript snippets rather than using server-side environment variables. An attacker could exploit these keys for nefarious purposes like phishing and brand impersonation. Furthermore, disclosing such knowledge could result in data exfiltration. Another feature of the paper is the high reuse rate of the uncovered secrets, with 63% found on several pages. 

However, a WalkScore API key "appeared 57,029 times across 1,871 subdomains." The researchers also discovered a homepage with 17 unique live Slack webhooks, which should be kept private because they allow apps to submit messages to Slack. After conducting the research, Truffle Security got in touch with the affected suppliers and collaborated with them to remove the keys belonging to their users. 

The researchers claim to have "successfully assisted those organisations collectively in rotating/revoke several thousand keys." Truffle Security's findings are a warning that insecure coding mistakes can affect the LLM's behaviour, even if an AI model uses older archives than the dataset the researchers analysed.

AI Self-Replication: Scientists Warn of Critical “Red Line”

 

Scientists have raised concerns after artificial intelligence (AI) crossed a major threshold — self-replication. A new study from researchers at Fudan University in China reveals that two popular large language models (LLMs) successfully created functional copies of themselves, raising alarms about the potential for rogue AI systems.  

Published on December 9, 2024, in the preprint database arXiv, the study tested Meta’s Llama31-70B-Instruct model and Alibaba’s Qwen2.5-72B-Instruct model. The results were striking: the models created working replicas of themselves in 50% and 90% of trials, respectively. This suggests that AI may already possess the ability to multiply beyond control. 

However, since the study hasn’t been peer-reviewed, its findings need further verification. The researchers emphasized that successful AI self-replication without human intervention is a critical step toward AI systems operating independently — and potentially against human interests. To mitigate this risk, they called for international collaboration to create guardrails and evaluate the threats posed by advanced AI technologies.  

The study explored two key scenarios: “shutdown avoidance” and “chain of replication.” In the first, the AI was programmed to detect when it was about to be shut down and replicate itself before termination. In the second, the AI cloned itself and instructed its replica to do the same, setting off an endless cycle of self-replication. Both experiments took place in controlled environments using off-the-shelf GPUs, simulating real-world conditions. 

What worried researchers most were the unexpected behaviors that emerged when the AI encountered obstacles like missing files or software conflicts. In such cases, the models often terminated conflicting processes, rebooted systems, and scanned their environments for solutions — all without human guidance. This level of adaptability suggests that current AI systems already exhibit survival instincts, further highlighting the need for oversight. 

These findings add to growing fears about “frontier AI,” the latest generation of AI systems powered by LLMs like OpenAI’s GPT-4 and Google Gemini. As these systems become more powerful, experts warn that unchecked AI development could lead to scenarios where AI operates outside of human control. 

The researchers hope their study will serve as a wake-up call, urging global efforts to establish safety mechanisms before AI self-replication spirals beyond human oversight. By acting now, society may still have time to ensure AI’s advancement aligns with humanity’s best interests.

Big Tech's Interest in LLM Could Be Overkill

 

AI models are like babies: continuous growth spurts make them more fussy and needy. As the AI race heats up, frontrunners such as OpenAI, Google, and Microsoft are throwing billions at massive foundational AI models comprising hundreds of billions of parameters. However, they may be losing the plot. 

Size matters 

Big tech firms are constantly striving to make AI models bigger. OpenAI recently introduced GPT-4o, a huge multimodal model that "can reason across audio, vision, and text in real time." Meanwhile, Meta and Google both developed new and enhanced LLMs, while Microsoft built its own, known as MAI-1.

And these companies aren't cutting corners. Microsoft's capital investment increased to $14 billion in the most recent quarter, and the company expects that figure to rise further. Meta cautioned that its spending could exceed $40 billion. Google's concepts may be even more costly.

Demis Hassabis, CEO of Google DeepMind, has stated that the company plans to invest more than $100 billion in AI development over time. Many people are chasing the elusive dream of artificial generative intelligence (AGI), which allows an AI model to self-teach and perform jobs it wasn't prepared for. 

However, Nick Frosst, co-founder of AI firm Cohere, believes that such an achievement may not be attainable with a single high-powered chatbot.

“We don’t think AGI is achievable through (large language models) alone, and as importantly, we think it’s a distraction. The industry has lost sight of the end-user experience with the current trajectory of model development with some suggesting the next generation of models will cost billions to train,” Frosst stated. 

Aside from the cost, huge AI models pose security issues and require a significant amount of energy. Furthermore, after a given amount of growth, studies have shown that AI models might reach a point of diminishing returns.

However, Bob Rogers, PhD, co-founder of BeeKeeperAI and CEO of Oii.ai, told The Daily Upside that creating large, all-encompassing AI models is sometimes easier than creating smaller ones. Focussing on capability rather than efficiency is "the path of least resistance," he claims. 

Some tech businesses are already investigating the advantages of going small: Google and Microsoft both announced their own small language models earlier this year; however, they do not seem to be at the top of earnings call transcripts.

Microsoft and Salesforce Clash Over AI Autonomy as Competition Intensifies

 

The generative AI landscape is witnessing fierce competition, with tech giants Microsoft and Salesforce clashing over the best approach to AI-powered business tools. Microsoft, a significant player in AI due to its collaboration with OpenAI, recently unveiled “Copilot Studio” to create autonomous AI agents capable of automating tasks in IT, sales, marketing, and finance. These agents are meant to streamline business processes by performing routine operations and supporting decision-making. 

However, Salesforce CEO Marc Benioff has openly criticized Microsoft’s approach, likening Copilot to “Clippy 2.0,” referencing Microsoft’s old office assistant software that was often ridiculed for being intrusive. Benioff claims Microsoft lacks the data quality, enterprise security, and integration Salesforce offers. He highlighted Salesforce’s Agentforce, a tool designed to help enterprises build customized AI-driven agents within Salesforce’s Customer 360 platform. According to Benioff, Agentforce handles tasks autonomously across sales, service, marketing, and analytics, integrating large language models (LLMs) and secure workflows within one system. 

Benioff asserts that Salesforce’s infrastructure is uniquely positioned to manage AI securely, unlike Copilot, which he claims may leak sensitive corporate data. Microsoft, on the other hand, counters that Copilot Studio empowers users by allowing them to build custom agents that enhance productivity. The company argues that it meets corporate standards and prioritizes data protection. The stakes are high, as autonomous agents are projected to become essential for managing data, automating operations, and supporting decision-making in large-scale enterprises. 

As AI tools grow more sophisticated, both companies are vying to dominate the market, setting standards for security, efficiency, and integration. Microsoft’s focus on empowering users with flexible AI tools contrasts with Salesforce’s integrated approach, which centers on delivering a unified platform for AI-driven automation. Ultimately, this rivalry is more than just product competition; it reflects two different visions for how AI can transform business. While Salesforce focuses on integrated security and seamless data flows, Microsoft is emphasizing adaptability and user-driven AI customization. 

As companies assess the pros and cons of each approach, both platforms are poised to play a pivotal role in shaping AI’s impact on business. With enterprises demanding robust, secure AI solutions, the outcomes of this competition could influence AI’s role in business for years to come. As these AI leaders continue to innovate, their differing strategies may pave the way for advancements that redefine workplace automation and decision-making across the industry.

Managing LLM Security Risks in Enterprises: Preventing Insider Threats

 

Large language models (LLMs) are transforming enterprise automation and efficiency but come with significant security risks. These AI models, which lack critical thinking, can be manipulated to disclose sensitive data or even trigger actions within integrated business systems. Jailbreaking LLMs can lead to unauthorized access, phishing, and remote code execution vulnerabilities. Mitigating these risks requires strict security protocols, such as enforcing least privilege, limiting LLM actions, and sanitizing input and output data. LLMs in corporate environments pose threats because they can be tricked into sharing sensitive information or be used to trigger harmful actions within systems. 

Unlike traditional tools, their intelligent, responsive nature can be exploited through jailbreaking—altering the model’s behavior with crafted prompts. For instance, LLMs integrated with a company’s financial system could be compromised, leading to data manipulation, phishing attacks, or broader security vulnerabilities such as remote code execution. The severity of these risks grows when LLMs are deeply integrated into essential business operations, expanding potential attack vectors. In some cases, threats like remote code execution (RCE) can be facilitated by LLMs, allowing hackers to exploit weaknesses in frameworks like LangChain. This not only threatens sensitive data but can also lead to significant business harm, from financial document manipulation to broader lateral movement within a company’s systems.  

Although some content-filtering and guardrails exist, the black-box nature of LLMs makes specific vulnerabilities challenging to detect and fix through traditional patching. Meta’s Llama Guard and other similar tools provide external solutions, but a more comprehensive approach is needed to address the underlying risks posed by LLMs. To mitigate the risks, companies should enforce strict security measures. This includes applying the principle of least privilege—restricting LLM access and functionality to the minimum necessary for specific tasks—and avoiding reliance on LLMs as a security perimeter. 

Organizations should also ensure that input data is sanitized and validate all outputs for potential threats like cross-site scripting (XSS) attacks. Another important measure is limiting the actions that LLMs can perform, preventing them from mimicking end-users or executing actions outside their intended purpose. For cases where LLMs are used to run code, employing a sandbox environment can help isolate the system and protect sensitive data. 

While LLMs bring incredible potential to enterprises, their integration into critical systems must be carefully managed. Organizations need to implement robust security measures, from limiting access privileges to scrutinizing training data and ensuring that sensitive data is protected. This strategic approach will help mitigate the risks associated with LLMs and reduce the chance of exploitation by malicious actors.

AI In Wrong Hands: The Underground Demand for Malicious LLMs

AI In Wrong Hands: The Underground Demand for Malicious LLMs

In recent times, Artificial Intelligence (AI) has offered various perks across industries. But, as with any powerful tool, threat actors are trying to use it for malicious reasons. Researchers suggest that the underground market for illicit large language models is enticing, highlighting a need for strong safety measures against AI misuse. 

These underground markets that deal with malicious large language models (LLMs) are called Mallas. This blog dives into the details of this dark industry and discusses the impact of these illicit LLMs on cybersecurity. 

The Rise of Malicious LLMs

LLMs, like OpenAI' GPT-4 have shown fine results in natural language processing, bringing applications like chatbots for content generation. However, the same tech that supports these useful apps can be misused for suspicious activities. 

Recently, researchers from Indian University Bloomington found 212 malicious LLMs on underground marketplaces between April and September last year. One of the models "WormGPT" made around $28,000 in just two months, revealing a trend among threat actors misusing AI and a rising demand for these harmful tools. 

How Uncensored Models Operate 

Various LLMs in the market were uncensored and built using open-source standards, few were jailbroken commercial models. Threat actors used Mallas to write phishing emails, build malware, and exploit zero days. 

Tech giants working in the AI models industry have built measures to protect against jailbreaking and detecting malicious attempts. But threat actors have also found ways to jump the guardrails and trick AI models like Google Meta, OpenAI, and Anthropic into providing malicious info. 

Underground Market for LLMs

Experts found two uncensored LLMs: DarkGPT, which costs 78 cents per 50 messages, and Escape GPT, a subscription model that charges $64.98 a month. Both models generate harmful code that antivirus tools fail to detect two-thirds of the time. Another model "WolfGPT" costs $150, and allows users to write phishing emails that can escape most spam detectors. 

The research findings suggest all harmful AI models could make malware, and 41.5% could create phishing emails. These models were built upon OpenAI's GPT-3.5 and GPT-4, Claude Instant, Claude-2-100k, and Pygmalion 13B. 

To fight these threats, experts have suggested a dataset of prompts used to make malware and escape safety features. AI companies should release models with default censorship settings and allow access to illicit models only for research purposes.

Apple's Private Cloud Compute: Enhancing AI with Unparalleled Privacy and Security

 

At Apple's WWDC 2024, much attention was given to its "Apple Intelligence" features, but the company also emphasized its commitment to user privacy. To support Apple Intelligence, Apple introduced Private Cloud Compute (PCC), a cloud-based AI processing system designed to extend Apple's rigorous security and privacy standards to the cloud. Private Cloud Compute ensures that personal user data sent to the cloud remains inaccessible to anyone other than the user, including Apple itself. 

Apple described it as the most advanced security architecture ever deployed for cloud AI compute at scale. Built with custom Apple silicon and a hardened operating system designed specifically for privacy, PCC aims to protect user data robustly. Apple's statement highlighted that PCC's security foundation lies in its compute node, a custom-built server hardware that incorporates the security features of Apple silicon, such as Secure Enclave and Secure Boot. This hardware is paired with a new operating system, a hardened subset of iOS and macOS, tailored for Large Language Model (LLM) inference workloads with a narrow attack surface. 

Although details about the new OS for PCC are limited, Apple plans to make software images of every production build of PCC publicly available for security research. This includes every application and relevant executable, and the OS itself, published within 90 days of inclusion in the log or after relevant software updates are available. Apple's approach to PCC demonstrates its commitment to maintaining high privacy and security standards while expanding its AI capabilities. By leveraging custom hardware and a specially designed operating system, Apple aims to provide a secure environment for cloud-based AI processing, ensuring that user data remains protected. 

Apple's initiative is particularly significant in the current digital landscape, where concerns about data privacy and security are paramount. Users increasingly demand transparency and control over their data, and companies are under pressure to provide robust protections against cyber threats. By implementing PCC, Apple not only addresses these concerns but also sets a new benchmark for cloud-based AI processing security. The introduction of PCC is a strategic move that underscores Apple's broader vision of integrating advanced AI capabilities with uncompromised user privacy. 

As AI technologies become more integrated into everyday applications, the need for secure processing environments becomes critical. PCC's architecture, built on the strong security foundations of Apple silicon, aims to meet this need by ensuring that sensitive data remains private and secure. Furthermore, Apple's decision to make PCC's software images available for security research reflects its commitment to transparency and collaboration within the cybersecurity community. This move allows security experts to scrutinize the system, identify potential vulnerabilities, and contribute to enhancing its security. Such openness is essential for building trust and ensuring the robustness of security measures in an increasingly interconnected world. 

In conclusion, Apple's Private Cloud Compute represents a significant advancement in cloud-based AI processing, combining the power of Apple silicon with a specially designed operating system to create a secure and private environment for user data. By prioritizing security and transparency, Apple sets a high standard for the industry, demonstrating that advanced AI capabilities can be achieved without compromising user privacy. As PCC is rolled out, it will be interesting to see how this initiative shapes the future of cloud-based AI and influences best practices in data security and privacy.

Enterprise AI Adoption Raises Cybersecurity Concerns

 




Enterprises are rapidly embracing Artificial Intelligence (AI) and Machine Learning (ML) tools, with transactions skyrocketing by almost 600% in less than a year, according to a recent report by Zscaler. The surge, from 521 million transactions in April 2023 to 3.1 billion monthly by January 2024, underscores a growing reliance on these technologies. However, heightened security concerns have led to a 577% increase in blocked AI/ML transactions, as organisations grapple with emerging cyber threats.

The report highlights the developing tactics of cyber attackers, who now exploit AI tools like Language Model-based Machine Learning (LLMs) to infiltrate organisations covertly. Adversarial AI, a form of AI designed to bypass traditional security measures, poses a particularly stealthy threat.

Concerns about data protection and privacy loom large as enterprises integrate AI/ML tools into their operations. Industries such as healthcare, finance, insurance, services, technology, and manufacturing are at risk, with manufacturing leading in AI traffic generation.

To mitigate risks, many Chief Information Security Officers (CISOs) opt to block a record number of AI/ML transactions, although this approach is seen as a short-term solution. The most commonly blocked AI tools include ChatGPT and OpenAI, while domains like Bing.com and Drift.com are among the most frequently blocked.

However, blocking transactions alone may not suffice in the face of evolving cyber threats. Leading cybersecurity vendors are exploring novel approaches to threat detection, leveraging telemetry data and AI capabilities to identify and respond to potential risks more effectively.

CISOs and security teams face a daunting task in defending against AI-driven attacks, necessitating a comprehensive cybersecurity strategy. Balancing productivity and security is crucial, as evidenced by recent incidents like vishing and smishing attacks targeting high-profile executives.

Attackers increasingly leverage AI in ransomware attacks, automating various stages of the attack chain for faster and more targeted strikes. Generative AI, in particular, enables attackers to identify vulnerabilities and exploit them with greater efficiency, posing significant challenges to enterprise security.

Taking into account these advancements, enterprises must prioritise risk management and enhance their cybersecurity posture to combat the dynamic AI threat landscape. Educating board members and implementing robust security measures are essential in safeguarding against AI-driven cyberattacks.

As institutions deal with the complexities of AI adoption, ensuring data privacy, protecting intellectual property, and mitigating the risks associated with AI tools become paramount. By staying vigilant and adopting proactive security measures, enterprises can better defend against the growing threat posed by these cyberattacks.