These underground markets that deal with malicious large language models (LLMs) are called Mallas. This blog dives into the details of this dark industry and discusses the impact of these illicit LLMs on cybersecurity.
LLMs, like OpenAI' GPT-4 have shown fine results in natural language processing, bringing applications like chatbots for content generation. However, the same tech that supports these useful apps can be misused for suspicious activities.
Recently, researchers from Indian University Bloomington found 212 malicious LLMs on underground marketplaces between April and September last year. One of the models "WormGPT" made around $28,000 in just two months, revealing a trend among threat actors misusing AI and a rising demand for these harmful tools.
Various LLMs in the market were uncensored and built using open-source standards, few were jailbroken commercial models. Threat actors used Mallas to write phishing emails, build malware, and exploit zero days.
Tech giants working in the AI models industry have built measures to protect against jailbreaking and detecting malicious attempts. But threat actors have also found ways to jump the guardrails and trick AI models like Google Meta, OpenAI, and Anthropic into providing malicious info.
Experts found two uncensored LLMs: DarkGPT, which costs 78 cents per 50 messages, and Escape GPT, a subscription model that charges $64.98 a month. Both models generate harmful code that antivirus tools fail to detect two-thirds of the time. Another model "WolfGPT" costs $150, and allows users to write phishing emails that can escape most spam detectors.
The research findings suggest all harmful AI models could make malware, and 41.5% could create phishing emails. These models were built upon OpenAI's GPT-3.5 and GPT-4, Claude Instant, Claude-2-100k, and Pygmalion 13B.
To fight these threats, experts have suggested a dataset of prompts used to make malware and escape safety features. AI companies should release models with default censorship settings and allow access to illicit models only for research purposes.
Enterprises are rapidly embracing Artificial Intelligence (AI) and Machine Learning (ML) tools, with transactions skyrocketing by almost 600% in less than a year, according to a recent report by Zscaler. The surge, from 521 million transactions in April 2023 to 3.1 billion monthly by January 2024, underscores a growing reliance on these technologies. However, heightened security concerns have led to a 577% increase in blocked AI/ML transactions, as organisations grapple with emerging cyber threats.
The report highlights the developing tactics of cyber attackers, who now exploit AI tools like Language Model-based Machine Learning (LLMs) to infiltrate organisations covertly. Adversarial AI, a form of AI designed to bypass traditional security measures, poses a particularly stealthy threat.
Concerns about data protection and privacy loom large as enterprises integrate AI/ML tools into their operations. Industries such as healthcare, finance, insurance, services, technology, and manufacturing are at risk, with manufacturing leading in AI traffic generation.
To mitigate risks, many Chief Information Security Officers (CISOs) opt to block a record number of AI/ML transactions, although this approach is seen as a short-term solution. The most commonly blocked AI tools include ChatGPT and OpenAI, while domains like Bing.com and Drift.com are among the most frequently blocked.
However, blocking transactions alone may not suffice in the face of evolving cyber threats. Leading cybersecurity vendors are exploring novel approaches to threat detection, leveraging telemetry data and AI capabilities to identify and respond to potential risks more effectively.
CISOs and security teams face a daunting task in defending against AI-driven attacks, necessitating a comprehensive cybersecurity strategy. Balancing productivity and security is crucial, as evidenced by recent incidents like vishing and smishing attacks targeting high-profile executives.
Attackers increasingly leverage AI in ransomware attacks, automating various stages of the attack chain for faster and more targeted strikes. Generative AI, in particular, enables attackers to identify vulnerabilities and exploit them with greater efficiency, posing significant challenges to enterprise security.
Taking into account these advancements, enterprises must prioritise risk management and enhance their cybersecurity posture to combat the dynamic AI threat landscape. Educating board members and implementing robust security measures are essential in safeguarding against AI-driven cyberattacks.
As institutions deal with the complexities of AI adoption, ensuring data privacy, protecting intellectual property, and mitigating the risks associated with AI tools become paramount. By staying vigilant and adopting proactive security measures, enterprises can better defend against the growing threat posed by these cyberattacks.
In a comprehensive study conducted by the Amazon Web Services (AWS) AI Lab, a disconcerting reality has surfaced, shaking the foundations of internet content. Shockingly, an extensive 57.1% of all sentences on the web have undergone translation into two or more languages, and the culprit behind this linguistic convolution is none other than large language model (LLM)-powered AI.
The crux of the issue resides in what researchers term as "lower-resource languages." These are languages for which there is a scarcity of data available for the effective training of AI models. The domino effect begins with AI generating vast quantities of substandard English content. Following this, AI-powered translation tools enter the stage, exacerbating the degradation as they transcribe the material into various other languages. The motive behind this cascade of content manipulation is a profit-driven strategy, aiming to capture clickbait-driven ad revenue. The outcome is the flooding of entire internet regions with an abundance of deteriorating AI-generated copies, creating a dreading universe of misinformation.
The AWS researchers express profound concern, eemphasising that machine-generated, multi-way parallel translations not only dominate the total translated content in lower-resource languages but also constitute a substantial fraction of the overall web content in those languages. This amplifies the scale of the issue, underscoring its potential to significantly impact diverse online communities.
The challenges posed by AI-generated content are not isolated incidents. Tech giants like Google and Amazon have grappled with the ramifications of AI-generated material affecting their search algorithms, news platforms, and product listings. The issues are multifaceted, encompassing not only the degradation of content quality but also violations of ethical use policies.
While the English-language web has been experiencing a gradual infiltration of AI-generated content, the study highlights that non-English speakers are facing a more immediate and critical problem. Beyond being a mere inconvenience, the prevalence of AI-generated gibberish raises a formidable barrier to the effective training of AI models in lower-resource languages. This is a significant setback for the scientific community, as the inundation of nonsensical translations hinders the acquisition of high-quality data necessary for training advanced language models.
The pervasive issue of AI-generated content poses a substantial threat to the usability of the web, transcending linguistic and geographical boundaries. Striking a balance between technological advancements and content reliability is imperative for maintaining the internet as a trustworthy and informative space for users globally. Addressing this challenge requires a collaborative effort from researchers, industry stakeholders, and policymakers to safeguard the integrity of online information. Otherwise this one-stop digital world that we all count on to disseminate information is destined to be doomed.
A user may as well gain access to one such ‘evil’ version of OpenAI’s ChatGPT. While these AI versions may not necessarily by legal in some parts of the world, it could be pricey.
Gaining access to the evil chatbot versions could be tricky. To do so, a user must find the right web forum with the right users. To be sure, these users might have posted the marketed a private and powerful large language model (LLM). One can get in touch with these users in encrypted messaging services like Telegram, where they might ask for a few hundred crypto dollars for an LLM.
After gaining the access, users can now do anything, especially the ones that are prohibited in ChatGPT and Google’s Bard, like having conversation with the AI on how to make pipe bombs or cook meth, engaging in discussions about any illegal or morally questionable subject under the sun, or even using it to finance phishing schemes and other cybercrimes.
“We’ve got folks who are building LLMs that are designed to write more convincing phishing email scams or allowing them to code new types of malware because they’re trained off of the code from previously available malware[…]Both of these things make the attacks more potent, because they’re trained off of the knowledge of the attacks that came before them,” says Dominic Sellitto, a cybersecurity and digital privacy researcher at the University of Buffalo.
These models are becoming more prevalent, strong, and challenging to regulate. They also herald the opening of a new front in the war on cybercrime, one that cuts far beyond text generators like ChatGPT and into the domains of audio, video, and graphics.
“We’re blurring the boundaries in many ways between what is artificially generated and what isn’t[…]“The same goes for the written text, and the same goes for images and everything in between,” explained Sellitto.
Phishing emails, which demand that a user provide their financial information immediately to the Social Security Administration or their bank in order to resolve a fictitious crisis, cost American consumers close to $8.8 billion annually. The emails may contain seemingly innocuous links that actually download malware or viruses, allowing hackers to take advantage of any sensitive data directly from the victim's computer.
Fortunately, these phishing mails are quite easy to detect. In case they have not yet found their way to a user’s spam folder, one can easily identify them on the basis of their language, which may be informal and grammatically incorrect wordings that any legit financial firm would never use.
However, with ChatGPT, it is becoming difficult to spot any error in the phishing mails, bringing about a veritable AI generative boom.
“The technology hasn’t always been available on digital black markets[…]It primarily started when ChatGPT became mainstream. There were some basic text generation tools that might have used machine learning but nothing impressive,” Daniel Kelley, a former black hat computer hacker and cybersecurity consultant explains.
According to Kelley, these LLMs come in a variety of forms, including BlackHatGPT, WolfGPT, and EvilGPT. He claimed that many of these models, despite their nefarious names, are actually just instances of AI jailbreaks, a word used to describe the deft manipulation of already-existing LLMs such as ChatGPT to achieve desired results. Subsequently, these models are encapsulated within a customized user interface, creating the impression that ChatGPT is an entirely distinct chatbot.
However, this does not make AI models any less harmful. In fact, Kelley believes that one particular model is both one of the most evil and genuine ones: According to one description of WormGPT on a forum promoting the model, it is an LLM made especially for cybercrime that "lets you do all sorts of illegal stuff and easily sell it online in the future."
Both Kelley and Sellitto agrees that WormGPT could be used in business email compromise (BEC) attacks, a kind of phishing technique in which employees' information is stolen by pretending to be a higher-up or another authority figure. The language that the algorithm generates is incredibly clear, with precise grammar and sentence structure making it considerably more difficult to spot at first glance.
One must also take this into account that with easier access to the internet, really anyone can download these notorious AI models, making it easier to be disseminated. It is similar to a service that offers same-day mailing for buying firearms and ski masks, only that these firearms and ski masks are targeted at and built for criminals.
ChatGPT is a large language model (LLM) from OpenAI that can generate text, translate languages, write different kinds of creative content, and answer your questions in an informative way. It is still under development, but it has already been used for a variety of purposes, including creative writing, code generation, and research.
However, ChatGPT also poses some security and privacy risks. These risks are highlighted in the following articles:
Overall, ChatGPT is a powerful tool with a number of potential benefits. However, it is important to be aware of the security and privacy risks associated with using it. Users should carefully consider the instructions they give to ChatGPT and only use trusted plugins. They should also be careful about what websites and web applications they authorize ChatGPT to access.
Here are some additional tips for using ChatGPT safely:
The size of the language models in the LLaMA collection ranges from 7 billion to 65 billion parameters. In contrast, the GPT-3 model from OpenAI, which served as the basis for ChatGPT, has 175 billion parameters.
Meta can potentially release its LLaMA model and its weights available as open source, since it has trained models through the openly available datasets like Common Crawl, Wkipedia, and C4. Thus, marking a breakthrough in a field where Big Tech competitors in the AI race have traditionally kept their most potent AI technology to themselves.
In regards to the same, Project member Guillaume’s tweet read "Unlike Chinchilla, PaLM, or GPT-3, we only use datasets publicly available, making our work compatible with open-sourcing and reproducible, while most existing models rely on data which is either not publicly available or undocumented."
Meta refers to its LLaMA models as "foundational models," which indicates that the company intends for the models to serve as the basis for future, more sophisticated AI models built off the technology, the same way OpenAI constructed ChatGPT on the base of GPT-3. The company anticipates using LLaMA to further applications like "question answering, natural language understanding or reading comprehension, understanding capabilities and limitations of present language models" and to aid in natural language research.
While the top-of-the-line LLaMA model (LLaMA-65B, with 65 billion parameters) competes head-to-head with comparable products from rival AI labs DeepMind, Google, and OpenAI, arguably the most intriguing development comes from the LLaMA-13B model, which, as previously mentioned, can reportedly outperform GPT-3 while running on a single GPU when measured across eight common "common sense reasoning" benchmarks like BoolQ, PIQA LLaMA-13B opens the door for ChatGPT-like performance on consumer-level hardware in the near future, unlike the data center requirements for GPT-3 derivatives.
In AI, parameter size is significant. A parameter is a variable that a machine-learning model employs in order to generate hypotheses or categorize data as input. The size of a language model's parameter set significantly affects how well it performs, with larger models typically able to handle more challenging tasks and generate output that is more coherent. However, more parameters take up more room and use more computing resources to function. A model is significantly more efficient if it can provide the same outcomes as another model with fewer parameters.
"I'm now thinking that we will be running language models with a sizable portion of the capabilities of ChatGPT on our own (top of the range) mobile phones and laptops within a year or two," according to Simon Willison, an independent AI researcher in an Mastodon thread analyzing and monitoring the impact of Meta’s new AI models.
Currently, a simplified version of LLaMA is being made available on GitHub. The whole code and weights (the "learned" training data in a neural network) can be obtained by filling out a form provided by Meta. A wider release of the model and weights has not yet been announced by Meta.