In recent times, Artificial Intelligence (AI) has offered various perks across industries. But, as with any powerful tool, threat actors are trying to use it for malicious reasons. Researchers suggest that the underground market for illicit large language models is enticing, highlighting a need for strong safety measures against AI misuse.
These underground markets that deal with malicious large language models (LLMs) are called Mallas. This blog dives into the details of this dark industry and discusses the impact of these illicit LLMs on cybersecurity.
The Rise of Malicious LLMs
LLMs, like OpenAI' GPT-4 have shown fine results in natural language processing, bringing applications like chatbots for content generation. However, the same tech that supports these useful apps can be misused for suspicious activities.
Recently, researchers from Indian University Bloomington found 212 malicious LLMs on underground marketplaces between April and September last year. One of the models "WormGPT" made around $28,000 in just two months, revealing a trend among threat actors misusing AI and a rising demand for these harmful tools.
How Uncensored Models Operate
Various LLMs in the market were uncensored and built using open-source standards, few were jailbroken commercial models. Threat actors used Mallas to write phishing emails, build malware, and exploit zero days.
Tech giants working in the AI models industry have built measures to protect against jailbreaking and detecting malicious attempts. But threat actors have also found ways to jump the guardrails and trick AI models like Google Meta, OpenAI, and Anthropic into providing malicious info.
Underground Market for LLMs
Experts found two uncensored LLMs: DarkGPT, which costs 78 cents per 50 messages, and Escape GPT, a subscription model that charges $64.98 a month. Both models generate harmful code that antivirus tools fail to detect two-thirds of the time. Another model "WolfGPT" costs $150, and allows users to write phishing emails that can escape most spam detectors.
The research findings suggest all harmful AI models could make malware, and 41.5% could create phishing emails. These models were built upon OpenAI's GPT-3.5 and GPT-4, Claude Instant, Claude-2-100k, and Pygmalion 13B.
To fight these threats, experts have suggested a dataset of prompts used to make malware and escape safety features. AI companies should release models with default censorship settings and allow access to illicit models only for research purposes.