One of the less discussed consequences in regard to ChatGPT is its privacy risk. Google only yesterday launched Bard, its own conversational AI, and others will undoubtedly follow. Technology firms engaged in AI development have certainly entered a race.
The issue would be its technology, which is entirely based on users’ personal data.
300 Billion Words, How Many Are Yours?
ChatGPT is apparently based on a massive language model, which backs up an enormous amount of data to operate and enhance its functions. Implying, the model gets more adept at seeing patterns, foreseeing what will happen next, and producing credible text as more data is used to train it.
OpenAI, the developer of ChatGPT, sourced the Chatbot model with some 3 million words systematically taken from the internet – via books, articles, websites, and posts – which also undeniably involves online users’ personal information, gathered without their consent.
Every blog post, product review, or comment written on an article, which exists or ever existed in the online world has a good chance that the data or information involved it is was consumed by ChatGPT.
What is the Issue?
The gathered data used in order to train ChatGPT is problematic for numerous reasons.
First, the collected data is unconsented, since none of the online users were ever asked if OpenAI could use their seemingly personal information. Thus, this would be a clear violation of privacy, especially when the data is sensitive and can be used to locate us, identify our loved ones, or identify ourselves.
The usage of data can compromise what we refer to as contextual integrity even when the data is publicly available. This is a cornerstone idea in discussions about privacy law. Information on people must not be made public outside of the context in which it was first created.
Moreover, OpenAI does not include any procedure for users to monitor whether the company has their personal information in-store, or to request it to be taken down. The European General Data Protection Regulation (GDPR), which guarantees this right, is still being discussed as to whether ChatGPT complies with its criteria.
This “right to be forgotten” is specifically essential when it comes to situations involving information that is inaccurate or misleading, which seems to be a regular occurrence in ChatGPT.
Furthermore, the scraped data that ChatGPT was trained on may be confidential or protected by copyright. For instance, the tool replicated the opening few chapters of Joseph Heller's copyrighted book Catch-22.
Finally, OpenAI did not pay for the internet data it downloaded. Its creators—individuals, website owners, and businesses—were not being compensated. This is especially remarkable in light of the recent US$29 billion valuation of OpenAI, which is more than double its value in 2021.
OpenAI has also recently announced ChatGPT Plus, which is a paid subscription plan that will provide users ongoing access to the tool, swift response times, and priority access to its new feature. By 2024, it is anticipated that this approach would help generate $1 billion in revenue.
None of this would have been possible without the usage of ‘our’ data, acquired and utilized without our consent.
Time to Consider the Issue?
According to some professionals and experts, ChatGPT is a “tipping point for AI” - The realisation of technological advancement that can revolutionize the way we work, learn, write, and even think.
Despite its potential advantages, we must keep in mind that OpenAI is a private, for-profit organization whose objectives and business demands may not always coincide with those of the larger community requirements.
The privacy hazards associated with ChatGPT should serve as a caution. And as users of an increasing number of AI technologies, we need to exercise extreme caution when deciding what data to provide such tools with.