Search This Blog

Powered by Blogger.

Blog Archive

Labels

Showing posts with label GPTbot. Show all posts

OpenAI's GPTBot Faces Media Backlash in France Over Data Collection Fears

 


A new level of tension has been created between the press and giants within the artificial intelligence industry. The OpenAI robot that runs on websites to suck up content and train its AI models, including the famous ChatGPT conversational agent, has been blocked by several headlines and publishers in recent weeks, according to reports. It was running on websites to suck up content and train its AI models. 

According to new data published by originality.AI, a content detector that uses artificial intelligence (AI) to detect AI content, nearly 20% of companies that offer AI services are blocking crawler bots that collect web data for AI purposes. It is reported that several news outlets have blocked a tool from OpenAI, which limits the company's ability to access its content in the future, including The New York Times, CNN, Reuters, and the Australian Broadcasting Corporation (ABC). 

ChatGPT is one of the most well-known and widely used AI chatbots developed by OpenAI. To improve the AI models on the market, GPTBot, its web crawler, scans webpages using its AI model for improvement. The New York Times blocked GPTBot from appearing on its website for several reasons, starting with the verification service The Verge. 

According to the Guardian, other major news websites, including CNN, Reuters, the Chicago Tribune, ABC, and some of the Australian Community Media brands (ACM) such as the Canberra Times and the Newcastle Herald, appear to have also refused to allow the crawler to access their websites. As part of the company's effort to boost ChatGPT's accuracy, a web crawler called GPTBot is being used to scrape publicly accessible data online for use in ChatGPT to improve accuracy - including copyrighted material. 

To process and generate texts through the chatbot, a deep-learning language model is used to produce and process the language. It has been stated in a blog post by OpenAI that allowing GPTBot to access your website can allow you to improve your AI models' performance and general capabilities as well as their safety. 

According to an announcement the company made on 8 August, used to train its GPT-4 and GPT-5 models, data would be automatically collected from the entire internet using this tool. In the same blog post, OpenAI also stated that the system would filter out sources that are charge wall-restricted, any sources that violate OpenAI's policies, or any sources that gather personally identifiable information about users. 

A personal data breach occurs when any information that can be used to identify an individual can be linked to them and linked straight to that individual. During a first clash with regulators in March, OpenAI was temporarily shut down domestically by the Italian data regulator Garante, accusing the company of flouting European privacy regulations, resulting in a temporary shutdown of the bot. 

As a result of increased privacy measures instituted by OpenAI for its users, ChatGPT was brought back to Italy. The European Data Protection Board, which represents all the EU data enforcement authorities, developed a task force in April of this year to make sure that these rules are applied consistently across all EU countries. 

The National Commission on Informatics and Liberty (NCIAL), a national data protection watchdog in France, was also recently able to publish an action plan addressing privacy concerns related to Artificial Intelligence (AI), particularly generative applications like ChatGPT, published in May. 

GPTbot: How Does it Work? 


To determine potential sources of data, GPTbot begins by identifying potential sources. It does this by crawling the web and looking for websites that contain relevant information that it can use in its search. GPTbot will extract information from a website once it has identified a potential source for the data, once it has identified a possible source for the data. 

The information is then compiled into a database and used to make AI models by training them according to the information obtained. Several types of information can be extracted using the tool, including text. Images and even code can be extracted using the tool. The GPTbot is capable of extracting text from websites, articles, books, and other documents, as well as from other sources.

To extract information from images, GPTbot can perform a variety of tasks, such as extracting information about the objects depicted in an image or creating a textual description of the image. GPTbot can also extract code from Web sites, GitHub repositories, and other sources, such as websites and blogs.

Several generative AI tools, including OpenAI's ChatGPT and other tools, rely on the use of data from websites to train models that will become more efficient with time. It was not long ago that Elon Musk blocked a mining service called OpenAI from scraping data from Twitter when it was still called Twitter, while the platform was still called Twitter