techbykushwah : OpenAI’s new GPTBot will crawl your website for data, but you can say no

OpenAI has launched a new web crawler called the GPTBot, built likely to enhance its GPT-4 large language model and possibly gather data for training GPT-5. The web crawler will access data from various websites, except those that are behind paywalls or that opt out of the process.

The idea is to reportedly only use sources that are freely available, comply with OpenAI’s policies, and do not collect any personal information from users. By allowing GPTBot to crawl their websites, publishers will be contributing their data to OpenAI’s existing and future models that power its AI chatbots. That may come with privacy and security concerns, but they’d be contributing to the overall AI advancement.

However, if publishers are not comfortable with sharing their data with an AI system, OpenAI offers a simple way to opt out. They just need to add a line of code to their website’s server – specifically, the robots.txt file. This line of code can be found in the official documentation for the bot. Publishers can also specify which parts of their webHowever, the launch of GPTBot is not without concerns. On one hand, ChatGPT, which is unaware of events that happened after most of its data was cut off (September 2021), needs more data to grow. But on the other hand, websites do not benefit from GPTBot crawling them. Unlike Google, which drives traffic to a website after crawling it by showing search results to billions of users, ChatGPT only summarises data from across the web without giving any citations. It is hard to trace the source of the information it provides.site will be accessible and which ones will not.

kushwahdinesh9249@gmail.com

techbykushwah

Tuesday, August 8, 2023

OpenAI’s new GPTBot will crawl your website for data, but you can say no

No comments:

Post a Comment

Nokia's Newest Phones Are Launching in the US at Under $200