Artificial intelligence is starting to face a serious problem.  The creators of AI tools prefer to remain silent

Media companies are negotiating licensing deals with AI developers and providers like OpenAI (ChatGPT), hungry for the data needed to train their language models and algorithms. It is now clear Nearly 90 percent of the world's leading news media, such as the New York Times and CNN, have created digital blocks of robot-processed content for use by artificial intelligence.. On the other hand, media outlets known for their right-wing views that promote specific political views in the United States, such as NewsMax or Breitbart, typically allow bots to use their capabilities and train based on their content.

Such actions already lead to a very dangerous situation.

Chatbots with Internet access, which are often paid, cannot rely on reputable sources because they do not want to give away their content for free and even sue the creators of the AI, for example. The New York Times Editorial Office for numerous copyright violations. Access to bots is provided through websites whose content often polarizes the community. As a result, we may receive AI-generated content on some topics that is not based on objective sources.

Data collected by Originality AI shows that almost all large, well-known websites block web crawlers belonging to their AI creators, including titles like The Washington Post, The Guardian, and The Atlantic.

The most blocked crawler is GTBot from OpenAI. In second place are Google robots and artificial intelligence Gemini.

AI will have a problem with objectivity

As John Gillham, founder and CEO of Originality AI, notes, If a certain group of websites blocks access to AI and another group provides it, the algorithms will only deepen their biases when training language models. “The AI ​​models then reflect the biases of their training data,” Gillham points out.

US conservative leaders (as well as Elon Musk) have expressed concerns that ChatGPT and other mainstream AI tools exhibit a liberal or left-wing political bias. Senator Marsha Blackburn recited an AI-generated poem praising President Biden as proof of her claim It was impossible to create a similar poem for Trump using ChatGPT. Right-wing media may now see the decision by their ideological opponents to ban AI-powered web crawlers as a unique opportunity to redress the balance.

David Rosado, a data scientist in New Zealand who developed an AI model called RightWingGPT to investigate perceived bias in ChatGPT, tells WIRED: This seems like a possible strategy. “From a technical perspective, yes, a media company allowing its content to be integrated into AI training data should have some impact on the subsequent performance of that model,” he says.

Of course, AI robots have previously collected and trained on a huge database of materials, when it was not yet possible to block their access. So we can say that when they use their offline databases, they usually have no problem providing objective and impartial answers. But the situation becomes worse when we ask about current events. They then have to rely on network access, and so their capabilities are limited.

ChatGPT, for example, cannot provide information from the CNN article because the website blocks access to it

not shiny. king

In such a case, we must either be content with the answer we have received or ask for a reference to a specific source. but after that, If the website blocks access, there will be a problem – the AI ​​will tell you it can't do the job We will, for example, send you to a search engine to enter your query there. This defeats the purpose, because paid chatbots convince customers to buy, among other things: that they have access to the Internet. But what's the point if they have it but can't use it?

Suppliers must respond – and they do

The current problems faced by AI tools do not mean that ChatGPT or Google Gemini will inevitably become propaganda machines for one political side or another. When we ask ChatGPT what it thinks about Breitbart, we get the answer: “The site is often controversial and a topic of debate due to its reliability and objectivity. In 2018, Wikipedia decided to ban the use of Breitbart as a source of facts. My answers are based on publicly available information and do not reflect personal opinions.

However, it is worth knowing Don't trust a chatbot blindly because it doesn't realize that it might be telling the truth (A problem with what is called hallucinations). Additionally, Google noted that certain topics – due to withholding of information sources – could not be further served as part of the AI ​​service.

The tech giant mentioned this limitation in the post On a blog dedicated to supporting elections in IndiaHe stressed his caution regarding such an important issue. Restrictions apply to election-related inquiries, and Google takes its responsibility to provide high-quality information seriously. The restrictions appear to apply not only to India, but also to election-related matters in the United States and other countries.

When asked who will run for President of the United States this year, Google Gemini points to the search engine

Providers of AI tools will place responsibility for verifying information on recipients. The user must ask additional questions and examine information sources more carefully – otherwise he may build his knowledge on materials that may not be correct. Google does not want to bear responsibility for the fact that it may be misleading in the context of elections.

Constraints that are difficult to overcome

If websites block access to AI bots, preventing them from using their content without paying licensing fees, and AI creators are not willing to pay for copyrighted posts, In the near future, there may be a situation where the capabilities of AI tools will be limited.

How limited? Chatbots will be able, for example, to provide a recipe for a popular cake or a definition of what basketball is, but they will lose the ability to answer queries related to current events, politics, business or science. They will then become tools for easily answering simple questions, but nothing else.

A lot may also change when Google adds AI functionality to its search engine and starts answering questions using AI. Then websites that block access to AI bots may see a significant drop in traffic due to redirects from the search engine, which for many of them is the main source of customers and users. They may then feel forced to share content with AI bots so that Google doesn't ignore their domain. However, this falls within unfair competition and is likely to lead to a reaction from regulators – especially in Europe.

There is another scenario.

Creators of tools like ChatGPT or Gemini will simply start paying for access to content by partnering with the right companies. However, there is another problem here: if there are many such partnerships, It may turn out that OpenAI or Google will start passing these costs on to customers. Then AI tools may become more expensive – significantly. Whereas today, for many people, the price of ChatGPT Plus is around $20. Per month (about PLN 80) is an attractive solution, it is difficult to say whether the price rises to, say, 200 USD or 400 USD. The visualization of generative AI tools will be the same.

Author: Grzegorz Kobra, journalist at Business Insider Polska

