The AI company Perplexity is complaining their bots can't bypass Cloudflare's firewall

Davriellelouna@lemmy.world · edit-2 2 days ago

The AI company Perplexity is complaining their bots can't bypass Cloudflare's firewall

snooggums@lemmy.world · 2 days ago

But a user initiated operation isn’t the same as a bot.

Oh fuck off with that AI company propaganda.

The AI companies already overwhelmed sites to get training data and are repeating their shitty scraping practices when users interact with their AI. It’s the same fucking thing.

Web crawlers for search engines don’t scrape pages every time a user searches like AI does. Both web crawlers and scrapers are bots, and how a human initiates their operation, scheduled or not, doesn’t matter as much as the fact that they do things very differently and only one of the two respects robots.txt.

FauxLiving@lemmy.world · 2 days ago

There’s no difference in server load between a user looking at a page and a user using an AI tool to summarize the page.

The AI companies already overwhelmed sites to get training data and are repeating their shitty scraping practices when users interact with their AI. It’s the same fucking thing.

You either didn’t read the article or are deliberately making bad faith arguments. The entire point of the article is that the traffic that they’re referring to is initiated by a user, just like when you type an address into your browser’s address bar.

This traffic, initiated by a user, creates the same server load as that same user loading the page in a browser.

Yes, mass scraping of web pages creates a bunch of server load. This was the case before AI was even a thing.

This situation is like Cloudflare presenting was a captcha in order to load each individual image, css or JavaScript asset into a web browser because bot traffic pretends to be a browser.

I don’t think it’s too hard to understand that a bot pretending to be a browser and a human operated browser are two completely different things and classifying them as the same (and captchaing them) would be a classification error.

This is exactly the same kind of error. Even if you personally believe that users using AI tools should be blocked, not everyone has the same opinion. If Cloudflare can’t distinguish between bot requests and human requests then their customers can’t opt out and allow their users to use AI tools even if they want to.

ubergeek@lemmy.today · 23 hours ago

There’s no difference in server load between a user looking at a page and a user using an AI tool to summarize the page.

There is, in scale.

snooggums@lemmy.world · 2 days ago

There is no difference between emptying a glass of water and draining swimming pool either if you ignore the total volume of water.

FauxLiving@lemmy.world · edit-2 2 days ago

I, too, can make any argument sound silly if I want to argue in bad faith.

A user cannot physically generate as much traffic as a bot.

Just like a glass of water cannot physically contain as much water as a swimming pool, so pretending the two are equal is ignorant in both cases.

snooggums@lemmy.world · 2 days ago

A user cannot physically generate as much traffic as a bot.

You are so close to getting it!

FauxLiving@lemmy.world · 2 days ago

And you’re not even close.

snooggums@lemmy.world · edit-2 2 days ago

The AI doesn’t just do a web search and display a page, in grabs the search results and scrapes multiple pages far faster than a person could.

It doesn’t matter whether a human initiated it when the load on the website is far, far higher and more intrusive in a shorter period of time with AI compared to a human doing a web search and reading the cobtent themselves.

FauxLiving@lemmy.world · 2 days ago

It creates web requests faster than a human could. It does not create web requests as fast as possible like a crawler does.

Websites can handle a lot of human user traffic, even if some human users are making 5x the requests of other users due to using automation tools (like LLM summarization).

A website cannot handle a single bot which can, by itself, can generate tens of millions of times as much traffic as a human.

Cloudflare’s method of detecting bots is to attempt to fingerprint the browser and user behavior to detect automations which are usually run in environments that can’t render the content. They did this because, until now, users did not use automation tools so detecting and blocking automation tools was a way to get most of the bots.

Now, users do use automation tools and so this method of classification is dated and misclassifying human generated traffic.

The AI company Perplexity is complaining their bots can't bypass Cloudflare's firewall

The AI company Perplexity is complaining their bots can't bypass Cloudflare's firewall

Perplexity Says Cloudflare Is Blocking Legitimate AI Assistants