Are AI Companies Stealing Our Data

Most readers read for free. A small group from the TelecomTalk community keeps this going. Support only if our work adds value for you.

Highlights

  • If you use any AI (artificial intelligence) chatbots or services, one question might have popped up in your mind - where is all the data coming from?
  • For example, when you request a deep research on platforms such as Perplexity, where does the data come from?
  • Perplexity sources it from the data that's available on the internet, the websites, and more

Follow Us

are ai companies stealing our data

If you use any AI (artificial intelligence) chatbots or services, one question might have popped up in your mind - where is all the data coming from? For example, when you request a deep research on platforms such as Perplexity, where does the data come from? Perplexity sources it from the data that's available on the internet, the websites, and more. In fact, there was a blog from Cloudflare, an internet company, which said that Perplexity is using stealth and undeclared crawlers to evade no-crawl directives from websites. Let's not get too technical here. Let me explain everything simply.




Read More - iPhones are Still the First Choice for Creators, Here's Why

To give answers, AI models need data, and then need the capability to interpret data and present it to the users. While most data in the public domain is for everyone to read and interpret, comment on, and share, it was not meant to be read by machines. Here, the writers, the creators aren't just worried about their data being used to train AI models and then make money from it, but also that these creators can be replaced.

Cloudflare did an experiment. The platform created a website, and gave the no index direction to the crawlers. Since this website was never indexed, and perplexity crawlers were also blocked to crawl, there should have been no way for Perplexity to access any data from this new domain/website. However, upon asking questions, Perplexity still managed to produce results about the website.

Read More - OnePlus CEO Arrest Warrant: Why Taiwan is After Pete Lau

This revealed that Perplexity not only used its declared crawlers, but also had undeclared crawlers which were not listed on the official IP range of Perplexity. This shows that AI companies such as Perplexity would crawl through your data, even when you specifically direct it not to. This is a blatant breach of privacy and does not equate to fair use.

Most readers read for free. A small group from the TelecomTalk community keeps this going. Support only if our work adds value for you.

Reported By

Editor in Chief

Tanay is someone with whom you can chill and talk about technology and life. A fitness enthusiast and cricketer, he loves to read and write.

Recent Comments

TheAndroidFreak :

Might be aligning with their strategy of that unlimited 4G plan. They will fast track this and gain from this.…

Jio Adds Nearly Twice as Many Broadband Users as Airtel…

Riju vv :

I don’t see the benefit of using B41 in rural areas. Its coverage is very limited and it’s mainly meant…

BSNL is Targeting 99% Uptime for Networks Across India

AMAN :

about 10 million numbers, that's alot. lol!

88 Lakh SIMs Disconnected, Rs 1,400 Crore Fraud Prevented in…

AMAN :

I ported out of airtel to vi in 2025. I was using airtel since 2012. Reason - an airtel store…

Vodafone Idea Still Has Many Problems to Solve

Gopal :

In rural areas backup is not provided and rightly so as it adds to cost and recovery is 10% of…

BSNL is Targeting 99% Uptime for Networks Across India

Load More
Subscribe
Notify of
guest
2 Comments
newest
oldest most voted
Inline Feedbacks
View all comments