Signal's Meredith Whittaker: AI is fundamentally 'a surveillance technology' | TechCrunch
techcrunch.com
external-link
Why is it that so many companies that rely on monetizing the data of their users seem to be extremely hot on AI? If you ask Signal president Meredith
NotAPenguin
link
fedilink
41Y

The article doesn’t explain how that’s the case at all.

Aren’t all the big AI models trained on publicly available data?

I see it more like your address is public in a sense that if I could knock on every door and look through every window I would eventually see where you live. But, I probably wouldn’t be able to quickly search where you live because it’s not made to be public knowledge.

AI take everything and makes it easily searchable for itself even if it wasn’t made to be.

deleted by creator

Hot Saucerman
link
fedilink
English
3
edit-2
1Y

Books3 is the definition of “not publicly available” because it’s all from pirated material downloaded from private torrent tracker Bibliotik.

Books3 is literally why several of AI groups are being sued by various authors like Sarah Silverman and George R.R. Martin.

Books3 was always illicitly obtained material which put into question whether an LLM using it could really fall under Fair Use. (It most likely does, but it’s still a legal question that hasn’t been answered yet.)

Books3 Link: https://huggingface.co/datasets/the_pile_books3

Books3 Description from Link:

This dataset is Shawn Presser’s work and is part of EleutherAi/The Pile dataset.

This dataset contains all of bibliotik in plain .txt form, aka 197,000 books processed in exactly the same way as did for bookcorpusopen (a.k.a. books1). seems to be similar to OpenAI’s mysterious “books2” dataset referenced in their papers. Unfortunately OpenAI will not give details, so we know very little about any differences. People suspect it’s “all of libgen”, but it’s purely conjecture.

Create a post

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.


Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.


Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

  • 1 user online
  • 24 users / day
  • 152 users / week
  • 448 users / month
  • 2.23K users / 6 months
  • 1 subscriber
  • 3.07K Posts
  • 43.9K Comments
  • Modlog