Machine translators have made it easier than ever to create error-plagued Wikipedia articles in obscure languages. What happens when AI models get trained on junk pages?

Don’t blame Wikipedia for that wtf

@[email protected]
link
fedilink
10
edit-2
8h

Yes! I mean, blame those who post AI-generated translations as if they were their own, or blame the AI scrappers that use those poorly generated pages for training, but it makes no sense to blame Wikipedia when the only thing they have done is just exist there and offer a platform for knowledge sharing.

In fact, this problem is hardly exclusive to Wikipedia, every platform with crowdsourced content is in some level susceptible to AI poisoning which ultimately ends up feeding other AIs, the loop exists in all platforms. Though I understand wanting to highlight particularly the risk of endangered languages being more vulnerable to this, since they have less content available to them so the AI models have a smaller dataset which makes them worse and more sensible to bad data.

@[email protected]
creator
link
fedilink
-37h

If you build the infrastructure for a certain thing to happen, you’re responsible for the thing. For the same reason we hold facebook accountable for the rise of the far-right, we should hold WikiPedia accountable for this stuff. Infrastructure is never neutral.

That is a completely unfair comparison. For starters, Facebook is a for-profit advertising company and Wikipedia is a community-driven encyclopedia and should be judged by different standards

Second, both admins and users can edit Wikipedia when there’s a problem. Everyone is “responsible” for fixing it - or at the very least equally at fault

Next, the content in question. Facebook was (rightfully) given hell for hosting gore, CSAM, adult porn, etc. Things that are immoral, illegal, or outright dangerous. The offending content on Wikipedia is bad translations.

Lastly, the bigger issue is always enforcement of said content. Facebook was made aware of the problem users/pages/uploads and slacked off on doing anything. These Wikipedia pages have very low traffic and weren’t getting reported. And even with reports, Wikipedia then has to consult with people who speak the rare language.

They’re similar problems of vastly different scales

Not exactly the same. I don’t blame facebook for the rise, its just a place to post and share… I blame the algorithm that facebook created and keeps updating to enhance and expand those bubbles while pushing users to outrage and divide them into bubbles that empower and embrace conspiracies, right/alt-right, and other extreme viewpoints. Same thing with X/Twitter.

Wikipedia doesn’t have any such algorithm. They don’t have a team dedicated to pushing people to those extremes (or anything at all).

Wait languages are vulnerable now?

eleijeep
link
fedilink
English
45h

Vulnerable to going extinct.

If you read the article it briefly touches on how the “doom spiral” could affect the trajectory of a language that is not widely spoken. It’s not a great article though, it just repeats the same thing for several pages, points the finger at wikipedia instead of the content-generation farms and then fails to properly conclude the argument of their presumed hypothesis.

davel [he/him]
link
fedilink
English
48h

The languages that are vulnerable are vulnerable https://en.wikipedia.org/wiki/Endangered_language

Create a post

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.


Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.


Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

  • 1 user online
  • 16 users / day
  • 80 users / week
  • 336 users / month
  • 1.45K users / 6 months
  • 1 subscriber
  • 4.25K Posts
  • 49.2K Comments
  • Modlog