Some image generators produce more problematic stereotypes than others, but all fail at diversity - AlgorithmWatch
algorithmwatch.org
external-link
Automated image generators are often accused of spreading harmful stereotypes, but studies usually only look at MidJourney. Other tools make serious efforts to increase diversity in their output, but effective remedies remain elusive.

Automated image generators are often accused of spreading harmful stereotypes, but studies usually only look at MidJourney. Other tools make serious efforts to increase diversity in their output, but effective remedies remain elusive.

I wonder if this is because AI is trained on data that ‘is’ and has therefore no concept of how it ‘should be’. Maybe it is an effective mirror of society…

Fluke McHappenstance
link
fedilink
English
261Y

Actually it is trained on data ‘the trainers have’. This is different from ‘trained on data that “is”’ or any other idealized view of data.

Data that ‘the trainers have’ is always an incomplete view of anything, and adding meaningfully to datasets is always very difficult.

I may have oversimplified my statement. Of course an objective description of reality is impossible. A curse on all social sciences and statistics.

My post was more a showerthought…even if the data is incomplete, whatever THAT data implies will also be the stereotype the AI will learn. Misrepresentation of minorities in sample data is absolutely nothing new. But even if the data WAS complete, it would probably still be very biased. I think we often don’t notice structural discrimination and AI would simply reproduce those and confront us with it. In that sense I think it is a very interesting way to get a sort of ‘outside look’ at our own society and that is something that’s very useful.

@[email protected]
link
fedilink
English
31Y

Actually it is trained on data ‘the trainers have’.

“The first rule of Tautology Club is the first rule of Tautology Club.”

That’s 100% a real issue. Fortunately for all these clickbait articles, most people don’t really grasp how these things are trained or how input data affects them during training.

@[email protected]
link
fedilink
4
edit-2
1Y

And even if we could provide the training algorithm a perfectly diverse dataset, who gets to decide what that means? You could probably poll a million anthropologists from across the world and observe trends, but no certain consensus. What if polling anthropologists in underdeveloped nations skews in a different direction than what we consider rich countries? How about if a country was a colonizer in the past or has participated in a violent revolution?

How do we decide who qualifies as an anthropologist? Is a doctorate required, or is a college degree with numerous publications sufficient?

I don’t think we’ll ever see a perfectly neutral solution to this problem. At best, we can come equipped with knowledge that these tools may come with some biases, like when you analyze texts from the past. You make the best with what you have and strive to improve

Create a post

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.


Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.


Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

  • 1 user online
  • 38 users / day
  • 149 users / week
  • 308 users / month
  • 2.32K users / 6 months
  • 1 subscriber
  • 3.01K Posts
  • 43.4K Comments
  • Modlog