This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.
Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.
Rules:
1: All Lemmy rules apply
2: Do not post low effort posts
3: NEVER post naziped*gore stuff
4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.
5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)
6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist
7: crypto related posts, unless essential, are disallowed
If there’s any PII in slack (which in itself is wrong), you cannot use this data for training, since the people whose data is being used have not given their consent. Simple as that.
That’s not true at all. If you obfuscate the PII it stops being PII. This is an extremely common trick companies use to circumvent these laws.
You could say it’s to “circumvent” the law or you could say it’s to comply with the law. As long as the PII is gone what’s the problem?
There isn’t necessarily a problem but it is definitely circumventing at least the spirit if not the letter of the law by not allowing data subjects to provide fully informed consent.
LLMs have shown time and time again that simple crafted attacks can unmask the training data verbatim.
It is impossible for them to contain more than just random fragments, the models are too small for it to be compressed enough to fit. Even the fragments that have been found are not exact, the AI is “lossy” and hallucinates.
The examples that have been found are examples of overfitting, a flaw in training where the same data gets fed into the training process hundreds or thousands of time over. This is something that modern AI training goes to great lengths to avoid.
How do you anonymise without supervision ? And obfuscation isn’t anonymisation…
Legally obfuscation can be anonymization depending on how it’s done
Depending on the data structures there are many methods to anonymize without supervision. None of them are perfect but the don’t have to be - just legally defensible.
Maybe it’s “simple as that” if you’re just expressing an opinion, but what’s the legal basis for it?
The entire gdpr. You can’t repurpose user data after the fact, and that includes the purpose of usage, but also the parties the data has been shared with. All these cookie banners have to state clearly “we’re using this data from you and we’re sharing it with these partners”.
I’m pretty sure, that hardly any company lists Slack in their cookie banners or ToS. Thus, sharing any personal data with slack is forbidden. Usually, that was overlooked, because it’s somewhat dubious if slack can be seen as actually “using” the data by just hosting whatever someone posts in a private message, but this announcement makes it very clear, that they intend to use this data.
The GDPR says that information that has been anonymized, for example through statistical analysis, is fine. LLM training is essentially a form of statistical analysis. There’s hardly anything in law that is “simple.”
It’s not even the training. It’s the extraction of the raw data.
You now store PII, that the clients can’t delete anymore (which in itself is a violation) and then do “something” with it. Whether it’s for AI or word counting doesn’t matter. You store PII that is not under the control of your clients anymore and you store PII without the P whose I could be used to I them having ever been informed.
Also, whether AI training is actually legally anonymization is still up to debate, as far as I know.
Assuming it is PII when you store it. This is a complicated discussion that will absolutely come down to what Slack can defend to a regulator
They could try to pass it as a legitimate interest but likely it would be struck as being ultimately disfavouring the individual and favouring the business. Probably.