AggressivelyPassive

If there’s any PII in slack (which in itself is wrong), you cannot use this data for training, since the people whose data is being used have not given their consent. Simple as that.

@[email protected]

That’s not true at all. If you obfuscate the PII it stops being PII. This is an extremely common trick companies use to circumvent these laws.

FaceDeer

You could say it’s to “circumvent” the law or you could say it’s to comply with the law. As long as the PII is gone what’s the problem?

@[email protected]

There isn’t necessarily a problem but it is definitely circumventing at least the spirit if not the letter of the law by not allowing data subjects to provide fully informed consent.

Lemongrab

LLMs have shown time and time again that simple crafted attacks can unmask the training data verbatim.

FaceDeer

It is impossible for them to contain more than just random fragments, the models are too small for it to be compressed enough to fit. Even the fragments that have been found are not exact, the AI is “lossy” and hallucinates.

The examples that have been found are examples of overfitting, a flaw in training where the same data gets fed into the training process hundreds or thousands of time over. This is something that modern AI training goes to great lengths to avoid.

@[email protected]

How do you anonymise without supervision ? And obfuscation isn’t anonymisation…

@[email protected]

Legally obfuscation can be anonymization depending on how it’s done

Depending on the data structures there are many methods to anonymize without supervision. None of them are perfect but the don’t have to be - just legally defensible.

FaceDeer

Maybe it’s “simple as that” if you’re just expressing an opinion, but what’s the legal basis for it?

AggressivelyPassive

The entire gdpr. You can’t repurpose user data after the fact, and that includes the purpose of usage, but also the parties the data has been shared with. All these cookie banners have to state clearly “we’re using this data from you and we’re sharing it with these partners”.

I’m pretty sure, that hardly any company lists Slack in their cookie banners or ToS. Thus, sharing any personal data with slack is forbidden. Usually, that was overlooked, because it’s somewhat dubious if slack can be seen as actually “using” the data by just hosting whatever someone posts in a private message, but this announcement makes it very clear, that they intend to use this data.

FaceDeer

The GDPR says that information that has been anonymized, for example through statistical analysis, is fine. LLM training is essentially a form of statistical analysis. There’s hardly anything in law that is “simple.”

AggressivelyPassive

It’s not even the training. It’s the extraction of the raw data.

You now store PII, that the clients can’t delete anymore (which in itself is a violation) and then do “something” with it. Whether it’s for AI or word counting doesn’t matter. You store PII that is not under the control of your clients anymore and you store PII without the P whose I could be used to I them having ever been informed.

Also, whether AI training is actually legally anonymization is still up to debate, as far as I know.

@[email protected]

Assuming it is PII when you store it. This is a complicated discussion that will absolutely come down to what Slack can defend to a regulator

@[email protected]

They could try to pass it as a legitimate interest but likely it would be struck as being ultimately disfavouring the individual and favouring the business. Probably.

removed by mod

removed by mod

Technology