@patatahooligan

@[email protected]

The scam is that they undermine the actually viable platforms by offering something that is literally too good to be true. Then when all their competitors are dead their store will go to shit and you won’t have an alternative. When the time comes, you will wish you’d spent some money on a real store rather than play for free on theirs. See enshittification.

@[email protected]

This is very common among big tech companies and we should start treating it as what it is, a scam.

@[email protected]

Let’s remove the context of AI altogether.

Yeah sure if you do that then you can say anything. But the context is crucial. Imagine that you could prove in court that I went down to the public library with a list that read “Books I want to read for the express purpose of mimicking, and that I get nothing else out of”, and on that list was your book. Imagine you had me on tape saying that for me writing is not a creative expression of myself, but rather I am always trying to find the word that the authors I have studied would use. Now that’s getting closer to the context of AI. I don’t know why you think you would need me to sell verbatim copies of your book to have a good case against me. Just a few passages should suffice given my shady and well-documented intentions.

Well that’s basically what LLMs look like to me.

@[email protected]

But what an LLM does meets your listed definition of transformative as well

No it doesn’t. Sometimes the output is used in completely different ways but sometimes it is a direct substitute. The most obvious example is when it is writing code that the user intends to incorporate into their work. The output is not transformative by this definition as it serves the same purpose as the original works and adds no new value, except stripping away the copyright of course.

everything it outputs is completely original

[citation needed]

that you can’t use to reconstitute the original work

Who cares? That has never been the basis for copyright infringement. For example, as far as I know I can’t make and sell a doll that looks like Mickey Mouse from Steamboat Willie. It should be considered transformative work. A doll has nothing to do with the cartoon. It provides a completely different sort of value. It is not even close to being a direct copy or able to reconstitute the original. And yet, as far as I know I am not allowed to do it, and even if I am, I won’t risk going to court against Disney to find out. The fear alone has made sure that we mere mortals cannot copy and transform even the smallest parts of copyrighted works owned by big companies.

I would find it hard to believe that if there is a Supreme Court ruling which finds digitalizing copyrighted material in a database is fair use and not derivative work

Which case are you citing? Context matters. LLMs aren’t just a database. They are also a frontend to extract the data from these databases, that is being heavily marketed and sold to people who might otherwise have bought the original works instead.

The lossy compression is also irrelevant, otherwise literally every pirated movie/series release would be legal. How lossy is it even? How would you measure it? I’ve seen github copilot spit out verbatim copies of code. I’m pretty sure that if I ask ChatGPT to recite me a very well known poem it will also be a verbatim copy. So there are at least some works that are included completely losslessly. Which ones? No one knows and that’s a big problem.

@[email protected]

“Transformative” in this context does not mean simply not identical to the source material. It has to serve a different purpose and to provide additional value that cannot be derived from the original.

The summary that they talk about in the article is a bad example for a lawsuit because it is indeed transformative. A summary provides a different sort of value than the original work. However if the same LLM writes a book based on the books used as training data, then it is definitely not an open and shut case whether this is transformative.

@[email protected]

Not a lawyer so I can’t be sure. To my understanding a summary of a work is not a violation of copyright because the summary is transformative (serves a completely different purpose to the original work). But you probably can’t copy someone else’s summary, because now you are making a derivative that serves the same purpose as the original.

So here are the issues with LLMs in this regard:

LLMs have been shown to produce verbatim or almost-verbatim copies of their training data
LLMs can’t figure out where their output came from so they can’t tell their user whether the output closely matches any existing work, and if it does what license it is distributed under
You can argue that by its nature, an LLM is only ever producing derivative works of its training data, even if they are not the verbatim or almost-verbatim copies I already mentioned

@[email protected]

Oh no, rich assholes who continuously lobby for strict copyright and patent laws in order to suffocate competition might find themselves restricted by it for once. Quick, find me the world’s smallest violin!

No, if you want AI to emerge, argue in favor of relaxing copyright law in all cases, not specifically to allow AI to copyright launder other peoples’ works.

@[email protected]

You are treating publicly available information as free from copyright, which is not the case. Wikipedia content is covered by the Creative Commons Attribution-ShareAlike License 4.0. Images might be covered by different licenses. Online articles about the book are also covered by copyright unless explicitly stated otherwise.

@[email protected]

Selectively breaking copyright laws specifically to allow AI models also favors the rich, unfortunately. These models will make a very small group of rich people even richer while putting out of work the millions of creators whose works wore stolen to train the models.