Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
OpenAI failed to deliver the opt-out tool it promised by 2025 (techcrunch.com)
108 points by thm on Jan 1, 2025 | hide | past | favorite | 23 comments


The only reason for them to implement opt-out is by legal order and subsequent confirmation by the supreme court or country equivalent. There is no other reason to do it.


Definitely a case for the Futurama "I'm shocked" meme. OpenAI has a lot of talented people that work there but it's clear @sama only cares about chasing the biggest possible payday and nothing else means anything.


The company is not facing punishment severe enough to make it important, I guess?


I'd love to hear HN's suggestions on keeping IP out of these LLMs where possible. For context I'm a writer. Nothing I write gets published online and I don't need to submit to, say, Amazon/Kindle to make money. However, much of my work is passed around by executives via email, for example. Ideas?


OpenAI considers everything "publicly available" to be fair game, so if GPTBot happens to stumble across pirated ePubs of your books then OpenAI will probably train on them even if you never published them on the open web yourself. They don't care about the provenance of the stuff they're scraping.


Just because something is publicly accessible on the web that does not mean it is in the public domain and free of copyright or other restrictions on use. Big Media - and many smaller players - have been fighting this battle for decades but generally winning because the law is relatively clear in this area in most places.

There is absolutely nothing in the law in my country - or probably most countries other than possibly the US - that says you can grab whatever you like if you can find it online and do whatever you want with it. And in the US the potential loophole is fair use and that has been controversial for a long time since it's clearly in violation of the global copyright treaties to which the US is also a signatory so something as big as AI might be enough to get other countries to push back significantly where usually they turn a blind eye.

So if OpenAI is doing that then I don't see how they are not in breach of copyright in much of the world. I would experience considerable Schadenfreude if that resulted in epic scale lawsuits because I don't think the use of "training AI models" as a means of laundering copyright infringement is a positive step. Like the search engines that started including significant parts of the original content directly on their results pages it's a distortion where the people who actually do the creative work are not the people being rewarded for it.


I'll be using OpenAI's logo for my next project. I found it on the open internet, after all ...


That's totally fine as long as you train a diffusion model on the OpenAI logo and then prompt the model to generate an image that just coincidentally happens to look exactly like the OpenAI logo. If an AI model made it then it's automatically not plagiarism.

https://x.com/louiswhunt/status/1874092181281268219

https://docsend.dropbox.com/view/tvjd9e32ijxcuj5s


And if the entity doing it is a big enough company.

It's interesting to imagine the legal landscape if/when this technique is applied to MPAA/RIAA content, and everybody is sharing the foundational model plus the "prompts" that will have it make "your" movie.


Try cloning Disney, and you have a problem.


Yeah, I kinda want to see two kinds of big kinda-evil forces fight, in the hopes we get some fair and consistent rules instead of "it's vague enough that you can do/prevent anything as long as you have enough lobbyists and lawyers."


So if there is content behind a paywall and someone posts it on reddit, it's "publicly available"? Sounds like piracy with extra steps.


I'd say lobby for laws. Purely tech measures won't help... Next versions of their email clients will have ToS to let them dump everything to Goog/Msft for training


364 days left to get it shipped.



From your link:

> It depends on the scenario. For example, if you always have class on Thursday. It would mean to have it done by class on Thursday. Whereas if you are taking an internet class that doesn't have set time frames, it would mean to have it done by 11:59PM Thursday night.

Pretty sure this is ambiguous for most English speakers. Here's another thread on the discussion: https://ell.stackexchange.com/questions/87002/what-does-by-m...


That conclusion is not supported by your link, honestly.


Probably because if this is created, most IP owners would mass opt out. To prevent that, you have to make the process difficult to do, hidden, or framed in an extremely careful way.


Now that the whistleblower is dead we’re good right?


quell surprise.

This is the tech industry in a nutshell. Over promise. Under deliver. Executives profit. Workers get laid off.


*quelle


Agreed, but it's also amusing to run with the literal Germanic verb as a command, like an awkwardly worded "Stop being surprised."

(As opposed to "What a surprise [this is].")


For the normal borrowed-from-French idiom, yes, but somehow I don't mind both English words with their normal definitions sitting there.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: