• 1 Post
  • 36 Comments
Joined 2Y ago
cake
Cake day: Jun 13, 2023

help-circle
rss

That’s been changing for me lately. All of a sudden youtube is throwing me curve-balls and it’s great.


Even if you’re poking at a black box and are reporting that “it acts funny when I poke it this way.” I’m my opinion, a reporter should send along a script or at least explicit instructions on how to repro.

I take the report more serious since it demonstrates you have an understanding of the issue or exploit. It will also save my time and it’s likely a trivial effort for the reporter since they’ve the context and knowledge of the issue loaded up and ready to go.


Agree that people like to fluff the severity of bugs they report. It’s better for prestige and bounty payouts. But this is a little more nuanced.

“While I didn’t really intend the module to be used for any security related checks, I’m very curious how an untrusted input could end up being passed into ip.isPrivate or ip.isPublic [functions] and then used for verifying where the network connection came from.”

It’s interesting, that it would be hard to make a case that there was a “vulnerability” in the ip package. But it seems like this package’s entire purpose is input validation so it’s kind of weird the dev thinks otherwise.

Recurring incidents like these raise the question, how does one strike a balance? Relentlessly reporting theoretical vulnerabilities can leave open-source developers, many of who are volunteers, exhausted from triaging noise.

The researchers need to provide proofs of concept. Actual functional exploits.


Title is confusing. OpenAI is using News Corp content to train their models. NC isn’t using the model to write articles. Still a garbage in garbage out scenario though.




For me it was mostly interesting to hear about their techniques and how they dealt with the earthquakes. Never really thought landslides would be such an issue.



It does work fine, but it’s yet another avenue of data collection for them and another reduction of choice by a monopoly.



Often you don’t even need more property. Just utilize existing rail systems. So much unused or barely used rail in this country.


Companies have been doing this since time immemorial. I guess because it’s TikTok influencers it’s easier to rage about?


For a half second I thought the location scouting was game footage


Every time I formulated a response in the article the author came to the same conclusion. However, somehow the headline is anti-AI rather than addressing the unrealistic expectations of management…

I’d rather spend my decompression time browsing Lemmy and reading articles with terrible conclusions than doing bullshit work.



Are you suggesting it’s never ethical to kill? Nothing is black and white, especially when it comes to ethics.


Legibility wasn’t the issue, but I appreciate your transcript anyway.



The one I remember most was while working for a games retailer. The Pokemon Game Boy games caused mayhem. Not enough product and so many angry parents.



unintelligable nonsense Big Tech unintelligable nonsense


You can’t just run a cipher on a copyrighted work and say “it’s not the same, so I didn’t copy it”.

Yes I can. I can download a Web page, encrypt it on my machine, and I’m not distributing said work.

And “distributing” is separate from violating copyright. It’s not distriburight, it’s copyright. Copying a work without authorization for private use is still violating copyright.

That’s just false.


What do you think happens to data when it’s scraped? Copying the data is a fundamental requirement for using it in training. These models are trained in big datacenters where the original work is split up and tokenized and used over and over again.

Tokenizing and calculating vectors or whatever is not the same thing as distributing copies of said work.

The difference between you training a model and you reading a book (put online by its author in clear text, to avoid the obvious issue of actual piracy for human use) is that you reading on a website is the intention of the copyright holder and you as a person have a fundamental right to remember things and be inspired.

Copyright holders can’t say what I do with their work, nor what I do with the knowledge of their book. They can only say how I copy and distribute it. I don’t need consent to burn an author’s book, create fan art around it, or quote characters in my blog. I do need their consent to copy and distribute their works directly.

You don’t however have a right to copy and use the text for other purposes, whether that’s making a t-shirt with a memorable line, printing it out to give to someone else, or tokenizing it to train a computer algorithm.

And at some point the resolution of said words is so specific that it becomes uncopyrightable. You can’t copyright most phrases nor words.


Insofar as my computer observes the data on my hard disk. But I suspect you know what I meant.


If the source is literally a piracy website that serves up applications on how to remove DRM from ebooks, it’s absolutely piracy. You can’t just deny the source and be like “it’s not piracy!”

They didn’t go out and buy copies of thousands of books.

And if they went to a library and scanned all the books?

I don’t, I was making a point about how absurdly large the language models have to be, which is to say, if they have to have that much data on top of thousands of pirated books, it means they fundamentally cannot make the models work without also scraping the internet for data, which is surveillance.

I mean, it’s just not surveillance, by definition. There’s no observation, just data ingestion. You’re deliberately trying to conflate the words to associate a negative behavior with LLM training to make your argument.

I really don’t get why LLMs get everybody all riled up. People have been running Web crawlers since the dawn of the Web.


It’s fundamentally a surveillance technology, because the technology fundamentally cannot function without that large dataset of language to begin with. It needs massive amounts of data that have to be surveilled to be achieved, because unless you’re Reddit or Facebook, your own site probably doesn’t contain enough data to fill out the needs of the LLM. Thus you need to scrape the internet for more data in hopes of filling it out.

I very much disagree with the characterization that training an LLM on a book is pirating said book. We might see copyright owners release their materials in the future under licenses that disallow this, which is their right (though it’s not clear to me that any copy is being made). In my opinion there’s not a lot difference between me training an LLM on said book and me using the story as inspiration for my own book. I suspect we’ll never agree on that one.

Pretty amusing that you think scraping published data somehow constitutes surveillance, though.


That’s like says smartphones are fundamentally a surveillance technology. There’s truth to it, but it’s not inherent to the technology. It’s a deliberate act by people using the tech that we allow for whatever reason.


No mentions of TrueNAS (used to be FreeNAS), so I’ll throw that one out there.


Much of science is broke/underfunded and home computers have collectively a shit load of idle computing power on traditional processors and GPUs which are harder to get and more expensive in DCs. The idea of distributed computing for science is sound.

I was mostly disillusioned by the lack of feeling of participation or accomplishment.


We never really get to see the results of our machines’ work, so it fell pointless. Also didn’t help the UX was all terrible.


It’s just another in a long list of things that some grown-ass adults act like is somehow beyond them because that’s easier than trying.

That’s the funny part. It’s not really easier. They have to go through life depending on others to do trivial things for them.

Imagine taking your car to a mechanic for low tire pressure rather than learning how to use the pumps at a gas station.


Ultimately I feel we fixate on every kid knowing computers at some enthusiast level for no reason.

Calling a level of knowledge “enthusiast” is super subjective and I think the author is arguing that bar should be higher. Being able to “use” a computer (IMO and the author’s) should include things like connecting it to a network, reading error messages, following basic instructions, and knowing what basic hardware components do.

Cars are a great example because most people take their car into a tire shop instead of doing it themselves.

Drivers should know how to deal with a flat and check their oil. A lot of people don’t, but they should.

Most people buy food instead of growing or butchering it themselves.

People should know how to cook a decent meal from ingredients. A lot of people don’t, but they should.

All the users that say shit like “make it work” for tools they use every day of their lives are under-educated IMO and should want to learn more about those tools and develop their skills further to make their daily lives easier. I don’t really get why people don’t.


Kind of implies people are really not using full PCs anymore. But I’m wondering if there’s just a lot of new users on-boarded via mobile (and dual-users). Curious what the raw numbers are rather than percentages.