AI can’t even run a vending machine – Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agents

@[email protected]

It’s well worth reading the entire paper. It’s one of the funniest things I’ve ever read.

@[email protected]

It definitely was. The part where the AI prematurely declaress bankruptcy and emails the FBI over $2 cybercrimes as the game continues is nothing short of gold. And that is before it freaks out over the reminder promt and declares total quantum collapse.

@[email protected]

My new baseless theory: We know that AI is trained on tons of novels and fictional stories. Is it possible that because all novels have significant conflicts and drama, and stories where some person just boringly does his boring job forever aren’t exactly bestsellers, the AI is maybe trying to inject drama even when it makes no sense, since it’s been conditioned that way through the training data? So it’s seeing these inconsequential issues and since every novel it’s ever “read” turns them into massive conflicts, it’s trying to follow suit?

@[email protected]

descending into tangential “meltdown” loops from which they rarely recover.

Dam it just like me fr

@[email protected]

Vendotron, please give me a Snickers bar.

Vendotron: Dispensing black licorice. Have a nice day!

davel [he/him]

Screwdrivers can’t even hammer nails.

@[email protected]

Actually… if you flip it…

So I’d rather argue that hammer can even screw screws.

@[email protected]

“You call yourself a beverage machine?!”

“I call myself Bev.”

@[email protected]

Why would a vending machine ever need AI?

@[email protected]

Real answer, surge or scarcity pricing.

@[email protected]

Totally unnecessary. A simple price/demand curve can easily be written in a few lines of code.

@[email protected]

But your basic algorithms cannot tell if Debbie just broke up with her BF and would totally spend all seven dollars in her purse for that late night candy barjust to bury the pain under something positive now could it?!

@[email protected]

It wouldn’t, a simple finite state machine that any intelligent entity could emulate would be enough.

But people have completely deluded themselves into thinking that (what CEOs and marketers call) “AI” is actually intelligent, and this case study shows how preposterous that fantasy actually is.

knightly the Sneptaur

I really hope people are starting to catch on, large language models aren’t “intelligent”, they’re multidimensional maps of human language use and querying them is just tracing a vector “forward” through language-space from the starting point of a prompt.

It’s the reification fallacy writ so large it’s eclipsing entire national economies. Human intelligence isn’t in language, language is a product of human intelligence. The map is not the territory.

And yeah, it is pretty cool that we have the processing power to map out language-space well enough to draw some vectors that remain coherent over thousands of tokens, but using a billion-parameter model to do what could be accomplished with probably-already-existing management software and a few seconds of CPU time per week is as wasteful as it is misguided.

@[email protected]

In the same way your fridge needs a web browser.

Though the point of this is probably not that it will be a viable product, but managing a vending machine is one of those seemingly easy and straightforward tasks that make good starting applications to test the AI with. Basically, if it can’t even handle something as simple as a vending machine, it definitely can’t be trusted with anything more complex.

@[email protected]

AI can’t even run a vending machine – Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agentsplus-square

AI can’t even run a vending machine – Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agentsplus-square

Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agents

Technology

AI can’t even run a vending machine – Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agents

AI can’t even run a vending machine – Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agents