AI can’t even run a vending machine – Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agents

@[email protected]

It’s well worth reading the entire paper. It’s one of the funniest things I’ve ever read.

@[email protected]

It definitely was. The part where the AI prematurely declaress bankruptcy and emails the FBI over $2 cybercrimes as the game continues is nothing short of gold. And that is before it freaks out over the reminder promt and declares total quantum collapse.

@[email protected]

My new baseless theory: We know that AI is trained on tons of novels and fictional stories. Is it possible that because all novels have significant conflicts and drama, and stories where some person just boringly does his boring job forever aren’t exactly bestsellers, the AI is maybe trying to inject drama even when it makes no sense, since it’s been conditioned that way through the training data? So it’s seeing these inconsequential issues and since every novel it’s ever “read” turns them into massive conflicts, it’s trying to follow suit?

AI can’t even run a vending machine – Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agentsplus-square

AI can’t even run a vending machine – Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agentsplus-square

Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agents

Technology

AI can’t even run a vending machine – Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agents

AI can’t even run a vending machine – Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agents