Most people in the field know that models usually fall apart after a few hundred steps because small errors just keep adding up until the whole process is ruined. The paper proposes a system called MAKER which uses a strategy they call massively decomposed agentic processes. Instead of asking one big model to do everything they break the entire task down into the smallest possible tiny pieces so each microagent only has to worry about one single move.
For their main test they used a twenty disk version of the Towers of Hanoi puzzle which actually requires over a million individual moves to finish. They found that even small models can be super reliable if you set them up correctly. One of the main tricks they used is a voting system where multiple agents solve the same tiny subtask and the system only moves forward once one answer gets a specific number of votes more than the others. This acts like a safety net that catches random mistakes before they can mess up the rest of the chain.
Another interesting part of their approach is red flagging which is basically just throwing away any response that looks suspicious or weird. If a model starts rambling for too long or messes up the formatting they just discard that attempt and try again because those kinds of behaviors usually mean the model is confused and likely to make a logic error. By combining this extreme level of task breakdown with constant voting and quick discarding of bad samples they managed to complete the entire million step process with zero errors.
And it turns out that you do not even need the most expensive or smartest models to do this since relatively small ones performed just as well for these tiny steps. Scaling up AI reliability might be more about how we organize the work rather than just making the models bigger and bigger. They even did some extra tests with difficult math problems like large digit multiplication and found that the same recursive decomposition and voting logic worked there as well.

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.
Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.
Rules:
1: All Lemmy rules apply
2: Do not post low effort posts
3: NEVER post naziped*gore stuff
4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.
5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)
6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist
7: crypto related posts, unless essential, are disallowed