Most scientific materials compress reasoning, presenting conclusions while omitting the derivational chains that justify them. This compression hinders verification by lacking explicit, step-wise justifications and inhibits cross-domain links by collapsing the very pathways that establish the logical and causal connections between concepts. We introduce a scalable framework that decompresses scientific reasoning, constructing a verifiable Long Chain-of-Thought (LCoT) knowledge base and projecting it into an emergent encyclopedia, SciencePedia. Our pipeline operationalizes an endpoint-driven, reductionist strategy: a Socratic agent, guided by a curriculum of around 200 courses, generates approximately 3 million first-principles questions. To ensure high fidelity, multiple independent solver models generate LCoTs, which are then rigorously filtered by prompt sanitization and cross-model answer consensus, retaining only those with verifiable endpoints. This verified corpus powers the Brainstorm Search Engine, which performs inverse knowledge search -- retrieving diverse, first-principles derivations that culminate in a target concept. This engine, in turn, feeds the Plato synthesizer, which narrates these verified chains into coherent articles. The initial SciencePedia comprises approximately 200,000 fine-grained entries spanning mathematics, physics, chemistry, biology, engineering, and computation. In evaluations across six disciplines, Plato-synthesized articles (conditioned on retrieved LCoTs) exhibit substantially higher knowledge-point density and significantly lower factual error rates than an equally-prompted baseline without retrieval (as judged by an external LLM). Built on this verifiable LCoT knowledge base, this reasoning-centric approach enables trustworthy, cross-domain scientific synthesis at scale and establishes the foundation for an ever-expanding encyclopedia.

This paper comes up with a really clever architectural solution to LLM hallucinations, especially for complex, technical topics. The core idea is that all our knowledge, from textbooks to wikis, is “radically compressed”. It gives you the conclusions but hides all the step-by-step reasoning that justifies them. They call it a vast, unrecorded network of derivations the “intellectual dark matter” of knowledge. LLMs being trained on this compressed, conclusion-oriented data is one reason why they fail so often. When you ask them to explain something deeply, they just confidently hallucinate plausible-sounding “dark matter”.

The solution the paper demonstrates is to use a massive pipeline to “decompress” all of the steps and make the answer verifiable. It starts with a “Socrates agent” that uses a curriculum of about 200 university courses to automatically generate around 3 million first-principles questions. Then comes the clever part, which is basically a CI/CD pipeline for knowledge. To stop hallucinations, they run every single question through multiple different LLMs. If these models don’t independently arrive at the exact same verifiable endpoint, like a final number or formula, the entire question-and-answer pair is thrown in the trash. This rigorous cross-model consensus filters out the junk and leaves them with a clean and verified dataset of Long Chains-of-Thought (LCoTs).

The first benefit of having such a clean knowledge base is a “Brainstorm Search Engine” that performs “inverse knowledge search”. Instead of just searching for a definition, you input a concept and the engine retrieves all the diverse, verified derivational chains that lead to that concept. This allows you to explore a concept’s origins and see all the non-trivial, cross-disciplinary connections that are normally hidden. The second and biggest benefit is the “Plato” synthesizer, which is how they solve hallucinations. Instead of just generating an article from scratch, it first queries the Brainstorm engine to retrieve all the relevant, pre-verified LCoT “reasoning scaffolds”. Its only job is then to narrate and synthesize those verified chains into a coherent article.

The results are pretty impressive. The articles generated this way have significantly higher knowledge-point density and, most importantly, substantially lower factual error rates, reducing hallucinations by about 50% compared to a baseline LLM. They used this framework to automatically generate “SciencePedia,” an encyclopedia with an initial 200,000 entries, solving the “cold start” problem that plagues human-curated wikis. The whole “verify-then-synthesize” architecture feels like it could pave the way for AI systems that are able to produce verifiable results and are therefore trustworthy.

Arthur Besse
link
fedilink
English
02d

standard Cole Phelps L.A. Noire (X) Doubt meme

☆ Yσɠƚԋσʂ ☆
creator
link
fedilink
12d

Which aspect of their approach do you doubt?

Arthur Besse
link
fedilink
English
12d

the leap from “lower factual error rates than an equally-prompted baseline without retrieval (as judged by an external LLM)” to “enables trustworthy, cross-domain scientific synthesis at scale and establishes the foundation for an ever-expanding encyclopedia

☆ Yσɠƚԋσʂ ☆
creator
link
fedilink
12d

I mean if you have a verifiable set of steps that build from the answer to first principles, that does seem to enable trust worthiness. Specifically because it makes it possible for a human to follow the chain and verify it as well. This is basically what underpins the scientific method and how we compensate for the biases and hallucinations that humans have. You have a reproducible set of steps that can explained and followed. And what they’re building is very useful because it lets you apply this method to many problems where it would’ve been simply too much effort to do manually.

Create a post

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.


Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.


Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

  • 1 user online
  • 26 users / day
  • 127 users / week
  • 364 users / month
  • 1.46K users / 6 months
  • 1 subscriber
  • 4.31K Posts
  • 49.5K Comments
  • Modlog