Large language models (LLMs) often fail to learn effective long chain-of-thought (Long CoT) reasoning from human or non-Long-CoT LLMs imitation. To understand this, we propose that effective and learnable Long CoT trajectories feature stable molecular-like structures in unified view, which are formed by three interaction types: Deep-Reasoning (covalent-like), Self-Reflection (hydrogen-bond-like), and Self-Exploration (van der Waals-like). Analysis of distilled trajectories reveals these structures emerge from Long CoT fine-tuning, not keyword imitation. We introduce Effective Semantic Isomers and show that only bonds promoting fast entropy convergence support stable Long CoT learning, while structural competition impairs training. Drawing on these findings, we present Mole-Syn, a distribution-transfer-graph method that guides synthesis of effective Long CoT structures, boosting performance and RL stability across benchmarks.

This paper is honestly one of the most creative takes on LLM reasoning I’ve seen in a while. The team at ByteDance basically argues that we should view Long Chain-of-Thought as a macromolecular structure with internal forces that hold the logic together. They found that when we try to teach a model to reason by simply distilling keywords from a teacher, it fails because it’s like trying to build a protein by looking at a photo of it rather than understanding the atomic bonds.

Their Molecular Structure of Thought hypothesis breaks reasoning down into three specific bond types that behave similarly to their chemical counterparts. Deep reasoning acts like covalent bonds, forming the rigid primary backbone where each logical step must strictly justify the next. Self-reflection functions like hydrogen bonds, creating folding patterns where the model looks back 100 steps to audit an earlier premise, which keeps it from hallucinating. Finally, you have self-exploration acting like van der Waals forces, these are low-commitment bridges that let the model probe different ideas without getting stuck in a rigid path too early.

They found that most synthetic reasoning data is actually trash because it lacks this distribution. They proved that models don’t actually learn the keywords themselves, but the characteristic reasoning behaviors those keywords represent. In one experiment, they replaced keywords like wait with arbitrary synonyms or removed them entirely, and the models still learned the reasoning structure just fine. It turns out that building these stable thought molecules is what creates the basis for Long CoT, as opposed to just mimicking a specific vibe or prompt format.

They built MOLE-SYN to address the problem. Instead of just copying teacher outputs, it uses a distribution transfer graph to walk through four behavioral states to synthesize traces that have the correct bond profile from the start. Their approach makes reinforcement learning much more stable because the model starts with a balanced skeleton instead of a bunch of fragmented logic. The paper challenges the whole more data is better mindset to argue that it’s the geometry of the information flow that really matters.

Create a post

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.


Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.


Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

  • 1 user online
  • 24 users / day
  • 137 users / week
  • 525 users / month
  • 1.5K users / 6 months
  • 1 subscriber
  • 4.73K Posts
  • 51.9K Comments
  • Modlog