Agentic crafting requires LLMs to operate in real-world environments over multiple turns by taking actions, observing outcomes, and iteratively refining artifacts. Despite its importance, the open-source community lacks a principled, end-to-end ecosystem to streamline agent development. We introduce the Agentic Learning Ecosystem (ALE), a foundational infrastructure that optimizes the production pipeline for agentic model. ALE consists of three components: ROLL, a post-training framework for weight optimization; ROCK, a sandbox environment manager for trajectory generation; and iFlow CLI, an agent framework for efficient context engineering. We release ROME, an open-source agent grounded by ALE and trained on over one million trajectories. Our approach includes data composition protocols for synthesizing complex behaviors and a novel policy optimization algorithm, Interaction-Perceptive Agentic Policy Optimization (IPA), which assigns credit over semantic interaction chunks rather than individual tokens to improve long-horizon training stability. Empirically, we evaluate ROME within a structured setting and introduce Terminal Bench Pro, a benchmark with improved scale and contamination control. ROME demonstrates strong performance across benchmarks like SWE-bench Verified and Terminal Bench, proving the effectiveness of ALE.

The core thesis of this paper is that the AI community needs to stop treating autonomous agents as just another text generation problem and start building comprehensive infrastructure to support closed loop learning. The authors argue that achieving reliable agentic behavior requires a full stack ecosystem that unifies data synthesis with sandboxed execution and specialized reinforcement learning. To prove this point they introduce the Agentic Learning Ecosystem which consists of an RL framework called ROLL alongside a sandbox manager named ROCK and an agent interface known as iFlow CLI. They believe that isolating models in static training environments is a dead end for solving complex real world workflows.

The team developed an open source model named ROME using a tightly integrated training pipeline with reproducible execution environments which allowed a relatively small 30 billion parameter model to rival or beat massive proprietary models exceeding 100 billion parameters on difficult software engineering benchmarks.

A big part of their argument rests on the idea that credit assignment in reinforcement learning needs to change. They propose a novel algorithm called Interaction Perceptive Agentic Policy Optimization which shifts the reward focus from individual text tokens to broader semantic interaction chunks. This chunk level optimization stabilizes the training process over long horizons and prevents the policy collapse often seen in complex tool use scenarios.

We’re increasingly seeing a shift of priorities away from raw data scale and focus on the systematic infrastructure as the actual bedrock of next generation models.

Create a post

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.


Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.


Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

  • 1 user online
  • 34 users / day
  • 135 users / week
  • 461 users / month
  • 1.49K users / 6 months
  • 1 subscriber
  • 4.78K Posts
  • 52K Comments
  • Modlog