Introducing DroPE: a simple method for extending a pretrained language model's usable context without long-context fine-tuning.

This paper is one of the more interesting takes on context extension I have seen in a while because it challenges the assumption that we need explicit positional encodings during inference. The authors make a case that embeddings like RoPE act more like scaffolding during construction rather than a permanent load bearing wall. The idea is that these embeddings are crucial for getting the model to converge and learn language structure initially, but they eventually turn into a hard constraint that prevents the model from generalizing to sequence lengths it has never seen before.

The methodology is surprisingly straightforward since they just take a pretrained model and completely drop the positional embeddings before running a very quick recalibration phase. This process essentially converts the architecture into a NoPE or No Positional Embedding model where the attention mechanism has to rely on the latent positioning it learned implicitly. It turns out that once you remove the explicit constraints of RoPE the model can extrapolate to context windows significantly longer than its training data without the perplexity explosions we usually see.

It is pretty wild to see this outperform techniques like YaRN on benchmarks like Needle In A Haystack while using a fraction of the compute. I think this suggests that Transformers are much better at understanding relative positions from semantic cues than we give them credit for. If this holds up it means we might be wasting a lot of resources trying to engineer complex interpolation methods when the answer was just to take the training wheels off once the model knows how to ride.

Create a post

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.


Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.


Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

  • 1 user online
  • 37 users / day
  • 150 users / week
  • 331 users / month
  • 1.37K users / 6 months
  • 1 subscriber
  • 4.56K Posts
  • 50.6K Comments
  • Modlog