QwQ-32B: Embracing the Power of Reinforcement Learning
qwenlm.github.io
external-link
QWEN CHAT Hugging Face ModelScope DEMO DISCORD Scaling Reinforcement Learning (RL) has the potential to enhance model performance beyond conventional pretraining and post-training methods. Recent studies have demonstrated that RL can significantly improve the reasoning capabilities of models. For instance, DeepSeek R1 has achieved state-of-the-art performance by integrating cold-start data and multi-stage training, enabling deep thinking and complex reasoning. Our research explores the scalability of Reinforcement Learning (RL) and its impact on enhancing the intelligence of large language models.
☆ Yσɠƚԋσʂ ☆
creator
link
fedilink
01M

Ultimately what matters is whether it gets the correct answer or not. It’s interesting that yours wasn’t able to do the strawberry test while mine did it with very short thinking cycle.

@[email protected]
link
fedilink
English
21M

Ultimately what matters is whether it gets the correct answer or not.

That’s… not true at all. It had the right answer, to most of the questions I asked it, just as fast as R1, and yet it kept saying “but wait! maybe I’m wrong”. It’s a huge red flag when the CoT is just trying to 1000 monkeys a problem.

While it did manage to complete the strawberry problem when I adjusted the top_p/top_k, I was using the previous values with other models I’ve tested and never had a CoT go that off kilter before. And this is considering even the 7B Deepseek model was able to get the correct answer for 1/4 of the vram.

☆ Yσɠƚԋσʂ ☆
creator
link
fedilink
01M

It’s true for me. I generally don’t read through the think part. I make the query, do something else, and then come back to see what the actual output it. Overall, I find it gives me way better answers than I got with the version of R1 I was able to get running locally. Turns out the settings do matter though.

Create a post

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.


Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.


Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

  • 1 user online
  • 16 users / day
  • 60 users / week
  • 315 users / month
  • 1.87K users / 6 months
  • 1 subscriber
  • 3.34K Posts
  • 45.3K Comments
  • Modlog