QwQ-32B is a 32 billion parameter language model achieves comparable performance to DeepSeek-R1 with 671 billion parameters, using reinforcement learning for scaling

☆ Yσɠƚԋσʂ ☆

Ultimately what matters is whether it gets the correct answer or not. It’s interesting that yours wasn’t able to do the strawberry test while mine did it with very short thinking cycle.

@[email protected]

Ultimately what matters is whether it gets the correct answer or not.

That’s… not true at all. It had the right answer, to most of the questions I asked it, just as fast as R1, and yet it kept saying “but wait! maybe I’m wrong”. It’s a huge red flag when the CoT is just trying to 1000 monkeys a problem.

While it did manage to complete the strawberry problem when I adjusted the top_p/top_k, I was using the previous values with other models I’ve tested and never had a CoT go that off kilter before. And this is considering even the 7B Deepseek model was able to get the correct answer for 1/4 of the vram.

☆ Yσɠƚԋσʂ ☆

It’s true for me. I generally don’t read through the think part. I make the query, do something else, and then come back to see what the actual output it. Overall, I find it gives me way better answers than I got with the version of R1 I was able to get running locally. Turns out the settings do matter though.

QwQ-32B is a 32 billion parameter language model achieves comparable performance to DeepSeek-R1 with 671 billion parameters, using reinforcement learning for scaling

QwQ-32B is a 32 billion parameter language model achieves comparable performance to DeepSeek-R1 with 671 billion parameters, using reinforcement learning for scaling

QwQ-32B: Embracing the Power of Reinforcement Learning

Technology

QwQ-32B is a 32 billion parameter language model achieves comparable performance to DeepSeek-R1 with 671 billion parameters, using reinforcement learning for scalingplus-square

QwQ-32B is a 32 billion parameter language model achieves comparable performance to DeepSeek-R1 with 671 billion parameters, using reinforcement learning for scalingplus-square

QwQ-32B: Embracing the Power of Reinforcement Learning

Technology

QwQ-32B is a 32 billion parameter language model achieves comparable performance to DeepSeek-R1 with 671 billion parameters, using reinforcement learning for scaling

QwQ-32B is a 32 billion parameter language model achieves comparable performance to DeepSeek-R1 with 671 billion parameters, using reinforcement learning for scaling