Chinese AI startup DeepSeek launches DeepSeek-V3, a massive 671-billion parameter model, shattering benchmarks and rivaling top proprietary systems. Its ultra-efficient, cost-effective design—developed for just $5.57 million—presents a compelling alternative to multi-million dollar models like GPT-4. With open-source availability on GitHub, this model sets the stage for democratizing AI innovation.

DeepSeek V3 is a big deal for a number of reasons.

At only $5.5 million to train, it’s a fraction of the cost of models from OpenAI, Google, or Anthropic which are often in the hundreds of millions.

It breaks the whole AI as a service business model that OpenAI and Google have been pursuing making state-of-the-art language models accessible to smaller companies, research institutions, and even individuals.

The code is publicly available, allowing anyone to use, study, modify, and build upon it. Companies can integrate it into their products without paying for usage, making it financially attractive. The open-source nature fosters collaboration and rapid innovation.

The model goes head-to-head with and often outperforms models like GPT-4o and Claude-3.5-Sonnet in various benchmarks. It excels in areas that are traditionally challenging for AI, like advanced mathematics and code generation. Its 128K token context window means it can process and understand very long documents. Meanwhile it processes text at 60 tokens per second, twice as fast as GPT-4o.

The Mixture-of-Experts (MoE) approach used by the model is key to its performance. While the model has a massive 671 billion parameters, it only uses 37 billion at a time, making it incredibly efficient. Compared to Meta’s Llama3.1 (405 billion parameters used all at once), DeepSeek V3 is over 10 times more efficient yet performs better.

DeepSeek V3 can be seen as a significant technological achievement by China in the face of US attempts to limit its AI progress. China once again demonstrates that resourcefulness can overcome limitations.

smpl
link
fedilink
816d

Bla bla bla…

Model #Total Params #Activated Params Context Length
DeepSeek-V3-Base 671B 37B 128K
DeepSeek-V3 671B 37B 128K
Create a post

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.


Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.


Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

  • 1 user online
  • 41 users / day
  • 134 users / week
  • 319 users / month
  • 2.32K users / 6 months
  • 1 subscriber
  • 3.01K Posts
  • 43.3K Comments
  • Modlog