‘DeepSeek DID NOT build OpenAI for USD 5M’ : Bernstein dismisses the Chinese company’s claims

by superadmin January 29, 2025

written by superadmin January 29, 2025 0 comments

DeepSeek, a new AI company seen as a potential rival to OpenAI, has gained massive success across social media, sending global stock markets into a frenzy. However, a recent report by Bernstein cautioned that while the company’s achievements are impressive, the claims of building an AI system comparable to OpenAI for just $5 million are not true.
The report said that the claim is misleading and does not reflect the bigger picture.
Bernstein stated, “We believe that DeepSeek DID NOT “build OpenAI for USD 5M”; the models look fantastic but we don’t think they are miracles; and the resulting Twitter-verse panic over the weekend seems overblown”.
The report said that DeepSeek developed two main AI models: the DeepSeek-V3 and DeepSeek R1. The V3 model, a large language model, utilises a mixture-of-experts (MOE) architecture, which combines multiple smaller models to achieve high performance while using fewer computing resources than traditional large models.
On the other hand, the V3 model boasts 671 billion parameters, with 37 billion active at any given time, and incorporated with innovations like multi-head latent attention (MHLA) to reduce memory usage and mixed-precision training with FP8 computation for greater efficiency.
Was it really $ 5 million?
Training the V3 model involved a cluster of 2,048 NVIDIA H800 GPUs over a two-month period which sums up around 5.5 million GPU hours.
While some estimates put the training cost at approximately $ 5 million, the report highlighted that this figure only considers the computational resources, leaving out the significant costs related to research, experimentation, and other developmental expenses.
The DeepSeek R1 model builds upon the V3’s foundation by using Reinforcement Learning (RL) and other techniques to enhance reasoning capabilities. The R1 model has performed competitively with OpenAI’s models in reasoning tasks. However, Bernstein pointed out that the additional resources required to develop the R1 model were substantial, though not detailed in DeepSeek’s research paper.
Commenting on the hype, Bernstein noted that DeepSeek’s models are impressive.
For example, the V3 model performs as well as or better than other large models in language, coding, and mathematics while consuming a fraction of the computational resources. Pre-training the V3 model required just 2.7 million GPU hours, or only 9 per cent of the compute resources needed for some leading models.
In conclusion, while DeepSeek’s advancements are noteworthy, the report urged caution in the face of exaggerated claims. While the company’s work is groundbreaking, the notion of creating an OpenAI competitor for just $ 5 million appears to be false.

Source link

superadmin

‘Skyforce’ actor Veer Pahariya says he felt ashamed to go to school when his parents got divorced: I wish it doesn’t happen to anyone | Hindi Movie News

About Instatraffics

About Links

Useful Links

Newsletter

Laest News