OPINION

Commentary: Answering the Nine Crucial Questions Raised by DeepSeek

Published: Feb. 7, 2025 7:50 p.m. GMT+8

00:00

00:00/00:00

Listen to this article 1x

DeepSeek has shaken the global AI industry, leaving people asking if it’ll reshape investment, the tech’s applications and the vast chip market.

The sudden rise of DeepSeek has sparked heated discussions about the resilience of Chinese artificial intelligence (AI) companies under resource constraints and a series of key questions. These include whether the Hangzhou-based startup behind the model will need more computational power as its AI applications expand, what innovations it has made in its research and development and whether its lower-cost model training methods will lead to stricter U.S. export restrictions.

You've accessed an article available only to subscribers

Subscribe today for just $.99.

VIEW OPTIONS

Unlock exclusive discounts with a Caixin group subscription — ideal for teams and organizations.

Subscribe to both Caixin Global and The Wall Street Journal — for the price of one.

Share this article

Open WeChat and scan the QR code

DIGEST HUB

Digest Hub

Back

Explore the story in 30 seconds

DeepSeek, a Chinese AI company, uses cost-efficient training methods using 2,048 Nvidia H800 GPUs, significantly reducing training costs compared to competitors like Meta’s Llama 3.
Despite efficient methods, growing AI applications may increase DeepSeek's demand for computational power and could prompt a reconsideration of the AI investment paradigm from ‘computational power’ to ‘algorithm efficiency.’
DeepSeek's ascent has instigated concerns over the U.S. tightening export restrictions and may influence open-source software ecosystems by lowering AI technology usage thresholds.

AI generated, for reference only

Explore the story in 3 minutes

DeepSeek, a Hangzhou-based AI startup, has created significant discourse regarding the resilience of Chinese AI firms under resource constraints and the implications of their innovative, cost-effective model training on U.S. export policies. As its AI applications grow, the company's need for additional computational power will likely increase, even though presently, its training cost-efficient methods are working well with limited resources [para. 1][para. 2].

DeepSeek's AI model, DeepSeek-V3, trained using 2,048 Nvidia H800 GPUs, achieved a lower training cost of $5.576 million compared to foreign peers like Meta's Llama 3, which required much greater computational resources. However, as its models become more popular, the demand for AI inference is expected to necessitate additional computational infrastructure [para. 3][para. 4].

Globally, tech giants continue to significantly invest in AI, with Meta Platforms Inc. anticipating $60-65 billion in capital expenditure mostly for AI infrastructure, and OpenAI, in partnership with Oracle and SoftBank, planning a $500 billion investment for AI development in the U.S. This indicates that the current AI computational power growth paradigm, driven largely by the exploration of AGI applications, remains unchanged [para. 4][para. 5][para. 6].

The evolving AI market may shift focus from a "computational power arms race" to improving "algorithm efficiency," driving innovations in developing models. Open-source licensing will facilitate wider innovation, potentially altering the traditional large model company dynamics. AI commercial deployment at an enterprise scale is expected by 2025 [para. 7][para. 8].

DeepSeek's effective model training using lower-end Nvidia chips, like the A100 and H800, could impact Nvidia's high-end GPU demand, reshaping the GPU market landscape. Their approach demonstrates that Chinese companies can competitively utilize lower-end chips to rival Western-developed models in consumer sectors [para. 9].

A January report highlighted that DeepSeek's reported $5.57 million training cost does not account for the total expenses on research, development, and hardware, which could have reached $500 million. Total ownership costs might reach $2.57 billion over four years, primarily due to their efficient training methods and innovative architectures, such as their model's reduced inference costs [para. 10][para. 11].

DeepSeek's innovations in areas like model architecture and inference efficiency employing technologies such as Mixture of Experts and Multi-Head Latent Attention have notably enhanced its performance. Particularly remarkable is its R1-Zero model's use of pure reinforcement learning without supervised fine-tuning, effectively doubling the speed of ChatGPT and providing superior results in computational tasks [para. 12][para. 13].

The company is also poised to accelerate end-device AI development through its cost-efficient models, promoting businesses to integrate AI applications directly into devices, though this evolution will proceed gradually due to ecosystem challenges [para. 14][para. 15].

Following its global impact, there are concerns that the U.S. might tighten technology export restrictions, potentially limiting high-end AI chip exports, restricting open-source large models, and blocking access to significant datasets for Chinese companies [para. 16][para. 17].

Lastly, while traditionally open-source models lagged behind closed-source ones in performance, DeepSeek's success with open-source models could democratize AI technology development, encouraging other firms to reconsider their business strategies [para. 18].

AI generated, for reference only

Who’s Who

DeepSeek: DeepSeek is a Hangzhou-based AI company known for its cost-efficient model training using fewer resources. Its 671-billion-parameter DeepSeek-V3 model was trained on 2,048 Nvidia H800 GPUs, costing $5.576 million. The firm innovates with technologies like Mixture of Experts and Multi-Head Latent Attention. While poised to influence the AI and chip markets, its rise may prompt stricter U.S. tech export restrictions.

Nvidia Corp.: Nvidia Corp. holds a 90% market share in the global GPU market as of Q3 2024, with the H100 GPU being a key product. DeepSeek's achievement using lower-end Nvidia chips like the A100 and H800 suggests potential impacts on the demand for higher-end Nvidia GPUs in fields like cloud computing and sovereign AI, despite competition from Chinese companies in the large model market using these less advanced GPUs.

Meta Platforms Inc.: Meta Platforms Inc. is investing heavily in AI infrastructure, with plans for a capital expenditure between $60 billion and $65 billion in 2025. The investment aims to support the development of next-generation models like GPT-5 and Llama 4. Meta's commitment highlights the ongoing exploration of cutting-edge AI applications demanding substantial computational power.

Oracle Corp.: Oracle Corp. is mentioned as a partner with OpenAI and SoftBank Group Corp. in the Stargate Project. They plan to invest $500 billion over the next four years to build new AI infrastructure in the U.S., reflecting a significant commitment to the development of cutting-edge AI applications requiring substantial computational power.

SoftBank Group Corp.: SoftBank Group Corp. is mentioned in the article as a partner in the Stargate Project, alongside Oracle Corp. and OpenAI. The project plans to invest $500 billion over the next four years to build new AI infrastructure in the U.S. This initiative highlights SoftBank's involvement in developing cutting-edge AI applications which require massive computational power.

Microsoft Corp.: Microsoft Corp., a supporter of OpenAI, has recently made the distilled DeepSeek-R1 models available for use on its Copilot+ PCs. This move suggests Microsoft's interest in leveraging cost-effective AI models to enhance computing capabilities and user experience on its devices.

Google LLC: The article mentions Google LLC in relation to the projected growth in capital expenditures among major U.S. tech companies. It indicates that these investments are aimed at developing next-generation AI models like GPT-5 and Llama 4, with a 19.6% expected growth in capital expenditures this year.

Amazon.com Inc.: The article mentions that Amazon.com Inc. is one of the five major U.S. tech companies whose combined capital expenditures are expected to grow by 19.6% this year, with a significant portion allocated for developing next-generation AI models like GPT-5 and Llama 4.

Apple Inc.: The article briefly mentions Apple Inc., stating that the development of smart hardware, such as AI-powered end devices, is gradual. Although improvements in model capabilities are part of the process, the anticipation for AI-powered devices like smartphones by 2025 should remain measured, highlighting the ongoing challenges in ecosystem coordination.

ByteDance Ltd.: The article mentions ByteDance Ltd. as one of the major global companies focused on developing closed-source models. It suggests that while closed-source models generally outperform open-source ones in capabilities, the emergence of open-source competitors like DeepSeek could prompt companies like ByteDance to reconsider their business models.

Baidu Inc.: The article mentions that Baidu Inc., alongside ByteDance Ltd., is focusing on developing closed-source AI models, which generally outperform open-source models in capabilities. However, with the rise of DeepSeek, which offers open-source models that rival advanced closed-source models, there might be a shift in business models and technological strategy among AI companies.

Alibaba Group Holding Ltd.: Alibaba Group Holding Ltd. is focusing on developing open-source AI models. This aligns with the global trend where some major companies are prioritizing open-source development, although traditionally, closed-source models have been regarded as more capable. The open-source approach by companies like Alibaba can help lower the entry barriers for AI technology usage and promote widespread adoption, benefiting from decreasing marginal costs associated with open-source models.

AI generated, for reference only

What Happened When

December 2024：: DeepSeek published a technical report stating its DeepSeek-V3 model training costs and details.

End of 2024：: The U.S. released an export control policy restricting the transfer of models trained in third-party countries to China.

January 2025：: A report by SemiAnalysis detailed DeepSeek's GPU investments and estimated total costs over the next four years.

January 21, 2025：: OpenAI, in partnership with Oracle Corp. and SoftBank Group Corp., launched the Stargate Project.

January 24, 2025：: Mark Zuckerberg announced Meta Platforms Inc. expects significant capital expenditure in 2025 towards building AI infrastructure.

AI generated, for reference only

GALLERY

: Gallery: China Celebrates Golden Week

PODCAST

Caixin Deep Dive: Chinese Local Governments Risk Replicating Mistakes of LGFVs

00:00

00:00/00:00

MOST POPULAR

1: Caixin Summit: China’s Property Transition Will Take at Least Five More Years, Ex-Finance Chief Says

2: China Growth Momentum Cools as Key Data Disappoint

3: Cover Story: How China Unlocked Simandou to Reshape the Global Iron Ore Trade

4: China Secondhand Home Prices Fall in All Major Cities for Second Month

5: Chinese Shippers Spend Billions on New Fleet for Guinea’s Giant Simandou Iron Ore Mine