By

In January 2025, China-founded DeepSeek announced the release of its R1 Large Language Model (LLM), which has massively disrupted global markets. DeepSeek’s R1 self-reported training cost was under $6 million, a tiny fraction of the billions America’s Silicon Valley companies spend to build their AI models.

The result: Tech stocks across the globe suffered massive dips, while US-based Nvidia lost nearly $593B in market value, a record one-day loss for any publicly traded company. But even a spokesperson for Nvidia called DeepSeek’s R1 model an “excellent AI advancement.”

The tech sector continues to experience turmoil. Companies are clamoring to scale while reducing costs. The most relevant question is how DeepSeek will impact the different constituencies within the AI ecosystem.  There will be positives and negatives for various parties, so let’s dive into the impact.

What is DeepSeek?

  • DeepSeek is a Chinese AI company founded in July 2023 and backed by quant fund High-Flyer.
  • DeepSeek-V3 is an open source LLM that’s free to use and modify.
  • DeepSeek researchers started with a base model, which they refer to as DeepSeek-V3-Base.

DeepSeek may have developed its base model using methods pioneered by OpenAI, a private US-based AI research company, and others. Specifically, DeepSeek may have used prompts to interact with OpenAI’s GPT-4 or ChatGPT, using their responses to train its model. This effectively mimics OpenAI’s approach of leveraging existing models for improvement.

The optimization objectives of serving DeepSeek-V3/R1 inference are higher throughput and lower latency.

The DeepSeek model employs cross-node Expert Parallelism (EP) to optimize these objectives.

  • First, EP significantly scales processing batch sizes, enhancing GPU computation efficiency and boosting processing throughput.
  • Second, EP distributes experts across GPUs, with each GPU processing only a small subset of experts (reducing memory access demands), lowering computing latency.

Training vs Inference

To add further context to DeepSeek’s impact, it’s critical to understand the difference between deep learning Training and Inference.

While deep learning can be defined in a lot of different ways, a straightforward definition is that it is basically what we call “machine learning,” in which the models (typically neural networks) are graphed like “deep” structures with multiple layers. Deep learning is used to learn features & patterns that best represent data. It works hierarchically: the top layers learn high-level generic features such as edges, and the low-level layers learn more data-specific features. Deep learning can be applied to various applications, including image classification, text classification, speech recognition, and predicting time series data. Both AI segments are very large, with many believing Inference is the most significant potential addressable market.

Training

Training is the phase in which a network tries to learn from the data.

In training, each layer of data is assigned random weights, and a classifier runs a forward pass through the data, predicting the class labels and scores using those weights. The class scores are then compared against the actual labels, and an error is computed via a loss function. This error is then propagated back into the network, and weights are updated accordingly via a weight update algorithm. A complete pass through all of the training samples is called an epoch. This is computationally expensive as the network only performs a single weight update after going through every sample. In practice, the data is divided into batches, and each weight is updated after each batch. This method takes less time to converge, so we need fewer epochs to run, which results in the machine “learning.”

Training can be sped up using a GPU (or multiple GPUs in parallel) as they’re much faster than CPUs for vector and matrix manipulations.

This is where Nvidia’s key strength has been derived from utilizing multiple GPUs to drive a dramatic improvement in performance. This makes what was traditionally called machine learning and “trains” the models faster, enabling repetitive tasks to be performed highly efficiently.

Inference

Inference-based AI reasoning models generate high volumes of data, so it takes longer to generate the output. Therefore, speed is essential because fast responses engage users, whereas lagging responses cause latency and frustration.

Inference is different because it focuses on how reasoning models are used to infer/predict the testing samples and comprises a similar forward pass as training to predict the values.

Unlike training, it doesn’t include a backward pass through the network to compute the error and update weights. Usually, a model is deployed to predict real-world data at the production phase. Traditional GPU processing can be utilized to speed up your predictions, but the GPU architecture requires a lot of processing overhead necessary for efficient training processing, but it is more costly for Inference workloads. That is why Inference-tuned processors, like Groq’s Language Processing Unit (LPU), are a more cost-effective solution. That platform is designed explicitly for Inference workloads and makes batch prediction way more efficient than predicting a single image by batching multiple samples and executing the batch in a single process. This is a better solution for enterprises with many users that require millions of hits per second by massively reducing latency and process system overhead, hence reducing costs.

DeepSeek is a unique reasoning  AI model because it proves you can significantly improve LLM model reasoning with pure Reinforcement Learning (RL), with no labeled data needed, as used in DeepSeek-R1-Zero. DeepSeek-R1 improved readability, providing the clarity and precision in the results that users required.

Because Deepseek V3 incorporates a more efficient way of processing large amounts of data during the Inference stage, using about half as much memory and improving throughput, it will accelerate the development of new AI models. In summary: this will accelerate Training and Inference by lowering the cost for end users of AI models.

Impact on the AI Ecosystem Constituents

AI Model Developers 

DeepSeek is a significant leap for AI model developers. The model architecture has a more positive impact on Inference than Training.

1) Mixture-of-Experts architecture activates less for Inference workloads (only 37B out of 671B parameters)

2) Distillation of reasoning capabilities: transferring long-chain reasoning capabilities from larger models into smaller, more efficient versions.

3) Reinforcement learning and post-training optimization: improve reasoning by enabling AI models to utilize automated evaluation functions to assess reasoning and outputs, which iteratively improves performance

 4) Multi-head latent attention reduces memory overhead (up to 5-13% memory reduction) and improves memory utilization, enabling more efficient inference

Another benefit is smarter data usage, which positively impacts both training and inference, is that open source models are far less costly than other model developers, such as OpenAI and Anthropic. This will result in more access for model developers to develop more sophisticated AI solutions for users. Hence accelerating innovation.

Infrastructure Manufacturers

Overall, DeepSeek will make software and hardware more efficient and less costly, which will enable model developers to develop on faster infrastructure at lower cost.

However, it will put pricing pressure on infrastructure manufacturers such as Nvidia. This may result in short-term volatility and a pause for chip and infrastructure manufacturers. This has been evident in the recent sell-off of Nvidia stock based on the perception that open-source lower-cost models like DeepSeek will reduce the demand from Hyper Scalers for high-performance, highly priced AI chips.

In contrast, in Nvidia’s latest quarterly earnings report, the company had nothing but good news, with over 70% growth in revenue, yet the company’s market cap slipped below $3 trillion. “Nothing wrong with the fundamentals,” according to Venture Partner Sasha Ostojic of Playground Global, who spent a decade leading Nvidia’s GPU division, in an interview with CNBC.

Even with a product transition from the old generation to the new generation of Blackwell, which may cause a short-term slowing of growth for Nvidia, after the DeepSeek announcement, cap ex forecasts from the Hyperscalers increased their spending forecasts by approximately 20%. This will likely bode well for infrastructure manufacturer demand over the long term.

Data Center Operators

Datacenter providers will continue to benefit from the sheer volume of necessary computational infrastructure capacity, whether for Training or Inference workloads. This is why the real estate play and building data centers have been an accelerating segment of the AI ecosystem. More demand for AI data center infrastructure enables the infrastructure manufacturers’ best-in-class power, security, and space to deliver the full impact of AI.

Energy Providers

Power is a critical aspect of data centers being built at breakneck speed.   As much as investment is pouring into infrastructure, manufactures, and data centers, power is the commodity that fuels it all.  This is why the Hyperscalers are in the market to verticalize their power sources, such as Google purchasing small nuclear reactors. Power providers will have a positive impact as DeepSeek helps to accelerate end-user demand for AI compute cycles.

End Users

End users are the big winners here.  DeepSeek lowers the cost for end users of AI models, further democratizing AI usage for tasks and inquiries.  We are just scratching the surface, and open source models like DeepSeek will help pave the wave for accelerated adoption.

Other Factors and Risks

DeepSeek is an open-source AI model under Chinese control. This has several layers of geopolitical implications.  There will be disputes about intellectual property rights  (IP) and enforcement of IP law globally.  For example, there are current claims that DeepSeek committed IP theft by distilling OpenAI models.

Every developed country is in the AI race, and the US Commerce Department is appropriately taking a very cautious view to export licensing of key AI software and hardware technologies.  Tariffs are also top of mind as a concern because there is a lack of clarity, resulting in uncertainty.   This uncertainty will continue to have an impact on model developers, infrastructure manufacturers, and end users.

Conclusion

There is a broad range of perspectives on how DeepSeek will impact the AI ecosystem, but it is clear that it already has had a profound impact, with different impacts on different players in the AI space. Given all of the variables, DeepSeek offers many opportunities that will significantly positively impact the AI ecosystem, but not without many variables that pose significant risks and uncertainties.

Kenton Chow

Kenton Chow joined FLG in 2015 and has over 25 years of CFO experience as both a CFO,  COO & Board member of various private and public technology companies, including software, fintech, biometric identification, big data analytics, IoT, e-commerce, security, retail, networking, and computer systems. Kenton (Ken) adds value to companies by leveraging…Read More