What Really Has Changed?
An explanation for normies on the events shaking the world of AI
A Chinese-built large language model called DeepSeek-R1 launched last week and has created pandemonium in the markets and AI community. For those who didn’t spend most of their past weekend reading up on these developments, below is a summary for what is going on and how I think the implications may play out. As always, this is not aimed at the very technical or very online. And if you want a quick takeaway just read the first three paragraphs below and then skip ahead to the last section.
What happened
Spun out of a hedge fund, DeepSeek, a Chinese AI company, released an AI model (DeepSeek-R1), which outperformed major rivals like ChatGPT, despite being built on a shoestring budget. Experts estimate that it cost around $6 million to rent the hardware needed to train the model, which is a fraction of the cost used by US based AI models. Part of the buzz around DeepSeek is that it has succeeded in making this model despite US export controls that limit Chinese firms’ access to the best computer chips designed for AI processing.
This marks a potentially transformative moment in the artificial intelligence sector. With its development of cost-effective and high-performing models, such as R1, the company has disrupted the traditional AI ecosystem dominated by Western tech giants like OpenAI, Google, and Meta. By prioritizing efficiency and affordability, the company has challenged the notion that cutting-edge AI requires massive compute resources and exorbitant costs. It demonstrates that with the right optimization strategies, smaller teams can rival the performance of larger players.
Moreover, DeepSeek’s commitment to open-source development has amplified its influence. Open-source models lower the barriers to entry for smaller players and accelerate innovation across industries. This, however, comes at a cost: the erosion of margins for AI companies reliant on proprietary models. The economic pressure this creates could reshape the financial structures of AI development, shifting the focus from monetizing models to deriving value through applications and services built on top of them.
Geopolitical Impact: China’s advantage
If AI models turn into Commodities, then the world will build their applications on those models. If China has the best one then they will control the base layer of global AI applications
What gives China its edge is commoditization of goods and services. As the cost of deploying and utilizing advanced AI systems decreases, AI’s value increasingly shifts from proprietary models to infrastructure, data, and applications. This plays directly into China’s strengths. With its robust manufacturing ecosystem, lower labor costs, and well-established supply chains for hardware and components, China is uniquely positioned to capitalize on the commoditization trend.
It can undercut competitors on price while expanding its influence in global markets. This mirrors broader trends in industries like electronics and solar energy, where China has achieved dominance through scale and cost efficiency.
For the West, where AI companies often rely on high margins from proprietary technologies, this commoditization poses a challenge. It undermines the economic model that has historically fueled AI research and development, forcing companies to rethink their strategies to remain competitive. China’s ability to leverage cost structures and rapidly scale commoditized solutions could make it the dominant force in shaping AI’s global adoption and standards.
Market Reactions and Economic Implications
The current market turbulence reflects investor concerns about the future profitability of companies that have heavily invested in AI infrastructure. The realization that high-performing AI models can be developed at a fraction of the previously assumed costs challenges the anticipated return on investment for these firms. Consequently, there is a growing apprehension that the substantial capital expenditures by major tech companies may not yield the expected competitive advantages, leading to potential reevaluations of their investment strategies.
Implications
There is a popular refrain in the AI world that none of these AI companies truly have a moat (or defensibility) against competitors. This DeepSeek news has turbo charged those views. Plus now we have to worry that China will be the place which leads in AI innovation. Lots to think through, below are some interesting takes from very smart people
Get ready to hear a lot of the the phrase ‘Jevon’s Paradox’:
If building AI models becomes cheaper, than it’s not so much that people will buy fewer chips and build fewer data centers but rather keep buying more since they can now do more with their purchases.
This is the stance the big tech companies are taking. Last night Microsoft CEO Satya Nadella tweeted out:
The Jevons paradox is an economic principle that states that increased efficiency can lead to increased consumption of a resource
Writer M.G. Siegler questions this stance:
Already, we’re seemingly getting a memo sent around to bring up Jevons paradox — the notion that an increase in efficiency leads to an increase in consumption. Translation: DeepSeek is great because it will lift all boats as AI can scale faster and further. Sure, at the highest level that’s undoubtedly true! But the details matter. Microsoft is about $80B deep in the weeds this year. The CapEx numbers that Nadella has been so busy touting may have just become an albatross around their neck. And if that’s the case, he’s lucky (and perhaps prescient) to offload OpenAI’s ‘Stargate’ Project spend to Oracle and others.
Meanwhile, where all of this leaves NVIDIA — the current king of the hill thanks to being the key company at the center of all of this spend — is either catastrophic or ultimately okay. The revenue ramp was already slowing, as it must given the law of large numbers, but if all of Big Tech decides to slash their CapEx at once… NVIDIA’s stock price may suffer a heart attack. And again, that would ripple through the entire market. But in the Jevons paradox equation above, NVIDIA would ultimately be fine as they’d undoubtedly remain a key provider of the underlying technology now being used at greater scale (albeit with lower individual entity spend).
The others in Big Tech will make a similar case: that all of the data centers built will still be needed as AI goes global, but the jury remains out in terms of technology depreciation in this world and future. One big question: as we shift from a world of pre-training to a world of inference, are the same servers and chips going to be just as useful/good versus racks built specifically for that purpose? And DeepSeek’s breakthroughs just made all of that even more complicated. And while yes, just as in past booms, the build outs ended up being crucial for the future, those that spent on those build-outs usually didn’t fare as well…
The real problem is that it won’t be so simple to simply pull back spend. Beyond a lot of it already being committed, there’s obviously still a very real risk that DeepSeek is just a blip on the radar and not the bomb that blows up everything. So none of Big Tech can really afford, quite literally, to let their feet off the gas just yet. Instead, they’re going to have to try to recreate and study the methods the group used to create their models and see how replicable and scalable it is.
Source: AI, Uh, Finds a Way
Speaking of NVDIA, since we all own a ton of it, the great Ben Thompson wrote this about that company in his newsletter this morning
I own Nvidia! Am I screwed?
There are real challenges this news presents to the Nvidia story. Nvidia has two big moats:
CUDA is the language of choice for anyone programming these models, and CUDA only works on Nvidia chips.
Nvidia has a massive lead in terms of its ability to combine multiple chips together into one large virtual GPU.
These two moats work together. I noted above that if DeepSeek had access to H100s they probably would have used a larger cluster to train their model, simply because that would have been the easier option; the fact they didn’t, and were bandwidth constrained, drove a lot of their decisions in terms of both model architecture and their training infrastructure. Just look at the U.S. labs: they haven’t spent much time on optimization because Nvidia has been aggressively shipping ever more capable systems that accommodate their needs. The route of least resistance has simply been to pay Nvidia. DeepSeek, however, just demonstrated that another route is available: heavy optimization can produce remarkable results on weaker hardware and with lower memory bandwidth; simply paying Nvidia more isn’t the only way to make better models.
That noted, there are three factors still in Nvidia’s favor. First, how capable might DeepSeek’s approach be if applied to H100s, or upcoming GB100s? Just because they found a more efficient way to use compute doesn’t mean that more compute wouldn’t be useful. Second, lower inference costs should, in the long run, drive greater usage.
Third, reasoning models like R1, derive their superior performance from using more compute. To the extent that increasing the power and capabilities of AI depend on more compute is the extent that Nvidia stands to benefit!
Still, it’s not all rosy. At a minimum DeepSeek’s efficiency and broad availability cast significant doubt on the most optimistic Nvidia growth story, at least in the near term. The payoffs from both model and infrastructure optimization also suggest there are significant gains to be had from exploring alternative approaches to inference in particular. For example, it might be much more plausible to run inference on a standalone AMD GPU, completely sidestepping AMD’s inferior chip-to-chip communications capability. Reasoning models also increase the payoff for inference-only chips that are even more specialized than Nvidia’s GPUs.
In short, Nvidia isn’t going anywhere; the Nvidia stock, however, is suddenly facing a lot more uncertainty that hasn’t been priced in. And that, by extension, is going to drag everyone down.
Source: DeepSeek FAQ
Did it really cost $6m?
Many people are also questioning the true cost of the model DeepSeek put out, they say they trained their model for $6m. However their own whitepaper, which is where the number comes from, says : Note that the aforementioned costs include only the official training of DeepSeek-V3, excluding the costs associated with prior research and ablation experiments on architectures, algorithms, or data.”
These are big numbers to leave out and the $6m is not something to be taken seriously because no company can build a model for that amount. The “prior research and ablation experiments on architectures, algorithms, or data” is likely tens if not hundreds of millions of dollars.
There is also the question of the true number of NVIDIA chips DeepSeek used, with some saying it is closer to 50,000 NVDIA Hooper GPUs, not 10,000 of the cheaper less capable A100s. Admitting they used these chips would be admitting they got them despite the sanctions imposed on China.
While these are important to factor into any analysis, what DeepSeek has done is create an inflection point. Their way of training their model shows true innovation that will have major implications for every company in this space. If you can now run high end AI models on less powerful devices, it opens up a new range of possibilities allowing more to be done with less money being spent. This could be why Apple’s stock has reacted positively so far to this development. You can see a future where you can run high end AI models on Apple devices, perhaps even some future iPhones.
One caveat to note; DeepSeek’s model used advanced techniques like data-efficient training, rigorous parameter tuning, and extensive manual intervention to adapt to the constraints of less powerful hardware. While this approach drastically reduced costs it also introduces significant scalability challenges. The reliance on bespoke optimizations means that reproducing or scaling this method to train larger models or handle diverse tasks would require a high level of expertise and potentially an unsustainable amount of manual effort. In contrast, our labs like OpenAI and Google have developed highly automated pipelines that rely on expansive infrastructure and powerful hardware, enabling them to scale up model training efficiently. How much this matters, remains to be seen but should not be discounted.
So… Where Does This Leave Us?
I’ve rarely seen so many thoughtful posts and articles come out about a tech development so quickly. It is all very head spinning. But if I take a step back, I cannot shake the suspicion that not much has truly changed for the long haul. Except for the fact that we may be racing toward what once seemed impossible at a faster pace. Companies like OpenAI and Microsoft have always spent billions without a clear short-term ROI payoff because they’re betting on building a “godlike” intelligence — or what they call artificial super intelligence (ASI)— capable of reaping an enormous share of global economic value. This latest breakthrough could speed that process along(if it’s indeed possible at all).
Now, in the short term will there be significant impacts on stocks of infrastructure companies like NVIDIA? Maybe, I would not be surprised if that stock trends down quite a bit for the next couple of quarters, but if you were bullish on AI last month, this shouldn’t shake your belief on its impact. And if that is the case then AI will need all the fuel and infrastructure and compute it can get. Infact, this is going to accelerate their need for it in the long term. So, I personally would not be selling any NVIDIA today.
As for smaller AI companies and startups, they too will only accelerate to where they were going before this development. For now they may benefit from the lowered costs of training models, but the further commoditization of the model layer will result in them becoming more like “CPG” (consumer-packaged goods) companies quicker. The savings they get from spending less on compute will go towards branding and distribution, sort of like how the large CPG companies function today. They are going to be selling a commodity and that comes with a host of new challenges. However I believe this was the case even before this past week’s developments. It’s just going to happen faster now.
One of the bigger issues this new model has thrown up is the impact of a China based open-source model being the default way global companies build their AI applications on, and that is something the US needs to contend with. We have our own open-source models (like Meta’s Llama) but those are not as cheap at DeepSeek and that is an issue. We have already seen what commoditization in manufacturing has done to our ability to produce vital equipment and goods. If China does the same in AI, the consequences could be serious.
So, a lot has changed, but maybe not really that much has, is my current take. Of course, that’s not what is going to drive views and attention, so we are going to hear a lot about this development for a bit. I say let’s take a deep breath and see what the next few months bring before completely altering our entire view of where things are headed in the long run. It promises to be wild (and likely painful from a stock price standpoint for some) as we sort through all of it.
More by me: faraazahmed.com