- This was an interesting article and gave me a new perspective from a source I believe to be reliable. Regardless, it is important to note that Dario Amodei is the founder of Anthropic, a company in direct competition with DeepSeek.
βExport controls serve a vital purpose: keeping democratic nations at the forefront of AI development.β
Three Dynamics of AI Development
- Scaling Laws: βGiven that all else equal, scaling up the training of AI systems leads to smoothly better results on a range of cognitive tasks, across the boardβ
- Essentially, more data and more training consistently leads to stronger models. For example, a 10M might solve 40%, $100M might solve 60%, and so on.
- βItβs worth noting that the scaling curve analysis is a bit oversimplified, because models are somewhat differentiated and have different strengths and weaknesses; the scaling curve numbers are a crude average that ignores a lot of details.β
- Shifting the Curve: Improvements in an architecture of a model (for example, a tweak in the Transformers architecture) can shift the scaling law curve or help a model run more efficiently on existing hardware. Essentially, innovation in hardware or model architecture will shift the curve, so for example: βif the innovation is a 2x βcompute multiplierβ (CM), then it allows you to get 40% on a coding task for 10M; or 60% for 100Mβ.
- βEvery frontier AI company regularly discovers many of these CMβs: frequently small ones (~1.2x), sometimes medium-sized ones (~2x), and every once in a while very large ones (~10x). Because the value of having a more intelligent system is so high, this shifting of the curve typically causes companiesΒ to spendΒ more, not less, on training models: the gains in cost efficiency end up entirely devoted to training smarter models, limited only by the companyβs financial resources.β
- Essentially, we make innovations to make AI cheap but overall, more money is still being spent as AI is not a single thing of constant quality.
- Ironically, in AP Microeconomics and AP Macroeconomics, this is essentially a shift in technology with shifts the Production Possibilities Curve outward and the Short-Run Aggregate Supply Curve rightward
- Shifting the Paradigm: Occasionally, the underlying thing that is being scaled also changes. For example, βfrom 2020-2023, the main thing being scaled was pretrained models: models trained on increasing amounts of internet text with a tiny bit of other training on top. In 2024, the idea of usingΒ Reinforcement Learning (RL) to train models to generate chains of thought has become a new focus of scalingβ
DeepSeek-V3
βDeepSeek-V3 was the real innovation and whatΒ shouldΒ have made people take notice a month ago (we certainly did). As a pretrained model, it appears to comeΒ close to the performance ofΒ state of the art US models on some important tasks, while costing substantially less to trainβ
However, βDeepSeek does not do for $6M what costs US AI companies billionsβ, but βDeepSeek produced a model close to the performance of US models 7-10 months older, for a good deal less cost (but not anywhere near the ratios people have suggested)β. Essentially, apparently DeepSeekβs energy efficiency gains fit into the trend of scaling laws and what US companies have been achieving for the last few yearsβand βDeepSeek-V3 is not a unique breakthrough or something that fundamentally changes the economics of LLMβs; itβs an expected point on an ongoing cost reduction curve.β
βWhatβs different this time is that the company that was first to demonstrate the expected cost reductions was Chinese.β
DeepSeek-R1
R1 was the model that was triggered an explosion of public attention but is apparently much less interesting from an innovation or engineering perspective, especially compared to V3. βIt adds the second phase of training β reinforcement learning, described in #3 in the previous section β and essentially replicates what OpenAI has done with o1β
Export Controls
βTo the extent that US labs havenβt already discovered them, the efficiency innovations DeepSeek developed will soon be applied by both US and Chinese labs to train multi-billion dollar models. These will perform better than the multi-billion models they were previously planning to train β but theyβll still spend multi-billions. That number will continue going up, until we reach AI that is smarter than almost all humans at almost all things.β
- This ties in to what was mentioned earlier within the curve shifting paradigm: any money saved by energy efficiency is still put into innovating for newer models
In the US, many companies will have the millions of chips that are required for AI research and development. If China gets the same, weβll live in a βbipolar worldβ where both countries have rapid advances in science and technology, but itβll still likely be unbalanced. βEven if the US and China were at parity in AI systems, it seems likely that China could direct more talent, capital, and focus to military applications of the technology. Combined with its large industrial base and military-strategic advantages, this could help China take a commanding lead on the global stage, not just for AI but for everything.β
βIf China canβt get millions of chips, weβll (at least temporarily) live in a unipolar world, where only the US and its allies have these models. Itβs unclear whether the unipolar world will last, but thereβs at least the possibility that, because AI systems can eventually help make even smarter AI systems, a temporary lead could be parlayed into a durable advantage. Thus, in this world, the US and its allies might take a commanding and long-lasting lead on the global stage.β
- Amodei champions for well-enforced export controls on outgoing chips to help maintain this unipolar society: βIt appears that a substantial fraction of DeepSeekβs AI chip fleet consists of chips that havenβt been banned (but should be); chips that were shipped before they were banned; and some that seem very likely to have been smuggled. This shows that the export controls are actually working and adapting: loopholes are being closed; otherwise, they would likely have a full fleet of top-of-the-line H100βs. If we can close them fast enough, we may be able to prevent China from getting millions of chips, increasing the likelihood of a unipolar world with the US ahead.β
Given my focus on export controls and US national security, I want to be clear on one thing. I donβt see DeepSeek themselves as adversaries and the point isnβt to target them in particular. In interviews theyβve done, they seem like smart, curious researchers who just want to make useful technology.
But theyβre beholden to an authoritarian government that has committed human rights violations, has behaved aggressively on the world stage, and will be far more unfettered in these actions if theyβre able to match the US in AI. Export controls are one of our most powerful tools for preventing this, and the idea that the technology getting more powerful, having more bang for the buck, is a reason to lift our export controls makes no sense at all.
Also wrote about Dario Amodei on Dario Amodei β Lex Fridman Podcast