Links for 2025-01-28

Alexander Kruel

Jan 28, 2025

AI:

Towards General-Purpose Model-Free Reinforcement Learning https://arxiv.org/abs/2501.16142
Eagle 2: Building Post-Training Data Strategies from Scratch for Frontier Vision-Language Models https://arxiv.org/abs/2501.14818
Qwen2.5-Max: Exploring the Intelligence of Large-scale MoE Model https://qwenlm.github.io/blog/qwen2.5-max/
Mixture-of-Mamba: Enhancing Multi-Modal State-Space Models with Modality-Aware Sparsity — Mixture-of-Mamba reaches similar loss with just half of the FLOPs https://arxiv.org/abs/2501.16295
Demis Hassabis on how AI can revolutionize scientific discovery, and why we could be 5-10 years away from AGI. https://www.youtube.com/live/ICv03VysLaE?si=r44hKGCPDZKlVWAK&t=1097

AI politics:

Dario Amodei says 2026-2027 is the critical window in AI and if you're ahead then, the models start getting better than humans at everything including AI design and using AI to make better AI, so export controls to prevent DeepSeek keeping up with US companies are worth continuing with https://youtu.be/uvMolVW_2v0?si=SootHOAoaRGugtYK&t=864
President Trump says China's DeepSeek AI model is a "wake-up call" for American companies but it is a good thing for faster and cheaper methods of AI to be developed https://www.youtube.com/live/AitXub2TE5s?si=IWjs-xpm6xAocfaE&t=3218
Trump To Tariff Chips Made In Taiwan, Targeting TSMC https://uk.pcmag.com/computers-electronics/156458/trump-to-tariff-chips-made-in-taiwan-targeting-tsmc
Former NSA chief revolves through OpenAI's door https://responsiblestatecraft.org/former-nsa-chief-revolves-through-openai-s-door/
ChatGPT Gov is designed to streamline government agencies’ access to OpenAI’s frontier models. https://openai.com/global-affairs/introducing-chatgpt-gov/

https://x.com/sama/status/1884066337103962416

Yet another safety researcher has left OpenAI. https://x.com/sjgadler/status/1883928200029602236

DeepSeek R1 and Compute Spending

Epistemic status: Directionally correct. Would require more effort than I have time for to hash out robust and directly comparable estimates.

Is the increased efficiency of DeepSeek R1’s training and inference an unexpected breakthrough that calls into question massive spending on computing power? No. The cost of training and running large language models (LLMs) has been declining for years, largely due to rapid improvements in algorithms.

For instance, a recent experiment showed that GPT-2 could be trained in about 24 hours for only $672 on an 8×H100 GPU node—significantly less than the roughly $50,000 it cost to train in 2019. The cost of running LLMs has also dropped dramatically: in 2021, GPT-3 (then the only model achieving an MMLU score of 42) cost $60 per million tokens, whereas today, Llama 3.2 3B achieves the same score at $0.06 per million tokens, a 1,000-fold reduction in just three years. Even more striking, the cost of GPT-4-level intelligence has fallen 1,000× in the last 1.5 years.

Such algorithmic progress has been underway for decades. Take, for example, one family of algorithms designed to solve the maximum subarray problem: between 1970 and today, the time required to solve this problem for n = 10^6 inputs has declined by a factor of about 10^12—a reduction of 99.9999999999%. With today’s algorithms, an average 1994 desktop computer would have beaten the world chess champion.

Owing to advances like these, the level of compute required to achieve a given AI performance level has been estimated to halve roughly every eight months. In this context, DeepSeek’s R1 delivering a 27× improvement is a natural progression rather than a paradigm shift, especially considering that optimization tends to happen much faster with very new technologies like reasoning models than on average.

Despite this, the demand for computing power has grown at a phenomenal rate. Experts estimate that the computational power required for AI doubles every 100 days—an increase of roughly 1,220% per year.

In summary, anyone closely following AI developments has already factored in algorithmic and hardware optimizations. Thus, it was expected that a model like R1, emerging a few months after o1, would be cheaper to train and run. A more telling comparison would be between R1 and the forthcoming o3-mini.

P.S. While DeepSeek R1’s training cost is remarkably low, it’s important to consider the broader research and development context. The combined R&D budget for DeepSeek V3 and R1 is estimated to be around $100 million, underscoring the substantial resources behind this achievement.

https://x.com/sebkrier/status/1884270969486991499

Miscellaneous:

Images show China building huge fusion research facility, analysts say https://www.reuters.com/world/china/images-show-china-building-huge-fusion-research-facility-analysts-say-2025-01-28/ [no paywall: https://archive.is/pKovf]

Ukraine:

Russia's Ryazan oil refinery halts operations after drone strikes, sources say. Ryazan oil refinery processed 13.1 million metric tons (262,000 barrels per day), or almost 5% of Russia's total refining throughput in 2024. https://www.reuters.com/world/europe/russias-ryzan-oil-refinery-halts-operations-after-drone-strikes-sources-say-2025-01-27/ [no paywall: https://archive.is/TRbVb]
US and Germany foiled Russian plot to assassinate CEO of arms manufacturer sending weapons to Ukraine https://edition.cnn.com/2024/07/11/politics/us-germany-foiled-russian-assassination-plot/index.html

M Flood

Jan 29, 2025

Completely agree with your analysis of Deepseek R1 and Compute Spending. What Deepseek's achievement indicates is that there is as of yet no most for AI labs: everything they accomplish can, a few months or years later, be replicated. It's a common story across technology: it takes lots of effort and experimentation to get the next advance, but making the newly discovered advance more efficient is easier. In Peter Thiel's terms, it's more effortful to go from 0 to 1 than from 1 to N.

I'm expecting this quick catchup trend to start changing in the next 2-3 years as labs start being able to compound their gains: using AI to research AI. Then who has the most compute will begin to pull ahead of the pack, able to run bigger experiments, and more of them, in parallel to get to the next advance, and the ones after that. That's where the export controls are going to be invaluable.

Axis of Ordinary

Discussion about this post

Ready for more?