Links for 2025-01-21

Alexander Kruel

Jan 21, 2025

DeepSeek-R1: Base → RL → Finetune → RL → Finetune → RL

DeepSeek-R1 is here. Performance on par with OpenAI-o1. Fully open-source.

Technical report: https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf

Try it today for free: http://chat.deepseek.com

The coolest part of DeepSeek-R1: Thinking time, self-reflection, and exploration behaviors are emergent properties.

The model learns on its own with RL that thinking longer makes you smarter

“This moment is not only an "aha moment" for the model but also for the researchers observing its behavior.”

Also: R1 distilled into Qwen 1.5b beats Sonnet and GPT-4o on some math benchmarks.

AI:

Kimi k1.5: Scaling Reinforcement Learning with LLMs — An o1-level multi-modal model https://github.com/MoonshotAI/Kimi-k1.5/blob/main/Kimi_k1.5.pdf
Evolutionary search works outside of formally verifiable domains: Google presents Evolving Deeper LLM Thinking — The authors aim to develop a method that enables LLMs to improve their reasoning by leveraging inference time compute without requiring explicit problem formalization. This approach is particularly useful in scenarios where a solution evaluator is available, but the problem cannot easily be described formally. This process mirrors divergent thinking (exploring many ideas) and convergent thinking (selecting the best ideas), akin to intelligent problem-solving behavior. https://arxiv.org/abs/2501.09891
Self-playing Adversarial Language Game Enhances LLM Reasoning https://arxiv.org/abs/2404.10642
Inference Magazine — a new publication on AI progress. https://inferencemagazine.substack.com/
Five Recent AI Tutoring Studies https://www.lesswrong.com/posts/bs3yj8vLDKNnoa95m/five-recent-ai-tutoring-studies
DeepMind Expects Clinical Trials for AI-Designed Drugs This Year https://www.bloomberg.com/news/articles/2025-01-21/deepmind-expects-clinical-trials-for-ai-designed-drugs-this-year [no paywall: https://archive.is/EmZWZ]
Anthropic CEO Says AI Could Surpass Human Intelligence by 2027 — he wouldn't be surprised if Anthropic had more than 1 million chips powering its AI technology in 2026. https://www.wsj.com/livecoverage/stock-market-today-dow-sp500-nasdaq-live-01-21-2025/card/anthropic-ceo-says-ai-could-surpass-human-intelligence-by-2027-9tka9tjLKLalkXX8IgKA [no paywall: https://archive.is/n6Vya]
Diving into the Underlying Rules or Abstractions in o3's 34 ARC-AGI Failures https://substack.com/home/post/p-154931348
Infrastructure for AI Agents https://arxiv.org/abs/2501.10114
“In chain-of-thought, we're collapsing that beautiful many-dimensional vector and all its semantic meaning down into a single token after every forward pass. Why should LLMs have to squeeze their thoughts through a narrow channel of tokens? Why not use a continuous latent space for reasoning?” https://www.lesswrong.com/posts/D2Aa25eaEhdBNeEEy/worries-about-latent-reasoning-in-llms
GameFactory: Creating New Games with Generative Interactive Videos https://arxiv.org/abs/2501.08325
Do generative video models learn physical principles from watching videos? — “While we expect rapid advances ahead, our work demonstrates that visual realism does not imply physical understanding.” https://arxiv.org/abs/2501.09038
Inference Scaling and the Log-x Chart https://www.tobyord.com/writing/inference-scaling-and-the-log-x-chart
“We made a mistake in not being more transparent about OpenAI's involvement.” — OpenAI was actually the major (it seems) funder of the FrontierMath benchmark, and apparently had some of the problems and solutions during training (although it's not clear how many, and they say they didn't use them for training). https://www.lesswrong.com/posts/8ZgLYwBmB3vLavjKE/broader-implications-of-the-frontiermath-debacle

https://x.com/sama/status/1881258443669172470

Just assume that by end of 2025, at minimum, we'll have multimodal agentic phd AIs doing complex tasks for you on your computer at o1-mini speeds. I highly recommend people update their worldviews and company plans to assume this. This is not hype.

— Will Bryk, https://x.com/WilliamBryk/status/1881397292034654439

In the very big scheme of things and of course notwithstanding a lot of hard work in the past and near future, solving AI has turned out to be very conceptually simple and very computationally cheap.

— Miles Brundage, https://x.com/Miles_Brundage/status/1881377489169305704

https://x.com/polynoamial/status/1881039073558806617

https://x.com/nabla_theta/status/1881152615180046371

Brains and AI:

Humans and AI systems end up representing some stuff in remarkably similar ways https://www.biorxiv.org/content/10.1101/2024.12.26.629294v1
In some ways, the brain and language models process language in surprisingly similar ways https://www.nature.com/articles/s41467-024-46631-y
Predicting Human Brain States with Transformer https://arxiv.org/abs/2412.19814
Can AI Models Show Us How People Learn? Impossible Languages Point a Way. https://www.quantamagazine.org/can-ai-models-show-us-how-people-learn-impossible-languages-point-a-way-20250113/

Tech:

This MIT spinout wants to spool hair-thin fibers into patients’ brains https://techcrunch.com/2025/01/16/this-mit-spinout-wants-to-spool-hair-thin-fibers-into-patients-brains/
How to trick the immune system into attacking tumours https://www.nature.com/articles/d41586-025-00126-y [no paywall: https://archive.is/C8Fmt]
MIT’s Latest Bug Robot Is a Super Flyer. It Could One Day Help Bees Pollinate Crops. https://singularityhub.com/2025/01/17/mits-latest-bug-robot-is-a-super-flyer-it-could-one-day-help-bees-pollinate-crops/
A BCI controlling virtual fingers is used to fly a high-performance virtual quadcopter. BCIs could allow people with paralysis to eventually play multiplayer video games with gamers who use video game controllers. https://www.nature.com/articles/s41591-024-03341-8

Math:

1. A Lottery Drawing Included Four Consecutive Numbers. What Are the Odds? https://www.nytimes.com/2024/12/19/us/lottery-mega-millions-four-numbers.html [no paywall: https://archive.is/viwjX]

2. OpenAI o1 and deepseek DeepThink get this right, while Gemini 2.0 Flash Thinking Experimental and QwQ-32B-Preview both get it wrong.

As always, repeat the original game many times. Focus only on the subset for which the host reveals the ordinary goat. This subset isn't special with respect to where the car is. What happens if you don't switch? In 2/3 of the cases, you don't win the car. This is what you want.

o1 solution: https://chatgpt.com/share/678eb743-2e18-800c-8112-16fa8195637d

3. The Wednesday Sleeping Beauty Problem https://www.umsu.de/blog/2025/813

4. Logits, log-odds, and loss for parallel circuits https://www.lesswrong.com/posts/xFA2kstHifF9F2Fnm/logits-log-odds-and-loss-for-parallel-circuits

5. Probability theory as a physical theory points to superdeterminism https://arxiv.org/abs/1811.10992

Short SF:

The Gentle Romance, a story about living through the transition to utopia. https://press.asimov.com/articles/gentle-romance
Why-chain https://aleph.se/andart2/fiction/why-chain/

Politics:

IQ and Taleb's Wild Ride https://hereticalinsights.substack.com/p/iq-and-talebs-wild-ride
Chinese shipbuilder Hengli Heavy Industry is looking to hire 30,000 workers, building to 50,000, as it launches a new shipyard, completed in just 153 days on Changxing Island, spanning over 2m square metres of building space across 17 large workshops. https://www.lloydslist.com/LL1152261/Hengli-Heavy-looks-to-hire-30000-for-new-China-shipyard

Axis of Ordinary

Discussion about this post

Ready for more?