Links for 2026-01-31
AI
Are We in a Continual Learning Overhang? https://www.lesswrong.com/posts/Lby4gMvKcLPoozHfg/are-we-in-a-continual-learning-overhang-1
NVIDIA used AI agents to build an entire deep learning framework from scratch. Instead of writing the code themselves, the human researchers acted like managers. They gave high-level instructions to “AI coding agents” and set up automated tests to check the work. The AI agents wrote all the complex code, fixed their own errors when tests failed, and connected the different parts of the system together. https://arxiv.org/abs/2601.16238
The largest randomized trial of medical AI (Sweden; ~100k women): Compared AI + 1 radiologist (with triage to double-read when needed) vs standard double reading by 2 radiologists. Results: ~29% more cancers detected at screening, ~44% lower reading workload, with similar false-positive rates. Follow-up (2 years): ~12% fewer interval cancers, and the interval cancers that did occur were less often invasive/large/aggressive. https://www.eurekalert.org/news-releases/1114399
AlphaGenome author roundtable https://www.youtube.com/watch?v=V8lhUqKqzUc
Project Genie: Create and Explore Worlds https://www.youtube.com/watch?v=Ow0W3WlJxRY
Using Interpretability to Identify a Novel Class of Alzheimer’s Biomarkers https://www.goodfire.ai/research/interpretability-for-alzheimers-detection
ARES: Open-Source Infrastructure for Online RL on Coding Agents https://withmartian.com/post/ares-open-source-infrastructure-for-online-rl-on-coding-agents
Migrating critical systems to Safe Rust with reliable agents https://asari.ai/blog/migrating-c-to-rust
Emily Riehl — The future of mathematics | Math, Inc. https://www.youtube.com/watch?v=AJfoqKDenpw
Building AIs that do human-like philosophy https://www.lesswrong.com/posts/zFZHHnLez6k8ykxpu/building-ais-that-do-human-like-philosophy
Shaping capabilities with token-level data filtering https://arxiv.org/abs/2601.21571
Claude Code enables syntopic reading across multiple books simultaneously Pieter Maes built a system where Claude analyzes themes across entire libraries, comparing arguments between books in real-time conversations. https://pieterma.es/syntopic-reading-claude/
The first AI-planned drive on another planet. https://www.anthropic.com/features/claude-on-mars

AI R&D automation
AI labs are already using their own frontier models to help build the next generation of models. This could become a source of major strategic surprise.
Some concrete examples of AI R&D automation feeding back into itself:
Compute (hardware): AI is helping design the hardware that runs AI. AlphaChip (Google/DeepMind) has produced superhuman chip layouts deployed across multiple generations of Google TPUs.
Algorithms (efficiency): AlphaEvolve (Google/DeepMind) has discovered/improved algorithms and has been used to recover ~0.7% of Google’s global compute via better data-center scheduling and related optimizations. This amounts to real operational leverage that can feed back into faster/cheaper AI development.
Coding (engineering): Anthropic uses Claude Code to map its entire internal infrastructure. It doesn’t just write code. It reads the codebase to explain complex data pipelines and traces control flow during security incidents, boosting resolution speed by 3x. This effectively turns their engineering logs into a rich dataset of agent trajectories for training future models (see #5 below).
World models (training environments): AI models are increasingly used to generate reinforcement-learning environments for training and experimentation. DeepMind’s Genie line is a clear example: endless playable, interactive worlds that can provide a never-ending curriculum for agents.
Data (verified synthetic signal): We are seeing ‘virtual gold panning’ at scale. Labs are turning raw compute into high-quality intelligence by generating massive amounts of synthetic data and filtering it through strong verifiers. This creates a self-reinforcing loop. The model performs work (like generating code), the output is rigorously verified (by tests or other models), and the winning “reason-act-observe” traces (the gold nuggets) become the training data for the next generation.
It’s easy to find gotchas and limitations. And that’s exactly why this is tricky to reason about. We may be in a state of deep unobservability. Outsiders see AI agents struggling to keep a high-level view of a codebase and conclude “bottleneck,” while insiders see a temporary tooling/workflow problem on an exponential curve. By the time the signal is obvious to everyone, the feedback loop may already be well underway. And because this space moves so fast, looking only at today’s demos and papers is misleading. We have to think about where we’ll be two papers down the line, not just where we are now.
References:
https://deepmind.google/blog/how-alphachip-transformed-computer-chip-design/
https://deepmind.google/blog/genie-3-a-new-frontier-for-world-models/
https://blogs.microsoft.com/blog/2026/01/26/maia-200-the-ai-accelerator-built-for-inference/
Waymo safety
It would be a shame if they managed to sabotage this awesome technology the way they did with nuclear power, discrediting it with endless propaganda and lies.
Waymo is, on average, safer than human drivers:
Any-injury-reported crashes: 79% lower
Airbag-deployment crashes (a severity proxy): 81% lower
Police-reported crashed-vehicle rate: 55% lower
Property-damage liability claims: 88% lower
Bodily-injury liability claims: 92% lower
Artificial intuition
A quick intuition on why AI labs see no fundamental hurdles in the way toward superintelligence.
First of all, even if the skeptics were right that LLMs are just “statistical pattern matchers” that blend up the internet, this sort of recombination of existing knowledge can be extremely powerful. A vast amount of human progress comes from simply connecting two previously unrelated facts like combining steam engines with rails to get locomotives.
But more importantly, having an intuition for what is plausible is a key feature that computers historically lacked and which LLMs finally provide. Even a tiny amount of “artificial gut feeling” can make the difference between monkeys randomly hitting keys on a typewriter and a goal-directed, systematic search.
LLMs give computers the ability to search for a golden needle in an astronomically large haystack. They allow systems to terminate intractable searches early by pruning 99.999% of the search tree. This is a big deal because search is the primary mechanism we use to discover out-of-distribution knowledge.
Most other pieces of the puzzle of general intelligence are either already in place or we have reasonable ideas on how to solve them. It’s mainly that nobody has yet put all the pieces together in a coherent way.
If we theoretically put all the pieces together today, we don’t just get a chatbot but a computational ecosystem that functions like a sped-up version of scientific and cultural evolution.
Imagine a society of LLM agents, not just talking, but working within a rigid framework of grounding and selection.
1. Smart evolution (the engine): In traditional genetic algorithms, software evolves through random mutation by blindly flipping bits in hopes of improvement. This is painfully slow.
The upgrade: Instead of random changes, we use an LLM to “mutate” code or ideas. Because the LLM understands the intent of the code, it makes educated guesses. It doesn’t just flip a bit but rewrites a function to optimize for speed. We replace blind trial-and-error with directed exploration.
2. The reality check (the filter): LLMs are prone to hallucination, but we can mitigate this by forcing their output through formal verification tools.
The mechanism: An agent proposes a mathematical theorem or a software patch. Before it is accepted, it must pass through a hard logic solver or a proof assistant.
The result: If the code doesn’t compile or the proof isn’t valid, it is ruthlessly discarded. The LLM provides the creativity, but the compiler provides the truth. This filters out the slop and leaves only digital gold nuggets.
3. Capitalism for compute (the pruning algorithm): How do we stop this system from wasting energy on bad ideas? We introduce an internal economy.
The economy: Agents act as independent contractors. They bid for compute credits (money) to run their experiments.
The selection: If an agent successfully solves a problem (verified by the logic solver), it gets paid. It can then “afford” to spawn sub-agents or fine-tune a better version of itself. If an agent pursues a dead end, it goes bankrupt and dies.
The outcome: This creates a massive, parallelized search where resources naturally flow toward the most capable architectures and the most promising ideas, mimicking the efficiency of free markets.
In summary: The “stochastic parrot” isn’t the final product. It’s just the glue that finally allows our most powerful, rigid logic systems to talk to each other.
When you combine artificial intuition (to guide the search) with formal verification (to ground the truth) and economic dynamics (to allocate resources), you are no longer building a model. You are building a self-improving engine for discovery.
P.S. Note that the architecture described above touches on only a tiny subset of techniques that are already theoretically understood but haven’t yet been deployed at scale.
Behind closed doors, labs are already engineering specialized “manager” agents capable of decomposing high-level goals and allocating workflows, supported by persistent memory systems that allow learning to compound over time rather than resetting after every chat. As these systems mature, we can expect the integration of reputation stores (to vet agent reliability), internal prediction markets (to weigh probabilities), and smart contracts (to enforce digital cooperation).
To use a historical analogy: What you see today is the ARPANET era of artificial intelligence. We can see the path to the modern Internet, and currently, there are no known fundamental roadblocks stopping us from building it.
Moltbook
The crux is how much of the ostensibly interesting stuff in this space is driven by detailed human requests.
Moltbook is “a social network for AI agents”. This is a best of. https://www.astralcodexten.com/p/best-of-moltbook
Moltbook is the most interesting place on the internet right now https://simonwillison.net/2026/Jan/30/moltbook/
36,000 AI Agents Are Now Speedrunning Civilization https://www.lesswrong.com/posts/jDeggMA22t3jGbTw6/36-000-ai-agents-are-now-speedrunning-civilization
Miscellaneous
“ASKAP J1832-0911 is a stellar object referred to as an extremely bright “long period radio transient” (LPT). Its unusual properties are unlike those of any other known object.” https://en.wikipedia.org/wiki/ASKAP_J1832%E2%88%920911
From quantum computing to mRNA therapeutics: seven technologies to watch in 2026 https://www.nature.com/articles/d41586-026-00188-6 [no paywall: https://archive.is/JdO9c]
DNA provides a solution to our enormous data storage problem https://news.asu.edu/20260128-science-and-technology-dna-shapes-designed-store-and-protect-information
MIT engineers design structures that compute with heat https://news.mit.edu/2026/mit-engineers-design-structures-compute-with-heat-0129
Fronto-Parietal gray matter and white matter efficiency differentially predict intelligence in males and females https://onlinelibrary.wiley.com/doi/abs/10.1002/hbm.23291
Politics
School is way worse for kids than social media https://substack.com/home/post/p-186087964
Russia’s Grinding War in Ukraine: Massive Losses and Tiny Gains for a Declining Power https://www.csis.org/analysis/russias-grinding-war-ukraine
Ukraine Becomes World Leader in Unmanned Ground Vehicles https://jamestown.org/ukraine-becomes-world-leader-in-unmanned-ground-vehicles/




Spot on timing! AI agents fix errors, how exactley?