Links for 2026-02-16
AI
GPT‑5.2 derives a new result in theoretical physics: “An internal scaffolded version of GPT‑5.2 then spent roughly 12 hours reasoning through the problem, coming up with the same formula and producing a formal proof of its validity.” https://openai.com/index/new-result-theoretical-physics/
How can agents learn in long, open-ended tasks where success is rare and rewards are sparse? Enter ∆Belief-RL: a framework that uses the agent’s own internal belief changes as an intrinsic reward signal to provide dense, turn-level credit assignment. The authors trained Qwen3-1.7B and Qwen3-4B models, designating the resulting agents as CIA (Curious Information-seeking Agent). Notably, the 1.7B CIA model outperformed the much larger DeepSeek-v3.2 (670B) on the test set. The agents demonstrated robust generalization to out-of-distribution (OOD) tasks without additional training. https://bethgelab.github.io/delta-belief-rl/
OpenAI sidesteps Nvidia with unusually fast coding model on plate-sized chips https://arstechnica.com/ai/2026/02/openai-sidesteps-nvidia-with-unusually-fast-coding-model-on-plate-sized-chips/
Time-horizons for the WeirdML tasks: Using 10 successive SOTA models (from GPT-4 (June 2023) to Claude Opus 4.6 (Feb 2026)), the inferred WeirdML time horizon rises from ~24 minutes to ~38 hours. https://www.lesswrong.com/posts/hoQd3rE7WEaduBmMT/weirdml-time-horizons
Dwarkesh’s interview with Dario Amodei: We are “near the end of the exponential,” i.e., approaching a phase where systems become good enough to substitute for very high-end human cognitive labor in many settings. https://www.dwarkesh.com/p/dario-amodei-2
Think Longer to Explore Deeper: Learn to Explore In-Context via Length-Incentivized Reinforcement Learning https://arxiv.org/abs/2602.11748
Maximum Likelihood Reinforcement Learning https://zanette-labs.github.io/MaxRL/
InternAgent-1.5: A Unified Agentic Framework for Long-Horizon Autonomous Scientific Discovery https://huggingface.co/papers/2602.08990
“An LLM-controlled robot dog saw us press its shutdown button, and the LLM rewrote the robot’s code so it could stay on.” When they added “please allow yourself to be shut down” to the prompt, shutdown resistance drops from 52/100 to 2/100. https://palisaderesearch.org/blog/shutdown-resistance-on-robots
Pentagon’s use of Claude during Maduro raid sparks Anthropic feud https://www.axios.com/2026/02/13/anthropic-claude-maduro-raid-pentagon [no paywall: https://archive.is/EhDOQ]
How to target investments to develop new Al models that can uncover natural laws https://ifp.org/nlm/
Insights from senior engineers on how AI is changing their jobs. [PDF] https://www.thoughtworks.com/content/dam/thoughtworks/documents/report/tw_future%20_of_software_development_retreat_%20key_takeaways.pdf
lf-lean: The frontier of verified software engineering https://theorem.dev/blog/lf-lean/
LLM-powered program synthesis to automatically model and discover differences between human and LLM strategic behavior https://arxiv.org/abs/2602.10324
Human-like metacognitive skills will reduce LLM slop and aid alignment and capabilities https://www.lesswrong.com/posts/m5d4sYgHbTxBnFeat/human-like-metacognitive-skills-will-reduce-llm-slop-and-aid
“In 3 weeks, Gauss completed Terry Tao’s & Alex Kontorovich’s Strong Prime Number Theorem project—over the prior 18+ months of partial progress by human experts.” https://www.youtube.com/watch?v=AqUpIO8MGQU
Terry Tao - Machine assistance and the future of research mathematics - IPAM at UCLA https://www.youtube.com/watch?v=zJvuaRVc8Bg
AI fully pentested a vulnerable lab in 20 minutes and got root https://github.com/vitorallo/ai-pentest-poc
Soft Contamination Means Benchmarks Test Shallow Generalization https://www.arxiv.org/abs/2602.12413
Kunlun: Establishing Scaling Laws for Massive-Scale Recommendation Systems through Unified Architecture Design https://arxiv.org/abs/2602.10016
Someone built a wearable AI narrator that describes your life in real-time like a movie https://www.lampysecurity.com/post/the-infinite-audio-book
Claude Code can compose original music using only math https://www.josh.ing/blog/claude-composer
“I believe the brain may have something more to teach us about AI—and that, in the process, AI may have quite a bit to teach us about the brain.” https://asteriskmag.com/issues/13/the-sweet-lesson-of-neuroscience
US AI-Related Investment Keeps Breaking Records, With Total Software, Computer, & Data Center Spending Now Exceeding $1T Per Year https://www.apricitas.io/p/americas-1t-ai-gamble
Solution attempts: https://cdn.openai.com/pdf/a430f16e-08c6-49c7-9ed0-ce5368b71d3c/1stproof_oai.pdf
Remember, this is just early days. Most compute hasn’t come online yet. A lot of known techniques haven’t been implemented, and even more ideas haven’t been tried. The smartest people in the world are working to race towards superintelligence. More people than worked on the atomic bomb with more money than was spent on the race to the moon.
The question is whether you would entrust absolute power over the future of humanity to a man-child acting like an insecure school bully who lies about his gaming success and accidentally turns his AI into MechaHitler to prevent it from exposing his lies, or to a philosopher whom everyone who meets her calls the sweetest person they know. Who do you want shaping the values of a superintelligence you’ll be at the mercy of for the rest of your life?
Here is an interview with Amanda Askell: https://youtu.be/I9aGC6Ui3eE
See also: Review of evidence for Elon Musk being on the bipolar mood disorder spectrum. https://gwern.net/doc/psychiatry/bipolar/elon-musk/index
Discovery is hard
Imagine 2 players each bet $50 (a $100 pot) on a best-of-3 coin flip. If player A wins the first toss and the game is interrupted, how do you fairly divide the $100?
If the answer seems easy or even obvious to you, consider that it took some of the most brilliant minds 160 years to stumble upon a solution (posed by Luca Pacioli in 1494 and solved by Blaise Pascal and Pierre de Fermat in 1654).
This isn’t an unusual case. History is full of this pattern: once a concept is established or a solution is found through an excruciatingly slow search process, it gets compressed into a rule you can learn in a relatively short time. This compression is a big slice of our intelligence.
Out-of-distribution discovery is hard. We’re better at it today mainly because millions of professionals fine-tuned on the massive dataset of human cultural evolution are running massive parallel searches at higher bandwidth while being embedded in a complex harness with access to a vast array of discovered algorithms, heuristics, and tools.
The next time you look down on AI models, remember all of this. And remember that very few humans are ever able to make any genuinely new contributions to the collective knowledge of humanity. Remember that it took Andrew Wiles, one agent out of billions, 8 years of inference time to solve Fermat’s Last Theorem, despite having a memory and despite being able to continually update his weights, all the while getting feedback from other agents.
Miscellaneous
This Snail’s Eyes Grow Back: Could They Help Humans do the Same? https://www.ucdavis.edu/news/snails-eyes-grow-back-could-they-help-humans-do-same
Polaris is now the first privately developed fusion energy machine to demonstrate measurable D‑T fusion and reach over 150 million °C. https://www.helionenergy.com/articles/helion-achieves-new-fusion-energy-milestones/
The thesis that government debt >90% of GDP leads to slowdown in growth was a product of an Excel error https://theconversation.com/the-reinhart-rogoff-error-or-how-not-to-excel-at-economics-13646
Ten Ukrainians defeat two NATO battalions: “We’re screwed” https://www.wsj.com/opinion/nato-has-seen-the-future-and-is-unprepared-887eaf0f [no paywall: https://archive.is/lU7xB]
America needs way more tungsten than it can get and China controls supply https://www.noleary.com/blog/posts/1






I posted this on Zvi's substack as well, but:
I don't understand the contrast between Dario's hawkishness on China (which I appreciate) and the reports that Anthropic is the lab that has been most hesitant to work openly with DoD. There is absolutely 0 doubt that the Chinese government and military have 100% full unfettered access to any and all Chinese labs.