Links for 2025-11-27

Nov 27, 2025

DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning

The authors build a system that can both generate mathematical proofs and critically evaluate them. Starting from a large base model, they first train a dedicated proof verifier that scores solutions on a three-point scale (0 / 0.5 / 1) according to expert rubrics, identifying concrete issues in a proof rather than just checking a final answer. Using this verifier as a reward model, they then train a proof generator that is incentivized to produce high-scoring, rigorously justified proofs. With scaled test-time compute (large numbers of samples and refinement rounds), the resulting model attains gold-medal–level performance on IMO 2025 and CMO 2024, and scores 118/120 on Putnam 2024, surpassing the best human contestant.

Instead of only rewarding correct final answers, the training process explicitly rewards self-criticism. The generator is prompted to produce both a solution and a self-evaluation in the same format as the verifier. It gets higher reward not just for correct proofs, but also for honestly recognizing remaining gaps and aligning its self-score with the verifier’s score. This effectively trains the model to “doubt itself” and refine its reasoning, similar to how a mathematician iteratively improves a proof until no further issues can be found.

To scale training data without relying on humans, the authors introduce an automated labeling pipeline. Each candidate proof is subjected to multiple, independent verification passes. For any analysis that claims errors, a meta-verifier (a second model trained to judge the quality of verifier feedback) checks whether those alleged defects really exist and whether the assigned score makes sense. If several independent analyses, validated by meta-verification, agree on the lowest score, that score is assigned to the proof. If no valid issues can be confirmed across many verification attempts, the proof is labeled correct. Otherwise, ambiguous cases are discarded or (in earlier iterations) sent to experts. In the final iterations, this pipeline fully replaces human annotation.

The meta-verifier thus acts as a “grader for the grader”: it ensures that the verifier is not hallucinating errors or misrepresenting the solution, checks that cited defects actually appear in the proof, and enforces consistency between the described defects and the final numeric score. Combined with reinforcement learning and iterative refinement, this creates a feedback loop in which generation and verification continually improve each other, pushing towards genuinely self-verifiable mathematical reasoning.

Paper: https://github.com/deepseek-ai/DeepSeek-Math-V2

More:

“This is my first fully automated, LLM-generated and auto-formalized proof of a new mathematical theorem.” https://x.com/nasqret/status/1992928119632593003
How GPT-5 helped mathematician Ernest Ryu solve a 40-year-old open problem https://openai.com/index/gpt-5-mathematical-discovery/

Opus 4.5

Anthropic releases Claude Opus 4.5

Anthropic asked 18 staff members to estimate the productivity boost they get from using Opus 4.5 + Claude Code:

50% (9/18) reported productivity improvement of at least 100%(!)
Mean productivity improvement was 220%(!)
11% (2/18) characterized the model as a “near-complete entry-level researcher replacement” (with caveats)
most researchers would prefer losing access to Opus 4.5 to losing access to Claude Code (i.e., the harness remains more important than the model)

On SWE-bench Verified at medium effort, Opus 4.5 beats Sonnet 4.5 while using 76% fewer output tokens.

Within our prescribed 2-hour time limit, Claude Opus 4.5 scored higher than any human candidate ever.

Racing straight towards ASI

It’s pretty clear now that we’re at a point of no return with AI.

AI has become too big to fail for the Trump administration at this point. AI might have accounted for half the GDP growth of the entire country in the first six months of the year.
The labs see no walls and are ASI-pilled. They’re racing each other to release new SOTA models.
China is racing to build its own semiconductor ecosystem. There are over 5,000 AI companies in China and over 450,000 registered smart robot firms.

The course of history is now pretty much set in stone, barring unexpected developments.

More:

OpenAI, Anthropic, and Google got access to petabytes of proprietary data. The data is coming from the 17 National Laboratories, which have been hoarding experimental data for decades. The US Government’s new Genesis Mission is officially building autonomous scientific agents. https://www.whitehouse.gov/presidential-actions/2025/11/launching-the-genesis-mission/
How the U.S. Economy Became Hooked on AI Spending https://www.wsj.com/tech/ai/how-the-u-s-economy-became-hooked-on-ai-spending-4b6bc7ff [no paywall: https://archive.is/4COnU]
Amazon to invest up to $50 billion to expand AI and supercomputing infrastructure for US government agencies https://www.aboutamazon.com/news/company-news/amazon-ai-investment-us-federal-agencies
Within two years, the largest data centers may use more power than major cities. Microsoft’s Fairwater Wisconsin campus will have a peak electrical load of 3.3 GW when its fourth building comes online in late 2027. (Los Angeles used 2.4 GW on average in 2023) https://epochai.substack.com/p/microsofts-fairwater-datacenter-will
AI startup you haven’t heard much about raises money to build a 2GW datacenter https://lumalabs.ai/blog/news/series-c
How Alibaba overcame Beijing’s crackdown to become an AI giant https://www.cnbc.com/2025/11/24/how-alibaba-overcame-beijings-crackdown-to-become-an-ai-giant.html
Ilya Sutskever ASI timeline: somewhere between 2030 and 2045 https://www.dwarkesh.com/p/ilya-sutskever-2

AI agents

OmniScientist: Toward a Co-evolving Ecosystem of Human and AI Scientists https://arxiv.org/abs/2511.16931
InfCode: Adversarial Iterative Refinement of Tests and Patches for Reliable Software Issue Resolution https://arxiv.org/abs/2511.16004
Effective harnesses for long-running agents https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents
Latent Collaboration in Multi-Agent Systems https://arxiv.org/abs/2511.20639
General Agentic Memory Via Deep Research https://arxiv.org/abs/2511.18423
Multi-Agent Evolve: LLM Self-Improve through Co-evolution https://arxiv.org/abs/2510.23595
Latent Collaboration in Multi-Agent Systems https://arxiv.org/abs/2511.20639
Matrix: Peer-to-Peer Multi-Agent Synthetic Data Generation Framework https://arxiv.org/abs/2511.21686

AI research and safety

CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning https://arxiv.org/abs/2511.18659
MIT scientists debut a generative AI model that could create molecules addressing hard-to-treat diseases https://news.mit.edu/2025/mit-scientists-debut-generative-ai-model-that-could-create-molecules-addressing-hard-to-treat-diseases-1125
Despite benchmarks ostensibly measuring different skills (coding, math, reasoning, etc.), model performance is dominated by a single underlying dimension. Models that are good at one task tend to be good at almost all others. https://epochai.substack.com/p/benchmark-scores-general-capability
What One Cannot, Two Can: Two-Layer Transformers Provably Represent Induction Heads on Any-Order Markov Chains https://arxiv.org/abs/2508.07208
Reasoning Models Sometimes Output Illegible Chains of Thought https://www.lesswrong.com/posts/GKyyYCs8n2goDcAe2/reasoning-models-sometimes-output-illegible-chains-of
Subliminal Learning Across Models https://www.lesswrong.com/posts/CRn9XtGoMtjnb5ygr/subliminal-learning-across-models
Evaluating Select Global Technical Options for Countering a Rogue AI https://www.rand.org/pubs/perspectives/PEA4361-1.html
Why AI Safety Won’t Make America Lose The Race With China https://www.astralcodexten.com/p/why-ai-safety-wont-make-america-lose
A critical disconnect is emerging between the exponential acceleration of technology and the static limits of human cognition and biology. AI could function as a necessary “time machine,” bridging this gap by enabling us to process information and adapt at the speed of the future we are creating. https://www.aipolicyperspectives.com/p/time-machines

More AI news

Gemini 3 Pro Is a Vast Intelligence With No Spine https://www.lesswrong.com/posts/REWPGibonsu3C5xhb/gemini-3-pro-is-a-vast-intelligence-with-no-spine
ChatGPT 5.1 Codex Max https://www.lesswrong.com/posts/YMFYQpsY2MGbXKPtS/chatgpt-5-1-codex-max
Documentary: The Thinking Game takes you on a journey into the heart of DeepMind, capturing a team striving to unravel the mysteries of intelligence and life itself. https://www.youtube.com/watch?v=d95J8yzvjbQ
Sundar Pichai: Gemini 3, Vibe Coding and Google’s Full Stack Strategy https://www.youtube.com/watch?v=iFqDyWFuw1c
MIT study finds AI can already replace 11.7% of U.S. workforce https://www.cnbc.com/2025/11/26/mit-study-finds-ai-can-already-replace-11point7percent-of-us-workforce.html

The “Offline” IQ quiz is a test made by a Mensa member that has never been on the public internet, and is in no Al training data. https://trackingai.org/home

Computer science

Algorithmic pricing can harm consumers through “reasonable” autonomous behaviors that current antitrust frameworks are ill-equipped to handle. https://www.quantamagazine.org/the-game-theory-of-how-algorithms-can-drive-up-prices-20251022/
“…if you think Bitcoin, and SSL, and all the other protocols based on Shor-breakable cryptography, are almost certainly safe for the next 5 years … then I submit that your confidence is also unwarranted.” https://scottaaronson.blog/?p=9344
A Technical Introduction to Solomonoff Induction without K-Complexity https://www.lesswrong.com/posts/HSDumToH57nSRdLST/a-technical-introduction-to-solomonoff-induction-without-k

Neuroscience

Mind-reading devices can now predict preconscious thoughts: is it time to worry? https://www.nature.com/articles/d41586-025-03714-0 [no paywall: https://archive.is/EYl7w]
Brain has five ‘eras’, scientists say – with adult mode not starting until early 30s https://www.theguardian.com/science/2025/nov/25/brain-human-cognitive-development-life-stages-cambridge-study
Neuroscience of human social instincts: a sketch https://www.lesswrong.com/posts/kYvbHCDeMTCTE9TAj/neuroscience-of-human-social-instincts-a-sketch

Science

While the statistic that human beings share roughly 99.9% of their DNA sequence is technically correct, it is frequently “incorrectly interpreted” by the public and policymakers to imply that biological differences between human populations are negligible or nonexistent. https://www.aporiamagazine.com/p/are-human-populations-999-identical
Organic geochemical evidence for life in Archean rocks identified by pyrolysis–GC–MS and supervised machine learning https://www.pnas.org/doi/10.1073/pnas.2514534122
University of Tennessee physicists have published a new study detailing three key discoveries that help explain the creation of heavy elements like gold during stellar events. The research provides the first measurement of neutron energies for beta-delayed two-neutron emission and challenges traditional models of how exotic nuclei decay. https://physics.utk.edu/one-experiment-three-discoveries/

Technology

Texas A&M researchers pioneer cryopreservation method to prevent organ cracking https://stories.tamu.edu/news/2025/09/17/texas-am-researchers-pioneer-cryopreservation-method-to-prevent-organ-cracking/
Surface-Only Superconductor Is the Strangest of Its Kind https://tu-dresden.de/tu-dresden/exzellenz/news/neuer-oberflaechensupraleiter-der-merkwuerdigste-seiner-art?set_language=en
NATO’s is dangerously unaware that its military edge is slipping https://www.lesswrong.com/posts/XN54FwHu2aBQbqcdM/nato-is-dangerously-unaware-of-its-military-vulnerability
How good are Chinese CPUs? Benchmarking the Loongson 3A6000 https://lemire.me/blog/2025/11/23/how-good-are-chinese-cpus-benchmarking-the-loongson-3a6000/

Ukraine

EU warmongers want the war to continue till the last Ukrainian.

— Kirill Dmitriev

The honest answer should be, “yes, of course!” We’re not losing any soldiers, while the Russians are dropping like flies. Meanwhile, NATO has been expanded by two countries since 2022. It’s a great bang for our buck.

Due to Russia’s low fertility rate, every Russian death means the loss of someone who will never return. That’s one fewer person to threaten us. Each Russian death shifts the balance of power in Europe in our favor. This is a historic opportunity to weaken a long-term enemy without putting our own lives at risk.

Note here that we’re not forcing anyone to die. If the Ukrainians don’t like it, they can surrender. And if the Russians don’t want to keep dying, they can leave Ukraine.

But Ukrainians won’t surrender. This is something people like Victor Orbán cannot fathom. Why would anyone fight and die to prevent Russian occupation? He would surrender his nation and its people on day one of a Russian invasion.

The answer can be guessed from Alexander Dugin’s plan for Ukraine, which he outlined on his Telegram channel: banning the Ukrainian language and completely erasing Ukrainian culture, with reeducation camps for everyone.

What Dugin’s plan would mean in practice could be witnessed in February 2022 in Ukraine’s occupied territories when Russia forcibly mobilized hundreds of thousands of Ukrainian men and sent them into meat waves against their own people. In other words, Ukrainians don’t have a choice. There is no choice between fighting and dying or accepting subjugation. They must either die fighting or have their children brainwashed to fight and die for Russia in its future wars.

P.S. Alexander Dugin is the architect of everything that has happened over the past few decades. In his 1997 book, he wrote that after dismembering Georgia and causing the UK to leave the EU, Russia should annex Ukraine and cause instability within the United States by undermining internal political processes while supporting isolationist tendencies.

Russophobia

The idea of a European ‘Russophobia’ is an absolutely ridiculous lie that Putin is telling the Russian people.

In 2018, more Germans considered Russia to be a reliable partner than the United States. We tried everything to befriend Russia. Through Nord Stream, we continued to build economic interdependence between Germany and Russia. German defense companies, such as Rheinmetall, built Russian training camps. Since the 1970s, German politicians have done everything to improve relations with Russia. It went as far as collaborating with a former East German state security officer and longtime Putin confidant to found a shell foundation posing as a climate project to evade American sanctions against Russia.

The largest protests in Germany were directed against America, specifically in connection with both the Vietnam and Iraq wars. The hatred against America is clearly greater. Russia could have easily exploited this. Instead, they squandered this advantage. There’s no way to sugarcoat it.

Countries such as Finland and Sweden were initially reluctant to join NATO, but Russia’s actions prompted them to reconsider.

Starting the biggest war since World War II for no good reason has consequences. And, yes, there really was no good reason. Nobody cared about Ukraine. Europe didn’t even care about the annexation of Crimea. No one would have supported Ukraine. In fact, the West made sure they destroyed all their most powerful weapons. This was overseen by people like Obama, who mocked Mitt Romney for calling Russia a threat. With Zelenskyy, Ukraine elected a peace dove who won against a hardliner. And yet, Russia invaded. For many people, except the most treasonous Russophiles, this destroyed any trust in Russia.

Putin has done more to harm Russia than even Russia’s worst enemies could have dreamed of.

The irony here is that Donald Trump is trying to undo this massive American advantage Russia gave him by promoting a pro-Russian party in Germany and trying to re-establish economic relations with Russia. All of this against the interests of a country that was attacked because it voluntarily sought to join America’s sphere of influence.

P.S. To be clear, I would love to have good relations with Russia and also China. It’s a shame that we have to put up with an absolute assclown like Donald Trump. But there is just no way for me to justify having relations with a country run by a Koran-kissing lunatic that attacks a white, pro-Western, Christian nation. And I have seen little reason to trust the Chinese so far. It’s a comically bad situation where America is run by DC villains while the rest of the world is stuck in a middle-aged imperial mindset.

Axis of Ordinary

Discussion about this post

Ready for more?