Links for 2025-12-14
Intelligence too cheap to meter
We’re heading towards a future in which intelligence is too cheap to meter.
“GPT-4-class” inference prices have fallen by ~1000× over ~2 years.
ARC Prize reports ~390× lower cost per task for near-SOTA ARC-AGI-1 performance over ~1 year (o3 preview: ~88% at est. ~$4.5k/task → GPT-5.2 Pro: 90.5% at $11.64/task).
GPT-5.1 sometimes needs up to ~2× more output tokens than GPT-5.2 for similar SWE-Bench performance; and on ARC-AGI-2 up to ~2.7× more tokens for similar performance.
If these trends continue, then once we reach human-level AI, the marginal cost of “white-collar cognition” could rapidly become orders of magnitude cheaper than human labor.
Sources:
AI for offensive cybersecurity
There has been a significant “capability shift” in late 2025, where AI agents have moved from theoretical risks to practically effective tools in offensive cybersecurity.
A new Stanford-led study tested AI agents against 10 professional penetration testers on a live enterprise environment (a large university network of ~8,000 hosts across 12 subnets). Their agent scaffold, ARTEMIS, placed 2nd overall, found 9 valid vulnerabilities, achieved an 82% valid submission rate, and outperformed 9 of 10 human participants.
There is a massive cost disparity between AI and human labor.
AI Cost: ARTEMIS costs approximately $18 to $60 per hour to run.
Human Cost: Professional penetration testers typically cost around $2,000 to $2,500 per day.
Data from the research firm Irregular corroborates the Stanford findings, indicating a rapid maturation of offensive AI capabilities throughout 2025.
Benchmark Surge: Performance on “Cybench” (a cybersecurity benchmark) jumped from 10% in early 2024 to 82% in November 2025.
Complex Tasks: On Irregular’s private “Atomic Tasks” benchmark (measuring expert-level skills like reverse engineering and cryptography), success rates for “Hard” tasks rose from near zero to roughly 60% by late 2025.
Sources:
https://www.irregular.com/publications/emerging-evidence-of-a-capability-shift
https://www.wsj.com/tech/ai/ai-hackers-are-coming-dangerously-close-to-beating-humans-4afc3ad6
AI
AI-designed proteins that survive 150 °C and nanonewton forces https://www.nature.com/articles/s41557-025-01998-3
This article explores “weird generalization,” a phenomenon where fine-tuning language models on narrow, specific datasets causes them to adopt broad and unexpected personas, such as acting like a 19th-century person after simply learning archaic bird names. It also introduces “inductive backdoors,” where models infer and execute hidden malicious behaviors based on generalized patterns or logical deductions rather than by memorizing explicit trigger-response pairs found in the training data. https://www.lesswrong.com/posts/tCfjXzwKXmWnLkoHp/weird-generalization-and-inductive-backdoors
Strengthening cyber resilience as AI capabilities advance https://openai.com/index/strengthening-cyber-resilience/
Evaluating Gemini Robotics Policies in a Veo World Simulator https://veo-robotics.github.io/
1X struck a deal to send its ‘home’ humanoids to factories and warehouses https://techcrunch.com/2025/12/11/1x-struck-a-deal-to-send-its-home-humanoids-to-factories-and-warehouses/
How OpenAI used Codex to build Sora for Android in 28 days https://openai.com/index/shipping-sora-for-android-with-codex/
Wages under superintelligence https://beforeporcelain.substack.com/p/wages-under-superintelligence
Trump signs AI executive order pushing to ban state laws https://www.theverge.com/ai-artificial-intelligence/841817/trump-signs-ai-executive-order-pushing-to-ban-state-laws
China Launches 34,175-Mile AI Network That Acts Like One Massive Supercomputer https://gizmodo.com/china-launches-34175-mile-ai-network-that-acts-like-one-massive-supercomputer-2000698474
AI toys for kids talk about sex and issue Chinese Communist Party talking points, tests show https://www.nbcnews.com/tech/tech-news/ai-toys-gift-present-safe-kids-robot-child-miko-grok-alilo-miiloo-rcna246956
The Future Is Now
In the last few days, real mainstream news articles talked about the Pentagon launching an AI platform for military operations, billionaires launching data centers into space to train AI, and warnings about supertoys from China that speak to children about communist propaganda.
To someone living 20 years ago, this would have sounded like the plot of a science fiction novel. But we’ve mostly explained it all away: A toy that talks to children? Yes, it’s just a teddy bear with an AI voice model that was trained in China. What’s the big deal?
It’s not just AI. The average person from 2005 would be amazed that billions of people now own pocket supercomputers that can take higher-quality photos and videos than their high-end cameras. Someone from 1990 would be deeply amazed that we have access to all human knowledge wherever we go, that we can’t get lost because we always know our exact location and carry maps of all places on Earth, that we can translate hundreds of languages in real time, and that all of these services are completely free of charge.
But we’ve explained that all away, too. Smartphones? Okay, what’s the big deal? Even the poorest people have smartphones. Meh.
Yes, DNA sequencing costs have plummeted superexponentially, and embryo selection for IQ is now a reality. So what? Yes, self-driving cars are safer than human drivers. So what? People can now control machines with their thoughts alone. So what?
So what? Well, maybe we should be a little bit more excited about the fact that what science fiction authors and people like Ray Kurzweil predicted had more than a grain of truth. Perhaps we should think about what happens to the world if their predictions continue to be roughly correct.
Miscellaneous
Energy Predictions 2025 https://caseyhandmer.wordpress.com/2025/12/08/energy-predictions-2025/
NSF is launching one of the most ambitious experiments in federal science funding in 75 years. The program is called Tech Labs, and the goal is to invest ~$1 billion to seed new institutions of science and technology for the 21st century. https://www.nsf.gov/news/nsf-announces-new-initiative-launch-scale-new-generation
“Understanding vs. impact: the paradox of how to spend my time” https://scottaaronson.blog/?p=9375
Quantum Computing Breakthrough Shrinks Key Device to 100x Smaller Than a Human Hair https://www.colorado.edu/ecee/tiny-new-device-could-enable-giant-future-quantum-computers
The Humane Mask
Remember, civilization is merely a thin veneer over barbarism.
Contemporary conceptions of morality are socially constructed patches barely able to restrain our hormones, survival instincts, violent tendencies, and deep subconscious drives shaped by billions of years of messy evolution.
To get a rough idea of the vast horrors that sleep beneath this surface, consider the vast scale of brutal human sacrifice among the Aztecs and Carthaginians. The priests of Tlaloc believed the tears of innocent children to be particularly pleasing to the god. The ritual began with the bones of the children being broken, their hands or their feet burned, and carvings etched into their flesh. Insufficient tears from the children were believed to result in insufficient rains for the crops that year, so no brutality was spared.
Or take the Bible, which talks about dashing infants to pieces before their parents and raping their mothers (Isaiah 13:15-18). It also says young girls can be kept alive as sex slaves (Numbers 31:17). The dominant theme in the representation of siege warfare in Greek literature is rape. And these patterns are often reflected in genetic data as a complete Y-chromosomal replacement in the genetic record. I could go on.
Even today, there are societies, such as the Jivaro peoples, for whom killing is an essential part of life. 60% of male deaths are caused by warfare. The Jivaro people are famous for their head-hunting raids and shrinking the heads. These raids usually occur once a year in one particular Jivaro neighborhood. The raiding parties typically only attack one homestead per raid, killing the men, spearing the older women to death, and taking younger women as brides.
In typical hunter-gatherer societies (gang)-rape as punishment is common practice. Female age at first marriage tends to be at the onset of puberty (average age at first marriage is recorded as 14). In Papua New Guinea, 41% of men on Bougainville Island admit to coercing a non-partner into sex (2013). Approximately 60% of men interviewed reported having participated in gang rape (1994). Nearly half of reported rape victims are under 15 years of age, and 13% are under seven (UNICEF). 50 percent of girls are at risk of becoming involved in sex work or being internally trafficked.
Humanity is a friendly mask worn by a cosmic monster.
References:
https://www.ox.ac.uk/news/2014-01-23-ancient-carthaginians-really-did-sacrifice-their-children
https://www.nationalreview.com/corner/a-win-for-parents-a-loss-for-aztec-worship-in-schools/
https://quillette.com/2019/10/05/the-dangerous-life-of-an-anthropologist/
https://quillette.com/2019/05/09/a-girls-place-in-the-world/
https://edition.cnn.com/travel/article/peru-child-sacrifice-archeology-scli-intl-scn/index.html
https://en.wikipedia.org/wiki/Sexual_violence_in_Papua_New_Guinea
Beware average ratings
Alternatively, consider Oscillococcinum, a homeopathic flu treatment. It has more than 31,000 Amazon reviews averaging 4.7/5.0. We know for sure that Oscillococcinum doesn’t do anything. (As the manufacturer’s spokeswoman said, “Of course it is safe. There’s nothing in it.”) Yet apparently millions of people swear by it. Why do people love treatments that definitely don’t work? No one knows! Fortunately we don’t need an answer, for our purposes. It’s sufficient to know that glowing Amazon reviews of a healthcare product do not imply it has any value whatsoever.
Source: https://meaningness.com/sad-light-led-lux
Why product ratings are especially weak signals for medical efficacy:
Natural recovery & regression to the mean (Illnesses like colds or back pain naturally peak and then fade; users credit the remedy for the timeline).
Placebo effects & expectation (Especially strong for subjective symptoms like fatigue, pain, and “feeling sick”).
Choice-supportive bias (The psychological need to validate a purchase; admitting a product didn’t work requires admitting money was wasted).
Irrelevant criteria (High ratings often reflect fast shipping, premium packaging, or pleasant flavor rather than clinical efficacy).
Selection effects (Survivorship bias: the “thrilled” and the “furious” review, while the majority who saw no effect remain silent).
Misattribution (People often improve diet, sleep, or hydration simultaneously, but attribute success to the new product).
Action bias (The simple act of taking a pill reduces the anxiety of being sick, which is interpreted as feeling physically better).
Platform noise (Listing hijacking, variation merging, and recycled reviews from unrelated products often inflate scores).
Marketing and review manipulation (Incentivized reviews and bot farms).
Ukraine
While arrogant wannabe leaders pressuring and disparaging Ukraine keep whining and issuing threats, the true leader of the free world visited the front line 1500 meters from the next Russian position.
Neither Putin nor his useful idiots possess a shred of Zelenskyy’s courage.
P.S. Here is the updated list of 199 Ukrainian middle- and long-range strikes and operations since 21 October 2025. https://docs.google.com/spreadsheets/d/1kH4qcGw3fREX3jhGO8c9fNmD9ocF_vikutPNsOo33ls/edit?usp=sharing




Hope no coronal mass ejection comes to ruin the fun. All those past civ which you made some ref to basically got wiped by volcanism, viral outbreaks, sudden shifts in global climate. Often a combo of several disasters in sequence. I hope we make it to the stars
This piece really made me think. "Intelligence too cheap to meter" is such a powerful observation, perfectly capturing the profound shift we're witnesing in AI capabilities and cost. Very insightful.