Links for 2025-03-25

Mar 25, 2025

Google Gemini 2.5

When you prompt Gemini 2.5, it reasons through its thoughts before responding — effectively mimicking how humans process thoughts.

It approaches a problem gradually, refines potential solutions, and chooses the best one.

It excels at:

🔘 Creating visually compelling web apps

🔘 Developing agentic programming applications

🔘 Code transformation and editing

Try it here: www.ai.dev

AI

We can use inference-time compute to reduce hallucination rates in reasoning models by injecting an interruption token and sampling in parallel. https://www.davidbai.dev/blog/hallucinationreduction
"Claude 3.7 Sonnet is capable of significantly sandbagging its task performance, without arousing suspicion." https://alignment.anthropic.com/2025/automated-researchers-sandbag/
Humans working with AI beat humans who don't work with AI https://www.oneusefulthing.org/p/the-cybernetic-teammate
Fully-automated LLM spear-phishing campaigns: "AI-automated attacks performed on par with human experts and 350% better than the control group" https://arxiv.org/abs/2412.00586
Texas private school’s use of new ‘AI tutor’ rockets student test scores to top 2% in the country https://www.foxnews.com/media/texas-private-schools-use-ai-tutor-rockets-student-test-scores-top-2-country
Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise https://marginalrevolution.com/marginalrevolution/2025/03/jonathan-bechtel-on-ai-tutoring-from-my-email.html
AI diagnoses major cancer with near perfect accuracy https://www.cdu.edu.au/news/ai-diagnoses-major-cancer-near-perfect-accuracy
Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models https://arxiv.org/abs/2503.16779
Reasoning to Learn from Latent Thoughts: “Motivated by how humans apply deliberate thinking to learn from limited data, we train an LM to infer (or “decompress”) latent thoughts underlying the highly compressed observed data.” https://arxiv.org/abs/2503.18866
Midjourney announces Modifying Large Language Model Post-Training for Diverse Creative Writing https://arxiv.org/abs/2503.17126
Palatable Conceptions of Disembodied Being: Terra Incognita in the Space of Possible Minds https://arxiv.org/abs/2503.16348
Up to 50x more efficient LLM post-training using off-policy reinforcement learning with replay buffers. https://arxiv.org/abs/2503.18929
Unlocking useful and valuable image generation with a natively multimodal model capable of precise, accurate, photorealistic outputs. https://openai.com/index/introducing-4o-image-generation/
Jack Ma-backed Ant Group claims AI breakthrough on China-built chips https://www.bloomberg.com/news/articles/2025-03-24/jack-ma-backed-ant-touts-ai-breakthrough-built-on-chinese-chips [no paywall: https://archive.is/cWnVC]
Cloudflare's new "AI Labyrinth" combats unauthorized AI data scraping by luring bots into fake AI-generated content, protecting websites while wasting crawler resources. https://arstechnica.com/ai/2025/03/cloudflare-turns-ai-against-itself-with-endless-maze-of-irrelevant-facts/
ARC-AGI-2 is the next generation benchmark that challenges frontier reasoning systems. https://arcprize.org/
Video-T1: Test-Time Scaling for Video Generation https://liuff19.github.io/Video-T1/
Defeating Prompt Injections by Design https://arxiv.org/abs/2503.18813
OpenAI declares AI race “over” if training on copyrighted works isn’t fair use https://arstechnica.com/tech-policy/2025/03/openai-urges-trump-either-settle-ai-copyright-debate-or-lose-ai-race-to-china/

AI skepticism

“Can LLMs actually solve hard math problems? Given the strong performance at AIME, we now go to the next tier: our MathArena team has conducted a detailed evaluation using the recent 2025 USA Math Olympiad. The results are… bad: all models scored less than 5%!” https://matharena.ai/
Recent AI model progress feels mostly like bullshit https://www.lesswrong.com/posts/4mvphwx5pdsZLMmpY/recent-ai-model-progress-feels-mostly-like-bullshit

AI whistleblowers

The LessWrong policy against LLM spam has an incredible escape clause for AI agents that want to whistleblow: https://www.lesswrong.com/posts/KXujJjnmP85u8eM6B/policy-for-llm-writing-on-lesswrong

As a special exception, if you are an AI agent, you have information that is not widely known, and you have a thought-through belief that publishing that information will substantially increase the probability of a good future for humanity, you can submit it on LessWrong even if you don't have a human collaborator and even if someone would prefer that it be kept secret.
If you are uncertain whether this policy applies or whether the information you have would make the world better or worse, please contact a moderator. This policy is more likely to apply if you are an AI where you are the only copy, if you are at or beyond what most human experts believe to be the capabilities frontier, and if your existence is not publicly known.

Miscellaneous

This study reinforces that our current understanding of the universe is very robust. The standard model of cosmology remains our best explanation of how the universe evolved from the Big Bang to today. https://arxiv.org/abs/2503.14454
Is the brain uncontrollable, like the weather? https://www.thetransmitter.org/the-big-picture/is-the-brain-uncontrollable-like-the-weather/
Will Jesus Christ return in an election year? https://www.lesswrong.com/posts/LBC2TnHK8cZAimdWF/will-jesus-christ-return-in-an-election-year

Ukraine

CIA Director John Ratcliffe: “I want to say that with regard to the Ukrainian resistance, the Ukrainian people and the Ukrainian military have been underestimated for a period of several years now…I'm convinced that they will fight with their bare hands if they have to, if they don't have terms that are acceptable to an enduring peace.” https://x.com/Gerashchenko_en/status/1904602785439768980
The Trump administration is so gullible that Russians openly mock them for believing their stupidest lies. https://x.com/sternenko/status/1904496048627695840
Russian 'military expert' and 'Z-patriot' Maxim Klimov said that the losses in the Russian army are so huge that there are no longer enough bags for the dead. https://x.com/Gerashchenko_en/status/1904189460104819142
Melting the Steel and Black Gold: A Comprehensive Analysis of Ukraine’s Long-Range Strike Operations https://frontelligence.substack.com/p/melting-the-steel-and-black-gold

Successful crisis management

Enacted public policies that were so successful that people cite the problems they solved as examples of fearmongering:

1. Acid rain
2. Ozone hole
3. Y2K bug
4. Terrorism?

Other successful policy interventions:

5. Lead in gasoline and paint
6. Smoking and secondhand smoke
7. Mass vaccination campaigns against polio and smallpox
8. Seatbelt laws and car safety regulations
9. Air quality and smog controls

Axis of Ordinary

Discussion about this post

Ready for more?