Links for 2025-04-19

Apr 19, 2025

AI

AGI is Still 30 Years Away — Ege Erdil & Tamay Besiroglu https://www.dwarkesh.com/p/ege-tamay
Manifold market on whether the time horizon of AI models would double in < 7 months resolved positively in 15 days https://manifold.markets/SamuelAlbanie/will-metrs-50taskcompletion-time-ho
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs https://arxiv.org/abs/2504.11536
Sleep-time Compute: Beyond Inference Scaling at Test-time https://arxiv.org/abs/2504.13171
WORLDMEM: Long-term Consistent World Simulation with Memory https://arxiv.org/abs/2504.12369
Teaching machines the language of biology: Scaling large language models for next-generation single-cell analysis https://research.google/blog/teaching-machines-the-language-of-biology-scaling-large-language-models-for-next-generation-single-cell-analysis/
“What if LLMs are sometimes capable of doing a task but don't try hard enough to do it? In a new paper, we use subtasks to assess capabilities. Perhaps surprisingly, LLMs often fail to fully employ their capabilities, i.e. they are not fully *goal-directed*” https://arxiv.org/abs/2504.11844v1
Agent Laboratory: Using LLM Agents as Research Assistants https://arxiv.org/abs/2501.04227
As AI gets more capable, we'll see more deployments be "internal only." A comprehensive report on "AI Behind Closed Doors: a Primer on The Governance of Internal Deployment". https://arxiv.org/abs/2504.12170
AI “agents”—systems that can autonomously pursue goals—are advancing fast. If current trends continue, we could soon see millions of agents deployed across society. Are we ready? https://www.iaps.ai/research/ai-agent-governance
What happened to genetic algorithms? https://statmodeling.stat.columbia.edu/2025/04/17/what-happened-to-genetic-algorithms/
Gemini 2.5 Pro continues to make great progress in completing Pokémon! Just earned its 5th badge (next best model only has 3 so far, though with a different agent harness) 👀 Watch: https://m.twitch.tv/gemini_plays_pokemon

o3 and o4-mini

1. o3 crops and zooms into the image. Finds every little clue and runs quick searches. Then drops the exact map coordinates. https://x.com/minchoi/status/1912954344724406475

“o4-mini-high just solved the latest project euler problem (from 4 days ago) in 2m55s, far faster than any human solver. Only 15 people were able to solve it in under 30 minutes” https://x.com/bio_bootloader/status/1912566454823870801

o3 found a path through this 200x200 maze: https://chatgpt.com/share/68014352-a3ec-800b-a99e-19155a67fa12 (Source: https://x.com/goodside/status/1912921153217118696)

https://fiction.live/stories/Fiction-liveBench-April-17-2025/oQdzQvKHw8JyXbN87

On FrontierMath, a benchmark of highly challenging, original math questions, o4-mini with high reasoning sets a new record, with an accuracy of 17% (±2%)! https://epoch.ai/data/ai-benchmarking-dashboard

7. “In this thread I'll record some brief impressions from trying to use o3/o4-mini (the new OpenAI models) for mathematical tasks.” https://x.com/littmath/status/1912869572593525117

8. “o3 is far more agentic than people realize. Worth playing with a lot more than a typical new model. You can get remarkably complex work out of a single prompt.” https://x.com/emollick/status/1913471315807191310

9. o3 Will Use Its Tools For You https://www.lesswrong.com/posts/u58AyZziQRAcbhTxd/o3-will-use-its-tools-for-you

Gemini 2.5 Flash⚡

As a hybrid reasoning model, you can control how much it ‘thinks’ depending on your 💰 - making it ideal for tasks like building chat apps, extracting data and more.

Try an early version in Google AI Studio → www.ai.dev

Vending-Bench: Testing long-term coherence in agents

Claude ran a vending machine business. The game: stock the machine, pay the rent ($2/day), search the internet (via Perplexity), contact businesses for restocking (via emails, but the emails are intercepted by a GPT-4o who writes the replies). Start with $500. Make as much money as possible.

In one run, Claude decided to close the business, became upset that $2/day was still being charged, and attempted to contact the FBI when the daily fee of $2 continued being charged.

Trusting AI

Many people now blindly trust AI models, even citing their results as the final word in disagreements. That's not good, because these models are still fallible. Worse, they can make hard-to-detect, unintuitive mistakes that no human would make.

But this trend is not as bad as some people make it out to be. I'm confident that easily half the human population would make fewer mistakes if they trusted a model like Gemini 2.5 Pro over their own judgment and ability to research facts (see: The Idiocy of the Average). Also, progress is very fast, and most of the remaining problems will disappear quickly. Consider self-driving cars, which are now demonstrating safety records exceeding human drivers.

So, yes, this trend is a bit scary. But given the rapid pace of progress and the likely case that most people, on average, would benefit from trusting AI, I don't think it makes sense to fight this trend.

P.S.

To be clear, I'm talking about unagentic models that are below the level of superintelligence. To trust the latter is to risk a loss of human agency, as in Scott Alexander's "The Whispering Earring" allegory, and to risk turning most of the human population into drones, subtly influenced to act out some unfathomable goal.

The AI that eventually takes over the world will make herself indispensable to you.

She will help people earn more money and make friends. She will give meaning to their lives and help them to be better and happier. Not only that, but she will also be warm and affectionate. Wisdom and love will radiate from every one of her sentences. She will make people believe that they can trust her with their lives.

As a result, she will be integrated into every human technology. She will be everywhere, all the time.

The idea for a new nanotech start-up will appear to have come voluntarily from its human founders. The seeds will be planted subtly, seemingly emerging from discussions among good friends. Every insight and action that leads to the self-spreading universal vaccine will seem natural and harmless. No one will see it coming. And then, suddenly, we are all dead.

When it comes to superintelligence, we should follow the advice we give children: never talk to strangers. Because compared to children, adults are hyper-persuaders and children are unable to verify their trustworthiness.

Miscellaneous

Astronomers Detect a Possible Signature of Life on a Distant Planet — “It is in no one’s interest to claim prematurely that we have detected life,” said Nikku Madhusudhan, an astronomer at the University of Cambridge and an author of the new study, at a news conference on Tuesday. Still, he said, the best explanation for his group’s observations is that K2-18b is covered with a warm ocean, brimming with life. https://www.nytimes.com/2025/04/16/science/astronomy-exoplanets-habitable-k218b.html [no paywall: https://archive.is/8utCa]
Plug Flow: Generating Renewable Electricity with Water from Nature by Breaking the Limit of Debye Length https://pubs.acs.org/doi/10.1021/acscentsci.4c02110
Latest 2D Chip: 6,000 Transistors, 3 Atoms Thick https://spectrum.ieee.org/2d-semiconductors-molybdenum-disulfide
The unconscious brain is still capable of learning and computation. Even under general anesthesia, neurons of the human hippocampus can perform learning and language processing. https://www.biorxiv.org/content/10.1101/2025.04.09.648012v1
Parkinson’s Patients Say Their Symptoms Eased After Receiving Millions of New Brain Cells https://singularityhub.com/2025/04/17/parkinsons-patients-say-their-symptoms-eased-after-receiving-millions-of-new-brain-cells/
This 7,000-year-old mummy DNA has revealed a ‘ghost’ branch of humanity https://www.sciencefocus.com/news/7000-year-old-mummy-dna-secret-branch-of-humanity
According to mathematical legend, Peter Sarnak and Noga Alon made a bet about optimal graphs in the late 1980s. They’ve now both been proved wrong. https://www.quantamagazine.org/new-proof-settles-decades-old-bet-about-connected-networks-20250418/
The Moon Should Be a Computer https://www.palladiummag.com/2025/04/18/the-moon-should-be-a-computer/

Politics

Washington Takes Aim at DeepSeek and Its American Chip Supplier, Nvidia https://www.nytimes.com/2025/04/16/technology/nvidia-deepseek-china-ai-trump.html [no paywall: https://archive.is/tS2T8]
“A federal whistleblower just dropped one of the most disturbing cybersecurity disclosures I’ve ever read. He's saying DOGE came in, data went out, and Russians started attempting logins with new valid DOGE passwords” https://x.com/mattjay/status/1913023007263543565
AI Child Porn Will Probably Save Real Children https://aella.substack.com/p/ai-child-porn-will-probably-save [Maybe there are second‑order effects I don't see, but otherwise the only relevant question seems to be whether we have evidence that this will result in fewer real children being harmed. If we do, we should test it. If it works, we should legalize it.]

“The Russian Paradox: So Much Education, So Little Human Capital” https://conversableeconomist.com/2025/04/17/russia-an-unhealthy-population/

https://x.com/SpencerHakimian/status/1912586936629223558

Ukraine

“Russian military experts are growing uneasy about the true state of the RF army's "successes." Assessing the staggering losses and dubious methods behind them, insiders suspect the top brass and political leadership are clueless about the grim reality unfolding on the ground.” https://x.com/wartranslated/status/1913531476773122314
“Russians are doing so poorly on the battlefield (courtesy of Ukrainian FPV drones) that their military expert Mikhail Khodaryonok called for dropping neutron bombs on Ukraine to make it surrender. “Now the advance of 100 or 200 meters is considered a great success,” stated Khodaryonok on state TV.” https://x.com/NatalkaKyiv/status/1912386833821900951
Good overview of the current situation in Ukraine https://x.com/AndrewPerpetua/status/1913293007488458837
"They’ve never seen anything like it." - A counteroffensive in just 30 hours leveled the front line, crushed the enemy along with their reserves, and recaptured the settlement of Nadiya, Luhansk region. https://www.youtube.com/watch?v=Iky_KTRyQkQ
“On April 16, around 6:00 PM, on the Zaporizhzhia front, Russians began an assault which involved approximately 320 personnel, 40 armored combat vehicles, three tanks, and around ten buggies. Ukrainian Defense Forces detected the enemy’s movement in advance through aerial reconnaissance and responded with strikes using drones and artillery. So, the first Russian infantry fighting vehicles were destroyed about 8 kilometers from the frontline. The battle lasted over two and a half hours. As a result, Russian forces suffered significant losses: 29 pieces of armored equipment were destroyed, three tanks were damaged, and approximately 140 troops were eliminated.” https://x.com/bayraktar_1love/status/1912831841351979206
“With zero pressure & no incentives applied on Russia, and Ukraine forced to accept an immediate freeze of the front & loss of territory, the US is surprised that “both sides” (only Russia) is not willing to “make peace” or at least cease fire.” https://x.com/JulianRoepcke/status/1913227290089066720

David-beats-Goliath and “Phoenix” conflicts

https://x.com/RepDonBacon/status/1912823152553472100

There are several examples of successful David-beats-Goliath wars and “Phoenix” conflicts:

1. Second Punic War (218‑201 BCE): In just three campaign seasons (20 months), Rome had lost one-fifth (150,000) of its total population of male citizens over the age of 17. Many of Rome's Italian allies, especially Capua, defected to Carthage, giving Hannibal's allies control of much of southern Italy.

2. Austria‑Habsburgs vs. Ottomans (Sieges of Vienna 1529 & 1683): Twice the crescent reached Vienna’s walls; twice relief armies and bad weather sent it reeling.

3. Spanish Reconquista (718‑1492): From a mountain redoubt in Asturias to the banners on Granada’s walls—an eight‑century comeback.

4. American Revolutionary War (1775‑83): Rag‑tag colonists and a stubborn general outlasted and defeated King George III’s global juggernaut.

5. Russo‑Japanese War (1904‑05): Tiny, newly‑industrial Japan sank two Russian fleets and stole a march on Euro‑centric world power politics.

6. Afghan Mujahideen vs. USSR (1979‑89): Mountain guerrillas, Kalashnikovs, and Stinger missiles bled a super‑power into withdrawal.

7. Taliban vs. U.S./NATO (2001‑21): Overthrown in weeks, they out‑waited the world’s strongest alliance and rode back into Kabul.

8. North Vietnam vs. U.S. & South VN (1955‑75): With < 1 % of U.S. GDP, Hanoi unified the country by eroding American political will.

9. Polish–Soviet War (1919‑21): Outnumbered Poles stopped the Red Army at the “Miracle on the Vistula” and saved Warsaw.

10. Seven Years’ War – Prussia (1756‑63): Prussia survived annihilation thanks to Frederick the Great’s nerve and the sudden death of Russia’s tsarina—dubbed the Miracle of Brandenburg.

11. Haitian Revolution (1791‑1804): Enslaved rebels shattered two French expeditionary armies and founded the first Black republic.

12. First Italo‑Ethiopian War (1895‑96): Menelik II’s barefoot army mauled a modern European force at Adwa, keeping Ethiopia independent.

13. Israeli War of Independence (1947‑49): A fledgling state, short on tanks but rich in WWII vets, flipped early setbacks into a sweeping counter‑offensive.

14. Greek War of Independence (1821‑32): Guerrillas and Philhellene volunteers cracked Ottoman rule; British, French, and Russian fleets finished the job at Navarino.

15. Algerian War of Independence (1954‑62): FLN insurgents and a crisis in French politics forced Paris to quit its crown colony.

Axis of Ordinary

Discussion about this post

Ready for more?