Links for 2025-08-05

Aug 05, 2025

Open models by OpenAI

Advanced open-weight reasoning models to customize for any use case and run anywhere: https://openai.com/open-models/

Gemini 2.5 Deep Think

It doesn't just answer, it brainstorms using parallel thinking and reinforcement learning techniques.

Deep Think can mirror how people tackle complex tasks: by exploring multiple approaches at once.

It generates parallel streams of thought, before comparing, contrasting, and refining these ideas to arrive at better answers.

Genie 3

What if you could not only watch a generated video, but explore it too? 🌐

Genie 3 is Google's groundbreaking world model that creates interactive, playable environments from a single text prompt.

From photorealistic landscapes to fantasy realms, the possibilities are endless.

🔘 Real-time capabilities: Generates dynamic worlds at 720p and 24 FPS, with each frame created in response to user actions.

🔘 Long-horizon consistency: Environments created remain largely consistent over several minutes, with visual memory extending as far as 1 minute in the past. This ability is critical to enable AI agents to learn about the world.

🔘 Promptable world events: Beyond navigation, users can insert text prompts to alter the world in real-time - like changing the weather ⛅ or introducing new characters 👤

🔘 Accelerating agent research: To explore the potential for agent training, Google placed its SIMA agent in a Genie 3 world with a goal. The agent acts, and Genie 3 simulates a response in the world without knowing the objective. This is key for building more capable embodied agents.💡

World models are a key stepping stone on the path to AGI, promising unlimited rich simulations for training AI agents. Genie 3 represents a significant leap forward in making this a reality.

See also:

They solved environmental consistency with Genie 3, and this was an emergent capability. https://x.com/agrimgupta92/status/1952735045527208016
OpenAI released Harmony along with the new gpt-oss models. It's a new chat template with several interesting features https://github.com/openai/harmony

Epoch AI: A fourth problem on FrontierMath Tier 4 has been solved by AI! Written by Dan Romik, it had won our prize for the best submission in the number theory category. https://x.com/EpochAIResearch/status/1951432847148888520

The build-out of AI infrastructure is so vast that, over the past six months, it has contributed more to the growth of the US economy than consumer spending.

As a percentage of GDP, spending on AI infrastructure has already exceeded spending on telecoms and internet infrastructure during the dot-com boom, and it’s still growing.

One reason for the ongoing strength of the U.S. economy is that this level of spending is acting as a kind of private-sector stimulus programme, larger than the EU's flagship research programme spanning six years.

And this is just the beginning. Europe is not even part of the ongoing industrial revolution and arms race between the US and China.

AI

AI Breakthrough at NJIT Unlocks 'New' Materials to Replace Lithium-Ion Batteries https://news.njit.edu/ai-breakthrough-njit-unlocks-new-materials-replace-lithium-ion-batteries
AI helps chemists develop tougher plastics https://news.mit.edu/2025/ai-helps-chemists-develop-tougher-plastics-0805
FastCSP: Accelerated Molecular Crystal Structure Prediction with Universal Model for Atoms https://ai.meta.com/research/publications/fastcsp-accelerated-molecular-crystal-structure-prediction-with-universal-model-for-atoms/
Claude Opus 4.1 advances state-of-the-art coding performance to 74.5% on SWE-bench Verified https://www.anthropic.com/news/claude-opus-4-1
Language models develop computational circuits that TRANSFER length generalization across related tasks. https://arxiv.org/abs/2506.09251
WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization https://arxiv.org/abs/2507.15061
Chinese ByteDance is exploring diffusion LLMs too: A large scale language model based on discrete-state diffusion, specializing in code generation, achieves an inference speed of 2,146 token/s, a 5.4x improvement over autoregressive models of comparable size. https://seed.bytedance.com/en/seed_diffusion
Quantifying the algorithmic improvement from reasoning models https://epochai.substack.com/p/quantifying-the-algorithmic-improvement
Once again, the future is proving more bizarre than science fiction of the past imagined it: “The method is loosely analogous to giving the model a vaccine—by giving the model a dose of “evil,” for instance, we make it more resilient to encountering “evil” training data.” https://www.anthropic.com/research/persona-vectors
Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving https://arxiv.org/abs/2507.23726
NuminaMath-LEAN, a large-scale dataset of 100K mathematical competition problems formalized in Lean 4, with more than 20K human annotations. https://huggingface.co/datasets/AI-MO/NuminaMath-LEAN
OpenAI’s ChatGPT to hit 700 million weekly users, up 4x from last year https://www.cnbc.com/2025/08/04/openai-chatgpt-700-million-users.html
ChatGPT for speeding up North Carolina public servants (e.g. reducing some tasks from 20 minutes to 20 seconds) https://www.nctreasurer.gov/news/press-releases/2025/08/01/state-treasurer-briner-openai-report-shows-many-benefits-offers-great-promise
A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence https://arxiv.org/abs/2507.21046
Unverified claim: “Our AI co-scientist achieves 92.4% GPQA Diamond accuracy. Surpassing every major AI system while solving the calibration problem.” https://autopoiesis.science/blog/92-4-gpqa-diamond
“We are a software company that builds RL environments and sells them to the leading AI labs. Our RL environments simulate real-world work scenarios.” https://www.mechanize.work/
How quick and big would a software intelligence explosion be? https://www.forethought.org/research/how-quick-and-big-would-a-software-intelligence-explosion-be

ChatGPT is rewiring how humans speak to each other

The appearance of large language models caused a drastic shift in the vocabulary of academic writing and talks.

There has been a measurable and abrupt increase in the use of words preferentially generated by ChatGPT, such as delve, comprehend, boast, swift, and meticulous, after its release.

The changes showed up in SPONTANEOUS conversations, not scripts or prepared thoughts. Random people chatting on podcasts started using ChatGPT's favorite words without realizing it.

Delve into the meticulously designed studies:

Study 1: Empirical evidence of Large Language Model's influence on human spoken communication https://arxiv.org/abs/2409.01754

Study 2: Delving into LLM-assisted writing in biomedical publications through excess vocabulary https://www.science.org/doi/10.1126/sciadv.adt3813

Imagine you're a kindergarten kid, and then a 3,000-year-old person shows up who is the result of a million-year eugenics program designed to breed intelligence, strength, and persuasion.

In this scenario, the power imbalance between you and this person is roughly a millionth of the disparity between you and an artificial superintelligence.

Miscellaneous

Surprising finding could pave way for universal cancer vaccine https://news.ufl.edu/2025/07/universal-cancer-vaccine/
Miniature neutrino detector promises to test laws of physics https://www.nature.com/articles/d41586-025-02404-1 [no paywall: https://archive.is/grsgy]
Suddenly, Trait-Based Embryo Selection https://www.astralcodexten.com/p/suddenly-trait-based-embryo-selection
Japanese chipmaker Rapidus begins test production of 2nm circuits — company commits to single-wafer processing ahead of 2027 mass production target https://www.tomshardware.com/tech-industry/semiconductors/japanese-chipmaker-rapidus-begins-test-production-of-2nm-circuits-company-commits-to-single-wafer-processing-ahead-of-2027-mass-production-target
Why Do Victims of Massacres Go Quietly to Their Deaths? https://www.benlandautaylor.com/p/why-do-victims-of-massacres-go-quietly

Politics

Interview with Steven Byrnes on Brain-like AGI, Foom & Doom, and Solving Technical Alignment https://www.lesswrong.com/posts/zecxwyATrN8ZbinoC/interview-with-steven-byrnes-on-brain-like-agi-foom-and-doom
Suppose we want the future to go better. What should we do? https://www.forethought.org/research/better-futures
‘By mid-March, corpses littered the street like newspapers’ https://news.harvard.edu/gazette/story/2025/08/by-mid-march-corpses-littered-the-street-like-newspapers/

Axis of Ordinary

Discussion about this post

Ready for more?