Links for 2024-11-10
AI:
LLMs Look Increasingly Like General Reasoners: “I believe the new evidence should update all of us toward LLMs scaling straight to AGI, and therefore toward timelines being relatively short.” https://www.lesswrong.com/posts/wN4oWB4xhiiHJF9bS/llms-look-increasingly-like-general-reasoners
“VLMs can act as generative universal value functions. We discovered a simple method to generate zero-shot and few-shot values for 300+ robot tasks and 50+ datasets using SOTA VLMs like Gemini” https://generative-value-learning.github.io/
Geometry-Informed Neural Networks are evolving! Beyond faster training and improved shapes, GINNs surprised us with an emergent property – a structured latent space. https://arturs-berzins.github.io/GINN/
Dextrous Code Generation: Generating Robot Policy Code for High-Precision and Contact-Rich Manipulation Tasks https://dex-code-gen.github.io/dex-code-gen/
Groq CEO Jonathan Ross says generative AI is so powerful because it benefits from increasing compute to find solutions buried in search trees that otherwise wouldn't be found and which give AIs their intuition https://youtu.be/KhLXVRiZBdo?si=WOv5ZzclpGJlTJEr&t=575
Why AI Could Eat Quantum Computing’s Lunch https://www.technologyreview.com/2024/11/07/1106730/why-ai-could-eat-quantum-computings-lunch/ [no paywall: https://archive.is/R6kaY]
Why could a coding model trained on just 2.5T tokens compete with top-tier models like DeepSeekCoder (10T tokens) and QwenCoder (15T tokens)? https://opencoder-llm.github.io/
Debate May Help AI Models Converge on Truth https://www.quantamagazine.org/debate-may-help-ai-models-converge-on-truth-20241108/
“I don't see signs of these capability increases stopping or slowing down, and if they do continue I expect the impact on society to start accelerating as they exceed what an increasing fraction of humans can do. I think we could see serious changes in the next 2-5 years.” https://www.lesswrong.com/posts/CNA8ksMwcuXHPjXRt/personal-ai-planning
New math benchmark:
Epoch has built a new, hard math benchmark written by 60 leading mathematicians:
Even with extended thinking time (10,000 tokens), Python access, and the ability to run experiments, current top models solve less than 2% correctly.
FrontierMath has three key design principles: 1) All problems are new and unpublished, preventing data contamination, 2) Solutions are automatically verifiable, enabling efficient evaluation, 3) Problems are "guessproof" with low chance of solving without proper reasoning.
What do experts think? Fields Medalists Terence Tao (2006), Timothy Gowers (1998), Richard Borcherds (1998), and IMO coach Evan Chen unanimously described the problems as exceptionally challenging, requiring deep domain expertise.
Learn more: https://epochai.org/frontiermath
Prediction market: Will an AI achieve >85% performance on the FrontierMath benchmark before 2028? https://manifold.markets/MatthewBarnett/will-an-ai-achieve-85-performance-o
Thread on why FrontierMath might be the closest to a definitive line delineating the boundary of “AGI” we’ll get: https://x.com/MatthewJBar/status/1855406544420053374
Health:
Three people with severely impaired vision who received stem-cell transplants have experienced substantial improvements in their sight https://www.nature.com/articles/d41586-024-03656-z
A ‘Crazy’ Idea for Treating Autoimmune Diseases Might Actually Work https://www.theatlantic.com/health/archive/2024/11/lupus-car-t-immune-reset-autoimmune-disease/680521/ [no paywall: https://archive.is/keEBa]
Political forecasting:
Polling by asking people about their neighbors: When does this work? Should people be doing more of it? And the connection to that French dude who bet on Trump https://statmodeling.stat.columbia.edu/2024/11/09/polling-by-asking-people-about-their-neighbors-when-does-this-work/
“…one election—or even several national elections—does not supply enough data to distinguish the performance of probabilistic forecasts that are so similar to each other.” https://statmodeling.stat.columbia.edu/2024/11/10/prediction-markets-in-2024-and-poll-aggregation-in-2008/
Miscellaneous:
“For decades, astrophysicists have been saying dark matter can't be mostly black holes. But this may be wrong: new calculations suggest up to 100% of dark matter could be black holes, about as heavy as asteroids, that formed in the very early universe.” https://mathstodon.xyz/@johncarlosbaez/113454218813011168
China’s Libertarian Medical City? https://marginalrevolution.com/marginalrevolution/2024/11/chinas-libertarian-medical-city.html
Ukraine:
"A recent poll indicates that 74.2% of South Koreans oppose providing lethal weapons to Ukraine, while only 20.5% are in favour. Any suggestion of a more significant military commitment could deal a critical blow to the incumbent administration" https://rusi.org/explore-our-research/publications/commentary/south-korea-navigates-moscow-pyongyang-axis-amid-domestic-constraints
North Korean troops: “It is likely not going to be a one-time shipment of 10,000 soldiers,” he said. “It is more likely going to be a way to regularly pull in thousands, perhaps up to 15,000 men a month.” https://www.nytimes.com/2024/11/10/us/politics/russia-north-korea-troops-ukraine.html [no paywall: https://archive.is/NoUUo]
"Ukraine is struggling to replace battlefield losses with conscription, barely hitting two-thirds of its target. Russia, meanwhile, is replacing its losses by recruitment with lucrative contracts, without needing to revert to mass mobilisation. A senior Ukrainian military commander admits that there has been a collapse in morale in some of the worst sections of the front. A source in the general staff suggests that nearly a fifth of soldiers have gone AWOL from their positions." https://www.economist.com/europe/2024/11/07/why-volodymyr-zelensky-may-welcome-donald-trumps-victory [no paywall: https://archive.is/sdgiH]
The Russian military executed yet another injured and disarmed Ukrainian soldier. https://x.com/IAPonomarenko/status/1855320620357255631
“Well-known Russian war correspondent Alexander Kharchenko recounts with horror his experience traveling on occupied roads in Donetsk region, which shows the fierce resistance from the Ukrainian Armed Forces. The path is littered with dozens of burned “Bukhankas” and other vehicles, driving the Russian soldiers to the brink of madness. The chance of being killed by a drone approaches 100%, leading to a shift in their mindset toward something almost esoteric.” https://x.com/wartranslated/status/1855233488883990642
«During two days of assaults, 28 units of enemy equipment and more than 100 orks from the 810th brigade were destroyed and about a hundred more WIA» https://x.com/GloOouD/status/1855267145053053393
Kupyansk direction: Fields are littered with Russian casualties. Paratroopers of the 77th Separate Airmobile Brigade destroyed over a hundred Russian soldiers near the village of Kruhliakivka, Kharkiv region. https://x.com/NOELreports/status/1855539459560767934
Footage of the repelling of the Russian attack in the Kursk region. https://x.com/bayraktar_1love/status/1855627258955710663
“Putin’s crony bloggers are highlighting the catastrophic drop in birth rates in Russia. In 2014, almost 2 million babies were born; in 2023, that figure was nearly 700,000 lower, and it’s dropping even further in 2024.” https://x.com/wartranslated/status/1855578573995725053






Seems to me that people are increasingly skeptical that LLMs will lead to AGI.