Links for 2024-10-04

Oct 04, 2024

AI:

Reinforcement Learning with Execution Feedback (RLEF).

Training LLMs to use inference-time feedback using large scale RL. Makes even the 8B Llama3.1 beat GPT-4 on CodeContests, and SOTA with the 70B.

Paper: https://arxiv.org/abs/2410.02089

Author summary:

LLMs for code should do much better if they can iterate on tests -- but they don't. Our new work (RLEF) addresses this with execution feedback at RL *training time* to use execution feedback at *inference time*.
Notably, RLEF models are very sample efficient for inference. Competitive programming questions are often approached by sampling a large number of candidate programs; we can reach SOTA with just up to 3 samples.

VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment https://arxiv.org/abs/2410.01679
The Perfect Blend: Redefining RLHF with Mixture of Judges — CGPO consistently outperforms state-of-the-art RLHF algorithms like PPO and DPO across various tasks including general chat, STEM, instruction following, math, coding, and knowledge. https://arxiv.org/abs/2409.20370
Meta just entered the movie business with Meta Movie Gen: A Cast of SotA Media Foundation Models. Text-to-video, text-to-image, image-to-video, precise video editing, sound effects and background music, all synced. These models can reason about object motion, subject-object interactions, and camera motion, and they can learn plausible motions for a wide variety of concepts. Meta outperforms prior state-of-the-art, including commercial systems such as Runway Gen3, LumaLabs, and OpenAI Sora. https://ai.meta.com/research/movie-gen/
Apple just released Depth Pro: Sharp Monocular Metric Depth in Less Than a Second https://arxiv.org/abs/2410.02073
AI-generated images can teach robots how to act https://arxiv.org/abs/2407.07875
BindCraft: one-shot design of functional protein binders https://www.biorxiv.org/content/10.1101/2024.09.30.615802v1
How AI is improving simulations with smarter sampling techniques https://news.mit.edu/2024/how-ai-improving-simulations-smarter-sampling-techniques-1002
OpenAI's Ilge Akkaya says the new o1 model series is "trained to think" and this new paradigm can brainstorm about problems, for example with doctors to generate novel ideas for cancer research https://www.youtube.com/watch?v=jPluSXJpdrA
Sam Altman: "if we can make an AI system that is materially better than all of OpenAI at doing AI research, that does feel like an important discontinuity... the model is going to get so good so fast... plan for the model to get rapidly smarter" https://www.youtube.com/live/-cq3O4t0qQc?si=m0gpVob2O9ORDpiI&t=781
The U.S. Department of Commerce announced there will be an open competition for $100 million in funding for AI autonomous experimentation into semiconductor materials. https://www.commerce.gov/news/press-releases/2024/10/biden-harris-administration-invest-100-million-accelerate-rd-and-ai
OpenAI raised $6.6 billion from investors, making the ChatGPT maker one of the world’s most valuable private companies. https://openai.com/index/scale-the-benefits-of-ai/
OpenAI asks investors to avoid five AI startups including Sutskever's SSI, sources say https://www.reuters.com/technology/openai-tells-investor-not-invest-five-ai-startups-including-sutskevers-ssi-2024-10-02/ [no paywall: https://archive.is/6LpaA]
NVIDIA CEO Jensen Huang says a trillion dollars is being spent on data centers to enable the next, biggest wave of AI to revolutionize business productivity https://www.youtube.com/watch?v=FGr2BvqQn9o
NVIDIA CEO Jensen Huang says their next-generation GPU Blackwell is in full production and demand is "insane" as they move to a new AI chip generation with 2-3x performance gains every year https://x.com/tsarnick/status/1842012753160249372
‘In awe’: scientists impressed by latest ChatGPT model o1 https://www.nature.com/articles/d41586-024-03169-9
“ChatGPT's new canvas interface is a game changer. Just used it to create a tesseract/hypercube visualizer with ThreeJS.” https://x.com/bilawalsidhu/status/1841906953083068452
Google Is Working on Reasoning AI, Chasing OpenAI’s Efforts https://finance.yahoo.com/news/google-working-reasoning-ai-chasing-110027962.html
Tim Brooks, the Sora research lead at OpenAI, is joining DeepMind to work on video generation and world simulators. https://x.com/_tim_brooks/status/1841982327431561528
Man tricks OpenAI’s voice bot into duet of The Beatles’ “Eleanor Rigby” https://arstechnica.com/information-technology/2024/09/man-tricks-openais-voice-bot-into-duet-of-the-beatles-eleanor-rigby/
Tyler Cowen asked ChatGPT o1-preview about the fiscal theory of the price level https://marginalrevolution.com/marginalrevolution/2024/10/ask-chatgpt-o1-preview-fiscal-theory-of-the-price-level-edition.html
2 hour podcast created with the help of various AI tools https://x.com/karpathy/status/1841594123381571863
o1-engineer: A command-line tool designed to assist developers in managing and interacting with their projects efficiently. https://github.com/Doriandarko/o1-engineer
Hidden traces of humanity: what AI images reveal about our world https://www.nplusonemag.com/issue-48/essays/eat-poop-you-cat/

https://x.com/adcock_brett/status/1842062876736999565

AI-Human Collaboration:

Mathematicians used the language models Claude-3.5-Sonnet, Gemini-1.5-pro, GPT-4o, and the reasoning model o1-mini to collaborate on a paper on network information flows and lattice theory. AI-assisted in the initial conjectures, some proofs, and most applications.

In summary, although many incorrect proofs were generated, Claude-3.5/GPT-4o conjectured a new theorem, while o1-mini came up with an entirely new, clever, correct proof, more elegant than a human proof.

Thread: https://x.com/robertghrist/status/1841462507543949581
Paper: https://arxiv.org/abs/2410.00315

AI Risks:

“A Narrow Path is our best attempt at charting a course through the filter of machine intelligence.” https://www.narrowpath.co/
Scott Aaronson: “I am not and have never been a Yudkowskyan … but still, given the empirical shock of the past four years, I’m now firmly, 100% in the camp that we need to approach AI with humility for the magnitude of civilizational transition that’s about to occur, and for our massive error bars about what exactly that transition will entail. We can’t just “leave it to the free market” any more than we could’ve left the development of thermonuclear weapons to the free market.” https://scottaaronson.blog/?p=8367

NeuroTechnology:

Artificial intelligence and human expertise meet to generate a map of all the connections in the fly brain. The resource is already being used by experimentalists and theoreticians to further our understanding of neural circuits in the fly and beyond. https://www.nature.com/immersive/d42859-024-00053-4/index.html
Thermodynamic Bayesian Inference https://arxiv.org/abs/2410.01793
Single cortical neurons as deep artificial neural networks https://www.cell.com/neuron/fulltext/S0896-6273(21)00501-8
Meta CTO Andrew Bosworth says smart glasses will replace TVs in years not decades and learning to use wrist-based neural interfaces will enable devices to be controlled with your mind https://www.youtube.com/watch?v=iYYGwINT590

Miscellaneous:

Negative Time is Real, Physicists Confirm. Kind Of. https://www.youtube.com/watch?v=ErLHm-1c6I4
“Given that many literal eugenic practices (like sperm selection or selection based on preimplantation genetic diagnosis) are lawful, the blanket ban on “eugenics” can only mean something else is meant…” https://www.craigwilly.com/p/jennifer-doudna-on-eugenics

Politics:

8 Scientists, a Billion Dollars, and the Moonshot Agency ARIA Trying to Make Britain Great Again https://www.wired.com/story/aria-moonshot-darpa-uk-britain-great-again/ [no paywall: https://archive.is/dFQkR]
“The 2008 Russia-Georgia war is one of the most misrepresented events in post-Soviet history. Today, Georgian Dream (GD) is twisting facts about the war to serve their political agenda ahead of the parliamentary elections. We need to get the facts straight.” https://x.com/terjehelland/status/1841911226726506595

Matt Yglesias on the fundamental problem with nuclear regulation: https://www.slowboring.com/p/noah-smith-is-too-down-on-nuclear

Suppose I had a design for a cost-effective nuclear reactor, and I said I should be allowed to build it, because electricity is good and air pollution is bad. The regulator is going to look at it and say, “Well, that reactor seems awfully cheap to build, why not add a bunch more features to make the radiation levels even lower?” And then I will say, “That would be hideously expensive in a way that is net bad for public health, because it leads to more burning of fossil fuels and worse air pollution.” But the regulator comes back and says, “We’re not using a cost-benefit framework, we’re using ALARA.” And I say, “That doesn’t make sense, coal ash is radioactive — you are creating more radiation by raising my costs.” And the regulator says, “I don’t regulate coal plants, I regulate you — ALARA!”

https://x.com/mmjukic/status/1841544332517769559

https://x.com/ryanburge/status/1841834053823066573

Israel:

Iran managed to hit several Israeli Air Force bases with ballistic missiles. Breathtaking and revealing videos from Amman show outgoing Arrow 3 kill vehicles followed by incoming Iranian ballistic missiles.

The first satellite image of part of the Nevatim air base shows a damaged hangar and other impacts. A later satellite image of the full base revealed 32 impact points at Nevatim, with a small degree of clustering. They landed multiple hits in the area of F-35 hangers, with one possible direct hit, but not much damage.

Check also this round-up of verified Iran ballistic missile strike videos.

Meanwhile, the Israeli Air Force struck targets in Syria in the vicinity of the Russian Khmeimim airbase. Another massive Israeli strike happened in Beirut against Hezbollah senior leader Hashem Safi a-Din - who was Hassan Nasrallah's likely successor.

Ukraine:

Yet another Russian war crime https://x.com/EuromaidanPR/status/1841918165913371008
“'They don’t spare people, and their men are forced to move through those paths. And in the last place where we were working, there’s a crossroads completely littered with bodies, and they keep coming, because they have orders,' he said. 'There’s already a mass of them. Everything is black with corpses.'” https://www.washingtonpost.com/world/2024/10/02/ukraine-russia-advance-pokrovsk-vuhledar/ [no paywall: https://archive.is/b8eFf]
The first promo video of the Unmanned Systems Forces of the Armed Forces of Ukraine. https://www.youtube.com/watch?v=2Ch5pOwuDlc
A bunker is not a fun place to be when a thermite equipped drone flies through the embrasure. https://x.com/OAlexanderDK/status/1841885474123645214
Undated video circulating on Russian TG channels today, purporting to show Russia hitting the Ukrainian town of Vovchansk with a ODAB-9000 bomb. https://x.com/Mike_Eckel/status/1841488958573973620
Vovchansk, Kharkiv region. Completely wiped off the map, mainly due to Russian glide bombs. https://x.com/RALee85/status/1841928120707776870
Video of a Russian ODAB-1500 strike on Vovchansk. https://x.com/RALee85/status/1841985973418352818
A recent Russian attack involving 17 MT-LB armored vehicles and troops on the Vovchansk Aggregate Plant was successfully repelled by Ukrainian forces. https://x.com/NOELreports/status/1842199692752384161
Russia is suspected of deliberately leaking chemical waste into a river, with deadly consequences for wildlife https://www.theguardian.com/world/2024/oct/01/ukraine-seim-river-poisoning-chernihiv-ecocide-

Axis of Ordinary

Discussion about this post

Ready for more?