Links for 2024-08-02

Alexander Kruel

Aug 02, 2024

Google released an experimental updated version of Gemini 1.5 Pro that is #1 on the Chatbot Arena: “This model is a significant improvement over earlier versions of Gemini 1.5 Pro (it cracks into 1300+ elo score territory).” Try it here: https://aistudio.google.com/app/
Method prevents an AI model from being overconfident about wrong answers https://news.mit.edu/2024/thermometer-prevents-ai-model-overconfidence-about-wrong-answers-0731
“Sparse Autoencoders act like a microscope for AI internals. They're a powerful tool for interpretability, but training costs limit Research. Announcing Gemma Scope: An open suite of SAEs on every layer & sublayer of Gemma 2 2B & 9B!” https://deepmind.google/discover/blog/gemma-scope-helping-the-safety-community-shed-light-on-the-inner-workings-of-language-models/
Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning https://arxiv.org/abs/2407.20798
Open-World Exploration in Minecraft — Odyssey is a new framework that equips large language model-based agents with advanced skills for exploring Minecraft. https://github.com/zju-vipa/Odyssey?tab=readme-ov-file
Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge https://arxiv.org/abs/2407.19594
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling — “We scale inference compute through repeated sampling: we let models make hundreds or thousands of attempts when solving a problem, rather than just one. Notably, with DeepSeek-Coder-V2-Instruct and 250 attempts, we solve 56% of issues from SWE-bench Lite, outperforming the single-attempt SOTA of 43%.” https://scalyresearch.stanford.edu/pubs/large_language_monkeys/
“Which is better, running a 70B model once, or a 7B model 10 times? Our findings reveal that the repeated use of smaller models can yield consistent improvements, with gains of up to 15% across five tasks.” https://arxiv.org/abs/2404.00725
“Not Diamond sets new SOTA standards on major benchmarks like GPQA, Arena Hard, MMLU, and HumanEval. We achieve this by ensembling every other model into a meta-model that learns when to call each LLM. Routing opens a new frontier for the performance and generalizability of LLMs.” https://www.notdiamond.ai/
Claude Engineer — An advanced CLI that uses Anthropic's Claude 3 and 3.5 models to assist with software development tasks. https://github.com/Doriandarko/claude-engineer
LangGraph Studio: The first agent IDE https://www.youtube.com/watch?v=pLPJoFvq4_M
“We discover a systematic way to scale up robot data, tackling the most painful pain point in robotics. The idea is simple: human collects demonstration on a real robot, and we multiply that data 1000x or more in simulation.” https://x.com/DrJimFan/status/1818302152982343983
“By making programs differentiable, we inherently introduce probability distributions over their execution, providing a means to quantify the uncertainty associated with program outputs.” https://arxiv.org/abs/2403.14606
Alphabet invests another $5 billion in self-driving startup Waymo https://arstechnica.com/cars/2024/07/waymo-will-get-another-5-billion-investment-from-alphabet/
How AI is changing warfare: “Tamir Hayman, a general who led Israeli military intelligence until 2021, points to two big breakthroughs. The “fundamental leap”, he says, eight or nine years ago, was in speech-to-text software that enabled voice intercepts to be searched for keywords. The other was in computer vision. Project Spotter, in Britain’s defence ministry, is already using neural networks for the “automated detection and identification of objects” in satellite images, allowing places to be “automatically monitored 24/7 for changes in activity”.” https://www.economist.com/briefing/2024/06/20/how-ai-is-changing-warfare [no paywall: https://archive.is/yw8Yz]
New ‘game-changing’ discovery for light-driven artificial intelligence https://www.ox.ac.uk/news/2024-08-01-new-game-changing-discovery-light-driven-artificial-intelligence
Sam Altman: "I have no idea how we may one day generate revenue. We have made a soft promise to investors that once we've built this sort of generally intelligent system, basically, we will ask it to figure out a way to generate an investment return for you. It sounds like an episode of Silicon Valley. It really does. I get it, you can laugh. It's all right. But it is what I actually believe is going to happen." https://www.youtube.com/watch?v=gjQUCpeJG1Y

https://x.com/adcock_brett/status/1819191267785581049

https://x.com/emollick/status/1818464075484991587

https://x.com/XiXiDu/status/1819363564517290020

Health:

Bacteria 'melts' head and neck cancer in revolutionary discovery https://www.kcl.ac.uk/news/bacteria-melts-head-and-neck-cancer-in-revolutionary-discovery
One dose of a new nasal spray treatment clears toxic tau proteins from brain cells, improving memory. https://www.utmb.edu/news/article/utmb-news/2024/07/03/new-breakthrough-in-alzheimer-s-research--utmb-researchers-develop-nasal-spray-treatment-for-alzheimer-s-disease
Weight-loss drugs like Ozempic, Mounjaro, and Wegovy are causing people to spend less on groceries and choose healthier options. A new study shows that users buy 52% less snacks and confectionery, 47% less baked goods, and 28% less sugary drinks. https://nypost.com/2024/07/27/lifestyle/weight-loss-drugs-eat-into-grocery-basket/

Physics:

Is nature really as strange as quantum theory says - or are there simpler explanations? Neutron measurements at TU Wien prove: It doesn't work without the strange properties of quantum theory. https://www.tuwien.at/en/phy/ati/news/neutronen-auf-klassisch-unerklaerlichen-bahnen-1
New work suggests that when black holes die, they turn into white holes. That myriads of tiny white holes could be passing through the Earth at any time. And that these objects are an ideal candidate for the dark matter that cosmologists believe fills the universe but have never directly observed. https://arxiv.org/abs/2407.09584

Miscellaneous:

Space is a latent sequence: A theory of the hippocampus https://www.science.org/doi/10.1126/sciadv.adm8470
Probability is just...really weird https://www.youtube.com/watch?v=zczGnnM05TQ
How computers work: "These videos explain how computers work from scratch. Starting from the basics we build every component step by step. With the help of animations we build the Scott's CPU. Scott's CPU is a 8 bit CPU perfect for educational purpose and for understanding the inner working of a computer. Let me lead you in this journey." https://www.youtube.com/playlist?list=PLnAxReCloSeTJc8ZGogzjtCtXl_eE6yzA
List of biotech founders and drug hunters who were unlikely to succeed (and yet they did) https://www.ladanuzhna.xyz/writing/list-of-biotech-founders
Romae Industriae: What were the binding constraints on a Roman Industrial Revolution? https://www.maximum-progress.com/p/romae-industriae

Politics:

Iran's Supreme Leader Ayatollah Khamenei, at an emergency meeting of Iran’s Supreme National Security Council, ordered a direct attack on Israel in response to HAMAS political leader Haniyeh's killing. https://www.nytimes.com/2024/07/31/world/middleeast/iran-orders-attack-israel.html [no pay: https://archive.is/TwlXu]
The Senate's version of the 2025 NDAA doesn't include the 'Countering CCP Drones Act,' which would have banned DJI drone sales in the U.S. This decision came after opposition from over 6,000 public safety agencies and hundreds of thousands of drone pilots. https://www.tomshardware.com/tech-industry/dji-drone-ban-dropped-by-the-us-senate-the-senate-draft-of-2025-ndaa-does-not-include-the-countering-ccp-drones-act-that-would-kill-dji-business-in-america

Ukraine:

“Moscow has taken 1,246 square kilometers (481 square miles) of Ukrainian territory since the start of the year, well above the 584 square kilometers seized over the entirety of 2023.” https://www.barrons.com/news/russia-seizes-almost-200-km2-of-ukraine-in-july-afp-count-a221e70b
“Russian forces continue their advance towards Pokrovsk. The map shows Russian forces are less than 17km from the city. Deep State says the situation is critical and continues to deteriorate. They says Russia has significantly increased its use of UMPK glide bombs in the direction over the past few days, and is using motorcycles to scout Ukrainian lines.” https://x.com/RALee85/status/1819123258446549204
“The elephant in the room: Ukraine's delayed and insufficient mobilization is the primary cause of Ukraine's current challenges at the front.” https://x.com/joni_askola/status/1819298725996429758
“Georgian commander Irakli Kurtsikidze, despite having a severe contusion provided medical aid to his fighters before the arrival of the evacuation team. He one by one dragged his comrades to the planned evacuation point and comforted them in 3 different languages.” https://x.com/Bodbe6/status/1818684355213987941

1 Comment

M Flood

A Flood of Ideas

Aug 3

I really, really hope that Health 2 is the Alzheimer's disease breakthrough we have been waiting for. And I still do, but I lost a lot of hope when I read the word "mouse model." There are good models of human diseases created in transgenic mice - understanding of disease from these models have led to life saving, suffering reducing breakthroughs. But mouse models of Alzheimer's Disease show every sign of being utterly worthless.

Derek Lowe - Just How Worthless Are The Standard Alzheimer's Models? (https://www.science.org/content/blog-post/just-how-worthless-are-standard-alzheimer-s-models)

Lowe's post is 8 years old, but unless something has greatly changed in the transgenic lab mice since then, I'd save your pharma investment dollars for at least Stage III drug trials.

And spend some quality time with your Alzheimer's afflicted relations.

Expand full comment

Axis of Ordinary