Links for 2024-07-07
AI:
AI Mathematical Olympiad: It appears that the winning program correctly answered 29/50 of the private test questions. — “Maybe what's even more impressive about this competition, beside the level of math these models are already capable of is how ressource contraint the participants were actually, having to run inference in a short amont of time on T4 which only let us imagine how powerful these models will become in the coming months.” https://x.com/Thom_Wolf/status/1809895886899585164
Learning Formal Mathematics From Intrinsic Motivation https://arxiv.org/abs/2407.00695
“This means the relationship between changes in underlying model capabilities and changes in real world impact can be unintuitive. If stepwise accuracy goes from 99% to 99.99%, a 200 step task goes from failing most of the time to succeeding almost always” https://x.com/RatOrthodox/status/1809055334536786130 (Paper: Rethinking AI agent benchmarking and evaluation https://www.aisnakeoil.com/p/new-paper-ai-agents-that-matter)
OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents https://omnijarvis.github.io/
Introducing ReSearch: An iterative self-reflection algorithm that enhances LLM's self-restraint abilities. Encouraging abstention when uncertain. Producing accurate, informative content when confident. https://arxiv.org/abs/2405.13022
Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning https://arxiv.org/abs/2309.10814
Diffusion Forcing combines the strength of full-sequence diffusion models and next-token models, acting as either or a mix at sampling time for different applications without retraining. https://boyuan.space/diffusion-forcing/
“This is an exciting time, when machine learning is enabling new, large-scale tests of longstanding questions in evolutionary science.” https://www.essex.ac.uk/news/2024/07/01/ai-powered-study-explores-under-researched-female-evolution
Improving retrieval with LLM-as-a-judge https://blog.vespa.ai/improving-retrieval-with-llm-as-a-judge/
“This is an interim report on reverse-engineering Othello-GPT, an 8-layer transformer trained to take sequences of Othello moves and predict legal moves. We find evidence that Othello-GPT learns to compute the board state using many independent decision rules that are localized to small parts of the board.” https://www.lesswrong.com/posts/gcpNuEZnxAPayaKBY/othellogpt-learned-a-bag-of-heuristics-1
Gradually, then Suddenly: What often matters is when technologies pass certain thresholds of capability. https://www.oneusefulthing.org/p/gradually-then-suddenly-upon-the
Musings on LLM Scale https://www.lesswrong.com/posts/T3tDQfkAjFsScHL3C/musings-on-llm-scale-jul-2024
“…when attempting to solve coding problems that existed on LeetCode before 2021…ChatGPT’s ability to produce functional code for “easy” coding problems dropped from 89 percent to 52 percent…And its ability to generate functional code for “hard” problems dropped from 40 percent to 0.66” https://spectrum.ieee.org/chatgpt-for-coding
llama.ttf is a font that embeds a (small) large language model. The font itself can do automatic text generation. https://fuglede.github.io/llama.ttf/
Engineering:
New Multi-Material “Laser” 3D Printer Can Create Complex Devices With Just a Single Machine https://engineering.missouri.edu/2024/no-assembly-required/
Desalinating Water Is Becoming “Absurdly Cheap” https://humanprogress.org/desalinating-water-is-becoming-absurdly-cheap/
Open-TeleVision: Teleoperation with Immersive Active Visual Feedback https://robot-tv.github.io/
“Britain should reclaim an area the size of Wales from Dogger Bank, the area of the North Sea where the sea is only 15-40m deep. We could do it for less than £100bn.” https://model-thinking.com/p/a-new-atlantis
“Queen Mary's Dolls' House is a doll's house built in the early 1920s, completed in 1924, for Queen Mary, the wife of King George V. It was designed by architect Sir Edwin Lutyens, with contributions from many notable artists and craftsmen of the period, including a library of miniature books containing original stories written by authors including Sir Arthur Conan Doyle and A. A. Milne.” https://en.m.wikipedia.org/wiki/Queen_Mary%27s_Dolls%27_House
Cosmology:
Evidence grows for deconfined quark matter in neutron-star cores https://physicsworld.com/a/evidence-grows-for-deconfined-quark-matter-in-neutron-star-cores/
The forgotten priest who predicted black holes – in 1783 https://www.bbc.com/future/article/20240626-the-priest-who-predicted-black-holes-in-1783
Intelligence:
“Our results imply that being genetically predisposed to be smarter causes left-wing beliefs.” https://www.sciencedirect.com/science/article/abs/pii/S0160289624000254
“Richard Lynn published a posthumous papers on race differences in schizophrenia. The worldwide pattern in races living in Western countries appear to follow their relative intelligence levels. Highest in Blacks, elevated in various other groups, Amerindians/Hispanics, Aboriginies, Maori, MENAP and so on. East Asians seem to be slightly lower, but little data.” https://x.com/KirkegaardEmil/status/1809021899130687541
Science:
“Aristotle was the first to notice honeybees dancing. In 1927 Karl von Frisch decoded the waggle. How it works was "explained" by MV Srinivasan AM FRS in the 1990s. Except Laura Luebbert found his papers are junk.” https://arxiv.org/abs/2405.12998
“…we show that inattentionally blind participants can successfully report the location, color and shape of the stimuli they deny noticing.” https://www.biorxiv.org/content/10.1101/2024.05.18.593967v1
Cave art study shows that visual storytelling is at least 51,000 years old https://www.nature.com/articles/s41586-024-07541-7
“The Ayta Magbukon in particular were found to possess the highest level of Denisovan ancestry in the world (between 3-9%), which is about ~30%–40% higher than the amount observed among Australo-Papuans, suggesting that distinct Islander Denisovan populations existed in the Philippines, which admixed with modern humans after their arrival.” https://en.wikipedia.org/wiki/Aeta_people
Miscellaneous:
BB(5) is now known to equal 47176870, thanks to a collaboratively-made Coq proof that decides the halting problem for all 5-state Turing machines by case analysis of ~180 million equivalence classes, which `coqc` can check in ~10 hours of wall-clock time. https://www.quantamagazine.org/amateur-mathematicians-find-fifth-busy-beaver-turing-machine-20240702/ (See also: “Basically, if and when artificial superintelligences take over the world, they can worry about the value of BB(6). And then God can worry about the value of BB(7).” https://scottaaronson.blog/?p=8088)
Stealing steel from sunken WWII warships all over the Pacific, which are considered tombs. They do it for the quality of the steel (pre-nuclear testing…valuable for medical applications) https://www.military.com/history/thieves-are-stripping-sunken-world-war-ii-shipwrecks-of-their-valuable-steel.html
Ukraine:
"There's no one left to finish off there." An episode of an enemy assault on Terny was repelled using strike drones and artillery before the enemy could approach infantry positions. https://x.com/NOELreports/status/1808564182058484060
Donetsk Oblast, a Russian mechanized assault force burns after a failed assault on the town of Hlyboke. https://x.com/Osinttechnical/status/1808918871367430283
Russian logistics at Hlyboke wiped out https://x.com/moklasen/status/1809636908470808745
A large ammunition depot was hit by Ukrainian drones in the Voronezh region of Russia last night and is still burning and exploding. https://x.com/bayraktar_1love/status/1809829329955619128
‘Those attacking are normally quickly spotted by drones above and the Russians leave their dead and wounded on the battlefield, Lt Col Bayev says. “Their main task is simply meat assaults and our total exhaustion.”’ https://www.bbc.com/news/articles/c80xjne8ryxo
Russia’s elite marines sent into death trap in Ukraine https://www.youtube.com/watch?v=RO_dPIYzbkE
How many Russian soldiers have been killed in Ukraine? https://www.economist.com/graphic-detail/2024/07/05/how-many-russian-soldiers-have-been-killed-in-ukraine [no paywall: https://archive.is/K8se7]
Russian war correspondent Romanov was on sight when an Ukrainian UAV hit a Russian oil refinery. https://x.com/Tendar/status/1809521523939532803
“As of the end of June, 🇷🇺 occupies a total of 17.57% (+0.01%) of Ukraine, including Crimea and the areas of Donetsk and Luhansk occupied before 2022. This represents a net gain by 🇷🇺 of approximately 50km². This is the 8th consecutive month of net 🇷🇺 advances.” https://x.com/War_Mapper/status/1808591988884983868
Russia's Looming Serious Tank Shortage - Tank Count Using Latest Bought Satellite Imagery https://www.youtube.com/watch?v=xWCEZUQtUwE
TRUE Reasons for Russia's INCOMPETENCE https://www.youtube.com/watch?v=wRVGNcLVn6g

AI 13 is weird: "To explore these limitations in more detail, his team sought to test GPT-3.5’s ability to address 728 coding problems from the LeetCode testing platform in five programming languages: C, C++, Java, JavaScript, and Python."
Why did they test GPT-3.5 rather than -4 or -4o? I understand academic publishing can take a long time, but that's ridiculous. It's like saying AlexNet isn't super at recognizing and comprehending 2022 pandemic memes.