Links for 2024-06-28
AI:
Finding GPT-4’s mistakes with GPT-4 — CriticGPT, a model based on GPT-4, writes critiques of ChatGPT responses to help human trainers spot mistakes during RLHF https://openai.com/index/finding-gpt4s-mistakes-with-gpt-4/
Meta Large Language Model Compiler: Foundation Models of Compiler Optimization — LLM Compiler achieves state-of-the-art results on code size optimization and disassembly. This work shows that AI is learning to optimize code and can assist compiler experts in identifying opportunities to apply optimizations. https://ai.meta.com/research/publications/meta-large-language-model-compiler-foundation-models-of-compiler-optimization/
“LangGraph helps you build reliable agents that actually work. Today, we've launched LangGraph Cloud, our new infrastructure to run fault-tolerant LangGraph agents at scale.” https://www.youtube.com/watch?v=l4sMKF1dTDM
LLM-based validators that *automatically improve* in response to human feedback. https://www.youtube.com/watch?v=3gCTa0Li4ew
ICAL: Continual Learning of Multimodal Agents by Transforming Trajectories into Actionable Insights https://ical-learning.github.io/
Dreamitate: Real-World Visuomotor Policy Learning via Video Generation https://dreamitate.cs.columbia.edu/
Can LLMs truly reason over loooong context? NoCha asks LLMs to verify claims about *NEW* fictional books. LLMs that solve needle-in-the-haystack (~100%) struggle on NoCha! None of 11 tested LLMs reach human performance → 97%. The best, #GPT-4o, gets only 55.8%. https://novelchallenge.github.io/
Flow of Reasoning: Efficient Training of LLM Policy with Divergent Thinking https://yu-fangxu.github.io/FoR.github.io/
600% Boost: Scientists Develop Game-Changing AI Chip With Impressive Energy Efficiency https://today.oregonstate.edu/news/new-computer-chips-show-promise-reducing-energy-footprint-artificial-intelligence
Study Finds Self-Driving Cars Are Actually Safer Than Humans in Many (But Not All) Situations https://singularityhub.com/2024/06/24/study-finds-self-driving-cars-are-actually-safer-than-humans-in-most-situations/
The A.I. Boom Has an Unlikely Early Winner: Wonky Consultants https://www.nytimes.com/2024/06/26/technology/ai-consultants.html [no paywall: https://archive.is/BAjal]
“We found that 94% of our AI submissions were undetected. The grades awarded to our AI submissions were on average half a grade boundary higher than that achieved by real students.” https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0305354
Interview with Carl Shulman on the economy and national security after AGI: • why economists get AGI mostly wrong • how output might double in 3 months • how incomes could grow 100x or more • the major risks created by military pressure to move fast https://80000hours.org/podcast/episodes/carl-shulman-economy-agi/
Amazon is reportedly working on its own AI chatbot that might be smarter than ChatGPT https://www.techradar.com/computing/artificial-intelligence/amazon-is-reportedly-working-on-its-own-ai-chatbot-that-might-be-smarter-than-chatgpt
Protegenneurotechnoengineering:
Protein design for growing semiconductors https://www.biorxiv.org/content/10.1101/2024.06.24.600095v1
Programmable RNA-guided enzymes for next-generation genome editing. https://arcinstitute.org/news/blog/bridge (Summary by Claude: Genome Design: The Bridge to Our Biological Future https://x.com/patrickc/status/1805996143228375263)
“esmGFP isn't groundbreaking on its own, but that's not really the point. It's more of a proof of concept for the model behind it. What's actually impressive is the all-in-one approach that ESM3 took — combining structure, sequence, and function in a model with 98 billion parameters. And Evolutionary Scale did this with only about 15 employees, which is insane! The clinical and research impact remains to be seen, but there's definite potential here for massively speeding up existing workflows.” https://www.abhishaike.com/p/a-primer-on-gfp-and-esmgfp
Rat Neurons Repair Mouse Brains That Lack a Sense of Smell https://sitn.hms.harvard.edu/flash/2024/researchers-create-interspecies-brain-chimeras-from-mice-and-rats/
Detecting Genetically Engineered Viruses With Metagenomic Sequencing https://www.lesswrong.com/posts/iaPhjYhhp7PP6BWp9/detecting-genetically-engineered-viruses-with-metagenomic
Scientists use computational modeling to guide a difficult chemical synthesis https://news.mit.edu/2024/scientists-use-computational-modeling-for-difficult-chemical-synthesis-0627
Incredible New Technique Measures Forces As Small as a Virus With Unprecedented Precision https://phys.org/news/2024-06-advances-nanoscale-doors-unprecedented-biological.html
Miscellaneous:
Ancient Sanskrit carved in stone in Egypt! Over 2000 years old! https://www.smithsonianmag.com/history/hidden-ancient-egyptian-port-reveals-180984485/
JWST’s ‘Little Red Dots’ Offer Astronomers the Universe’s Weirdest Puzzle https://www.scientificamerican.com/article/jwsts-little-red-dots-offer-astronomers-the-universes-weirdest-puzzle/ [read without registration: https://archive.is/GHAi0]
Spaced repetition for teaching two-year olds how to read https://chrislakin.blog/p/spaced-repetition-for-teaching-two
"Among 14, 545 households in rural Burkina Faso … women who were given free access to medical contraception for three years did not have lower birth rates; we can reject even modest effects." [PDF] https://www.nber.org/system/files/working_papers/w32427/w32427.pdf
It’s over for Biden:
Ukraine:
“Russian military blogger Kirill Fedorov published an interview with a Russian military man, who told how Ukrainians in the occupied territories are kidnapped and tortured for using Ukrainian language” https://x.com/den_kazansky/status/1805541835294597520
“Russian senator, former ambassador to NATO and former head of the Russian space agency Rogozin boasts of burning Ukrainian books and openly calls for genocide.” https://x.com/yarotrof/status/1806378392192770221


I would like to see the AI 7 paper in the NoCHA dataset rerun using advanced prompting (CoT) and multiagent techniques (actor-critic, for example)