Links for 2023-03-16
OpenAI co-founder on company’s past approach to openly sharing research: ‘We were wrong’ https://www.theverge.com/2023/3/15/23640180/openai-gpt-4-launch-closed-research-ilya-sutskever-interview
The New York Times: GPT-4 Is Exciting and Scary https://archive.is/UDPRi
“The GPT-4 safety card gets pretty wild in some places. Like how ARC used GPT-4 to simulate a rogue AI on the internet trying to replicate itself.” https://twitter.com/Algon_33/status/1635769668156768258
Why Not Just Outsource Alignment Research To An AI? https://www.lesswrong.com/posts/3gAccKDW6nRKFumpP/why-not-just-outsource-alignment-research-to-an-ai
“In this post, I experimentally probe the relationship between intelligence and coherence in animals, people, human organizations, and machine learning models. The results suggest that as entities become smarter, they tend to become less, rather than more, coherent. This suggests that superhuman pursuit of a misaligned goal is not a likely outcome of creating AGI.” https://www.lesswrong.com/posts/SQfcNuzPWscEj4X5E/the-hot-mess-theory-of-ai-misalignment-more-intelligent
“The ways in which these systems are way worse than human also scare me. Lack of long term memory. Lack of learning directly and efficiently from problem solving. Because they seem pretty solvable, and once you combine systems already superhuman in some ways with these features...” https://twitter.com/JeffLadish/status/1635898384707117056
High-throughput Generative Inference of Large Language Models with a Single GPU https://arxiv.org/abs/2303.06865
“We trained an optimized BERT model to match the results from the original paper in ~9 GPU hours for a cost of about $20…if you train for longer, you can get better accuracy than the original papers. You can even beat the original BERT-large with a BERT-base after a couple dozen GPU hours.” https://www.mosaicml.com/blog/mosaicbert
“The unreasonable effectiveness of few-shot learning for machine translation...with only 5 examples of high-quality translation data shown at inference, a transformer decoder-only model trained solely with self-supervised learning, is able to match specialized supervised state-of-the-art models as well as more general commercial translation systems...the resulting models are two orders of magnitude smaller than state-of-the-art language models.” https://arxiv.org/abs/2302.01398
“…we show that careful design of deep RNNs using standard signal propagation arguments can recover the impressive performance of deep SSMs on long-range reasoning tasks, while also matching their training speed.” https://arxiv.org/abs/2303.06349
Meet In the Middle (MIM): A New Pretraining Paradigm. MIM(2.7B) outperforms CodeGen 16B, Incoder 6.7B, PaLM 540B, LLaMA 65B, FIM 2.7B in Code generation tasks. https://arxiv.org/abs/2303.07295
GPT 4: Full Breakdown (14 Crazy Details You May Have Missed) - Last One is Extra Wild https://www.youtube.com/watch?v=2AdkSYWB6LY
“Here are some incredible things people are already doing with GPT-4” https://twitter.com/LinusEkenstam/status/1635754587775967233
GPT-4 takes Bryan Caplan's midterm and gets an A https://matthewbarnett.substack.com/p/gpt-4-takes-bryan-caplans-midterm
“After working for the past few months with key partners like @NotionHQ, @Quora, and @DuckDuckGo, we’ve been able to carefully test out our systems in the wild. We are now opening up access to Claude, our AI assistant, to power businesses at scale.” https://twitter.com/AnthropicAI/status/1635679544521920512
‘Let 1,000 Flowers Bloom’: A.I. Funding Frenzy Escalates [The New York Times] https://archive.is/CT2WT
Bumblebees Learn To Solve Puzzles by Watching Other Bees https://scitechdaily.com/bumblebees-learn-to-solve-puzzles-by-watching-other-bees/
First demonstration of hydrogen production from nuclear energy https://www.energy.gov/ne/articles/nine-mile-point-begins-clean-hydrogen-production
“We asked the British public which punishments should apply to a range of crimes. The results? Britain really is tough on crime. 48% say someone should go to jail for racist abuse on social media. 58% for bike theft. 67% for sexist abuse in person.” https://www.spectator.co.uk/article/voters-agree-with-lee-anderson-about-cracking-down-on-crime/
I wish all the smart people who are busy trying to trip up the latest AI model with trick questions would at least once try to do the same with ordinary people outside their academic filter bubbles.
Seriously, go outside and talk to normal people. Ask them to maximize the sum of the digits on a 24-hour clock.
Matthew Barnett:
2024: no AI will get a perfect score on the Putnam exam
2025: no AI will pass a long, informed, adversarial Turing test
2028: no team of humanoid robots will win the RoboCup challenge
Christian Szegedy:
2024: solve some famous mathematical conjectures. (Too soon, IMO)
2028: write 1000s of lines of coherent code for some complex real world task specified in natural language, in a large, real-life code base, eg a whole app for a new fitness watch. (My estimate: 2032).
cory.eth:
A human-sized robot cannot sew clothing or knit at 4.5-star Etsy review level by 2028
michael vassar:
Write a mystery novel which half of readers can solve and half can’t by April 2024
Level 5 self driving for April 2025
Move a humanoid robot with human-like precision and flexibility of action by April 2028.
ACT | Gail Det'wish:
End-to-end, with no human "driver":
2024: Generate a 60-minute movie with a coherent plot (no obvious inconsistencies, follows a narrative thread to an end)
2025: Fabricate an identity that is elected to public office
2028: Correctly fetch and deliver all items on a grocery list
Jonathan Figdor:
Clean a hotel room including bathroom at a reasonable cost.
@unreprecords:
Basic repairs on an old decrepit house. Eg. adjust rusted/bent door hinges to stop the door from jamming. Ai will destroy civilization before it can manage that task unassisted.
@hao_dev_777:
Low tier gruntwork type jobs. "Whatever it is that some guy in a yellow vest is doing when he is reaching his arms down a sever manhole"---type jobs.
Vikram Shakti:
Successfully lead a root-cause investigation, including report authoring, of a hull crack event in a prototype UUV. I’d take this easily at 9-1 for 2028.
Scarlet Falcon:
2028: produce a feature film about the rise of a chess-boxing grandmaster who grew up as the son of a Manhattan lawyer, with novel chess games presented and no notable inconsistencies in object permanence throughout the city nor in physics (including rain, snow, and a tornado).
Alex Wilkins:
Make me laugh.
Josh Triplett:
1-year: given a git clone and a semantic description of a change without obvious precedent, generate a correct git PR 99% indistinguishable from expert human.