Links for 2024-04-04

Apr 04, 2024

Google presents Mixture-of-Depths: Dynamically allocating compute in transformer-based language models — “Not only do models trained in this way learn to dynamically allocate compute, they do so efficiently. These models match baseline performance for equivalent FLOPS and wall-clock times to train, but require a fraction of the FLOPs per forward pass, and can be upwards of 50\% faster to step during post-training sampling.” https://arxiv.org/abs/2404.02258
“This paper demonstrates how LLMs can be used to create an extremely effective system to resolve references of various types, by showing how reference resolution can be converted into a language modeling problem, despite involving forms of entities like those on screen that are not traditionally conducive to being reduced to a text-only modality…our smallest model achieving performance comparable to that of GPT-4, and our larger models substantially outperforming it.” https://arxiv.org/abs/2403.20329
Gecko: Versatile Text Embeddings Distilled from Large Language Models — “Gecko with 768 embedding dimensions achieves an average score of 66.31, competing with 7x larger models and 5x higher dimensional embeddings.” https://arxiv.org/abs/2403.20327
“Firms such as OpenAI and Anthropic are working to find enough information to train next-generation artificial-intelligence models” https://www.wsj.com/tech/ai/ai-training-data-synthetic-openai-anthropic-9230f8d8 [no paywall: https://archive.is/R06ay]
Comment by Vladimir Nesov on there being no data bottleneck: https://www.lesswrong.com/posts/xahmJmH6BtqzPP3jD/self-play-by-analogy?commentId=LtRhbWyPiWqv7PWjs

There is no data bottleneck (for data that's not necessarily high quality), because data can be repeated in training, about 4 times without much difference compared to unique data, up to about 16 times while still significantly improving the model. This was notably used in Galactica (see Figure 6), published Nov 2022, then there was the systematic study of scaling laws for repeated data from May 2023, recently repeated data was applied in StarCoder 2 (Feb 2024).
A Chinchilla optimal model uses a model size proportional to dataset size, meaning compute is proportional to data squared. If you repeat data 16 times, this means finding a use for 256 times more compute. A filtered and deduplicated CommonCrawl text dataset RedPajama-Data-v2 has 30 trillion tokens. If repeated 16 times with a Chinchilla optimal monolithic Transformer, it would use about 7e28 FLOPs of compute. This scales with data squared, if there is more data to be found, which there certainly is, even if not OOMs more. Assuming BF16 training with 30% utilization, this would require 3.2e10 H100-hours, which assuming $2/hour takes about $65 billion. Anchoring to the rumored 2e25 FLOPs GPT-4 run at $100 million instead, this gives $350 billion. Both numbers are likely currently outside commercial feasibility, if smaller models fail to demonstrate sufficiently impressive feats. And there's still that further quadratic scaling of needed compute with more data than 30 trillion tokens. (Though Microscaling in Blackwell might reduce the cost of effective compute more than otherwise could be expected this soon.)

One more AI link: AI-generated sad girl with piano performs the text of the MIT License https://x.com/goodside/status/1775713487529922702 (Created with https://suno.ai/)

If Suno v3 is this good, imagine v10. We will finally learn what music composed by superhuman beings sounds like. We will hear angels singing in our lifetime.

Miscellaneous:

The first patient to receive a kidney transplanted from a genetically modified pig has fared so well that he has been discharged from the hospital on Wednesday, just two weeks after the groundbreaking surgery. https://www.nytimes.com/2024/04/03/health/pig-kidney-transplant-slayman.html [no paywall: https://archive.is/2psXh]
And yet quantum computing continues to progress: “In December we saw QuEra announce a small net gain from error-detection in neutral atoms, and accuracy that increased with the use of larger error-correcting codes. Today, a collaboration between Microsoft and Quantinuum has announced what might be the first demonstration of error-corrected two-qubit entangling gates with substantially lower error than the same gates applied to the bare physical qubits.” https://scottaaronson.blog/?p=7916
Radios, how do they work? A brief introduction to antennas, superheterodyne receivers, and signal modulation schemes. https://lcamtuf.substack.com/p/radios-how-do-they-work

Where AirBnBs are fully booked next week: https://x.com/mikesimonsen/status/1775683012598079639

Cryptography:

A startup has released open source libraries for fully homomorphic encryption. Homomorphic encryption is a set of codes and protocols for computing with encrypted data without first decrypting the data. https://github.com/zama-ai
Summary of the state of post-quantum cryptography. https://bughunters.google.com/blog/5108747984306176/google-s-threat-model-for-post-quantum-cryptography
“Microsoft developed 46 hypotheses to investigate, including some scenarios [of] adversary possessing a theoretical quantum computing capability to break public-key cryptography” [PDF] https://www.cisa.gov/sites/default/files/2024-04/CSRB_Review_of_the_Summer_2023_MEO_Intrusion_Final_508c.pdf

Politics:

Despite the pandemic's initial hit, the United States is witnessing a roaring 2020s with record highs in net worth, stock market, and housing prices. https://awealthofcommonsense.com/2024/03/are-we-living-in-the-roaring-20s/

Ukraine:

“Ukraine is at great risk of its front lines collapsing…There’s nothing that can help Ukraine now because there are no serious technologies able to compensate Ukraine for the large mass of troops Russia is likely to hurl at us. We don’t have those technologies, and the West doesn’t have them as well in sufficient numbers…” https://www.politico.eu/article/ukraine-great-risk-front-line-collapse-war-russia/
Russia preparing to mobilize additional 300,000 troops by June, Kyiv says https://kyivindependent.com/ukraine-war-latest-russia-preparing-to-mobilize-additional-300-000-troops-by-june-kyiv-says/
“Russia has updated its Iranian munitions so that they now fly faster (up to 300km/h), higher, and with new wing coating that makes shooting them down much more difficult.” https://x.com/olliecarroll/status/1775762666448990434
Another day of what must be unsustainable losses for Russia https://x.com/AndrewPerpetua/status/1775810525546303654
“Video of UGVs from Ukraine’s 63rd Mechanized Brigade dropping mines/munitions near Russian positions before detonating them remotely.” https://x.com/RALee85/status/1775759317011963969
“The Russian T-72B3 tank, model 2022 or even 2023, came under attack from drones. The electronic warfare system, apparently a Harpy, did not help him at all. Most likely jamming the wrong frequencies” https://x.com/ian_matveev/status/1775825389555753051

Axis of Ordinary

Links for 2024-04-04

AI:

Miscellaneous:

Cryptography:

Politics:

Ukraine: