“Our new Sparse Universal Transformer is both parameter-efficient and computation-efficient compared to the Transformer, and it's better at compositional generalization!” https://arxiv.org/abs/2310.07096
undefined subscriptions will be displayed on your profile (edit)
Skip for now
For your security, we need to re-authenticate you.
Click the link we sent to , or click here to sign in.
Share this post
Links for 2023-10-15
Share this post
“Our new Sparse Universal Transformer is both parameter-efficient and computation-efficient compared to the Transformer, and it's better at compositional generalization!” https://arxiv.org/abs/2310.07096