Post
47
Are you familiar with reverse residual connections or looping in language models?
Excited to share my Looped-GPT blog post and codebase ๐
https://github.com/sanyalsunny111/Looped-GPT
TL;DR: looping during pre-training improves generalization.
Plot shows GPT2 LMs pre-trained with 15.73B OWT tokens
P.S. This is my first post here โ I have ~4 followers and zero expectations for reach ๐
Excited to share my Looped-GPT blog post and codebase ๐
https://github.com/sanyalsunny111/Looped-GPT
TL;DR: looping during pre-training improves generalization.
Plot shows GPT2 LMs pre-trained with 15.73B OWT tokens
P.S. This is my first post here โ I have ~4 followers and zero expectations for reach ๐