A prerequisite to use a pre-trained model as is, without fine tuning — TL;DR Self-supervised learning is being leveraged off at scale using transformers, not only for text, but lately also for images(CLIP, ALIGN), to solve traditionally supervised tasks (e.g. classification), either as is, or with subsequent fine tuning. …