Few-shot Adaptation Works with UnpredicTable Data
abstract
Prior work on language models (LMs) shows that training on a large number of diverse tasks improves few-shot learning (FSL) performance on new tasks. We take this to the extreme; automatically extracting 413;299 tasks from internet tables - orders of magnitude more than the next-largest public datasets. Finetuning on the resulting dataset leads to improved FSL performance on Natural Language Processing (NLP) tasks; but not proportionally to dataset scale. In fact; we find that narrow subsets of our dataset sometimes outperform more diverse datasets. For example; finetuning on software documentation from here raises FSL performance by a mean of +7.5% on 52 downstream tasks; which beats training on 40 human-curated NLP datasets (+6.7%). Finetuning on various narrow datasets leads to similar broad improvements across test tasks; suggesting that the gains are not from domain adaptation but adapting to FSL in general. We do not observe clear patterns between the datasets that lead to FSL gains; leaving open questions about why certain data helps with FSL.