View Submission

A1189

Title: Can large language models solve compositional tasks: A study of out-of-distribution generalization Authors: Yiqiao Zhong - UW Madison (United States) [presenting]
Abstract: Large language models (LLMs) such as GPT-4 sometimes appeared to be creative, solving novel tasks with a few demonstrations in the prompt. These tasks require the pre-trained models to generalize on distributions different from those from training data - which is known as out-of-distribution generalization. For example, in "symbolized language reasoning", where names/labels are replaced by arbitrary symbols, the model can infer the names/labels without any fine-tuning. The focus is on a pervasive structure within LLMs known as induction heads. By experimenting on a variety of LLMs, it is empirically demonstrated that compositional structure is crucial for Transformers to learn the rules behind training instances and generalize on OOD data. Further, the "common bridge representation hypothesis" is proposed, where a key intermediate subspace in the embedding space connects components of early layers and those of later layers as a mechanism of composition.