A0189
Title: PETapter: A masked-language-modeling classification head for modular fine-tuning of (large) language models
Authors: Jonas Rieger - TU Dortmund University (Germany) [presenting]
Abstract: A significant portion of applications for large language models involve classification tasks for documents, sequences, sentences, or even single entities. For such tasks, pretrained encoder-only models like RoBERTa and DeBERTa serve as powerful tools. These models typically undergo supervised fine-tuning adding an additional linear layer to the transformer architecture, known as a classification head, using datasets of varying sizes. This fine-tuning may incorporate a technique called parameter-efficient fine-tuning (PEFT), which freezes large parts of the base model to reduce computational demand. Additionally, few-shot learning methods, such as pattern-exploiting training (PET), enable faster adaptation to (few) training examples. PETapter represents a fusion of these two promising research directions, leveraging the strengths of both to achieve effective training and performant predictions with just a few training samples. It employs PEFT methods for fine-tuning word embeddings and a PET-like masked-language-modeling objective for the final classification of text elements. Utilizing a benchmark study across various datasets, we demonstrate that PETapter is computationally more efficient than full fine-tuning via PET while maintaining comparable performance with just 100 training examples. Furthermore, it surpasses the performance of classical PEFT methods when used in conjunction with traditional classification heads.