EcoSta 2024: Start Registration
View Submission - EcoSta2024
A0574
Title: Encoding recurrence into transformers Authors:  Feiqing Huang - University of Hong Kong (Hong Kong)
Kexin Lu - University of Hong Kong (Hong Kong)
Yuxi Cai - The University of Hong Kong (Hong Kong) [presenting]
Zhen Qin - HUAWEI Noahs Ark Lab (China)
Yanwen Fang - University of Hong Kong (China)
Guangjian Tian - Huawei Noahs Ark Lab (China)
Guodong Li - University of Hong Kong (Hong Kong)
Abstract: The purpose is to break down with ignorable loss an RNN layer into a sequence of simple RNNs, each of which can be further rewritten into a lightweight positional encoding matrix of a self-attention, named the recurrence encoding matrix (REM). Thus, recurrent dynamics introduced by the RNN layer can be encapsulated into the positional encodings of a multi-head self-attention, and this makes it possible to seamlessly incorporate these recurrent dynamics into a transformer, leading to a new module, self-attention with recurrence (RSA). The proposed module can leverage the recurrent inductive bias of REMs to achieve a better sample efficiency than its corresponding baseline transformer, while self-attention is used to model the remaining non-recurrent signals. The relative proportions of these two components are controlled by a data-driven gated mechanism, and the effectiveness of RSA modules is demonstrated by time series forecasting tasks.