Adaptive Two Sided Laplace Transforms as a Replacement for Self-Attention

4 months ago 2

[Submitted on 1 Jun 2025]

View PDF HTML (experimental)

Abstract:We propose an innovative, learnable two-sided short-time Laplace transform (STLT) mechanism to supplant the traditional self attention in transformer-based LLMs. Our STLT introduces trainable parameters for each Laplace node, enabling end-to-end learning of decay rates , oscillatory frequencies, and window bandwidth T. This flexibility allows the model to dynamically adapt token relevance half lives and frequency responses during training. By selecting S learnable nodes and leveraging fast recursive convolution, we achieve an effective complexity of in time and memory. We further incorporate an efficient FFT-based computation of the relevance matrix and an adaptive node allocation mechanism to dynamically adjust the number of active Laplace nodes. Empirical results on language modeling (WikiText\-103, Project Gutenberg), machine translation (WMT'14 En\-De), and long document question answering (NarrativeQA) demonstrate that our learnable STLT achieves perplexities and scores on par with or better than existing efficient transformers while naturally extending to context lengths exceeding 100k tokens or more limited only by available hardware. Ablation studies confirm the importance of learnable parameters and adaptive node allocation. The proposed approach combines interpretability, through explicit decay and frequency parameters, with scalability and robustness, offering a pathway towards ultra-long-sequence language modeling without the computational bottleneck of self-attention.

Submission history

From: Andrew Kiruluta [view email]
[v1] Sun, 1 Jun 2025 00:32:24 UTC (201 KB)

Read Entire Article