Topological properties of attention for speech processing
We apply Topological Data Analysis (TDA) methods to speech classification problems and to introspection of the pre-trained speech model. To this end, we introduce a number of topological and algebraic features, which are derived from the transformer's attention maps and embeddings. We empirically show that a simple linear classifier built on top of such features outperforms fine-tuning classification head. In particular, we achieve an improvement of up to 9% accuracy and 5% ERR on four common datasets with the HuBERT model. On the Crema-D dataset the proposed feature set establishes a new state-of-the-art performance by reaching the performance of accuracy equal to 80.155. Last but not least we show that topological features are capable of revealing the functional roles of speech transformer heads. For example, we find the heads, that are capable to distinguish in a precise way between emotions (sad/happy), sample source (natural/generated) or recognize one of two voices without any downstream fine-tuning. To do so, we introduce a ranking function, which separates topological representations of a single head. The results demonstrate that TDA is a promising research direction for speech analysis, specifically, for the tasks that require structural prediction.This webpage provides suplementoral and additional material for our article "Topological properties of attention for speech processing" that was accepted to INTERSPEECH 2023. Due to the strict constraints on the the size of the submitions, in our paper we had to omit or mention too briefly some of the achieved results, but here we can present them in their enterietry without any fear for page count.
Our articleOur article "Topological properties of attention for speech processing" is currently available at ArXiv:2211.17223. |
Our repositoryGithub repository with code that allows to reproduce experiments from our paper or try them on a completely new task or dataset is available at github.com/ArGintum/topohubert. Right now it is under constuction |
In our work we used several publicaly available datasets. Here we would like to thank authors of the datasers and provide quick links to the websites of those projects.
We used several approaches to studying the topology and spectral properties of attention maps of HuBERT model and performed various experiments, but due to the tight size constraints were unable to put them all into our paper. This section presents compilation of paragraphs, maps and charts that didn't fit into the article or (in a different circumstances) would make its appendix.
In our work we extensively use methods of Topological Data Analysis (TDA), a modern field that was developed from numerous works in algebraic topology and computational geometry over the last two decades. We know that due to the novelty of TDA our Reader may not be accustomed with it, however short format didn't allow us to put proper explanations in the paper.
Here we will attempt to provide information necessary for better understanding of the topology-related part of our paper: a glossary, analysis of features used in our models, and references to other works implementing TDA methods for Transformer models (not only speech).
      We are the research team behind the article "Topological properties of attention for speech processing" and this webpage: Eduard Tulchinskii, Kristian Kuznetsov, Laida Kushnareva, Daniil Cherniavskii, Serguei Barannikov, Irina Piontkovskaya, Sergey Nikolenko, and Evgeny Burnaev.
Right now you can contact us via e-mail at   Eduard.Tulchinskiy@ skoltech.ru