TopoHuBERT project

Topological properties of attention for speech processing

We apply Topological Data Analysis (TDA) methods to speech classification problems and to introspection of the pre-trained speech model. To this end, we introduce a number of topological and algebraic features, which are derived from the transformer's attention maps and embeddings. We empirically show that a simple linear classifier built on top of such features outperforms fine-tuning classification head. In particular, we achieve an improvement of up to 9% accuracy and 5% ERR on four common datasets with the HuBERT model. On the Crema-D dataset the proposed feature set establishes a new state-of-the-art performance by reaching the performance of accuracy equal to 80.155. Last but not least we show that topological features are capable of revealing the functional roles of speech transformer heads. For example, we find the heads, that are capable to distinguish in a precise way between emotions (sad/happy), sample source (natural/generated) or recognize one of two voices without any downstream fine-tuning. To do so, we introduce a ranking function, which separates topological representations of a single head. The results demonstrate that TDA is a promising research direction for speech analysis, specifically, for the tasks that require structural prediction.

This webpage provides suplementoral and additional material for our article "Topological properties of attention for speech processing" that was accepted to INTERSPEECH 2023. Due to the strict constraints on the the size of the submitions, in our paper we had to omit or mention too briefly some of the achieved results, but here we can present them in their enterietry without any fear for page count.

Article and Source Code

scripted paper  

Our article

Our article "Topological properties of attention for speech processing" is currently available at ArXiv:2211.17223.
  Github logo  

Our repository

Github repository with code that allows to reproduce experiments from our paper or try them on a completely new task or dataset is available at github.com/ArGintum/topohubert.

Right now it is under constuction

 

Used datasets

In our work we used several publicaly available datasets. Here we would like to thank authors of the datasers and provide quick links to the websites of those projects.

Appendices and Supplementary Material

We used several approaches to studying the topology and spectral properties of attention maps of HuBERT model and performed various experiments, but due to the tight size constraints were unable to put them all into our paper. This section presents compilation of paragraphs, maps and charts that didn't fit into the article or (in a different circumstances) would make its appendix.

About Topological Data Analysis

In our work we extensively use methods of Topological Data Analysis (TDA), a modern field that was developed from numerous works in algebraic topology and computational geometry over the last two decades. We know that due to the novelty of TDA our Reader may not be accustomed with it, however short format didn't allow us to put proper explanations in the paper.

Here we will attempt to provide information necessary for better understanding of the topology-related part of our paper: a glossary, analysis of features used in our models, and references to other works implementing TDA methods for Transformer models (not only speech).

Image is not available

TDA Glossary

Glossary of TDA terminology used in our paper and a brief introduction to Topological Data Analysis

Image is not available

Introduction to TDA

Here we will provide some examples of implementation of different TDA methods used in our work.


1) Computing topological features of a graph/point cloud

2) Computing the RTD

Image is not available

TDA for Transformers

This research is not the first nor the last (hopefully) to implement methods of TDA to transformer models. Here is a brief overview of previous works in the field that influenced our studies.

E-Mail

Contact information

      We are the research team behind the article "Topological properties of attention for speech processing" and this webpage:
Eduard Tulchinskii, Kristian Kuznetsov, Laida Kushnareva, Daniil Cherniavskii, Serguei Barannikov, Irina Piontkovskaya, Sergey Nikolenko, and Evgeny Burnaev.

Right now you can contact us via e-mail at  

   

Designed in Notepad and hosted by Github. (C) TopoHuBERT team (Eduard Tulchinskii et.al.), 2023.