Although Topological Data Analysis and Transformer neural networks seem to be unrelated, there is a list of works on the intersection of these two topics. Recently, the Persformer [5] was introduced, a first Transformer architecture that accepts persistence diagrams as input. Authors claim that the Persformer architecture significantly outperforms previous topological neural network architectures on classical synthetic and graph benchmark datasets. Moreover, it satisfies a universal approximation theorem
Also, a number of works was released exploring BERT [2] from a geometric and topological point of view [3, 1, 4]. All of them treat attention maps as adjacency matrices of graphs and calculate topological statistics of them. The [3, 1] used various summary statistics of persistence barcodes to obtain features, while [4] used persistent images. [3] explored how these topological features relate to the naturalness of text, i.e. whether a text was artificially generated or written by human, and [1] explored how they correlate to linguistic phenomena such as grammatical correctness.
 
Designed in Notepad and hosted by Github. (C) TopoHuBERT team, 2023.