ASVSpoof dataset contains sapmples from 19 models of speech synthethis named A01-A19. Data from models A01-A06 is included in both train and validation splits, data from A07-A19 - in test split only.
In Section 4.3 of our article we show that individual heads are quite good at separating individual speech synthesize models, but due to lack of space we included results only for four of the models. Here we can present complete results for all 19 models. As in the article, for each of them we found a head that is the best at separating this model's speech from bonafide (as described in Section 4.3), plotted distributions of Hm, sym0, and found optimal in terms of Equal Error Rate (EER) threshold classifier. All those results are given in the table below.
Histograms for synthetic speech are given in red, for bonafide — in blue. Dashed line marks the threshold of optimal classifier. Please note that we enumerate layers and heads of HuBERT starting from 1. On the mobile phone slide left-to-right to see all the pictures.
   
Designed in Notepad and hosted by Github. (C) TopoHuBERT team, 2023.