🏆 Speech Anti-Spoofing Arena

Spectra-0 is live on the Speech Anti-Spoofing Arena — 🔓 Unpublished / Proprietary tier (listed but unranked, no paper). Every result is sha-pinned and independently reproduced at scoring level (reproduce --scoring).

Dataset	Arena EER %	Trials
LibriSeVoc	0.00	18,487
CD-ADD	0.05	20,786
ASVspoof2019_LA	0.51	71,237
InTheWild	1.10	31,779
SONAR	2.20	3,948
CFAD	3.42	62,999
ASVspoof2021_DF	5.30	611,829
ASVspoof2021_LA	6.58	181,566
CVoiceFake_small	6.77	138,136
ASVspoof5	14.82	680,774

Arena EER uses the toolkit's deterministic eval pipeline (preemphasis 0.97, first-64,600-sample window); values may differ slightly from the model card's original numbers above.

Model Card: Spectra-0 (anti-spoofing / bonafide vs spoof)

Spectra-0 is a model for speech spoofing detection (binary classification: bonafide vs spoof) from raw audio waveforms. Architecture: SSL encoder (Wav2Vec2) → MLP projection → ECAPA-TDNN 2-class classifier.

Input: waveform (float32), shape (batch, num_samples) (typically 16 kHz).
Output: logits of shape (batch, 2), where index 0 = spoof, index 1 = bonafide.

On first run, the model will automatically download the SSL encoder facebook/wav2vec2-xls-r-300m via transformers.

Evaluation Results

Model	ASVspoof19 LA	ASVspoof21 LA	ASVspoof21 DF	ASVspoof5	ADD2022	In-the-Wild
Res2TCNGuard	7.487	19.130	19.883	37.620	49.538	49.246
AASIST3	27.585	37.407	33.099	41.001	47.192	39.626
XSLS	0.231	7.714	4.220	17.688	33.951	7.453
TCM-ADD	0.152	6.655	3.444	19.505	35.252	7.767
DF Arena 1B	43.793	40.137	42.994	35.333	42.139	17.598
Spectra-0	0.181	6.475	5.410	14.426	14.716	1.026

Quickstart

Clone from Hugging Face

This repository is hosted on Hugging Face Hub: https://huggingface.co/MTUCI/spectra_0.

git lfs install
git clone https://huggingface.co/MTUCI/spectra_0
cd spectra_0

Install dependencies

pip install -U torch torchaudio transformers huggingface_hub safetensors soundfile

Single-file inference (example preprocessing)

import random
import torch
import torchaudio
import soundfile as sf

from model import spectra_0


def pad_random(x: torch.Tensor, max_len: int = 64600) -> torch.Tensor:
    # x: (num_samples,) or (1, num_samples)
    if x.ndim > 1:
        x = x.squeeze()
    x_len = x.shape[0]
    if x_len >= max_len:
        start = random.randint(0, x_len - max_len)
        return x[start:start + max_len]
    num_repeats = int(max_len / x_len) + 1
    return x.repeat(num_repeats)[:max_len]


def load_audio_mono(path: str) -> torch.Tensor:
    audio, sr = sf.read(path, dtype="float32")
    audio = torch.from_numpy(audio)
    if audio.ndim > 1:
        # (num_samples, channels) -> mono
        audio = audio.mean(dim=1)
    if sr != 16000:
        audio = torchaudio.functional.resample(audio, sr, 16000)
    return audio


device = "cuda" if torch.cuda.is_available() else "cpu"
model = spectra_0.from_pretrained(pretrained_model_name_or_path=".").eval().to(device)

audio = load_audio_mono("path/to/audio.wav")
audio = torchaudio.functional.preemphasis(audio.unsqueeze(0))  # (1, T)
audio = pad_random(audio.squeeze(0), 64600).unsqueeze(0)       # (1, 64600)

with torch.inference_mode():
    logits = model(audio.to(device))  # (1, 2)
    score_spoof = logits[0, 0].item()
    score_bonafide = logits[0, 1].item()

print({"score_bonafide": score_bonafide, "score_spoof": score_spoof})

Threshold-based classification (and how to tune it)

In model.py, the Spectra0Model class provides classify() with a default threshold chosen as an “optimal” value for the original setting:

Default threshold: -1.0625009 (it thresholds logit_bonafide = logits[:, 1])
Note: this threshold may not be optimal on a different dataset/domain. It’s recommended to tune the threshold on your dataset using EER (Equal Error Rate) or a target FAR/FRR.

Example:

with torch.inference_mode():
    pred = model.classify(audio.to(device), threshold=-1.0625009)  # 1=bonafide, 0=spoof

Tuning the threshold via EER (typical workflow)

Run the model on a labeled set and collect scores for both classes.
Compute EER and the threshold

Limitations and notes

This is a pre-release model.
Significantly stronger models are planned for Q3–Q4 2026 — stay tuned.

License

MIT (see the license field in the model repo header).

Contacts

TG channel: https://t.me/korallll_ai email: ~~k.n.borodin@mtuci.ru~~ (deprecated — use kborodin.research@gmail.com) website: https://lab260.ru/

Benchmarks on Papers with code

@misc{wang2020asvspoof2019largescalepublic,
      title={ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech}, 
      author={Xin Wang and Junichi Yamagishi and Massimiliano Todisco and Hector Delgado and Andreas Nautsch and Nicholas Evans and Md Sahidullah and Ville Vestman and Tomi Kinnunen and Kong Aik Lee and Lauri Juvela and Paavo Alku and Yu-Huai Peng and Hsin-Te Hwang and Yu Tsao and Hsin-Min Wang and Sebastien Le Maguer and Markus Becker and Fergus Henderson and Rob Clark and Yu Zhang and Quan Wang and Ye Jia and Kai Onuma and Koji Mushika and Takashi Kaneda and Yuan Jiang and Li-Juan Liu and Yi-Chiao Wu and Wen-Chin Huang and Tomoki Toda and Kou Tanaka and Hirokazu Kameoka and Ingmar Steiner and Driss Matrouf and Jean-Francois Bonastre and Avashna Govender and Srikanth Ronanki and Jing-Xuan Zhang and Zhen-Hua Ling},
      year={2020},
      eprint={1911.01601},
      archivePrefix={arXiv},
      primaryClass={eess.AS},
      url={https://arxiv.org/abs/1911.01601}, 
}
@article{210900535,
  title={{ASVspoof 2021: Automatic Speaker Verification Spoofing and  …}},
  author={{}},
  year={{2021}},
  eprint={{2109.00535}},
  archivePrefix={{arXiv}}
}
@misc{wang2024asvspoof5crowdsourcedspeech,
      title={ASVspoof 5: Crowdsourced Speech Data, Deepfakes, and Adversarial Attacks at Scale}, 
      author={Xin Wang and Hector Delgado and Hemlata Tak and Jee-weon Jung and Hye-jin Shim and Massimiliano Todisco and Ivan Kukanov and Xuechen Liu and Md Sahidullah and Tomi Kinnunen and Nicholas Evans and Kong Aik Lee and Junichi Yamagishi},
      year={2024},
      eprint={2408.08739},
      archivePrefix={arXiv},
      primaryClass={eess.AS},
      url={https://arxiv.org/abs/2408.08739}, 
}
@misc{yi2024add2022audiodeep,
      title={ADD 2022: the First Audio Deep Synthesis Detection Challenge}, 
      author={Jiangyan Yi and Ruibo Fu and Jianhua Tao and Shuai Nie and Haoxin Ma and Chenglong Wang and Tao Wang and Zhengkun Tian and Xiaohui Zhang and Ye Bai and Cunhang Fan and Shan Liang and Shiming Wang and Shuai Zhang and Xinrui Yan and Le Xu and Zhengqi Wen and Haizhou Li and Zheng Lian and Bin Liu},
      year={2024},
      eprint={2202.08433},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2202.08433}, 
}
@article{220316263,
  title={{Does Audio Deepfake Detection Generalize?}},
  author={{Nicolas M. Müller et al.}},
  year={{2022}},
  eprint={{2203.16263}},
  archivePrefix={{arXiv}}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Safetensors

Model size

0.3B params

Tensor type

F32

Collection including lab260/spectra_0

Spectra

Collection

Family of Speech Antispoofing models • 3 items • Updated May 10

Papers for lab260/spectra_0

ASVspoof 5: Crowdsourced Speech Data, Deepfakes, and Adversarial Attacks at Scale

Paper • 2408.08739 • Published Aug 16, 2024

ADD 2022: the First Audio Deep Synthesis Detection Challenge

Paper • 2202.08433 • Published Jul 2, 2024

Does Audio Deepfake Detection Generalize?

Paper • 2203.16263 • Published Mar 30, 2022

ASVspoof 2021: Automatic Speaker Verification Spoofing and Countermeasures Challenge Evaluation Plan

Paper • 2109.00535 • Published Sep 1, 2021

ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech

Paper • 1911.01601 • Published Nov 5, 2019

Evaluation results

Equal Error Rate on ASVspoof19_LA
self-reported

0.181
Equal Error Rate on ASVspoof21_LA
self-reported

6.475
Equal Error Rate on ASVspoof21_DF
self-reported

5.410
Equal Error Rate on ASVspoof5
self-reported

14.426
Equal Error Rate on ADD2022
self-reported

14.716
Equal Error Rate on In-the-Wild
self-reported

1.026