Real-time, Robust and Adaptive Universal Adversarial Attacks Against Speaker Recognition Systems


Voice user interface (VUI) has become increasingly popular in recent years. Speaker recognition system, as one of the most common VUIs, has emerged as an important technique to facilitate security-required applications and services. In this paper, we propose to design, for the first time, a real-time, robust, and adaptive universal adversarial attack against the state-of-the-art deep neural network (DNN) based speaker recognition systems in the white-box scenario. By developing an audio-agnostic universal perturbation, we can make the DNN-based speaker recognition systems to misidentify the speaker as the adversary-desired target label, with using a single perturbation that can be applied on arbitrary enrolled speaker’s voice. In addition, we improve the robustness of our attack by modeling the sound distortions caused by the physical over-the-air propagation through estimating room impulse response (RIR). Moreover, we propose to adaptively adjust the magnitude of perturbations according to each individual utterance via spectral gating. This can further improve the imperceptibility of the adversarial perturbations with minor increase of attack generation time. Experiments on a public dataset of 109 English speakers demonstrate the effectiveness and robustness of the proposed attack. Our attack method achieves average 90% attack success rate on both X-vector and d-vector speaker recognition systems. Meanwhile, our method achieves 100 × speedup on attack launching time, as compared to the conventional non-universal attacks.

In Journal of Signal Processing Systems