Robust Detection of Machine-induced Audio Attacks in Intelligent Audio Systems with Microphone Array


This paper builds a holistic solution for detecting machine-induced audio attacks leveraging multi-channel microphone arrays on modern intelligent audio systems. We utilize magnitude and phase spectrograms of multi-channel audio to extract spatial information and leverage a deep learning model to detect the fundamental difference between human speech and adversarial audio generated by the playback machines. Moreover, we adopt an unsupervised domain adaptation training framework to further improve the model’s generalizability in new acoustic environments.

In 2021 ACM SIGSAC Conference on Computer and Communications Security