Audio-domain Position-independent Backdoor Attack via Unnoticeable Triggers


In this work, we explore the severity of audio-domain backdoor attacks and demonstrate their feasibility under practical scenarios of voice user interfaces, where an adversary injects (plays) an unnoticeable audio trigger into live speech to launch the attack. To realize such attacks, we consider jointly optimizing the audio trigger and the target model in the training phase, deriving a position-independent, unnoticeable, and robust audio trigger. We design new data poisoning techniques and penalty-based algorithms that inject the trigger into randomly generated temporal positions in the audio input during training, rendering the trigger resilient to any temporal position variations.

In The 28th Annual International Conference On Mobile Computing And Networking