NoiseRobustAudioCodec-demo

Noise robustness remains a critical challenge in the development of neural audio codecs, particularly for real-world speech communication scenarios. This paper presents a novel training strategy, progressive probabilistic top-K sampling, designed to enhance the noise robustness of audio codecs while training exclusively on clean speech data. Unlike traditional residual vector quantization (RVQ) methods that select the closest codebook vector, our approach probabilistically samples from the top-K closest candidates, simulating noise at the code level and enabling the model to handle unseen noisy conditions. Additionally, we propose a progressive training strategy that gradually introduces the noise robustness from the final quantizer to the first quantizer in the RVQ structure. Experimental results on one of the most advanced audio codecs demonstrate significant improvements in noise robustness, with PESQ increasing from 2.399 to 2.466 for decoded noisy speech, while maintaining high-quality performance for clean speech.

Input Noisy	Closest	Closest*	Proposed	Proposed†

Groundtruth	Closest	Closest*	Proposed	Proposed†

Enhancing Noise Robustness in Audio Codecs via Progressive Probabilistic Top-K Sampling in Residual Vector Quantization Using Only Clean Speech