background
logo
ArxivPaperAI

Rethinking the Output Architecture for Sound Source Localization

Author:
Linfeng Feng, Xiao-Lei Zhang, Xuelong Li
Keyword:
Electrical Engineering and Systems Science, Audio and Speech Processing, Audio and Speech Processing (eess.AS)
journal:
--
date:
2023-11-21 00:00:00
Abstract
Sound source localization (SSL) involves estimating the direction of arrival (DOA) of a sound signal. The output space of the DOA estimation is continuous, suggesting that regression may be the most appropriate formulation for DOA. However, in practice, converting the DOA estimation into a classification problem often results in better performance than the regression formulation, since that classification problems are generally easier to model, and are more robust in handling noise and uncertainty than regression problems. In the classification formulation of DOA, the output space is discretized into several intervals, each of which is treated as a class. These classes exhibit strong inter-class correlation, with their mutual-similarity increasing when they approach each other and being ordered. However, this property is not sufficiently explored. To exploit these property, we propose a soft label distribution, named Unbiased Label Distribution (ULD), for eliminating the quantization error of the training target and further taking the inter-class similarity into strong consideration. We further introduce two loss functions, named the Negative Log Absolute Error (NLAE) loss function and {Mean Squared Error loss function without activation (MSE(wo))}, for the soft label family. Finally, we design a new decoding method to map the predicted distribution to sound source locations, called Weighted Adjacent Decoding (WAD). It uses the weighted sum of the probabilities of the peak classes and their adjacent classes in the predicted distribution for decoding. Experimental results show that the proposed method achieves the state-of-the-art performance, and the WAD decoding method is able to even breakthrough the quantization error limits of existing decoding methods.
PDF: Rethinking the Output Architecture for Sound Source Localization.pdf
Empowered by ChatGPT