background
logo
ArxivPaperAI

Concurrent Speaker Detection: A multi-microphone Transformer-Based Approach

Author:
Amit Eliav, Sharon Gannot
Keyword:
Electrical Engineering and Systems Science, Audio and Speech Processing, Audio and Speech Processing (eess.AS)
journal:
--
date:
2024-03-11 00:00:00
Abstract
We present a deep-learning approach for the task of Concurrent Speaker Detection (CSD) using a modified transformer model. Our model is designed to handle multi- microphone data but can also work in the single-microphone case. The method can classify audio segments into one of three classes: 1) no speech activity (noise only), 2) only a single speaker is active, and 3) more than one speaker is active. We incorporate a Cost-Sensitive (CS) loss and a confidence calibration to the training procedure. The approach is evaluated using three real- world databases: AMI, AliMeeting, and CHiME 5, demonstrating an improvement over existing approaches.
PDF: Concurrent Speaker Detection: A multi-microphone Transformer-Based Approach.pdf
Empowered by ChatGPT