Concurrent Speaker Detection: A multi-microphone Transformer-Based Approach

Author:

Amit Eliav, Sharon Gannot

Keyword:

Electrical Engineering and Systems Science, Audio and Speech Processing, Audio and Speech Processing (eess.AS)

journal:

date:

2024-03-11 00:00:00

Abstract

We present a deep-learning approach for the task of Concurrent Speaker Detection (CSD) using a modified transformer model. Our model is designed to handle multi- microphone data but can also work in the single-microphone case. The method can classify audio segments into one of three classes: 1) no speech activity (noise only), 2) only a single speaker is active, and 3) more than one speaker is active. We incorporate a Cost-Sensitive (CS) loss and a confidence calibration to the training procedure. The approach is evaluated using three real- world databases: AMI, AliMeeting, and CHiME 5, demonstrating an improvement over existing approaches.

PDF: Concurrent Speaker Detection: A multi-microphone Transformer-Based Approach.pdf