Paper
20 March 2013 A speaker change detection method based on coarse searching
Xue-yuan Zhang, Qian-hua He, Yan-xiong Li, Jun He
Author Affiliations +
Proceedings Volume 8768, International Conference on Graphic and Image Processing (ICGIP 2012); 87681S (2013) https://doi.org/10.1117/12.2010844
Event: 2012 International Conference on Graphic and Image Processing, 2012, Singapore, Singapore
Abstract
The conventional speaker change detection (SCD) method using Bayesian Information Criterion (BIC) has been widely used. However, its performance relies on the choice of penalty factor and suffers from mass calculation. The twostep SCD is less time consuming but generates more detection errors. The limitation of conventional method’s performance originates from the two adjacent data windows. We propose a strategy that inserts an interval between the two adjacent fixed-size data windows in each analysis window. The dissimilarity value between the data windows is regarded as the probability of a speaker identity change within the interval area. Then this analysis window is slid along the audio by a large step to locate the areas where speaker change points may appear. Afterwards we only focus on these areas and locate precisely where the change points are. Other areas where a speaker change point unlikely appears are abandoned. The proposed method is computationally efficient and more robust to noise and penalty factor compared with conventional method. Evaluated on the corpus of China Central Television (CCTV) news, the proposed method obtains 74.18% reduction in calculation time and 22.24% improvement in F1-measure compared with the conventional approach.
© (2013) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Xue-yuan Zhang, Qian-hua He, Yan-xiong Li, and Jun He "A speaker change detection method based on coarse searching", Proc. SPIE 8768, International Conference on Graphic and Image Processing (ICGIP 2012), 87681S (20 March 2013); https://doi.org/10.1117/12.2010844
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Single crystal X-ray diffraction

Lithium

Error analysis

Televisions

Tolerancing

Feature extraction

Acoustics

RELATED CONTENT

Image thresholding using standard deviation
Proceedings of SPIE (March 07 2014)
Optical font recognition of single Chinese character
Proceedings of SPIE (January 13 2003)
MPEG-7-based video annotation and browsing
Proceedings of SPIE (November 26 2003)
Singing voice detection for karaoke application
Proceedings of SPIE (June 24 2005)

Back to Top