This post describes the implementation of our paper “Multi-class spectral clustering with overlaps for speaker diarization”, accepted for publication at IEEE SLT 2021.

The code consists of 2 parts: overlap detector, and our modified spectral clustering method for overlap-aware diarization.

We have implemented the diarization recipe in Kaldi, and modified scikit-learn’s spectral clustering class for our modification. The entire code to reproduce our results is available at:

Installation and setup

Since the recipe is implemented using Kaldi, you first need to install Kaldi by following the instructions at:

In the Kaldi installation, miniconda is not installed by default. To install it, go to tools/ and run:


Install a modified version of scikit-learn in the miniconda Python installation:

$HOME/miniconda3/bin/python -m pip install git+


The recipe containing the overlap detector and the spectral clustering for AMI can be found at egs/ami/s5c. Additionally, the recipe also contains example for different clustering methods, namely AHC, VBx, and spectral clustering, to reproduce the single-speaker baselines in the paper.

The script does not contain stages for training an x-vector extractor, since we used the same extractor from the CHiME-6 baseline system.

The key stages in the script are as follows.

  • --stage 8: Trains the overlap detector.
  • --stage 9: Performs decoding with a trained overlap detector.
  • --stage 10: Performs spectral clustering informed by the output from stage 9.

Where to find the clustering implementation?

We use scikit-learn’s spectral clustering class to implement our modified clustering method. In the default scikit-learn class, the argument assign_labels can take on 2 values:

  1. kmeans: This performs the conventional spectral clustering using the Ng-Jordan-Weiss method.
  2. discretize: This implements the clustering described in this paper and we modify it for our implementation.

We modified the discretize() function here by adding an additional argument which specifies the overlap vector. The vector is used in L141-148 to assign a second label to the overlapping segments.

Pre-trained model

For all our clustering-based diarization experiments, we used the x-vector extractor that was provided with the CHiME-6 baseline system, and is available here. To use it, first download the extractor using wget and then extract it using tar -xvzf and copy the contents to your exp directory.


If you find this code useful, consider citing our paper:

  title={Multi-class spectral clustering with overlaps for speaker diarization},
  author={Desh Raj and Zili Huang and Sanjeev Khudanpur},
  booktitle={IEEE Spoken Language Technology Workshop},


For any questions about using the code, you can contact me at