You can also browse my Google Scholar profile.

2024

  • Faster Speech-LLaMA inference with multi-token prediction
    Desh Raj, Gil Keren, Junteng Jia, Jay Mahadeokar, Ozlem Kalinli
    Submitted to IEEE ICASSP 2025
    Paper

  • M-BEST-RQ: A multi-channel speech foundation model for smart glasses
    Yufeng Yang, Desh Raj, Ju Lin, Niko Moritz, Junteng Jia, Gil Keren, Egor Lakomkin, Yiteng Huang, Jacob Donley, Jay Mahadeokar, Ozlem Kalinli
    Submitted to IEEE ICASSP 2025
    Paper

  • ConEC: Earnings Call Dataset with Real-world Contexts for Benchmarking Contextual Speech Recognition
    Ruizhe Huang, Mahsa Yarmohammadi, Jan Trmal, Jing Liu, Desh Raj, Leibny Paola Garcia, Alexei V Ivanov, Patrick Ehlen, Mingzhi Yu, Dan Povey, Sanjeev Khudanpur
    LREC 2024
    Paper

  • Listening to multi-talker conversations: Modular and end-to-end perspectives
    Desh Raj
    PhD Thesis, Johns Hopkins University
    Thesis Slides Video

  • On speaker attribution with SURT
    Desh Raj, Matthew Wiesner, Matthew Maciejewski, Paola Garcia, Daniel Povey, Sanjeev Khudanpur
    Speaker Odyssey 2024
    Paper Slides

  • Updated corpora and benchmarks for long-form speech recognition
    Jennifer Drexler Fox, Desh Raj, Natalie Delworth, Quinn McNamara, Corey Miller, Migüel Jetté
    IEEE ICASSP 2024
    Paper Code

  • Training Early-Exit Architectures for Automatic Speech Recognition: Fine-Tuning Pre-Trained Models or Training from Scratch
    George August Wright, Umberto Cappellazzo, Salah Zaiem, Desh Raj, Lucas Ondel Yang, Daniele Falavigna, Alessio Brutti
    IEEE ICASSP 2024 Workshop on Self-supervision in Audio, Speech, and Beyond (SASB)
    Paper

2023

  • Learning from flawed data: Weakly supervised automatic speech recognition
    Dongji Gao, Hainan Xu, Desh Raj, Leibny Paola Garcia Perera, Daniel Povey, Sanjeev Khudanpur
    IEEE ASRU 2023
    Paper Code

  • SURT 2.0: Advances in transducer-based multi-talker speech recognition
    Desh Raj, Daniel Povey, Sanjeev Khudanpur
    IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)
    Paper ArXiv Poster Webpage

  • The CHiME-7 DASR challenge: Distant meeting transcription with multiple devices in diverse scenarios
    Samuele Cornell, Matthew Wiesner, Shinji Watanabe, Desh Raj, Xuankai Chang, Paola Garcia, Matthew Maciejewski, Yoshiki Masuyama, Zhong-Qiu Wang, Stefano Squartini, Sanjeev Khudanpur
    CHiME Workshop at InterSpeech 2023
    Paper Website

  • GPU-accelerated guided source separation for meeting transcription
    Desh Raj, Daniel Povey, Sanjeev Khudanpur
    InterSpeech 2023
    Paper ArXiv Poster Code

  • Anchored speech recognition using neural transducers
    Desh Raj, Junteng Jia, Jay Mahadeokar, Chunyang Wu, Niko Moritz, Xiaohui Zhang, Ozlem Kalinli
    IEEE ICASSP 2023
    Paper Slides Video

  • Adapting self-supervised models to multi-talker speech recognition using speaker embeddings
    Zili Huang, Desh Raj, Paola Garcia, Sanjeev Khudanpur
    IEEE ICASSP 2023
    Paper Code

2022

  • Low-Latency speech separation guided diarization for telephone conversations
    Giovanni Morrone, Samuele Cornell, Desh Raj, Luca Serafini, Enrico Zovato, Alessio Brutti, Stefano Squartini
    IEEE Spoken Language Technology (SLT) Workshop 2022
    Paper

  • Continuous streaming multi-talker ASR with dual-path transducers
    Desh Raj, Liang Lu, Zhuo Chen, Yashesh Gaur, Jinyu Li
    IEEE ICASSP 2022
    Paper Slides Poster Video

  • Injecting text and cross-lingual supervision in few-shot learning from self-supervised models
    Matthew Wiesner, Desh Raj, Sanjeev Khudanpur
    IEEE ICASSP 2022
    Paper Code Poster Video (Matthew)

2021

  • Joint speaker diarization and speech recognition based on region proposal networks
    Zili Huang, Marc Delcroix, Leibny Paola Garcia, Shinji Watanabe, Desh Raj, Sanjeev Khudanpur
    Computer, Speech, and Language, Vol. 72
    Paper

  • Reformulating DOVER-Lap label mapping as a graph partitioning problem
    Desh Raj, Sanjeev Khudanpur
    INTERSPEECH 2021
    Paper Code Report Slides Video

  • Auxiliary loss function for target speech extraction and recognition with weak supervision based on speaker characteristics
    Katerina Zmolikova, Marc Delcroix, Desh Raj, Shinji Watanabe, Jan Černocký
    INTERSPEECH 2021
    Paper

  • Target-speaker voice activity detection with improved i-vector estimation for unknown number of speaker
    Mao-Kui He, Desh Raj, Zili Huang, Jun Du, Zhuo Chen, Shinji Watanabe
    INTERSPEECH 2021
    Paper

  • Training hybrid models on noisy transliterated transcripts for code-switched speech recognition
    Matthew Wiesner, Mousmita Sarma, Ashish Arora, Desh Raj, Dongji Gao, Ruizhe Huang, Supreet Preet, Moris Johnson, Zikra Iqbal, Nagendra Goel, Jan Trmal, Leibny Garcıa-Perera, Sanjeev Khudanpur
    INTERSPEECH 2021
    Paper Code

  • The Hitachi-JHU DIHARD III system: Competitive end-to-end neural diarization and x-vector clustering systems combined by DOVER-Lap
    Shota Horiguchi, Nelson Yalta, Paola Garcia, Yuki Takashima, Yawen Xue, Desh Raj, Zili Huang, Yusuke Fujita, Shinji Watanabe, Sanjeev Khudanpur
    Third DIHARD Speech Diarization Challenge
    Paper

  • Multi-class spectral clustering with overlaps for speaker diarization
    Desh Raj, Zili Huang, Sanjeev Khudanpur
    IEEE Spoken Language Technology (SLT) Workshop 2021
    Paper Code Slides

  • DOVER-Lap: A method for combining overlap-aware diarization outputs
    Desh Raj, Paola Garcia, Zili Huang, Shinji Watanabe, Daniel Povey, Andreas Stolcke, Sanjeev Khudanpur
    IEEE Spoken Language Technology (SLT) Workshop 2021
    Paper Code Slides

  • Integration of speech separation, diarization, and recognition for multi-speaker meetings: System description, comparison, and analysis
    Desh Raj, Pavel Denisov, Zhuo Chen, Hakan Erdogan, Zili Huang, Maokui He, Shinji Watanabe, Jun Du, Takuya Yoshioka, Yi Luo, Naoyuki Kanda, Jinyu Li, Scott Wisdom, John R. Hershey
    IEEE Spoken Language Technology (SLT) Workshop 2021
    Paper Code Slides

  • Sequential multi-frame neural beamforming for speech separation and enhancement
    Zhong-Qiu Wang, Hakan Erdogan, Scott Wisdom, Kevin Wilson, Desh Raj, Shinji Watanabe, Zhuo Chen, John R. Hershey
    IEEE Spoken Language Technology (SLT) Workshop 2021
    Paper

2020

  • Frustratingly easy noise-aware training of acoustic models
    Desh Raj, Jesus Villalba, Daniel Povey, Sanjeev Khudanpur
    ArXiv, 2020
    Paper Code

  • The JHU multi-microphone multi-speaker ASR system for the CHiME-6 challenge
    Ashish Arora*, Desh Raj*, Aswin Shanmugam Subramanian*, Ke Li*, Bar Benyair, Matthew Maciejewski, Piotr Zelasko, Paola Garcia, Shinji Watanabe, Sanjeev Khudanpur.
    The 6th CHiME Workshop (at ICASSP 2020).
    Paper Video Slides

2019

  • Probing the infomation encoded in x-vectors
    Desh Raj, David Snyder, Daniel Povey, Sanjeev Khudanpur.
    IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) 2019.
    Paper Code Poster

  • Using ASR methods for OCR
    Ashish Arora, Chun Chieh Chang, Babak Rekabdar, Daniel Povey, David Etter, Desh Raj, Hossein Hadian, Jan Trmal, Paola Garcia, Shinji Watanabe, Vimal Manohar, Yiwen Shao, Sanjeev Khudanpur.
    International Conference on Document Analysis and Recognition (ICDAR) 2019.
    Preprint Paper Code Blog

2018

  • Uncertain fuzzy self-organization based clustering: interval type-2 approach to adaptive resonance theory
    Shakaiba Majheed, Aditya Gupta, Desh Raj, Frank Chung-hoon Rhee.
    Information Sciences, 2018.
    Paper

2017

  • Learning local and global contexts using a convolutional recurrent neural network for relation classification in biomedical text
    Desh Raj, Sunil Kumar Sahu, Ashish Anand.
    Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL) 2017.
    Paper Poster Code

  • Analysis of data generated from multidimensional type-1 and type-2 fuzzy membership functions
    Desh Raj, Aditya Gupta, Bhuvnesh Garg, Kenil Tanna, Frank Chung-hoon Rhee.
    IEEE Transactions on Fuzzy Systems, 2017.
    Paper