Publications

2025

4 publications

Can Speech LLMs think while listening?

Ian Shih, Desh Raj, Chunyang Wu, Wei Zhou, SK Bong, Yashesh Gaur, Jay Mahadeokar, Ozlem Kalinli, Mike Seltzer

Paper Slides

Faster Speech-LLaMA inference with multi-token prediction

Desh Raj, Gil Keren, Junteng Jia, Jay Mahadeokar, Ozlem Kalinli

IEEE ICASSP 2025

Paper

M-BEST-RQ: A multi-channel speech foundation model for smart glasses

Yufeng Yang, Desh Raj, Ju Lin, Niko Moritz, Junteng Jia, Gil Keren, Egor Lakomkin, Yiteng Huang, Jacob Donley, Jay Mahadeokar, Ozlem Kalinli

IEEE ICASSP 2025

Paper

Speech-N-LlaMA: Improving Speech LLMs with multi-pass training

Amit Kumar Singh Yadav, Gil Keren, Desh Raj, Wei Zhou, Junteng Jia, Ke Li, Ying Xu, Chunyang Wu, Jay Mahadeokar, Ozlem Kalinli

IEEE ICASSP 2025

Paper

2024

5 publications

ConEC: Earnings Call Dataset with Real-world Contexts for Benchmarking Contextual Speech Recognition

Ruizhe Huang, Mahsa Yarmohammadi, Jan Trmal, Jing Liu, Desh Raj, Leibny Paola Garcia, Alexei V Ivanov, Patrick Ehlen, Mingzhi Yu, Dan Povey, Sanjeev Khudanpur

LREC 2024

Paper

Listening to multi-talker conversations: Modular and end-to-end perspectives

Desh Raj

PhD Thesis, Johns Hopkins University

Thesis Slides Video

On speaker attribution with SURT

Desh Raj, Matthew Wiesner, Matthew Maciejewski, Paola Garcia, Daniel Povey, Sanjeev Khudanpur

Speaker Odyssey 2024

Paper Slides

Updated corpora and benchmarks for long-form speech recognition

Jennifer Drexler Fox, Desh Raj, Natalie Delworth, Quinn McNamara, Corey Miller, Migüel Jetté

IEEE ICASSP 2024

Paper Code

Training Early-Exit Architectures for Automatic Speech Recognition: Fine-Tuning Pre-Trained Models or Training from Scratch

George August Wright, Umberto Cappellazzo, Salah Zaiem, Desh Raj, Lucas Ondel Yang, Daniele Falavigna, Alessio Brutti

IEEE ICASSP 2024 Workshop on Self-supervision in Audio, Speech, and Beyond (SASB)

Paper

2023

6 publications

Learning from flawed data: Weakly supervised automatic speech recognition

Dongji Gao, Hainan Xu, Desh Raj, Leibny Paola Garcia Perera, Daniel Povey, Sanjeev Khudanpur

IEEE ASRU 2023

Paper Code

SURT 2.0: Advances in transducer-based multi-talker speech recognition

Desh Raj, Daniel Povey, Sanjeev Khudanpur

IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)

Paper ArXiv Poster Webpage

The CHiME-7 DASR challenge: Distant meeting transcription with multiple devices in diverse scenarios

Samuele Cornell, Matthew Wiesner, Shinji Watanabe, Desh Raj, Xuankai Chang, Paola Garcia, Matthew Maciejewski, Yoshiki Masuyama, Zhong-Qiu Wang, Stefano Squartini, Sanjeev Khudanpur

CHiME Workshop at InterSpeech 2023

Paper Website

GPU-accelerated guided source separation for meeting transcription

Desh Raj, Daniel Povey, Sanjeev Khudanpur

InterSpeech 2023

Paper ArXiv Poster Code

Anchored speech recognition using neural transducers

Desh Raj, Junteng Jia, Jay Mahadeokar, Chunyang Wu, Niko Moritz, Xiaohui Zhang, Ozlem Kalinli

IEEE ICASSP 2023

Paper Slides Video

Adapting self-supervised models to multi-talker speech recognition using speaker embeddings

Zili Huang, Desh Raj, Paola Garcia, Sanjeev Khudanpur

IEEE ICASSP 2023

Paper Code

2022

3 publications

Low-Latency speech separation guided diarization for telephone conversations

Giovanni Morrone, Samuele Cornell, Desh Raj, Luca Serafini, Enrico Zovato, Alessio Brutti, Stefano Squartini

IEEE Spoken Language Technology (SLT) Workshop 2022

Paper

Continuous streaming multi-talker ASR with dual-path transducers

Desh Raj, Liang Lu, Zhuo Chen, Yashesh Gaur, Jinyu Li

IEEE ICASSP 2022

Paper Slides Poster Video

Injecting text and cross-lingual supervision in few-shot learning from self-supervised models

Matthew Wiesner, Desh Raj, Sanjeev Khudanpur

IEEE ICASSP 2022

Paper Code Poster Video (Matthew)

2021

10 publications

Joint speaker diarization and speech recognition based on region proposal networks

Zili Huang, Marc Delcroix, Leibny Paola Garcia, Shinji Watanabe, Desh Raj, Sanjeev Khudanpur

Computer, Speech, and Language, Vol. 72

Paper

Reformulating DOVER-Lap label mapping as a graph partitioning problem

Desh Raj, Sanjeev Khudanpur

INTERSPEECH 2021

Paper Code Report Slides Video

Auxiliary loss function for target speech extraction and recognition with weak supervision based on speaker characteristics

Katerina Zmolikova, Marc Delcroix, Desh Raj, Shinji Watanabe, Jan Černocký

INTERSPEECH 2021

Paper

Multi-class spectral clustering with overlaps for speaker diarization

Desh Raj, Zili Huang, Sanjeev Khudanpur

IEEE Spoken Language Technology (SLT) Workshop 2021

Paper Code Slides

DOVER-Lap: A method for combining overlap-aware diarization outputs

Desh Raj, Paola Garcia, Zili Huang, Shinji Watanabe, Daniel Povey, Andreas Stolcke, Sanjeev Khudanpur

IEEE Spoken Language Technology (SLT) Workshop 2021

Paper Code Slides

Integration of speech separation, diarization, and recognition for multi-speaker meetings: System description, comparison, and analysis

Desh Raj, Pavel Denisov, Zhuo Chen, Hakan Erdogan, Zili Huang, Maokui He, Shinji Watanabe, Jun Du, Takuya Yoshioka, Yi Luo, Naoyuki Kanda, Jinyu Li, Scott Wisdom, John R. Hershey

IEEE Spoken Language Technology (SLT) Workshop 2021

Paper Code Slides

Sequential multi-frame neural beamforming for speech separation and enhancement

Zhong-Qiu Wang, Hakan Erdogan, Scott Wisdom, Kevin Wilson, Desh Raj, Shinji Watanabe, Zhuo Chen, John R. Hershey

IEEE Spoken Language Technology (SLT) Workshop 2021

Paper

2020

2 publications

Frustratingly easy noise-aware training of acoustic models

Desh Raj, Jesus Villalba, Daniel Povey, Sanjeev Khudanpur

ArXiv, 2020

Paper Code

The JHU multi-microphone multi-speaker ASR system for the CHiME-6 challenge

Ashish Arora*, Desh Raj*, Aswin Shanmugam Subramanian*, Ke Li*, Bar Benyair, Matthew Maciejewski, Piotr Zelasko, Paola Garcia, Shinji Watanabe, Sanjeev Khudanpur

The 6th CHiME Workshop (at ICASSP 2020)

Paper Video Slides

2019

2 publications

Probing the information encoded in x-vectors

Desh Raj, David Snyder, Daniel Povey, Sanjeev Khudanpur

IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) 2019

Paper Code Poster

Using ASR methods for OCR

Ashish Arora, Chun Chieh Chang, Babak Rekabdar, Daniel Povey, David Etter, Desh Raj, Hossein Hadian, Jan Trmal, Paola Garcia, Shinji Watanabe, Vimal Manohar, Yiwen Shao, Sanjeev Khudanpur

International Conference on Document Analysis and Recognition (ICDAR) 2019

Preprint Paper Code Blog

2018

1 publication

Uncertain fuzzy self-organization based clustering: interval type-2 approach to adaptive resonance theory

Shakaiba Majheed, Aditya Gupta, Desh Raj, Frank Chung-hoon Rhee

Information Sciences, 2018

Paper

2017

2 publications

Learning local and global contexts using a convolutional recurrent neural network for relation classification in biomedical text

Desh Raj, Sunil Kumar Sahu, Ashish Anand

Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL) 2017

Paper Poster Code

Analysis of data generated from multidimensional type-1 and type-2 fuzzy membership functions

Desh Raj, Aditya Gupta, Bhuvnesh Garg, Kenil Tanna, Frank Chung-hoon Rhee

IEEE Transactions on Fuzzy Systems, 2017

Paper