Thoughts on machine learning, speech processing, and research. 57 articles

Transducers at InterSpeech 2023

Neural transducers are the most popular ASR modeling paradigm in both academia and industry. Since I could not attend InterSpeech 2023 in person, I decided...
conference transducer

GBO notes: Approximation algorithms

This note is a brief introduction to approximation algorithms. Basically, the “Intro to Algorithms” courses are concerned with problems which are solvable in poly-time (i.e.,...
gbo algorithms

GBO notes: MVDR beamforming

In a previous note, we described the process of mask estimation using complex angular central GMMs that are used in guided source separation (GSS). Mask...
gbo mvdr

GBO notes: Mask estimation for GSS

Guided source separation (GSS) is an unsupervised algorithm for target speech extraction, first proposed in the Paderborn submission to the CHiME-5 challenge. Given a noisy...
gbo gss

GBO notes: Spectral clustering

In this note, I will review a popular clustering algorithm called spectral clustering. We will discuss its connection to the min-cut problem in graph partitioning,...
gbo spectral clustering

GBO notes: i-vectors and x-vectors

In this note, we will review the two most popular speaker embedding extraction methods, namely i-vectors and x-vectors. But first, it would be useful to...
gbo i-vectors

GBO notes: Expectation Maximization

In this note, we will describe how to estimate the parameters of GMM and HMM models using expectation-maximization method. The equations and discussion is heavily...
gbo expectation maximization

Some interesting papers from ASRU 2019

The IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) 2019 ended last week, and here is a list (with very brief summaries) of some...
conference speech processing

Highlights from SANE 2019

I attended the Speech and Audio in the Northeast (SANE) 2019 conference at Columbia University last Thursday, and in this post, I will try to...
conference speech processing

ACL 2019: Notes from an ASR perspective

I did not attend ACL 2019 in Florence, Italy. I did, however, go through several videos (all the videos of oral presentations are available here),...
speech recognition natural language processing conference summary

Iterative Scaling and Coordinate Descent

Recently, I was reading a paper on language model adaptation, which used an optimization technique called Generalized Iterative Scaling (GIS). Having no idea what the...
machine-learning optimization maxent

Experiments with Subword Modeling

Think about tasks such as machine translation (MT), automatic speech recognition (ASR), or handwriting recognition (HWR). While these appear very distinct, on abstraction they share...
subword machine-learning speech-recognition

Award-winning classic papers in ML and NLP

I was trying to find a consolidated list of papers in machine learning (ICML, NIPS, AAAI, SIGIR) and natural language processing (ACL, EMNLP, NAACL) published...
machine-learning natural-language-processing

Transfer Learning in NLP

Transfer learning is undoubtedly the new (well, relatively anyway) hot thing in deep learning right now. In vision, it has been in practice for some...
natural language processing transfer learning

How to Obtain Sentence Vectors

In several of my previous posts, I have discussed methods for obtaining word embeddings, such as SVD, word2vec, or GloVe. In this post, I will...
representation learning

Online Learning of Word Embeddings

Word vectors have become the building blocks for all natural language processing systems. I have earlier written an overview of popular algorithms for learning word...
online learning representation learning

Irony Detection in Tweets

There was a SemEval 2018 Shared Task on “irony detection in tweets” that ended recently. As a fun personal project, I thought of giving it...
natural language processing representation learning

Unsupervised Approaches for NMT

Translation is one of those tasks in language where the arrival of deep learning systems, and in particular sequence-to-sequence, has been something like a boon....
deep learning natural language processing machine translation

Beyond Euclidean Embeddings

Representation learning, as the name suggests, seeks to learn representations for structures such as images, videos, words, sentencences, graphs, etc., which may then be used...
representation learning

Deep Learning for Multimodal Systems

When I was browsing through research groups for my grad school applications, I came across some interesting applications of new deep learning methods in a...
deep learning multimodal

Trends in Semantic Parsing - Part 2

In Part 1 of this two-part series, I discussed some supervised approaches for the objective. In this part, we will look at some unsupervised or...
natural language processing semantic parsing

The Best Papers at ICLR 2017

The International Conference on Learning Representations (ICLR) has evolved into the deep learning conference over the last few years, and with its open review system,...
deep learning conference summary

The Last 3 Years in Text Classification

While working on my undergrad thesis on relation classification of biomedical text using deep learning methods, I quickly hacked together models in Tensorflow that combined...
natural language processing deep learning text classification

Understanding Word Vectors

This article is a formal representation of my understanding of vector semantics, from course notes and reading reference papers and chapters from Jurafsky’s SLP book....
deep learning natural language processing representation learning

Trends in Semantic Parsing - Part 1

In this article, I will try to round up some (mostly neural) approaches for semantic parsing and semantic role labeling (SRL). This is not an...
natural language processing semantic parsing

Metrics for NLG Evaluation

Simple natural language processing tasks such as sentiment analysis, or even more complex ones like semantic parsing are easy to evaluate since the evaluation simply...
machine translation natural language processing natural language generation