About

The Free Spoken Digit Dataset is an open data set consisting of audio recordings of various individuals speaking the digits from 0-9, with 50 recordings of each digit per individual.

The data set can be though of as an audio version of the popular MNIST data set which consists of hand-written digits. However, the fact that the data consists of recordings of different length makes it more challenging to deal with than the fixed-size images of MNIST.

Models based on recurrent neural networks that can be implemented in PyTorch are a common approach for this task, and TorchFSDD aims to provide an interface to FSDD for such neural networks in PyTorch, by providing a torch.utils.data.Dataset wrapper that is ready to be used with a torch.utils.data.DataLoader.

Documentation Search and Index