Preparing Japanese Audio Datasets for TensorFlow
Note: The purpose of this post is as a personal reflection and not as a tutorial.
These datasets to be used with TensorFlow are available here.
To to set up these datasets we will follow this guide:
JSUT
https://sites.google.com/site/shinnosuketakamichi/publication/jsut
JSUT is a japanese speech dataset consisting of about 10h of a single female speaker. The transcipt was designed to cover common use words.
Common Voice Version 6
TensorFlow datasets only has version 1 of this dataset which does not have Japanese.
Version 6 has 5h total Japanese speech with 3h of it validated.
Tatoeba Japanese
Japanese sentences that contain audio on Tatobeta.
Consists of about 1h of Japanese speech made up from 1525 sentences.
Zip file can be downloaded here.
JVS Corpus
More information available at dataset’s webpage
Japanese speech dataset.
About 30h hours with 100 speakers.
Contains some of the same sentences as jsut.