Personal tools

Blizzard Challenge 2015 Rules

From SynSIG

Contents

DATABASE ACCESS

REGISTRATION FEE

  • A registration fee of 800 USD or 700 EUR is payable by all participants to offset the costs of running the challenge, including paying local assistants and listeners. The details of the Blizzard 2015 registration fees are as follows:

1. All payments are made in the name of "International Institute of Information Technology" Hyderabad.

2. Please download the PDF file (Registration fee details) to wire transfer the money in INR/USD/EUR.

3. If you need an invoice to enable this wire transfer, please let us know the "Name, Affiliation, Complete address" and send an email to blizzard@festvox.org. We will send an invoice asap.

4. The fee must be paid by May 12, 2015.

EXPERT LISTENERS

  • Each participant should try to recruit at least a few volunteer listeners for the each of the evaluation tests. Native speakers are preferable, where possible. The organisers would also appreciate assistance in advertising the Challenge as widely as possible (e.g., to your students or colleagues).

MATERIALS PROVIDED

All participants will have access to the following materials (subject to signing the license):

  • Indian languages: About 4 hours of speech data in each of three Indian languages (Hindi, Tamil and Telugu), and about 2 hours of speech data in each of other three Indian languages (Marathi, Bengali and Malayalam) recorded by native professional speakers in high quality studio environments. Text is provided in UTF-8 format. No other information, such as segment labels, is provided.

These speech databases are provided by the group of institutions: IIT-Madras, IIIT-Hyderabad, SSNCE, CDAC Trivandrum, CDAC Mumbai, CDAC Kolkata.

THE CHALLENGES

This year there are two parts to the Blizzard Challenge: the Hub tasks, and Spoke tasks on Indian language data.

  • It is not permissible for a single participant to submit multiple entries to any task, because the listening test will become unmanageable. This rule may be relaxed in the event of a small number of participants.
  • Participants involved in joint projects or consortia who wish to submit multiple systems (e.g., an individual entry and a joint system) should contact the organisers in advance to agree this. We will try to accommodate all reasonable requests, provided the listening test remains manageable.
  • It is strongly encouraged to participate in all tasks and not to "cherry pick".
  • For all tasks, synthetic speech may be submitted at any sampling rate (but always at 16 bits per sample). Waveforms will not be downsampled for the listening test

Hub task

  • Build one voice in each language from the provided speech data (wav/ directory), sampled at 16 kHz, and the corresponding text in UTF-8 format (train.done.data). Any other information that may be included in the distributions, such as segment labels, phone set and Roman transliteration available from the sample Festvox voice builds (e.g., the .lab, .sl and .slehmm files) is not officially provided or endorsed as a part of this challenge, although may be used by participants if they wish. In all cases test material will be provided as UTF-8 text only (in a similar format to train.done.data). The subtasks are numbered as follows:
    • 2015-IH1.1 Bengali
    • 2015-IH1.2 Hindi
    • 2015-IH1.3 Malayalam
    • 2015-IH1.4 Marathi
    • 2015-IH1.5 Tamil
    • 2015-IH1.6 Telugu

Spoke task

The purpose of this task is build a multilingual synthesis i.e., Indian language + English.

Training: Indian language (ex: Telugu) uttered by speaker A. Note that the training data provided for the Indian language may not contain any English words at all.

Test: Telugu with speaker A's voice and English with speaker A's voice

Example test sentences:

"యూఈఏ నిర్దేశించిన 286 పరుగుల లక్ష్యాన్ని మరో 12 బంతులు మిగులుండగానే ఛేదించింది. 48 ఓవర్లలో 6 వికెట్లు కోల్పోయి 286 పరుగులు చేసింది. http://telugu.oneindia.com"

We are not providing language tags in the test sentence. As the text is in UTF-8, it is easy to identify the language from the Unicode point (the idea is to simulate the way the text is available on the webpages without much information).

    • 2015-IH2.1 Bengali
    • 2015-IH2.2 Hindi
    • 2015-IH2.3 Malayalam
    • 2015-IH2.4 Marathi
    • 2015-IH2.5 Tamil
    • 2015-IH2.6 Telugu

USE OF EXTERNAL DATA

  • "External data" is defined as data, of any type, that is not part of the provided database.
  • You are allowed to use external data in any way you wish, subject to any exclusions given in these rules
  • Use of external data is entirely optional and is not compulsory
  • You must use the provided audio files
  • You must not use any additional speech data from the same speakers
  • You may exclude any parts of the provided databases if you wish.
  • Use of any provided segmentations, transcriptions or labels is optional.
  • If you are in any doubt about how to apply these rules, please contact the organizers immediately.

SYNTHESISING THE TEST EXAMPLES

  • The exact nature of the test set will not be revealed in advance, but is likely to include both sentence, paragraph and possibly longer texts from a similar domain to the provided corpus, as well as texts from other domains. Formal listening tests will be conducted to evaluate the synthetic speech submitted.

RETENTION OF SUBMITTED SYNTHETIC SPEECH SAMPLES

  • Any examples that you submit for evaluation will be retained by the Blizzard organisers for future use.
  • You must include in your submission of the test sentences a statement of whether you give the organisers permission to publically distribute your waveforms and the corresponding listening test results in anonymised form. In the past, all participants have agreed to this and we strongly encourage you to give this consent.

LISTENING TEST

  • The Blizzard organisers will conduct a listening test design which will probably include the standard elements used in previous years (naturalness, speaker similarity, intelligibility) and will be extended to include additional tests specific to the audiobook reading task, including the synthesis of multi-sentence paragraphs.

PAPER

  • Each participant will be expected to submit a six-page paper describing their entry for review.
  • One of the authors of each accepted paper should present it at the Blizzard 2015 Workshop
  • In addition, each participant will be expected to complete a form giving the general technical specification of their system, to facilitate easy cross-system comparisons (e.g. is it unit selection? does it predict prosody? etc. etc)

HOW ARE THESE RULES ENFORCED?

  • This is a challenge, which is designed to answer scientific questions, and not a competition. Therefore, we rely on your honesty in preparing your entry.

SynSIG is a Special Interest Group of ISCA, the International Speech Communication Association.

SynSIG 1998-2019