Personal tools

Blizzard Challenge 2008 Rules

From SynSIG



  • You will receive a separate message about how to download this


  • A registration fee of 750USD is due to offset the costs of running the challenge, including paying undergraduate listeners. This must be paid by the time you submit your test examples. You can pay this fee using Edinburgh University's epay system ( where you should register for the event called 'Blizzard Challenge 2008'. After doing this, please also email to notify us that you have paid.


  • Each participant is expected to provide ten speech experts as listeners of the evaluation tests. English or Mandarin native speakers (as appropriate) are preferable, where possible.


  • Each participant should build three synthetic voices (two from the UK English database and one from the Mandarin database). It is permissible to submit fewer voices, but we strongly encourage you to complete the full challenge because this will be more informative.
  • It is not permissible for a single participant to submit multiple entries for any of the voices (because the listening test will become unmanageable).
  • All three voices should be built using essentially the same synthesis method. For example, you cannot use a concatenative method for one voice and HMM synthesis for other voices.

Voices to be built

  • Voice A: from the full UK English database (about 15 hours)
  • Voice B: from the ARCTIC subset of the UK English database (about 1 hour)
  • Voice C: from the full Mandarin database (about 6.5 hours)


  • "External data" is defined as data, of any type, that is not part of the provided database.
  • You are allowed to use external data. You must follow one of these two sets of rules (and the same one for all three voices):
    • Standard rules: You may use external data to construct these parts of your system:
      • text normalisation
      • lexicon & letter-to-sound
      • duration model
      • F0 model
      • aligner (i.e., any component used only to label the database, such as a set of HMMs used for forced alignment)
    • Voice conversion rules: You may use external data in any way you wish
  • In essence, if there is any possibility that your system could sound like a different speaker than the database speaker, then your system should be classified as a voice conversion type of system.

* When building a voice from a subset of the data, you should treat the remainder of the data as 'external data'

  • If you are in any doubt about how to apply these rules, please contact the organizers immediately.


  • No manual intervention is allowed during synthesis. This includes, but is not limited to:
    • "Prompt sculpting"
    • Altering existing entries in your lexicon (however, you are allowed to add new words)
    • Using different subsets of the database for different test sentences or sentence types, unless this is a fully automatic part of your system


  • We are not releasing details of the listening test design at this time, because you should not be tailoring your voice building to it. It will contain similar sections to previous challenges along with new ones, and you will need to synthesise several hundred sentences from text.
  • For voice conversion-type systems, there will be an additional component of the test, to judge how close the system sounds to the database speaker. If the listening test design allows, we will perform this test for all standard systems too.
  • Any examples that you submit for evaluation may be retained by the Blizzard organisers for future use. We hope to be able to distribute them in anonymised form to all participants, or publically, subject to participants' consent.


  • Each participant will be expected to submit a four-page paper describing their entry for review.
  • One of the authors of each accepted paper should present it at the Blizzard 2008 Workshop, which we hope will be a satellite of Interspeech 2008 in Australia
  • In addition, each participant will be expected to complete a form giving the general technical specification of their system, to facilitate easy cross-system comparisons (e.g. is it unit selection? does it predict prosody? etc. etc)


  • This is a challenge, which is designed to answer scientific questions, and not a competition. Therefore, we rely on your honesty in preparing your entry.

SynSIG is a Special Interest Group of ISCA, the International Speech Communication Association.

SynSIG 1998-2019