Skip to content

Latest commit

 

History

History
77 lines (56 loc) · 3.49 KB

README.md

File metadata and controls

77 lines (56 loc) · 3.49 KB

PAVOQUE Corpus of Expressive Speech

Corpus design

A single speaker, multi-style corpus of German speech, with a large neutral subset, and subsets acting out four different expressive speaking styles, named for virtual characters in the SEMAINE and IDEAS4GAMES projects (quoting the original directors instructions):

  • Poppy ist fröhlich, optimistisch und sieht das Gute an allen Dingen! (Poppy is cheerful and optimistic.)
  • Obadiah ist von Natur aus niedergeschlagen und blickt pessimistisch in die Zukunft... (Obadiah is gloomy and pessimistic.)
  • Spike ist aggressiv und geht keinem Streit aus dem Weg! (Spike is aggressive and confrontational.)
  • Max ist ein ausgekochter Pokerspieler. Er ist cool, ihn bringt nichts aus der Ruhe. (Max is a hard-boiled poker player. He is cool and laid-back.)

The speaker is Stefan Röttig, a male native speaker of German trained as a professional actor and baritone opera singer.

Data format

Audio

The audio data is provided in the losslessly compressed FLAC format, which can be played by a myriad of software, including Praat. It is sampled at a a rate of 44.1 kHz, with 16 bits per sample, in mono. No filters of any sort have been applied to this raw data, and low-pass filtering at 50 Hz is recommended.

Phonetic segmentation

Annotations are provided as one YAML file per style. These files are lists of utterances, each of which contains

  • a prompt code (file basename),
  • the utterance text,
  • the speaking style,
  • utterance start and end times (in seconds) in the FLAC file,
  • optionally, the (manually corrected) phonetic segments, each of which has
    • a label (based on SAMPA, _ denotes silence), and
    • its end time (in seconds), relative to that utterance's start time

For example,

- prompt: spike0008
  text: Ach ja?
  style: angry
  start: 27.0
  end: 28.92
  segments:
  - {lab: H#, end: 0.280902}
  - {lab: '?', end: 0.324898}
  - {lab: a, end: 0.408238}
  - {lab: x, end: 0.475}
  - {lab: j, end: 0.61}
  - {lab: 'a:', end: 0.963273}
  - {lab: _, end: 1.915}

Downloading the data

Use the links on the releases page, or run the download task (see below).

Converting the data

For convenience, the utterances for each subset can be be extracted from the YAML and FLAC files using simple commands to run Gradle tasks. After cloning or downloading and unpacking this repository, run ./gradlew tasks (or gradlew tasks on Windows) for details.

Prerequisites

You will need Java to run the Gradle tasks. Extracting the utterances to WAV files also requires sox to be installed.

Copyright and license

Copyright 2009 DFKI GmbH.

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Contact

In case of issues, please open a new issue.