Overview

The Speech Test Videos corpus consists of recordings of common speech test materials (see table below). The recordings incorporate multiple talkers and repetitions, with utterance length and context ranging from isolated vowels and syllables to highly-predictable sentences. Brief test utterances that are embedded in the carrier phrase (“You will mark ____ please.” ) have been marked for easy extraction of the audio. The recorded audio and video signals have been carefully synchronized. For most sets of materials, recordings were made with two male and two female talkers; in a few sets it was one male and one female.

Click on the titles in the table below for more information on each group of recordings.

Test Material Carrier Phrase Talkers Repetitions Number of Items
CV syllables yes 2M, 2F 3 1320 (330 per talker)
VC syllables yes 2M, 2F 3 1200 (300 per talker)
hVd syllables yes 2M, 2F 3 180 (45 per talker)
MRT words yes 2M, 2F 3 3276 (819 per talker)
Numbers 0-10 yes 1M, 1F 3 66 (33 per talker)
Numbers 0-99 no 1M, 1F 3 600 (300 per talker)
High-Probability Spin Sentences no 2M, 2F 1 200 (200 per talker)
Nonsense Sentences no 2M, 2F 1 200 (200 per talker)
Structured Sentences no 2M, 2F 1 2000 (500 per talker)

File Formats

 

All materials are available as:

(i) videos — with full HD 1920×1080 pixel resolution, stored in .mov files with H.264 video encoding and a single channel of uncompressed 24-bit audio;
and
(ii) audio-only — single-channel, 24-bit .wav files.

In the corpora that use carrier phrases (see list above), the .mov files for the test items spoken in the carrier phrase include chapter markers indicating the start and end time of the item within the phrase. Matlab .mat files are also provided that contain the key-word start and stop times for each file.

All signal levels have been normalized to an RMS of -25 dB re Full Scale; .mat files provide the original RMS levels.

CV Syllables

Test Material Carrier Phrase Talkers Repetitions Number of Items
CV syllables yes 2M, 2F 3 1320

Each of 22 initial consonants were combined with each of five vowels to form 110 CV syllables.

Initial Consonants

/ p, t, tʃ, k, f, ɵ, s, ʃ, b, d, ʤ, g, v, ð, z, ʒ, m, n, ɭ, r, w, j /

Vowels

/ ɑ, eɪ, i, o, u /

Sample

>M1

VC Syllables

Test Material Carrier Phrase Talkers Repetitions Number of Items
VC syllables yes 2M, 2F 3 1200

Twenty final consonants were combined with the same five vowels as used with CVs to form 100 VC syllables.

Final Consonants

/ p, t, tʃ, k, f, ɵ, s, ʃ, b, d, ʤ, g, v, ð, z, ʒ, m, n, ɭ, r /

Vowels

/ ɑ, eɪ, i, o, u /

Sample

F1

hVd Syllables

Test Material Carrier Phrase Talkers Repetitions Number of Items
hVd syllables yes 2M, 2F 3 180

Fifteen vowels were spoken in hVd context in the carrier phrase.

Vowels

/ i, ɪ, e, æ, ɑ, ɔ, o, ʊ, u, ɜr, eɪ, ʌ, aɪ, ɔɪ, aʊ /

Sample

M2

MRT Words

Test Material Carrier Phrase Talkers Repetitions Number of Items
MRT words yes 2M, 2F 3 3276

The words of the modified rhyme test (House et al., 1965) were recorded in the carrier phrase.

went sent bent dent tent rent
hold cold told fold sold gold
pat pad pan path pack pass
kit bit fit hit wit sit
must bust gust rust dust just
teak team teal teach tear tease
din dill dim dig dip did
bed led fed red wed shed
pin sin tin fin din win
dug dung duck dud dub dun
sum sun sung sup sup sud
seep seen seethe seek seem seed
not tot got pot hot lot
vest test rest best west nest
pig pull pin pip pit pick
back bath bad bass bat ban
way may say pay day gay
pig big dig wig rig fig
pale pace page pane pay pave
cane case cape cake came cave
shop mop cop top hop pop
coil oil soil toil boil foil
tan tang tap tack tam tab
fit fib fizz fill fig fin
same name game tame came fame
peel reel feel eel keel heel
hark dark mark bark park lark
heave hear heat heal heap heath
cup cut cud cuff cuss cub
thaw law raw paw jaw saw
pig big dig wig rig fig
pen hen men then den ten
puff puck pub pus pup pun
bean beach beat beak bead beam
heat neat feat seat meat beat
dip sip hip tip lip rip
kill kin kit kick king kid
hang sang bang rang fang gang
took cook look hook shook book
mass math map mat man mad
ray raze rate rave rake race
save same sale sane sake safe
fill kill will hill till bill
sill sick sip sing sit sin
bale gale sale tale pale male
wick sick kick lick pick tick
peace peas peak peach peat peal
bun bus but bug buck buff
sag sat sass sack sad sap
fun sun bun gun run nun

Sample

F2

Numbers 0-10

Test Material Carrier Phrase Talkers Repetitions Number of Items
Numbers 0-10 yes 1M, 1F 3 66

The eleven numbers (“zero’, “one”, . . ., “ten”) were spoken by two talkers with the carrier phrase.

Sample

F1

Numbers 0-99

Test Material Carrier Phrase Talkers Repetitions Number of Items
Numbers 0-99 no 1M, 1F 3 600

The one-hundred numbers (“zero’, “one”, . . ., “ninety-nine”) were spoken by two talkers with no carrier phrase.

Sample

M1

High-Probability Spin Sentences

Test Material Carrier Phrase Talkers Repetitions Number of Items
High-Probability Spin Sentences no 2M, 2F 1 200

This set comprises 200 high-context SPIN sentences (Bilger et al., 1984); for example, “Hold the baby on your lap.”

Sample

M2

Nonsense Sentences

Test Material Carrier Phrase Talkers Repetitions Number of Items
Nonsense Sentences no 2M, 2F 1 200

This set comprises 200 syntactically-correct but semantically-meaningless sentences; for example, “He was fought by the highway on your lap.”

These sentences were constructed so that the final word in the ith sentence is the same as that in the ith High-Probability Spin Sentence.

Sample

F2

Structured Sentences

Test Material Carrier Phrase Talkers Repetitions Number of Items
Structured Sentences no 2M, 2F 1 2000

Two thousand unique five-word syntactically-constrained sentences were constructed by selecting randomly from ten choices for each word, given below. This set was divided into four sets of 500 sentences; one set was spoken by each of the four talkers.

Peter got three large desks
Kathy sees nine small chairs
Lucy bought seven old tables
Allen gives eight dark toys
Rachel sold four heavy spoons
William prefers nineteen green windows
Steven has two cheap sofas
Thomas kept fifteen pretty rings
Doris ordered twelve red flowers
Nina wants sixty white houses

Sample

F1

Purchase

Speech materials are sold in the following sets, which can be ordered for individual talkers.

Test Material Audio Video
CV syllables $400 $600
VC syllables $400 $600
hVd syllables $100 $200
MRT words $700 $1100
Numbers 0-10 $100 $200
Numbers 0-99 $400 $600
High-Probability Spin Sentences $500 $900
Nonsense Sentences $500 $900
Structured Sentences $800 $1300

Notes:

  • Sets of videos include the corresponding sets of audio-only .wav files.
  • All selections include multiple repetitions if available.
  • Price listed is for one talker

For additional talkers, add:

  • 75% of the one–talker price for the second talker
  • 50% of the one–talker price for the third talker
  • 25% of the one–talker price for the fourth talker

Thus, for all four talkers, multiply the one-talker price by 2.5.

To purchase STeVi please contact Sensimetrics.