Research
Sensimetrics
conducts research and development in speech science, hearing
science, and
related areas, for government agencies and private clients.
Drawing on the expertise and innovation of its research staff,
Sensimetrics has compiled a successful track record of
technology development supported by grants and contracts. The
majority of these projects have been sponsored by the National
Institute on Deafness and Other Communicative Disorders; other
governmental sources of research support have been the National
Cancer Institute, the National Eye Institute, the National
Institute on Occupational Safety and Health, and the Department
of Defense. Some research projects have been or are being
conducted in collaboration with academic and clinical
researchers, principally at Boston University, the Massachusetts
Institute of Technology, the Massachusetts Eye and Ear
Infirmary, and the University of Massachusetts. Some of our
private research clients have included Grason Stadler
(audiometric equipment); Articulate Systems, Inc., (speech
recognition systems for personal computers); Creative
Technology, Ltd., (computer peripherals); Shure Brothers, Inc.,
(microphones); MCI Telecommunications, Inc.; and ReSound
Corporation and Starkey Laboratories (hearing aids).
The following is a summary of the major research projects
currently underway at Sensimetrics.
A Wireless
Self-Contained Tactile Aid
This work is
aimed at the development of a tactile aid for the deaf contained
entirely in a single small unit that can be worn on an arm like
a wristwatch. Two models of such a tactile aid will eventually
be developed. The first, which is the primary motivation for the
proposed effort, is a unit that provides speech-based tactile
stimulation for infants and young children during the time
period when they are awaiting a cochlear implant. This unit is a
small vibration-display device designed with special attention
given to the safety and usage requirements for this population.
A second model will be an adult unit that will be slightly
larger than the children’s unit and will contain additional
wireless receiver functionality for displaying alarms signaling
remote events (e.g., doorbell, smoke alarm) and for delivering
speech sensed by a remote microphone. [Work supported by a grant
from NIH/NIDCD.]
Speech
Synthesis for Distance Cueing in Audio Displays
This project asks the basic question “If we can’t see someone
who is talking to us, how is it that we can still know how far
away that person is?” Acoustic cues in speech that give us
information on the physical distance between a talker and
listener have been studied in the past, but not in a
comprehensive effort such as this one. The work is developing
(1) a prototype speech synthesizer capable of manipulating the
vocal effort of speech for use in text-to-speech synthesis, and
(2) a prototype real-time system dedicated to modifying the
apparent vocal effort of an arbitrary speech signal input. These
two objectives will be achieved by (1) thorough acoustic
analysis of actual talkers speaking to listeners at various
distances up to 32m; (2) using synthetic speech to manipulate
individual acoustic parameters likely to be cues for distance,
such as duration changes in vowels and consonants; and (3)
conducting perceptual tests of the synthetic speech to identify
the important and necessary acoustic cues for distance. This
work will ultimately lead to more realistic virtual audio
environments. [Work supported by a contract from AFOSR.]
Application of
Cortical Processing Theory to Acoustical Analysis
The goal of
this STTR program is to formulate a template-matching operation,
with perception-related rules of integration over time and
frequency at its core, in the context of human perception of
degraded speech. In particular, we aim at developing models of
auditory processing capable of predicting consonant confusions
by normally-hearing listeners, under a variety of acoustic
distortions. A prerequisite is to formulate the signal
processing principles realized by the auditory periphery in
providing the observed graceful degradation of human performance
in noise. [Work supported by a contract from AFOSR.]
Acoustic
Processing of Speech to Improve Electrolarynx Communication
Over half of laryngectomy patients use an electrolarynx (EL)
to communicate, but current EL devices produce speech that has
poor quality (“non-human sounding”) and reduced intelligibility.
The acoustic deficits in EL speech inhibit the ability of
laryngectomy patients to communicate, thus reducing their
functional capability and quality of life. The long-term goal of
this project is to develop real-time speech processing
technology to improve the naturalness and intelligibility of EL
speech, thereby improving EL communication and the quality of
life for laryngectomy patients. [Work supported by a grant from
NIH/NIDCD.]
Exploiting
nervous-system rhythmicity for spoken-word recognition
This project aims to develop a model for recognizing spoken
words based upon principles of neural computation. By exploiting
the presumed role of nervous-system rhythms in neural
computation, a model of time-frequency integration of signals
that are a few hundreds of milliseconds long (e.g. whole words)
will be developed. The Phase I project is aimed at the
evaluation of a model for recognizing diphones (i.e. speech
segments of duration of few tens of milliseconds). The model
utilizes a template matching circuit (TMC) inspired by presumed
principles of cortical neural processing (Hopfield, 2004), with
a sub-threshold gamma oscillatory input (with a frequency of
about 30 Hz) at its core. One property of the TMC is
insensitivity to time-scale variations of the input stimuli.
Such a property is needed to recognize speech tokens that
inherently exhibit phonemic variability. [Work supported by a
grant from NSF.]
Spoken word
recognition by humans: a single- or a multi- layer process?
The study emerges from the growing understanding of the
presumed role of nervous-system rhythmicity in neural
computation. In particular, the exploitation of the theta
oscillations (of duration of about 200ms) allows a feasible
single-layer model of neural computation for spoken word
recognition. We seek psychophysical validity for a single-layer
approach to lexical access. Current models of spoken-word
recognition by humans adopt, almost exclusively, a multi-layered
approach; i.e. phones are identified first, and the ordered
sequence of identified phones results in a pointer to the
lexicon. However, an accumulation of indirect observations put
in question such an approach, at least for words that are
frequently used. This research program is aimed at providing
formal psychophysical support for a possible single-layer
process for lexical access. [Work supported by a contract from
AFRL.]
Improved Sound Processing for Hearing Devices
We are actively
pursuing improved sound processing for hearing aids and cochlear
implants. A current focus point is on the development of
improved multiple microphone noise reduction strategies. We have
developed novel noise reduction strategies that take advantage
of multiple microphones over a single ear, as well as for over
both ears. Our evaluations illustrate substantial benefit of
these strategies for listeners in the presence of background
noise. We are actively pursuing novel dynamic range adaptation
strategies that are based on the mechanisms that are used in a
healthy ear. The goal of the adaptation strategy is to provide
an improved loudness mapping for listeners as they move from
quiet to noisy environments. In addition to the noise reduction
and dynamic range strategies, we have developed a sound
processing strategy that encodes fine timing cues in electric
stimulation patterns for cochlear implant users. The potential
benefits of this fine timing strategy is currently under review,
but preliminary results indicate a significant improvement in
pitch perception of simple stimuli. [Work supported by a grant
from NIH/NIDCD.]
Signal
Detection and Speech Reception in Reverberation
Despite
the fact that listening in everyday situations almost always
takes place in acoustically-reflective surroundings, there has
been very little work on auditory signal detection in
reverberation. Work in this project is aimed at developing a
model for human performance in detecting sounds in reverberant
spaces. The results of this modeling effort will later be
applied to the prediction of speech intelligibility in rooms.
Such predictive methods can be applied to the design of rooms,
such as classrooms, in which speech communication quality is a
critical element. [Work done in collaboration with the
University of Massachusetts and Boston University, supported by
grants from NIH/NIDCD].
Processing
of Degraded Speech by the Auditory System
This research program aims at
formulating signal processing principles realized by the
auditory system, in particular when the input signal is speech. This
effort is motivated by the general observation that while human
performance in tasks related to speech intelligibility
deteriorates only modestly with worsening environmental
conditions – even for tasks with a minimal cognitive load –
the performance of machines using simulated auditory-nerve (AN)
representations generated by state-of-the-art auditory models
deteriorates at a much faster rate. An overall aim of this
program is to revise current models of auditory processing to
better reflect human performance. Two specific aims are:
(1) to model the role of the descending pathway in stabilizing
AN representations of speech sounds in degraded acoustic
conditions. Current models of the auditory periphery are based
upon the ascending pathway up through the AN. We study the role
of the descending pathway (mainly the MOC feedback mechanism),
and its interaction with the ascending pathway, in processing
speech; (2) to model post-AN functions that play a role in
robust extraction of important acoustic-phonemic cues from the
AN firing patterns. The methodology used combines
analytical measurements of human speech discrimination and
auditory modeling, all using the same database of diagnostic
speech materials. The speech material is subjected to parametric
modifications in the time and frequency dimensions, and/or
degraded systematically (by additive speech-shaped noise, room
reverberations, etc.). The validity of a proposed auditory model
is tested by utilizing it as a “front-end” in machines
specifically designed to mimic the corresponding psychophysical
experiments. [Work supported by a contract from AFOSR.]
Simulation
of Hearing Loss and Prosthetic Devices
Audiologists and counselors need a good and valid demonstration of hearing
loss. The parents of a hearing-impaired child, for example, can
benefit greatly from experiencing first-hand the communication
difficulties faced by their child. Unlike visual deficits, which
are usually very easy to demonstrate, hearing loss is much more
difficult. In this project we are developing an electroacoustic
signal-processing system for hearing loss simulation, hearing
aid simulation, or cochlear implant simulation.
This hearing loss simulator will be a portable, real-time device that a
person can use in any sound environment. [Work supported by a
grant from NIH/NIDCD].
Audio
System for fMRI
Functional
magnetic resonance imaging (fMRI) is being used extensively to
produce images of brain activity while a subject actively
performs a task or receives sensory stimulation. The process of
acquiring fMRI brain images in response to auditory stimulation
is especially difficult because of both the background noise
generated by the fMRI equipment and the intense magnetic field,
which excludes the use of many materials commonly used in audio
devices. In this project we are develop a complete audio system
for research and clinical applications requiring high-quality
audio stimulation. This goal will be achieved by using a
combination of a circumaural muff with active noise reduction
and an insert receiver/earplug, with stimuli delivered by small
electrostatic transducers. [Work supported by a grant from NIH/NIDCD
and done in collaboration with the Massachusetts Eye and Ear
Infirmary].
Hand-Operated
Speech Synthesis
Many people who lose
their voice due to trauma or cancer rely on a text-to-speech (TTS)
system or an electrolarynx (EL) to produce speech.
State-of-the-art TTS systems can provide intelligible and
somewhat natural speech at a rate as fast as the user can type,
while an EL can produce speech at a normal rate but with
decreased naturalness and intelligibility. This research
explores a person's potential for controlling the HLsyn
speech
synthesizer in real time with one hand using a pen-like device,
a strategy that could produce intelligible speech at a normal
rate that is more natural than either a TTS system or EL speech.
[Work supported by a grant from
NIH/NIDCD].

Click
here to see and hear videos of the One Hand Operated
Synthesizer.
Portable Voice Accumulator
The most common voice disorders are likely a result of abusive vocal behavior patterns, yet these disorders are difficult to assess and
effectively rehabilitate because clinicians must rely on a patient's self-report of voice use, which is subjective and potentially error-prone.
The Portable Voice Accumulator (PVA) is an ambulatory monitoring and feedback system developed to record 12 hours of a patient's voice use and
to provide biofeedback in real-time to the patient. The PVA records estimates of fundamental frequency and sound pressure level eight times per
second, which can be used to produce graphical summaries of the patient's daily voice use. Additionally, the PVA's capability to provide biofeedback
may enhance patient compliance with instructions on voice use given during voice therapy. Currently, PVA prototypes are being tested on wearers with
normal voices and with voice disorders. In the long-term, clinical use of the PVA is expected to improve the diagnosis and treatment of the most
common types of voice disorders in which the identification of harmful patterns of voice use and/or compliance with vocal retraining are
considered critical elements to effective management. The PVA will also provide unprecedented research capability to directly investigate and test
basic theories about the etiology and treatment of voice disorders. [Work supported by a grant from
NIH/NIDCD].
Advanced
Hearing Protectors
Hearing
protectors are effective when properly used. But workers often
resist using them partly because they interfere with their
ability to hear important sounds and to communicate with others.
This project is aimed at developing advanced hearing protectors
that combines maximal attenuation of ambient sounds with signal
processing that extracts the most important components of the
sound field for controlled presentation to the user. By
processing the signals from a microphone array mounted on the
headband of the hearing protector, this device enhances desired
signals from a specified "look" direction relative to
signals from other directions, allowing face-to-face acoustic
communication in many high-noise environments where it would be
impossible otherwise. In addition, the signal processing is
designed to preserve sound localization ability. [Work supported
by a grant from CDC/NIOSH].
Haptic
Display of Space
Spatial
perception via touch is a primary compensation for the
capabilities lost when vision is diminished. For example, the
long cane is an effective travel aid for the visually impaired
primarily because the forces and torques sensed through the
handle are a natural means whereby users may infer the layout of
the environment. Such
haptic cues have not been explicitly incorporated into previous
electronic travel aids. In
our research we are developing a novel class of electronic
travel aids that utilize the cues that make the cane so
effective and intuitive. The
core of this research deals with psychophysical study of force
and torque cues and the application of the derived haptic coding
principles to the display of spatial information. These results
will apply not only to travel aids but will also enable
improvements in human-computer interfacing for applications such
as virtual environments. [Work supported by a grant from NIH/NEI].
Computer-Based
Instruction in Cued Speech
Cued
Speech is a method of communication between hearing and deaf
people that has been demonstrated to be highly effective. Cued
Speech uses a system of manual cues, in the form of hand shapes
and placements, that disambiguate confusable lip shapes. This
project is aimed at developing a computer-based instruction in
Cued Speech. While it will not completely eliminate the need for
one-on-one instruction, this instructional software will provide
a learner with the basic elements of Cued Speech along with
endless and easy-to-use opportunity for practice. [Work
supported by a grant from NIH/NIDCD].
|