Research


Kemar with Localization Test ArraySensimetrics conducts research and development in speech science, hearing science, and related areas, for government agencies and private clients. Drawing on the expertise and innovation of its research staff, Sensimetrics has compiled a successful track record of technology development supported by grants and contracts. The majority of these projects have been sponsored by the National Institute on Deafness and Other Communicative Disorders; other governmental sources of research support have been the National Cancer Institute, the National Eye Institute, the National Institute on Occupational Safety and Health, and the Department of Defense. Some research projects have been or are being conducted in collaboration with academic and clinical researchers, principally at Boston University, the Massachusetts Institute of Technology, the Massachusetts Eye and Ear Infirmary, and the University ofKemar with Localization Test Array Massachusetts. Some of our private research clients have included Grason Stadler (audiometric equipment); Articulate Systems, Inc., (speech recognition systems for personal computers); Creative Technology, Ltd., (computer peripherals); Shure Brothers, Inc., (microphones); MCI Telecommunications, Inc.; and ReSound Corporation and Starkey Laboratories (hearing aids).


The following is a summary of the major research projects currently underway at Sensimetrics.   

A Wireless Self-Contained Tactile Aid

This work is aimed at the development of a tactile aid for the deaf contained entirely in a single small unit that can be worn on an arm like a wristwatch. Two models of such a tactile aid will eventually be developed. The first, which is the primary motivation for the proposed effort, is a unit that provides speech-based tactile stimulation for infants and young children during the time period when they are awaiting a cochlear implant. This unit is a small vibration-display device designed with special attention given to the safety and usage requirements for this population. A second model will be an adult unit that will be slightly larger than the children’s unit and will contain additional wireless receiver functionality for displaying alarms signaling remote events (e.g., doorbell, smoke alarm) and for delivering speech sensed by a remote microphone. [Work supported by a grant from NIH/NIDCD.]

Speech Synthesis for Distance Cueing in Audio Displays

This project asks the basic question “If we can’t see someone who is talking to us, how is it that we can still know how far away that person is?” Acoustic cues in speech that give us information on the physical distance between a talker and listener have been studied in the past, but not in a comprehensive effort such as this one. The work is developing (1) a prototype speech synthesizer capable of manipulating the vocal effort of speech for use in text-to-speech synthesis, and (2) a prototype real-time system dedicated to modifying the apparent vocal effort of an arbitrary speech signal input. These two objectives will be achieved by (1) thorough acoustic analysis of actual talkers speaking to listeners at various distances up to 32m; (2) using synthetic speech to manipulate individual acoustic parameters likely to be cues for distance, such as duration changes in vowels and consonants; and (3) conducting perceptual tests of the synthetic speech to identify the important and necessary acoustic cues for distance. This work will ultimately lead to more realistic virtual audio environments. [Work supported by a contract from AFOSR.]

Application of Cortical Processing Theory to Acoustical Analysis

The goal of this STTR program is to formulate a template-matching operation, with perception-related rules of integration over time and frequency at its core, in the context of human perception of degraded speech. In particular, we aim at developing models of auditory processing capable of predicting consonant confusions by normally-hearing listeners, under a variety of acoustic distortions. A prerequisite is to formulate the signal processing principles realized by the auditory periphery in providing the observed graceful degradation of human performance in noise. [Work supported by a contract from AFOSR.]

Acoustic Processing of Speech to Improve Electrolarynx Communication

Over half of laryngectomy patients use an electrolarynx (EL) to communicate, but current EL devices produce speech that has poor quality (“non-human sounding”) and reduced intelligibility. The acoustic deficits in EL speech inhibit the ability of laryngectomy patients to communicate, thus reducing their functional capability and quality of life. The long-term goal of this project is to develop real-time speech processing technology to improve the naturalness and intelligibility of EL speech, thereby improving EL communication and the quality of life for laryngectomy patients. [Work supported by a grant from NIH/NIDCD.]

Exploiting nervous-system rhythmicity for spoken-word recognition

This project aims to develop a model for recognizing spoken words based upon principles of neural computation. By exploiting the presumed role of nervous-system rhythms in neural computation, a model of time-frequency integration of signals that are a few hundreds of milliseconds long (e.g. whole words) will be developed. The Phase I project is aimed at the evaluation of a model for recognizing diphones (i.e. speech segments of duration of few tens of milliseconds). The model utilizes a template matching circuit (TMC) inspired by presumed principles of cortical neural processing (Hopfield, 2004), with a sub-threshold gamma oscillatory input (with a frequency of about 30 Hz) at its core. One property of the TMC is insensitivity to time-scale variations of the input stimuli. Such a property is needed to recognize speech tokens that inherently exhibit phonemic variability. [Work supported by a grant from NSF.]

Spoken word recognition by humans: a single- or a multi- layer process?

The study emerges from the growing understanding of the presumed role of nervous-system rhythmicity in neural computation. In particular, the exploitation of the theta oscillations (of duration of about 200ms) allows a feasible single-layer model of neural computation for spoken word recognition. We seek psychophysical validity for a single-layer approach to lexical access. Current models of spoken-word recognition by humans adopt, almost exclusively, a multi-layered approach; i.e. phones are identified first, and the ordered sequence of identified phones results in a pointer to the lexicon. However, an accumulation of indirect observations put in question such an approach, at least for words that are frequently used. This research program is aimed at providing formal psychophysical support for a possible single-layer process for lexical access. [Work supported by a contract from AFRL.]

Improved Sound Processing for Hearing Devices

We are actively pursuing improved sound processing for hearing aids and cochlear implants. A current focus point is on the development of improved multiple microphone noise reduction strategies. We have developed novel noise reduction strategies that take advantage of multiple microphones over a single ear, as well as for over both ears. Our evaluations illustrate substantial benefit of these strategies for listeners in the presence of background noise. We are actively pursuing novel dynamic range adaptation strategies that are based on the mechanisms that are used in a healthy ear. The goal of the adaptation strategy is to provide an improved loudness mapping for listeners as they move from quiet to noisy environments. In addition to the noise reduction and dynamic range strategies, we have developed a sound processing strategy that encodes fine timing cues in electric stimulation patterns for cochlear implant users. The potential benefits of this fine timing strategy is currently under review, but preliminary results indicate a significant improvement in pitch perception of simple stimuli. [Work supported by a grant from NIH/NIDCD.]

Signal Detection and Speech Reception in Reverberation

Despite the fact that listening in everyday situations almost always takes place in acoustically-reflective surroundings, there has been very little work on auditory signal detection in reverberation. Work in this project is aimed at developing a model for human performance in detecting sounds in reverberant spaces. The results of this modeling effort will later be applied to the prediction of speech intelligibility in rooms. Such predictive methods can be applied to the design of rooms, such as classrooms, in which speech communication quality is a critical element. [Work done in collaboration with the University of Massachusetts and Boston University, supported by grants from NIH/NIDCD].    

Processing of Degraded Speech by the Auditory System

This research program aims at formulating signal processing principles realized by the auditory system, in particular when the input signal is speech. This effort is motivated by the general observation that while human performance in tasks related to speech intelligibility deteriorates only modestly with worsening environmental conditions – even for tasks with a minimal cognitive load – the performance of machines using simulated auditory-nerve (AN) representations generated by state-of-the-art auditory models deteriorates at a much faster rate. An overall aim of this program is to revise current models of auditory processing to better reflect human performance. Two specific aims are: (1) to model the role of the descending pathway in stabilizing AN representations of speech sounds in degraded acoustic conditions. Current models of the auditory periphery are based upon the ascending pathway up through the AN. We study the role of the descending pathway (mainly the MOC feedback mechanism), and its interaction with the ascending pathway, in processing speech; (2) to model post-AN functions that play a role in robust extraction of important acoustic-phonemic cues from the AN firing patterns. The methodology used combines analytical measurements of human speech discrimination and auditory modeling, all using the same database of diagnostic speech materials. The speech material is subjected to parametric modifications in the time and frequency dimensions, and/or degraded systematically (by additive speech-shaped noise, room reverberations, etc.). The validity of a proposed auditory model is tested by utilizing it as a “front-end” in machines specifically designed to mimic the corresponding psychophysical experiments. [Work supported by a contract from AFOSR.]

Simulation of Hearing Loss and Prosthetic Devices

Audiologists and counselors need a good and valid demonstration of hearing loss. The parents of a hearing-impaired child, for example, can benefit greatly from experiencing first-hand the communication difficulties faced by their child. Unlike visual deficits, which are usually very easy to demonstrate, hearing loss is much more difficult. In this project we are developing an electroacoustic signal-processing system for hearing loss simulation, hearing aid simulation, or cochlear implant simulation. This hearing loss simulator will be a portable, real-time device that a person can use in any sound environment. [Work supported by a grant from NIH/NIDCD]. 

Audio System for fMRI

Functional magnetic resonance imaging (fMRI) is being used extensively to produce images of brain activity while a subject actively performs a task or receives sensory stimulation. The process of acquiring fMRI brain images in response to auditory stimulation is especially difficult because of both the background noise generated by the fMRI equipment and the intense magnetic field, which excludes the use of many materials commonly used in audio devices. In this project we are develop a complete audio system for research and clinical applications requiring high-quality audio stimulation. This goal will be achieved by using a combination of a circumaural muff with active noise reduction and an insert receiver/earplug, with stimuli delivered by small electrostatic transducers. [Work supported by a grant from NIH/NIDCD and done in collaboration with the Massachusetts Eye and Ear Infirmary].   

Hand-Operated Speech Synthesis

Many people who lose their voice due to trauma or cancer rely on a text-to-speech (TTS) system or an electrolarynx (EL) to produce speech. State-of-the-art TTS systems can provide intelligible and somewhat natural speech at a rate as fast as the user can type, while an EL can produce speech at a normal rate but with decreased naturalness and intelligibility. This research explores a person's potential for controlling the HLsyn speech synthesizer in real time with one hand using a pen-like device, a strategy that could produce intelligible speech at a normal rate that is more natural than either a TTS system or EL speech. [Work supported by a grant from NIH/NIDCD].

Click here to see and hear videos of the One Hand Operated Synthesizer.


Portable Voice Accumulator

The most common voice disorders are likely a result of abusive vocal behavior patterns, yet these disorders are difficult to assess and effectively rehabilitate because clinicians must rely on a patient's self-report of voice use, which is subjective and potentially error-prone. The Portable Voice Accumulator (PVA) is an ambulatory monitoring and feedback system developed to record 12 hours of a patient's voice use and to provide biofeedback in real-time to the patient. The PVA records estimates of fundamental frequency and sound pressure level eight times per second, which can be used to produce graphical summaries of the patient's daily voice use. Additionally, the PVA's capability to provide biofeedback may enhance patient compliance with instructions on voice use given during voice therapy. Currently, PVA prototypes are being tested on wearers with normal voices and with voice disorders. In the long-term, clinical use of the PVA is expected to improve the diagnosis and treatment of the most common types of voice disorders in which the identification of harmful patterns of voice use and/or compliance with vocal retraining are considered critical elements to effective management. The PVA will also provide unprecedented research capability to directly investigate and test basic theories about the etiology and treatment of voice disorders. [Work supported by a grant from NIH/NIDCD].

Advanced Hearing Protectors

Hearing protectors are effective when properly used. But workers often resist using them partly because they interfere with their ability to hear important sounds and to communicate with others. This project is aimed at developing advanced hearing protectors that combines maximal attenuation of ambient sounds with signal processing that extracts the most important components of the sound field for controlled presentation to the user. By processing the signals from a microphone array mounted on the headband of the hearing protector, this device enhances desired signals from a specified "look" direction relative to signals from other directions, allowing face-to-face acoustic communication in many high-noise environments where it would be impossible otherwise. In addition, the signal processing is designed to preserve sound localization ability. [Work supported by a grant from CDC/NIOSH].   

Haptic Display of Space

Spatial perception via touch is a primary compensation for the capabilities lost when vision is diminished. For example, the long cane is an effective travel aid for the visually impaired primarily because the forces and torques sensed through the handle are a natural means whereby users may infer the layout of the environment.  Such haptic cues have not been explicitly incorporated into previous electronic travel aids.  In our research we are developing a novel class of electronic travel aids that utilize the cues that make the cane so effective and intuitive.  The core of this research deals with psychophysical study of force and torque cues and the application of the derived haptic coding principles to the display of spatial information. These results will apply not only to travel aids but will also enable improvements in human-computer interfacing for applications such as virtual environments. [Work supported by a grant from NIH/NEI]. 

Computer-Based Instruction in Cued Speech

Cued Speech is a method of communication between hearing and deaf people that has been demonstrated to be highly effective. Cued Speech uses a system of manual cues, in the form of hand shapes and placements, that disambiguate confusable lip shapes. This project is aimed at developing a computer-based instruction in Cued Speech. While it will not completely eliminate the need for one-on-one instruction, this instructional software will provide a learner with the basic elements of Cued Speech along with endless and easy-to-use opportunity for practice. [Work supported by a grant from NIH/NIDCD].