[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Novel Neural Net Recognizes Spoken Words Better Than Human Listeners



http://www.sciencedaily.com/releases/1999/10/991001064257.htm

Source:
                     
     University Of Southern California (http://www.usc.edu/)

 Date:
      
     Posted 10/1/99


Novel Neural Net Recognizes Spoken Words Better
Than Human Listeners 

Machine demonstrates superhuman speech recognition abilities.
University of Southern California biomedical engineers have created
the world's first machine system that can recognize spoken words
better than humans can. A fundamental rethinking of a
long-underperforming computer architecture led to their achievement.

The system might soon facilitate voice control of computers and other
machines, help the deaf, aid air traffic controllers and others who
must understand speech in noisy environments, and instantly produce
clean transcripts of conversations, identifying each of the
speakers. The U.S.  Navy, which listens for the sounds of submarines
in the hubbub of the open seas, is another possible user. Potentially,
the system's novel underlying principles could have applications in
such medical areas as patient monitoring and the reading of
electrocardiograms.

In benchmark testing using just a few spoken words, USC's Berger-Liaw
Neural Network Speaker Independent Speech Recognition System not only
bested all existing computer speech recognition systems but
outperformed the keenest human ears.

Neural nets are computing devices that mimic the way brains process
information. Speaker-independent systems can recognize a word no
matter who or what pronounces it. No previous speaker-independent
computer system has ever outperformed humans in recognizing spoken
language, even in very small test bases, says system co-designer
Theodore W. Berger, Ph.D., a professor of biomedical engineering in
the USC School of Engineering.

The system can distinguished words in vast amounts of random "white"
noise, noise with amplitude 1,000 times the strength of the target
auditory signal. Human listeners can deal with only a fraction as
much. And the system can pluck words from the background clutter of
other voices, the hubbub heard in bus stations, theater lobbies and
cocktail parties, for example. Even the best existing systems fail
completely when as little as 10 percent of hubbub masks a speaker's
voice. At slightly higher noise levels, the likelihood that a human
listener can identify spoken test words is mere chance. By contrast,
Berger and Liaw's system functions at 60 percent recognition with a
hubbub level 560 times the strength of the target stimulus. With just
a minor adjustment, the system can identify different speakers of the
same word with superhuman acuity.

Berger and system co-designer Jim-Shih Liaw, Ph.D., achieved this
improved performance by paying closer attention to the signal
characteristics used by real flesh-and-blood brains in processing
information.

First proposed in the 1940s and the subject of intensive research in
the '80s and early '90s, neural nets are computers configured to
imitate the brain's system of information processing, wherein data are
structured not by a central processing unit but by an interlinked
network of simple units called neurons. Rather than being programmed,
neural nets learn to do tasks through a training regimen in which
desired responses to stimuli are reinforced and unwanted ones are not.

"Though mathematical theorists demonstrated that nets should be highly
effective for certain kinds of computation (particularly pattern
recognition), it has been difficult for artificial neural networks
even to approach the power of biological systems," said Liaw, director
of the Laboratory for Neural Dynamics and a research assistant
professor of biomedical engineering at the USC School of Engineering.

"Even large nets with more than 1,000 neurons and 10,000
interconnections have shown lackluster results compared with
theoretical capabilities. Deficiencies were often laid to the fact
that even 1,000-neuron networks are tiny, compared with the millions
or billions of neurons in biological systems." Remarkably, USC's
neural net system uses an architecture consisting of just 11 neurons
connected by a mere 30 links.

According to Berger, who has spent years studying biological
data-processing systems, previous computer neural nets went wrong by
oversimplifying their biological models, omitting a crucial dimension.

"Neurons process information structured in time," he explained. "They
communicate with one another in a 'language' whereby the 'meaning'
imparted to the receiving neuron is coded into the signal's timing. A
pair of pulses separated by a certain time interval excites a certain
neuron, while a pair of pulses separated by a shorter or longer
interval inhibits it.  "So far," Berger continued, "efforts to create
neural networks have had silicon neurons transmitting only discreet
signals of varying intensity, all clocked the way a computer is
clocked, in beats of unvarying duration. But in living cells, the
temporal dimension, both in the exciting signal and in the response,
is as important as the intensity."

Berger and Liaw created computer chip neurons that closely mimic the
signaling behavior of living cells, those of the hippocampus, the
brain structure involved in associative learning. "You might say, we
let our cells hear the music," Berger said. Berger and Liaw's computer
chip neurons were combined into a small neural network using standard
architecture.  While all the neurons shared the same
hippocampus-mimicking general characteristics, each was randomly given
slightly different individual characteristics, in much the same way
that individual hippocampus neurons would have slightly different
individual characteristics. The network created was then trained,
using a procedure as unique as the neurons , again taken from the
biological model, a learning rule that allows the temporal properties
of the net connections to change.

The USC research was funded by the Office of Naval Research; the
Defense Department's Advanced Research Projects Agency; the National
Centers for Research Resources, and the National Institute of Mental
Health. The university has applied for a patent on the system and the
architectural concepts on which it is based.

### 

RealVideo demonstration at:
http://www.usc.edu/ext-relations/news_service/real/real_video.html 

Editor's Note: The original news release can be found at
http://www.usc.edu/ext-relations/news_service/releases/stories/36013.html

Machine Demonstrates Superhuman Speech
Recognition Abilities 

University of Southern California biomedical engineers have created
the world's first machine system that can recognize spoken words
better than humans can. A fundamental rethinking of a
long-underperforming computer architecture led to their achievement.

The system might soon facilitate voice control of computers and other
machines, help the deaf, aid air traffic controllers and others who
must understand speech in noisy environments, and instantly produce
clean transcripts of conversations, identifying each of the
speakers. The U.S. Navy, which listens for the sounds of submarines in
the hubbub of the open seas, is another possible user.

Potentially, the system's novel underlying principles could have
applications in such medical areas as patient monitoring and the
reading of electrocardiograms.

In benchmark testing using just a few spoken words, USC's Berger-Liaw
Neural Network Speaker Independent Speech Recognition System not only
bested all existing computer speech recognition systems but
outperformed the keenest human ears.

Neural nets are computing devices that mimic the way brains process
information. Speaker-independent systems can recognize a word no
matter who or what pronounces it.

No previous speaker-independent computer system has ever outperformed
humans in recognizing spoken language, even in very small test bases,
says system co-designer Theodore W. Berger, Ph.D., a professor of
biomedical engineering in the USC School of Engineering.

The system can distinguished words in vast amounts of random "white"
noise - noise with amplitude 1,000 times the strength of the target
auditory signal. Human listeners can deal with only a fraction as
much.

And the system can pluck words from the background clutter of other
voices - the hubbub heard in bus stations, theater lobbies and
cocktail parties, for example.

Even the best existing systems fail completely when as little as 10
percent of hubbub masks a speaker's voice. At slightly higher noise
levels, the likelihood that a human listener can identify spoken test
words is mere chance. By contrast, Berger and Liaw's system functions
at 60 percent recognition with a hubbub level 560 times the strength
of the target stimulus.

With just a minor adjustment, the system can identify different
speakers of the same word with superhuman acuity.

Berger and system co-designer Jim-Shih Liaw, Ph.D., achieved this
improved performance by paying closer attention to the signal
characteristics used by real flesh-and-blood brains in processing
information.

First proposed in the 1940s and the subject of intensive research in
the '80s and early '90s, neural nets are computers configured to
imitate the brain's system of information processing, wherein data are
structured not by a central processing unit but by an interlinked
network of simple units called neurons. Rather than being programmed,
neural nets learn to do tasks through a training regimen in which
desired responses to stimuli are reinforced and unwanted ones are not.

"Though mathematical theorists demonstrated that nets should be highly
effective for certain kinds of computation (particularly pattern
recognition), it has been difficult for artificial neural networks
even to approach the power of biological systems," said Liaw, director
of the Laboratory for Neural Dynamics and a research assistant
professor of biomedical engineering at the USC School of Engineering.

"Even large nets with more than 1,000 neurons and 10,000
interconnections have shown lackluster results compared with
theoretical capabilities. Deficiencies were often laid to the fact
that even 1,000-neuron networks are tiny, compared with the millions
or billions of neurons in biological systems."

Remarkably, USC's neural net system uses an architecture consisting of
just 11 neurons connected by a mere 30 links.

According to Berger, who has spent years studying biological
data-processing systems, previous computer neural nets went wrong by
oversimplifying their biological models, omitting a crucial dimension.

"Neurons process information structured in time," he explained.  "They
communicate with one another in a 'language' whereby the 'meaning'
imparted to the receiving neuron is coded into the signal's timing. A
pair of pulses separated by a certain time interval excites a certain
neuron, while a pair of pulses separated by a shorter or longer
interval inhibits it.

"So far," Berger continued, "efforts to create neural networks have
had silicon neurons transmitting only discreet signals of varying
intensity, all clocked the way a computer is clocked, in beats of
unvarying duration. But in living cells, the temporal dimension, both
in the exciting signal and in the response, is as important as the
intensity."

Berger and Liaw created computer chip neurons that closely mimic the
signaling behavior of living cells - those of the hippocampus, the
brain structure involved in associative learning.

"You might say, we let our cells hear the music," Berger said.

Berger and Liaw's computer chip neurons were combined into a small
neural network using standard architecture. While all the neurons
shared the same hippocampus-mimicking general characteristics, each
was randomly given slightly different individual characteristics, in
much the same way that individual hippocampus neurons would have
slightly different individual characteristics.

The network created was then trained, using a procedure as unique as
the neurons - again taken from the biological model, a learning rule
that allows the temporal properties of the net connections to change.

The USC research was funded by the Office of Naval Research; the
Defense Department's Advanced Research Projects Agency; the National
Centers for Research Resources, and the National Institute of Mental
Health. The university has applied for a patent on the system and the
architectural concepts on which it is based.