WORK BY NIST RESEARCHER SUGGESTS PERFORMANCE BREAKTHROUGH IN SPEAKER RECOGNITION.A possible performance breakthrough by a NIST (National Institute of Standards & Technology, Washington, DC, www.nist.gov) The standards-defining agency of the U.S. government, formerly the National Bureau of Standards. It is one of three agencies that fall under the Technology Administration (www.technology. scientist involving significant reduction in error rates was reported at the 2001 Speaker Recognition Workshop, held in May 2001, in Linthicum, MD. Hosted by NIST, the workshop reviewed the recently concluded 2001 Speaker Recognition Evaluation. Twelve academic and industrial research organizations participated: six from the United States United States, officially United States of America, republic (2005 est. pop. 295,734,000), 3,539,227 sq mi (9,166,598 sq km), North America. The United States is the world's third largest country in population and the fourth largest country in area. , three from France, and one each from Spain Spain, Span. España (āspä`nyä), officially Kingdom of Spain, constitutional monarchy (2005 est. pop. 40,341,000), 194,884 sq mi (504,750 sq km), including the Balearic and Canary islands, SW Europe. , India India, officially Republic of India, republic (2005 est pop. 1,080,264,000), 1,261,810 sq mi (3,268,090 sq km), S Asia. The second most populous country in the world, it is also sometimes called Bharat, its ancient name. India's land frontier (c. , and Australia Australia (ôstrāl`yə), smallest continent, between the Indian and Pacific oceans. With the island state of Tasmania to the south, the continent makes up the Commonwealth of Australia, a federal parliamentary state (2005 est. pop. . The evaluation covered several basic tasks involved in text-independent speaker recognition and included eight different tests. Sites achieving the best scores on the tests were noted, although differences between competing systems were sometimes small. NIST researchers gave three presentations at the workshop, analyzing performance results for different parts of the evaluation. Most of the data used in the evaluation were excerpts from the Switchboard Corpora corpora plural form of corpus. corpora albicantia see corpus albicans. corpora arenacea sandy or gritty bodies, found in the pineal body; appear to be of glial or stromal origin; have the structure of of conversational telephone speech, generated at NIST. A new Switchboard Cellular Corpus [Latin, Body, aggregate, or mass.] Corpus might be used to mean a human body, or a body or group of laws. The term is used often in Civil Law to denote a substantial or positive fact, as opposed to one that is ambiguous. was used at the workshop, marking the first time that cellular telephone data has been used in a speaker recognition evaluation. Such cellular data will serve as primary data in the next evaluation. In a possible performance breakthrough based on results of preliminary work, a NIST researcher showed that much useful information for characterizing speakers could be found in longer-term speech characteristics, particularly the frequent usage of certain words or phrases. He showed that such "idiolectal" features of speech, obtainable from word transcripts, even errorful transcripts produced by automatic speech recognizers, could greatly enhance performance. For this task, he used test segments consisting of entire conversation sides (taken from conversations of 5 to 10 minutes each) and training data for each speaker consisting of several, preferably pref·er·a·ble adj. More desirable or worthy than another; preferred: Coffee is preferable to tea, I think. pref at least eight, such conversation sides. To further explore the use of idiolectal characteristics of speakers in speaker recognition, a new extended data one-speaker detection task was included in this year's evaluation. For this evaluation, systems were provided with much larger amounts of training and test data and with word transcripts generated by an automatic speech recognizer. Speaker detection performance was evaluated by measuring the correctness of detection decisions by the systems. These decision scores were used to produce error trade-off curves in order to see how misses may be traded off against false alarms. Two sites, MIT-Lincoln Laboratory and R523 (DoD), produced systems for this evaluation. Their performance results were quite impressive, reducing previously seen error rates on such data by up to an order of magnitude A change in quantity or volume as measured by the decimal point. For example, from tens to hundreds is one order of magnitude. Tens to thousands is two orders of magnitude; tens to millions is three orders of magnitude, etc. . This work is an exciting development that could have significant applications and that brings together emerging speech recognition and speaker recognition technologies. More information about the speaker recognition program is available on the web at http://www.nist.gov/speech/tests/spk/index.htm. |
|
||||||||||||||||||||

Printer friendly
Cite/link
Email
Feedback
Reader Opinion