Title

Linguistic Cues And Memory For Synthetic And Natural Speech

Abstract

Past research has demonstrated that there are cognitive processing costs associated with comprehension of speech generated by text-to-speech synthesizers, relative to comprehension of natural speech. This finding has important performance implications for the many applications that use such systems. The purpose of this study was to ascertain whether certain characteristics of synthetic speech slow on-line, real-time cognitive processing. Whereas past research has focused on the phonemic acoustic structure of synthetic speech, we manipulated prosodic, syntactic, and semantic cues in a task requiring participants to recall sentences spoken either by a human or by one of two speech synthesizers. The findings were interpreted to suggest that inappropriate prosodic modeling in synthetic speech was the major source of a performance differential between natural and synthetic speech. Prosodic cues, along with others, guide the parsing of speech and provide redundancy. When these cues are absent or inaccurate, the additional burden placed on working memory may exceed its capacity, particularly in time-limited, demanding tasks. Actual or potential applications of this research include improvement of text-to-speech output systems in warning systems, feedback devices in aerospace vehicles, educational and training modules, aids for the handicapped, consumer products, and technologies designed to increase the functional independence of older adults.

Publication Date

1-1-2000

Publication Title

Human Factors

Volume

42

Issue

3

Number of Pages

421-431

Document Type

Article

Personal Identifier

scopus

DOI Link

https://doi.org/10.1518/001872000779698132

Socpus ID

0034529786 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/0034529786

This document is currently not available here.

Share

COinS