Natural vs. UnNatural Language
One would think that in the speech industry people would use their important technical terminology with precision and care. After all, in speech, words are the industry.
But paradoxically, the speech industry cares little about its own technical terms.
I was forced into this unhappy conclusion after moving into speech over a year ago, having spent the previous 20 years in the text-based world of computational linguistics. In the intertwined disciplines of linguistics and computer science, the term ‘natural language’ has one clear and distinct meaning. In each of these disciplines, a natural language is simply one that arises naturally in the world, such as English, Russian or Pashto.
Natural languages are distinct from their complement, denoted by the term ‘artificial language’. An artificial language refers to any language that has been consciously engineered, such as Esperanto, predicate calculus, or C++ (though I am personally tempted to refer to the last as an unnatural language).
Upon entering the speech recognition field, I was immediately struck by the fact that the term ‘natural language’ has been prostituted to mean anything that the marketing directors want it to mean. It strikes me as odd, if not offensive, that when I can either say or enter a digit on my touchtone phone to navigate through an IVR
(Interactive Voice Response) system, that this is somehow perceived as an application of natural language. Even the ability to say ‘yes’ or ‘no’ to a computer and be understood does not necessarily qualify as natural language.
Of course, I will not take the extreme position that a system is not using natural language until it can think and speak as freely as we featherless bipeds. The Turing test has not yet been passed by any text-based AI system, and yet there is a healthy field of language science and engineering called ‘computational linguistics’.
Similarly, I will not attempt the hopeless task of defining when a speech system does or does not employ natural language. That would be just as pointless as trying to give a rigorous definition of when blue stops and when violet begins in a spectrum. But just as there is a broad area where blue is not in question (cultural differences aside), there is an equally broad area where certain systems are clearly not in the realm of natural language.
Another crucial term in the field that still has no commonly accepted definition is ‘mixed initiative.’ Strangely its polar opposite, ‘directed dialogue’, does have a commonly understood meaning. In directed dialogue, the system is always in control, prompting the user for information such as airline name, flight number, departure time, and so on and the user always responding with the requested information.
In contrast, ‘mixed initiative’ does not have a single interpretation that is commonly accepted in the field. Somehow, the term ‘mixed initiative’ means to some people that the user is allowed to provide more than one atomic information element in a single utterance. I have heard one industry expert refer to just this definition at a major speech conference. He went on to provide an example whereby a single utterance allowed for the binding of 3 semantic slots. While the definition is not totally clear, the W3C
(http://www.w3.org/TR/2001/WD-voicexml20-20011023/#dml2.1.5) seems also to allow this narrow interpretation when they say that mixed initiative can occur when more than one field in a form can be filled in any order by one utterance.
However, the W3C’s definition is thankfully broader than this. As the term itself implies, true mixed initiative should only be possible when either the system or the user can take the initiative to direct the course of the dialogue. Thus, if the system has prompted the user for airline information and the user responds with a request for a hotel reservation and the system responds appropriately, then we can say the this system supports mixed initiative. In my humble opinion, this is the only valid interpretation of the term ‘mixed initiative’. The W3C would do well to tighten up their definition slightly so as to eliminate the “multiple field in any order” interpretation of the term.
Usage of technical terms is extremely important in any field of science and engineering. If critical terms are not standardized, then confusion reigns when professionals try to discuss and debate their field. Similarly, potential clients and users of speech systems cannot make reasonable comparisons when evaluating different companies’ offerings if those companies differ in their claims of support for ‘natural language’ and ‘mixed initiative’ when they do not use those terms in the same way. Marketers actually do their own companies an injustice when they wrest these technical terms out of the hands of scientists and engineers for their own purposes. The misuse of technical terminology undermines the advance of the field, ultimately harming all companies as well as their customers.
I call upon all of the techies of the world to fight back. Insist that marketing directors make claims that are consistent with the technology. Insist that your terms are used to mean what they mean. Educate the sales and marketing departments as to what these terms mean. In doing so, you will advance your own profession.
Kurt Godden, Ph.D.
Senior Staff Research Scientist (Computational Linguistics)
General Motors
September 18, 2002
|