ejTalk Logo
Home Tech Service Demo Member [About]
Company People News Careers [FAQ] Contact
Business [Technical]

Technical

Is ejTalkCM a chatterbot program? Can I buy it and talk to it? Does it learn?

The ejTalk VoiceXML demos seem to paraphrase the system utterances. They are not exactly the same even if I return to the exact same point in the conversation. Why ?

Why does ejTalk use Text-to-Speech (TTS) in its synthetic conversations?

How is the ejTalk approach to conversation different than packaged behaviors such as Nuance SpeechObjects or SpeechWorks DialogModules?

Is ejTalkCM a chatterbot program? Can I buy it and talk to it? Does it learn?

  • No, it is not a chatterbot
  • It is a collection of tools and methods that result in conversational interfaces in predetermined domains. Those resulting interfaces benefit from a base of underlying conversational movements that do not have to be written in at every point in the conversation definition. In addition to the base behaviors, we strive to parameterize a domain so that it can converse as naturally as possible while receiving its content in a more formal format (i.e. fields from a database). Our approach assumes a very simple interface to the ASR/TTS environment and so is readily adapted to VoiceXML, SALT, and even proprietary voice platforms.
  • At present there are no shrink-wrap developer tools that accept novel domain information and build a system that talks about that domain. All of the current systems for conversation design are either low level procedural programming languages or else colorful drag-and-drop programs that transliterate the low level instructions into icons. But all these systems ignore the real obstacle to designing a conversation: The combinatorial explosion of minutia that is unavoidable as the conversation is made longer, broader or deeper. That said, we do believe that there are ways to parameterize sections of a conversation that can be invoked with a formal notation yet maintain a natural human-like variability.
  • Learning is another problem. We are researching a potential tool that will allow us to "train" parts of a conversation. This is similar to ASR where the recognizer is "trained" from a large set of known utterances. The justification for doing this is similar too: Humans may not be able to codify the rules of speech, but they immediately know it when they hear it.

The ejTalk VoiceXML demos seem to paraphrase the system utterances. They are not exactly the same even if I return to the exact same point in the conversation. Why?

  • This variability is intentional.
  • Humans tend to operate at a meta level and fill in the specifics on the fly. When we walk somewhere we establish a destination (or sub-destination) and then adaptively cope with obstacles, terrain and dynamic elements in real-time. We do much the same in speech. Just think back. How many times have you stopped speaking because you are searching for a better word or phrase to characterize a meaning. Most times the phrase selection task is not very challenging and the words just flow, and at those times, it seems as if we are actually thinking in words.
  • Also, when we are asked to repeat something, most of us will automatically rephrase our original semantic. We know that another choice of words presented to a listener, who believes that you have not changed your original meaning, will lend a useful parallax on your intended meaning. We view variability as a small sub-conscious sign of cooperation.

Why does ejTalk use Text-to-Speech (TTS) in its synthetic conversations?

  • Primarily because the richness of a real conversation is not accomplished with a collection of canned recordings. To seem natural, a dialog should exhibit some degree of variability. This leads directly to a second question:
  • Why not assemble recorded snippets of phrases and words to create some variation? This raises other problems.
    • The joining of disjointed phrases and words is not a simple task. Additionally it suffers a number of challenges at the boundaries such as pitch, rate, co-articulation, etc. If not crafted carefully, the results are often jarring. And, the work load grows with the number of different things being said.
    • Assuming that the acoustic joining can be done successfully, then consider the future.
  • TTS is getting better all the time. It is not human quality yet, but it has been improving quite a bit lately. One of our problems with the various TTS systems is the choice that must be made between sounding acoustically human OR controlling the prosodic features (HOW someone says something).
  • Eventually the demand for new content, as well as completely new applications, will force the voice industry to abandon the recorded scenario and embrace a totally synthetic one.

How is the ejTalk approach to conversation different than packaged behaviors such as Nuance SpeechObjects or SpeechWorks DialogModules?

  • The packaged behaviors are intended to encapsulate a specific voice task and provide an easy-to-manage interface for the application developer. The designer of the packaged behavior provides a calling interface. For example, getSSN which would manage all of the dialog strategies of collecting, validating, helping, correcting and confirming a Social Security Number. getSSN would return a "good" or "failure" response. This paradigm presumes that application developer does not know what happened while inside the behavior, only whether it worked or not. In a computer programmer's termnology this would be called a procedural subroutine.
  • At ejTalk we believe that how an application (or segment of an application) manages the the collecting, validating, helping, correcting and confirming defines the personality of the application. So, to avoid exposing the end user to this "multiple personality disorder" problem, we have chosen to encapsulate the basic conversational strategies into a hierarchy of conversational behaviors. These behaviors, which are independent of the details of a specific conversation, are defined in the most basic layer (for instance, the simple issues of thanks and you're welcome). These behaviors are automatically available to the layers above. This paradigm presumes that the application developer wants to build on top of a consistent behavior base AND (this is important) if the behavior base is improved then all the parts that are built on that base will exhibit the improvements automatically. This would be termed an Object-Oriented class.
  • As applications become much larger and more sophisticated end users will find that the consistency helps them predict an application's personality (or perhaps lets them avoid noticing any jarring inconsistencies). And for application developers it provides a much cleaner, less error prone design cycle as well as a higher level of representation.



Home Tech Service Demo Member [About]
Company People News Careers [FAQ] Contact
Business [Technical]

November 18, 2003 Copyright © 1997 -  2003 ejTalk