How many of us have ever tried to learn a foreign language? You get the books, you get the tapes, you get the software that listens to you repeat phrases. You think you have the vocabulary and the pronunciation down, and you can daydream some fairly interesting conversations with native speakers of that language. Then, emboldened, you venture out to that country (or that part of town) and you try out your new skill. At this point you find yourself in a position that most voiced based applications find themselves in.
If you’re like me, I expect your first forays into conversation in that new language broke down very rapidly. You were lucky to get past two exchanges before you were in the uncharted conversational territory. All manner of unexpected things happened: Odd pronunciations, slang, idiomatic phrases, unexpected domain shifts, the list goes on. This, by the way, is why today’s voice applications avoid conversation with the human, and for the most part just lead the human along asking questions that must be answered in a restricted and controlled format.
But, back to the issue that your imagined conversations are never quite like real conversations. What we’re missing in our imagined conversations is any hint of unexpected variation. I’m not talking about the crazy non sequitur or random leaps from one domain to another, but just the normal subtle variations that always happen even when the topic is as mundane as picking up your dry cleaning, or depositing a check with the bank teller. Things as simple as when you mention the weather your conversational partner might respond with “yes it is cold,” or “boy it’s freezing,” or “yeah, I needed a hat.”
This might be an excellent job for a synthetic agent. The domains are reasonably narrow and very well-defined, remember — we’re not inventing Hal 9000 here. Next-generation dialog managers, like ejTalker, can automatically inject plausible variability in the phrasing and flow of the conversation. These types of dialog managers also employ automatic behaviors such as conversational ellipsis and confusion metrics that improve the chances of repairing a conversation much like humans do. So even if things go badly from the human’s perspective the exercise may still succeed, giving some measure of positive feedback. In fact, I think it might be real fun to practice one’s conversational skills with a patient and forgiving tutor. Of course, with present technology, this will not be as good as talking to a real human tutor, but it will be a definite step in the right direction. One I would take.
I wonder why language teaching facilities, such as ESL (English as a Second Language) schools have not at least explored this kind of technology? It seems like a natural.