What Course Do We Steer?
The computer speech industry should be a raging success. It offers 24/7/365 convenience, measurable ROI, and it appeals to our sci-fi dreams, so what's wrong? Why hasn't it become an obvious necessity to everyone in the civilized world?
What is the shape of this industry? Every player stands somewhere within the boundary of this business, but how do we see it? Do we see the Milky Way or a spiral galaxy? It may be helpful to think of it like another tech driven, content delivery industry from the past. One we do find indispensable. The car.
The car began as a cottage industry energized by inventors and/or wealthy enthusiasts. These people saw the car as future certainty. (Before you nod knowingly, remember that there were about as many similar folks who were developing perpetual motion gadgets, and they felt a sense of certainty, too.) These car people crafted horseless carriages. They even sold a few.
When speech started to become commercial in the early 1990's it seemed possible that one or two companies might be the entire speech industry. Maybe a phone company and/or a mainframe manufacturer. But, then toward the mid-1990's this business spurred a lot of academics and wealthy VC's to do their high-tech high-stakes version of a cottage industry. Lots of speech recognizers and text to speech companies emerged with products. They even sold a few.
At this point, was either the car or speech technology a real industry? Let's leave that for later. Certainly, neither of them was affordable to the masses. They missed the mass market. They needed economies of scale. And they needed a clearer value proposition.
Even when some manufacturing pioneers addressed the economies of production, the nagging problem of the value proposition was not nearly as obvious as it is with hindsight. The car didn't seem quite so practical because important parts of the industry had not yet been developed. The car alone did not solve enough of the problem. The infrastructure of good roads, service stations, maps, etc. did not exist and so the car was still an oddity that was difficult to sell. It really wasn't much better than the horse. Horses usually did a better job navigating muddy roads. And to this day, horses have a much better auto pilot system. Fortunately, there were those kooky early adopters...
Economics professors tell us that markets can be smart. The mass market realized that the car industry was not complete. The markets knew that the shape of the car industry had to include lots of good roads, service stations, maps, parking spaces, etc. Today, all the pieces are in place and we can't imagine life without our cars. Cars assist us with thousands of specific problems in our daily lives. Soon, after (if?) we address the missing parts of the speech industry, we won't be able to live without our synthetic talking agents either. What's missing in this industry? Why is it so out of shape?
During the heady VC days of the mid to late 1990s, many startups thought they could own the speech industry. The mantra was "first to market" and "market share". Many of these startups tackled speech recognition since it seemed to be the hardest (and sexiest) part of the problem. They thought that if they owned that technology then they would own the speech industry.
What value did the mass market perceive? They enjoyed the TV news stories and magazine articles touting the cool world of the future.
A little later, some new startups tackled the market at a higher level. They thought in terms of creating an assistant, a persona, the beginnings of a character with some rudimentary personality. They also focused on some specific and straight forward functionality: call management, resource scheduling, CRM, etc. These were still relatively expensive functions so they were targeted specifically to business (no-nonsense, high-cost). These startups even sold a few systems.
But what value for the mass market? The demos on news TV's "technology" segments were getting a little more entertaining.
Up to this point all the deployed (and proposed) systems required building a large (reduced from what was previously huge) custom infrastructure for the application. Imagine if when you bought a car you needed to pay for the roads and gas stations from your house to your employer, the mall, grandmother's house, etc.
A few people got the idea that the infrastructure should be a cost distributed over all of the individual content deliverers (much like road cost is distributed to all vehicle operators). An initial wave of voice portals arrived. This led to a flood of smaller voice portal startups. All of them are reasonably compatible with the spirit and letter of a voluntary industry specification known as VoiceXML. Now we have some roads and gas stations. Surely now we can build cars that people can afford to drive.
Now that these portals have been around for more than a year everyone must be using them to do all sorts of interesting things. You must know at least ten people who call a portal to deliver content on a frequent basis. No? Not even as a member of the speech industry? What gives?
Let's revisit the origins of cars. How did one drive a horseless carriage at the dawn of the industry? Every car manufacturer had their own idea about controls for the driver. But, in the beginning, most cars used a tiller to steer, much like a child steering a toy wagon. It was simple to implement, it was an extension of the familiar steerable front axel on a horse drawn carriage. It was counter intuitive (you have to push it the opposite direction that you want to go, so it took some training), but it did steer. Of course, cars have universally had steering wheels for the last 90 or so years. Steering wheels are harder to implement, but there is something so resonant about having your upper body rotate in the direction you intend to turn. Just observe small children, they get the idea of steering a toy car much sooner than tillering a wagon. Without belaboring the point, other controls for the throttle, power transmission, braking, etc. began with solutions that were functional but were designed along historic (i.e. carriage brake levers) lines.
Turning and leaning in the direction that we want to walk is an absolutely innate kinetic semantic. Talking is just as innate a skill as walking. People want the voice equivalent of "turning in direction they want to go." Unfortunately most voice portal applications have tillers (mostly from the horse-drawn IVR days). Ostensibly, this is to make them more familiar to the users (or maybe it is just to make them easier for the developers to build). Remember, all cars had steering wheels by the time the car industry got hot.
This is not to say that the car emerged fully formed, but when its design crossed a certain threshold of naturalness, then at that point it became a self sustaining market. Infrastructure and cars powered each other's growth. The speech industry is not at that threshold yet. Once the speech infrastructure gets a basic voice steering wheel (no need for power steering or a padded leather version), it will be at the cusp of an expanding market. The sales staff focus will not be to find customers but to better fulfill customer needs and capture the dollars that they intend to spend.
Vehicles sold to the masses are quite different from the vehicles used by business. Smoothly transitioning from a business voice product to a mass market consumer product may not be feasible.
Someone needs to think about steering in the future.
Emmett Coin
February 15, 2002
|