Title: A test-bed for integration of different Humanized Interface Technologies (HIT).
Motivation:
A market for talking heads connected to knowledge bases is emerging. Many companies use them as their sales representative, question answering devices, saving time of their staff and leaving only difficult questions for humans. They find applications in e-commerce and m-commerce industries.
The next stage of the PDA development leads to integration of all PDA functionality in telephones. Personal Digital Assistants will have the form of talking heads with the ability to recognize speech, remember simple things, and recall them at demand.
Such interfaces may also be used for entertainment: reading from web pages they will allow users to talk to an avatar, presenting answers to questions in a trivia quiz, 20 question games, or similar games. In particular the 20 question game in PDA may be the next great challenge for AI, because it is a different type of game than chess. HIT interfaces should enable to play such games via telephones.
HITs will also be useful in some scientific projects, enabling natural training of software capable of learning. An interesting example is the baby Hal project, trying to develop real understanding of language in a neural-based system through interactions with people. Devices such as AIBO dogs or Qurio robots may use similar trained minds for better understanding of language.
A test-bed is urgently needed to experiment with such technologies.
Julian Szymański has now a number of servers showing some programs developed in our project:
20 Q |
visual Wordnet |
visual Wiki |
UMLS visualization.
Short term goals of the project.
Create a HIT interface in form of a talking head, using speech recognition for text input, 3D graphics for the head, some behavioral modeling to control face expressions, lips-sound synchronization, and speech synthesis for output. The interface should be able to read and send the information through the web pages. It may be based on existing components: Haptek Virtual Friends for graphics and behavioral modeling, free speech synthesis, and some commerical speech recognition.
This should be sufficient for simple applications, such as playing word games over the internet or via phone.
Review other relevant technologies, including the avatar construction technology, speech recognition, natural language processing and knowledge modeling technologies that could be useful in this project.
Longer term goals of the project.
Improve the HIT interface in many ways. In particular improve:
Graphics - more realistic 3D, behavioral modeling, lips-speech synchronization.
Speech - reading text with prosody (intonation and emotional expressions), correlating the text keywords (and eventually understanding of the text) with avatar behavior and speech parameters, leading to more natural speech.
Create portable version of HIT that could be easily downloaded and installed at the user's end, perhaps as a java applet, both for the internet and portable phones. Consider various compression issues for sending avatars through telephone connections.
Add lingubot to the system to sustain a simple dialog with the user. Attach it to the Start engine at MIT to be able to find and read information from encyclopedia and selected internet sources on any subject. This will give an instant access to vast and reliable information sources.
Add memory in form of simple facts, like A is B. Asking for A should lead to the recall of B, for example: NTU address is Nanyang ... and later asking for address of NTU should bring the address.
Add associative recall, enabling to recall B if approximated A query is given, i.e. a wording or a structure of the question is changed.
Add knowledge-based experts in microdomains, allowing to create a model of the user interest and knowledge in some specific domains. This will allow the HIT to employ agents to collect specific information, and also to be used as extension to letters, able to answer questions in specific, narrow domains. Avatars that would be able to explain and answer additional questions, will save time for multiple exchanges of letters. They may initially be just an expansion of the typical voice-mail technology, with voice input instead of pressing a digit, and therefore enabling more intelligent questions. In the longer term an "alter ego" system that can represent the user in variate of situations over the Internet and telephone-based devices may be created. Such system could be used in various situations.
Add some word-based games. Game industry is now restricted to video games, while traditionally humans have always played many word games. In particular the twenty question game may be advertised as a new challenge for AI. It is an interesting test bed for various theories of semantic memory. Several AI approaches possible here.
Add games based on category recognition. The user sees different objects and should guess the rule used for categorization. Machine learning techniques may be used to discover rules and reason with these rules, competing with humans.
Add good theory of mind, for example based on ART-R, a system used in most successful computerized tutorials, or SOAR, that can also be used to disambiguate some linguistic problems. This will allow an implementation of quite challenging intellectual games and to answer some non-obvious questions requiring sophisticated reasoning. Some part of this may be done on larger servers, with only the results returned via HIT to the user.
The possibilities to create better linguistic and reasoning engines, with access to Internet resources, are endless.
Applications: e-commerce, m-commerce, games, education, robotics.
Deliverables:
HIT interface - in a few month.
HIT-based software for science museums to play trivia games, 20 question games and other such games.
Various extensions to HIT interface - later.
Needs: brain power, some software.
Set up centers for competency in different areas, starting from simple ones, for example graphics formats, by converting FAQ infromation on different subjects into a knowledge base!
Working log (local accsess only)