Towards functional Human-Machine communication
Currently, the most interaction between humans and machines happens between people and smartphones. As presented earlier, we interact with our smartphone three hours and fifteen minutes per day. This will change, according to Blaise Thomson. We will interact with a lot more devices, as more devices become “smart”. However, before we can interact effectively with our devices, including our smartphone, there is one important problem that needs to be solved. In this blogpost, that problem is being addressed and a solution will be presented.
Let’s start with exploring problems which occur nowadays when it comes to communicating with devices.
Let’s say you are in you car, which has a “smart” navigation system. The navigation system can recognize your voice when you tell it you want to go somewhere. For example, you tell your navigation system that you want to go to the Oostzeedijk number 364 in Rotterdam. The system probably asks for a confirmation, and will proceed to show you the right way to go.
This example assumes that you actually know where you want to go, and you know what the address is. But what happens if you do not know where you want to go? The only thing you know is that you want to fulfill some sort of demand or want to complete some task.
For example, you are driving in a city you do not know very well, and you want to have lunch. You ate meat yesterday, so you want to have fish now. You are pretty short on budget and the weather is nice so you want to sit outside. You tell these preferences and limitations to your navigation system. This is a problem for current state-of-the art navigation systems: They do not understand what you say as they are only programmed to understand cities, street names and house numbers. Accordingly, they will not send you to a lunchroom you will like.
This problem also occurs is more advanced communication devices: You can actually try this at home if you have a speech recognition application on your mobile device that interacts with your mobile phone (such as Siri). Add a contact that is named Koen, and give this contact a phone number (your own for example).
Now tell Siri: “Call Koen”.
It comes up with websites related to “Kalkoen” (Dutch for Turkey).
Initial speech recognition problems can be caused by a noisy background or a sore throat. We cannot blame Siri for not totally understand you. So you try again::
And again, it comes up with websites related to “Kalkoen”. This also occurs when you change the language.
That does not make any sense because you literally pressed away that result of Siri seconds ago, and now it thinks that you want Siri to execute a search for “Kalkoen” again. Why would you want to do that?
The problem that these three examples reveal, concern the inability of a system to understand a conversation. It can only understand singular individual orders, and has no clue of the context. This is because they way these things are designed is wrong. The state-of-the art is not really sufficient when it comes to interacting with systems. The underlying problem concerns the way that these systems are designed. All these systems are made via handcrafting, and having systems understand entire conversations is very difficult to program by hand. That is why machine learning is important.
There are three elements of human-machine communication. The first is speech recognition, the translation from sound waves to text. The second is a decision making process: What does the text mean? And the last is text to speech, communicating the proposed decision of the device to you, the user. Machine learning could typically be relevant for the second process.
At this moment, the decision making process is typically being designed through supervised or semi-supervised learning. The first implies that a programmer enters every part of the decision making process itself. Evidently, this is a very costly and time-consuming process. In semi-supervised learning, the programmer links basic decision making processes to a more extensive database of decision making implications and rules. This is less time-consuming, but limited to the rules available in the database.
To overcome these limitations, reinforced learning is proposed. Reinforced learning is based on semi-supervised learning, but with self-learning algorithms that improve and add rules to the currently used databases. The system uses elements (words or sentences) from each command and assesses the relevance and impact of the used element to the final solution. Through learning, the system will provide solutions with success probabilities and will adjust the success probability based on final decision (solution) that the user takes. The success of a solution will be provided through a reflection of the user. If the system is totally unsure, posing the question again is a solution as well, although undesired. Surely, it works inaccurately at the beginning, but will improve fast through the interconnection between different self-learning systems.
The future impact
Are system taking over the world? This is a question for later. The system interface will be the same (unified) for every device and it will learn from the questions to all these devices which action should be taken.
Machine learning is going to be key in this. It is important to keep the conversation going, this will change the way people live.
Thomson, B., (2015) The future of human-machine interaction https://www.youtube.com/watch?v=XX4wlMQAK8o accessed on 10-5-2015
Brown, J., (2015) Review: Apple 6s It’s Shoe http://www.wired.com/2015/10/iphone-6s-review/ accessed on 10-5-2015