As reported in a peer reviewed paper just published by Loyola Marymount University a chatbot avatar based on Daden’s Discourse chatbot engine recently significantly passed a modified Turing Test run in the virtual world of Second Life. The highlights from the paper are:
- The first natural language Turing Test conducted in a 3D virtual environment.
- 78% of participants incorrectly judged a chatbot to be human.
- Agency judgments were due to the quality of the AI engine and psychological factors.
- Passage of the Turing Test involves a complex process of human–computer interaction.
The test itself was based on a test Daden set out in another paper back in 2009. There it described how most current Turing Tests were very unbalanced, since both judges and “hidden humans” knew they were taking part in a test, and often the only person who didn’t was the chatbot! Daden described how virtual environments such as Second Life offered an ideal level playing field, since everyone, human or chatbot, looked like an avatar, could have access to the same information about the environment, and crucially the expectation of most people in the world was that the other avatars they talk to are other humans – so all the bot has to do is not give away the fact that it was a bot, rather than try and convince an already sceptical judge that it was a human!
In the modified Turing Test that Daden proposed – a “covert Turing Test” – subjects (aka judges) went into a virtual world for supposedly another purpose, and whilst there had a conversation with a bot (or human). Only after the event were they asked whether they thought the avatar they were talking to was a bot or a human.
This is exactly the test that Loyola Marymount set up. You can read the full details and results in the paper, but the essence was that subjects were invited into Second Life to give comment on a new shop layout and design, and that there would be a shop assistant in there to help them. 50% of the time the assistant was a chatbot, and 50% a human. A nice touch in the double-blind design was that the humans playing the assistant also didn’t know that they were taking part in a Turing Test, so they couldn’t act “over-human”, or even “more robotic” – as often happens in physical Turings.
The bottom line was that 78% of the test subjects did not recognise the avatar as being a chatbot when run by the chatbot – well in excess of the 30% target set in the full Turing Test, or even a 50% random choice.
Read the full article on the Daden website here