Experts at St Andrews University believe social media could transform the lives of people unable to talk.
Researchers say the widespread study of sites such as Twitter and Facebook could result in synthetic voice systems, such as that used by Stephen Hawking, becoming ”faster and easier.”
The work was carried out by Dr Per Ola Kristensson, of St Andrews’ school of computer science. He used crowdsourcing a new method of obtaining large amount of statistical data through the monitoring of social media sites to inform his findings.
Together with colleague Dr Keith Vertanen of the department of computer science at Princeton University in the USA, Dr Kirstensson used the sites to create a unique dataset that provides predictive text more like real speech.
Previously only small amounts of data were available for users of Augmented and Alternative Communication (AAC), a device that enables those with communication disabilities to participate in everyday conversations.
Speech devices rely on statistical language models to improve text entry by offering word predictions. These can be improved if the language model is trained on data that closely reflects the style of the users’ intended communications.
However, until now these was no large open dataset of AAC messages available.
Dr Kristensson’s work at St Andrews, titled The Imagination of Crowds and published by the Association for Computational Linguistics (PDF link), demonstrated how ”crowdsourcing” can be used to create a large set of fictional messages.
He revealed the work, funded by the Engineering and Physical Sciences Research Council, was sparked by his interest in online sites dedicated to sourcing information from the public such as Amazon Mechanical Turk.
The site uses online volunteers to carry out simply tasks computers can’t, such as transcribing scanned documents or rating the quality of photographs.
However, the tasks are often very simple and Dr Kristensson wondered if there might be greater potential.
He said: ”We wondered if we could also use these services to harness the creativity of the crowd. Can we design a task for these services that provides us with a large surrogate dataset of AAC messages?”
The initial collection of crowdsourced messages was then expanded by intelligently selecting similar sentences from Twitter, blog and Usenet data.
The end result is a dataset much larger and of higher quality than anything that had previously been used.
Dr Kristensson and Dr Vertanen have released the data collection, word lists and best performing models for free.
The hope is to use these models to design and test new interfaces that enable faster communication for users with communication difficulties.
”The work demonstrates that we can tap the creativity of users of social media and crowdsourcing technologies to help improve the lives of people unable to speak,” say the researchers.