Researchers are strengthening Scottish Gaelic resources – using automatic speech and handwriting recognition to advance Gaelic language technology
EFI supports interdisciplinary and data-driven research which focusses on navigating an increasingly complex future. Lying at the intersection between ethnography, linguistics and data-driven innovation, Dr Will Lamb and his team are creating and refining the world’s first Scottish Gaelic Speech Recognition System.
Automatic Speech Recognition
Automatic Speech Recognition (ASR) technology can be used to translate spoken language into written text. It is used for many purposes in the lives of majority language speakers, for example in subtitling, voice assistant software and dictation services. For minority languages however, these services are often unavailable or inaccurate.
ASR and minority languages
ASR technology needs to be ‘trained’ with real language input to become more accurate. ASR systems analyse spoken language data, for example, audio recordings from native speakers, and learn about its patterns and structures. The more natural language data there is available, the more accurate the ASR technology can become.
For majority languages such as English there is a wealth of ‘real world’ language data available, for example, from literature, television, news and radio. This data can be used for training an increasingly accurate ASR system. But for minority languages such as Scottish Gaelic there is usually less of this data available. This means that ASR systems for minority languages struggle to reach the high levels of accuracy that are possible for majority languages.
ASR and Scottish Gaelic
In comparison with other minority languages, Scottish Gaelic has a surprising level of language technology available. Over the past 10 years, researchers have developed a Scottish Gaelic handwriting recogniser to help transcribe handwritten manuscripts into digital text and a speech recogniser to transcribe audio files and an ‘aligner’ to create timestamps for the audio represented by the transcription.
Through the process of building the Scottish Gaelic handwriting recogniser, Dr Will Lamb and his team were able to create a database of natural language data which could be used to train ASR technology for Scottish Gaelic. One use of this has been for applying Gaelic subtitles to programmes for MG ALBA, a Gaelic media outlet.
The future of ASR and Scottish Gaelic
Improving the accuracy of ASR systems for Scottish Gaelic is important for multiple reasons. Accurate automatic subtitling in Scottish Gaelic will enable native speakers and language learners to use Gaelic media to maintain and develop their language skills. It will also make Gaelic programming more accessible to d/Deaf audiences.
Improved ASR technology could also be used to improve dictation services, which allow spoken language to be converted into text automatically by a web tool. This could be used to support children with learning difficulties in school.
Conclusion
Building accurate ASR technology for Scottish Gaelic is important for supporting native speakers and language learners. It can also help document and revitalise Scottish Gaelic as a minority language, support its use and encourage the development of useful resources for language learners.
“One of the aims of EFI is to address the challenges and opportunities posed by data driven innovation in the Arts, Humanities and Social Sciences. Will’s project is an excellent example of how computer-based technology and digital methods can be applied in to the study of human culture”
Professor Melissa Terras, Research Director (EFI)