Speech Translation Tutorial

This tutorial was presented at EACL 2021.

Abstract

Speech translation is the translation of speech in one language typically to text in another, traditionally accomplished through a combination of automatic speech recognition and machine translation. Speech translation has attracted interest for many years, but the recent successful applications of deep learning to both individual tasks have enabled new opportunities through joint modeling, in what we today call `end-to-end speech translation.’

In this tutorial we introduce the techniques used in cutting-edge research on speech translation. Starting from the traditional cascaded approach, we give an overview on data sources and model architectures to achieve state-of-the art performance with end-to-end speech translation for both high- and low-resource languages. In addition, we discuss methods to evaluate analyze the proposed solutions, as well as the challenges faced when applying speech translation models for real-world applications.

About the Presenters

Jan Niehues is an assistant professor at Maastricht University. He received his doctoral degree from Karlsruhe Institute of Technology in 2014 on the topic of “Domain Adaptation in Machine Translation.” He has conducted research at Carnegie Mellon University and LIMSI/CNRS, Paris. His research has covered different aspects of Machine Translation and Spoken Language Translation. He has been involved in several international projects on spoken language translation e.g. the German-French Project Quaero, the H2020 EU project QT21 EU-Bridge and ELITR. Currently, he is one of the main organizers of the spoken language track in the IWSLT shared task.

Elizabeth Salesky is a PhD student at Johns Hopkins University. She has previously studied at Carnegie Mellon University, been a research assistant at Karlsruhe Institute of Technology, and worked at MIT Lincoln Laboratory, focused on speech and text translation. Her research focuses on speech translation for real-world and low-resource scenarios; e.g. phoneme features to reduce data dependence, disfluency removal in translating conversational speech, and learning robust representations. She has organized shared tasks on speech translation at IWSLT.

Marco Turchi is the head of the machine translation unit at Fondazione Bruno Kessler (FBK). He received his PhD degree in Computer Science from the U. of Siena, Italy in 2006. Before joining FBK in 2012, he worked at the European Commission, at the University of Bristol, at the Xerox Research Centre Europe, and at Yahoo Research Lab. His research activities focus on various aspects of sequence-to-sequence modelling applied to machine translation, speech translation and automatic post-editing. He is the co-organizer of the Conference of Machine Translation, the Spoken Language Translation Workshop and the automatic post-editing evaluation campaigns. He has been involved in several EU projects such as SMART, Matecat, ModernMT and QT21. He was the recipient of the Amazon AWS ML Research Awards on the topic of end-to-end spoken language translation in rich data conditions. He is the secretary of the ISCA SIGSLT interest group.

Matteo Negri is a senior researcher in the Machine Translation unit at Fondazione Bruno Kessler. He received his degree in Philosophy of Language from the University of Turin, Italy in 2000. His research interests are in the field of computational linguistics, particularly machine translation, spoken language translation, textual entailment and question answering. He worked in several EU projects (QT21, CRACKER, MMT, MateCat, CoSyne, QALL-ME) and co-organised conferences, workshops and evaluation campaigns in NLP and MT-related areas (including the Conference on Machine Translation, the International Workshop on Spoken Language Translation and SemEval shared tasks). Together with Marco Turchi, he was the recipient of an Amazon AWS ML Research Award on “End-to-end Spoken Language Translation in Rich Data Conditions.”