Cascaded Speech Translation Systems Outperform End-to-End Models, Research Finds

By: Ana Moirano

The first-ever voluntary mentorship program in speech translationSpeechTlaunched and led by Yasmin MoslemNLP Researcher, brought together researchers, practitioners, and students from diverse companies and institutions worldwide to explore speech translation.

Running from December 2024 to January 2025, the initiative introduced participants to data collection, model training, and advanced research techniques, helping them develop hands-on expertise in speech translation.

Participants came from varied backgrounds, ranging from software engineering to text-to-text machine translation (MT), and followed a structured three-week mentorship program.

The first week focused on data preparation, where participants collected and processed bilingual speech datasets. In the second week, they trained and fine-tuned models using the datasets prepared earlier. The final week was dedicated to advanced research, where participants explored synthetic data generation, language model post-processing, and domain adaptation.

During this phase, participants experimented with synthetic data augmentation — important when the data is limited for the language or domain. They used text-to-speech (TTS) models to create synthetic source audio for existing translations, applied MT models to produce translated text from transcriptions, and refined data quality by improving alignment between audio and text using segmentation techniques and part-of-speech tagging.

Read more…

Source: Slator



Translation news
Stay informed on what is happening in the industry, by sharing and discussing translation industry news stories.

All of ProZ.com
  • All of ProZ.com
  • Բառերի որոնում
  • Պատվերներ
  • Ֆորումներ
  • Multiple search