Research report

Automatic speech recognition : state-of-the-art and performance testing

SONAR|HES-SO

  • Genève : Haute école de gestion de Genève

42 p.

English The performance of automatic speech recognition systems (ASR) has dramatically improved over the last decade. A multitude of commercial and open-source models are available to a researcher who wishes to choose one for his or her study. Commercial vendors tend to test their models on standard benchmark corpora which do not reflect real world scenarios. We test three state-of-the-art ASR systems (Amazon Transcribe, Google Speech-To-Text and Whisper from OpenAI) on a corpus of YouTube climate change videos. We compare their performances using the standard word error rate metric and conduct fine grained analysis of the transcripts produced by the systems. We find that amongst the three tested systems Amazon Transcribe performs the best on the climate change corpus. The best performing model will be subsequently used to transcribe the answers to self-registered questionnaires that examines barriers to climate change.
Language
  • English
Classification
Information, communication and media sciences
Notes
  • Haute école de gestion Genève
  • Information documentaire
  • hesso:hegge
Persistent URL
https://folia.unifr.ch/global/documents/330960
Statistics

Document views: 40 File downloads:
  • SCHWANDER_MOLLET_projet_recherche_2024.pdf: 131