Automatic speech recognition : state-of-the-art and performance testing
SONAR|HES-SO
- Genève : Haute école de gestion de Genève
42 p.
English
The performance of automatic speech recognition systems (ASR) has dramatically improved over the last decade. A multitude of commercial and open-source models are available to a researcher who wishes to choose one for his or her study. Commercial vendors tend to test their models on standard benchmark corpora which do not reflect real world scenarios. We test three state-of-the-art ASR systems (Amazon Transcribe, Google Speech-To-Text and Whisper from OpenAI) on a corpus of YouTube climate change videos. We compare their performances using the standard word error rate metric and conduct fine grained analysis of the transcripts produced by the systems. We find that amongst the three tested systems Amazon Transcribe performs the best on the climate change corpus. The best performing model will be subsequently used to transcribe the answers to self-registered questionnaires that examines barriers to climate change.
-
Language
-
-
Classification
-
Information, communication and media sciences
-
Notes
-
- Haute école de gestion Genève
- Information documentaire
- hesso:hegge
-
Persistent URL
-
https://folia.unifr.ch/global/documents/330960
Statistics
Document views: 40
File downloads:
- SCHWANDER_MOLLET_projet_recherche_2024.pdf: 131