Automatic speech recognition

Schwander, Aparna; Mollet, Cécile

Back

SCHWANDER_MOLLET_projet_recherche_2024.pdf

Research report

Automatic speech recognition : state-of-the-art and performance testing

SONAR|HES-SO

Schwander, Aparna
Mollet, Cécile
Mumenthaler, Christian (Degree supervisor)

Genève : Haute école de gestion de Genève

42 p.

English The performance of automatic speech recognition systems (ASR) has dramatically improved over the last decade. A multitude of commercial and open-source models are available to a researcher who wishes to choose one for his or her study. Commercial vendors tend to test their models on standard benchmark corpora which do not reflect real world scenarios. We test three state-of-the-art ASR systems (Amazon Transcribe, Google Speech-To-Text and Whisper from OpenAI) on a corpus of YouTube climate change videos. We compare their performances using the standard word error rate metric and conduct fine grained analysis of the transcripts produced by the systems. We find that amongst the three tested systems Amazon Transcribe performs the best on the climate change corpus. The best performing model will be subsequently used to transcribe the answers to self-registered questionnaires that examines barriers to climate change.

Language

English

Classification

Information, communication and media sciences

Notes

Haute école de gestion Genève
Information documentaire
hesso:hegge

Persistent URL

https://folia.unifr.ch/global/documents/330960

Statistics

Document views: 40 File downloads:

SCHWANDER_MOLLET_projet_recherche_2024.pdf: 131