Enhancing Political Topic Detection: Chaining Automated Speech Recognition and Language Models for Automated Classification

Mariana Ramos; Pedro Chaves; Pedro  Almeida; Patricio Costa; Pedro Moreira

doi:10.33774/apsa-2024-24qbn

In this work, we present an approach for the automatic classification of political topics in the context of TV-broadcasted political debates. The full recordings of five Republican Party political debates were subjected to a pipeline involving automatic speech recognition and speaker diarization. The output chunks were then automatically classified based on a set of predefined political topics, according to (1) natural language processing (NLP, using mDeBERTa) and (2) large language models (LLMs, using GPT-4o, llama3-8b and llama 3-70b). The performance of the models was compared against manual classification. The results demonstrated that GPT-4o had the highest accuracy (69%) followed by llama3-70b (67%), llama3-8b (61%) and mDeBERTa (43%). Models’ accuracy further improved when considering secondary manual classifications from the human coders (GPT-4o: 75%, llama3-70: 74%, llama3-8b: 69%, mDeBERTa: 43%). This research demonstrates the viability of automated text classification, based on LLMs to summarize political debates.

Enhancing Political Topic Detection: Chaining Automated Speech Recognition and Language Models for Automated Classification

Abstract

Keywords

Supplementary materials

Comments

Version History

Metrics

License

DOI

Author’s competing interest statement

Ethics

Share

Enhancing Political Topic Detection: Chaining Automated Speech Recognition and Language Models for Automated Classification

Authors

Abstract

Keywords

Supplementary materials

Comments

Version History

Metrics

License

DOI

Author’s competing interest statement

Ethics

Share