Enhancing Political Topic Detection: Chaining Automated Speech Recognition and Language Models for Automated Classification

04 October 2024, Version 1
This content is an early or alternative research output and has not been peer-reviewed at the time of posting.

Abstract

In this work, we present an approach for the automatic classification of political topics in the context of TV-broadcasted political debates. The full recordings of five Republican Party political debates were subjected to a pipeline involving automatic speech recognition and speaker diarization. The output chunks were then automatically classified based on a set of predefined political topics, according to (1) natural language processing (NLP, using mDeBERTa) and (2) large language models (LLMs, using GPT-4o, llama3-8b and llama 3-70b). The performance of the models was compared against manual classification. The results demonstrated that GPT-4o had the highest accuracy (69%) followed by llama3-70b (67%), llama3-8b (61%) and mDeBERTa (43%). Models’ accuracy further improved when considering secondary manual classifications from the human coders (GPT-4o: 75%, llama3-70: 74%, llama3-8b: 69%, mDeBERTa: 43%). This research demonstrates the viability of automated text classification, based on LLMs to summarize political debates.

Keywords

political debates
automatic speech recognition
speakers diarization
natural language processing
large language models

Supplementary materials

Title
Description
Actions
Title
Codebook
Description
Variables labels
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.