From Text to Events: Turning Freedom House Reports into Evidence of Democratization and Democratic Backsliding

31 March 2026, Version 1
This content is an early or alternative research output and has not been peer-reviewed at the time of posting.

Abstract

A key limitation in debates about democracy measurement and whether democratic backsliding is occurring is the mismatch between theoretical claims — about specific institutional actions and processes — and available data. Political events are the appropriate unit of analysis: they denote discrete actions by identifiable actors with consequences for democratic institutions, but are difficult to produce at scale. This paper presents an LLM-based pipeline that transforms Freedom House's annual country reports (1990–2024) into structured event data. Our four-stage process produces nearly 200,000 annotated events across 228 countries, each tied to textual evidence. Validation against human coders and existing datasets demonstrates high construct validity. We apply the data to compare democratic backsliding in Hungary and Poland. The primary contribution is auditable interpretation as a standard for LLM-assisted measurement — producing outputs traceable to verbatim source material. Such event-level data enables analyses of institutional sequencing where indices and case studies fall short.

Keywords

Democracy
Democratic backsliding
Qualitative data
Freedom House
Large Language Models

Supplementary materials

Title
Description
Actions
Title
Supplementary Materials for "From Text to Events"
Description
This supplementary document accompanies "From Text to Events: Turning Freedom House Reports into Evidence of Democratization and Democratic Backsliding" (Wilson, Martin-Morales, and Nelson, 2026). It provides methodological detail supporting the paper's core pipeline, which uses large language models to extract structured political events from Freedom House country reports spanning 1990–2024. The appendices include diagnostic figures on sentence and highlight extraction rates, inter-coder reliability comparisons between human coders and GPT, stability tests across repeated model runs, and validation results for eight case-study countries. Additional appendices document the complete prompt templates used for highlight selection, event extraction, thematic tagging, and democratic sentiment classification, along with configuration parameters for each pipeline stage. A detailed de-duplication procedure is described, and country-year coverage and formatting changes in Freedom House reports over time are documented. Replication files and the final annotated event dataset will be made available for online access.
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.