Abstract
Cluster analysis is political science's primary tool for unsupervised discovery, yet the discipline has never required it to meet inferential standards. This paper introduces the Inferential Cluster Analysis (ICA) framework, which transforms cluster analysis from a descriptive tool into one capable of testing theoretical predictions, validating latent constructs, and supporting genuine inference. The framework builds three conditions into four steps : falsifiability, model fit, and robustness are enforced through theoretical pre-specification, multi-algorithmic estimation, model fitting and selection, and validation. An application tests a foundational assumption in acculturation research that has organized the field for decades. The binary model, which presumes a zero-sum relationship between heritage and American cultural attachments, fails to hold. In its place, four distinct orientations emerge: culture-affirming, assimilationist, bicultural, and demicultural. The framework thus both falsifies a longstanding assumption and validates a novel bidirectional model of acculturation.
Supplementary weblinks
Title
ICA Replication Code and Data
Description
Full replication materials for "Making Cluster Analysis Inferential: From Discovery to Measurement," including R code, data preprocessing scripts, and diagnostic outputs. All analyses were conducted in R using Quarto. The repository contains the canonical preprocessed dataset (lns_with_clusters_k4.rda) drawn from the 2006 Latino National Survey (n=4,785), clustering scripts for all five algorithms evaluated (K-Means, GMM, Hierarchical, Fuzzy C-Means, DBSCAN), bootstrap and cross-validation procedures, and all figures. The preprocessed dataset should be used for downstream analyses, as the multiple imputation step produces stochastic results that vary across R versions and platforms.
Actions
View Title
Author Website
Description
Personal academic website for Jessala A. Grijalva, Ph.D., Postdoctoral Fellow at the Institute for Latino Studies, University of Notre Dame.
Actions
View Title
Making Cluster Analysis Inferential: From Discovery to Measurement
Description
Full replication materials for "Making Cluster Analysis Inferential: From Discovery to Measurement," including R code, data preprocessing scripts, and diagnostic outputs. All analyses were conducted in R using Quarto. The repository contains the canonical preprocessed dataset (lns_with_clusters_k4.rda) drawn from the 2006 Latino National Survey (n=4,785), clustering scripts for all five algorithms evaluated (K-Means, GMM, Hierarchical, Fuzzy C-Means, DBSCAN), bootstrap and cross-validation procedures, and all figures. The preprocessed dataset should be used for downstream analyses, as the multiple imputation step produces stochastic results that vary across R versions and platforms.
Actions
View 
![Author ORCID: We display the ORCID iD icon alongside authors names on our website to acknowledge that the ORCiD has been authenticated when entered by the user. To view the users ORCiD record click the icon. [opens in a new tab]](https://preprints.apsanet.org/engage/assets/public/apsa/logo/orcid.png)