The Split Senate

,


Background
There are numerous techniques to assess political relationships within voting organizations based on the compilation of statistics from votes by members.Methods range from clustering techniques [1][2][3] and network analyses [3][4][5], to multi-dimensional scaling (MDS) [6][7][8] and dimensionality reduction [9][10][11][12].These approaches seek to highlight, or define, where people or groups of people are most similar and most divergent in their political behaviors.Applications include analysis and interpretation of historical voting data [4,[13][14][15], identification of voting coalitions among individuals [16][17][18][19], and prediction of future voting patterns [2,20].Within one visualization it can be difficult to encompass or present a view of interesting or unique relationships of individuals relative to some larger group, each other, and other sub-groups, and interpretable quantitative information on the strength or proximity of these relationships is absent in the output from many methods [4,9,19,[21][22][23]. Additionally, for exploratory analysis it is challenging to obtain an unbiased visualization that allows for a multi-level as well as global picture of the political dataset of interest prior to more narrow, focused investigation [3,19,24,25].
We propose the use of an algorithm popular in phylogenetics, but which has not been used in political science applications to date.The Neighbor-Net algorithm (NNet), is useful for visualizing hierarchical structure based on splits in datasets, i.e. typically partitions into two parts of taxa in biology.The NNet algorithm creates an object called a circular split system, along with a split network that realizes it in a 2D network representation [26][27][28].These objects are also often referred to as phylogenetic networks [29].The input to the algorithm is a matrix of distances among objects.In phylogenetics, distances are typically based on molecular distances from DNA alignments or other phylogenetically informative characters.NNet has also been used for studying relationships among languages, where distances are based on linguistic characters such as phonemes [30,31].The advance NNet represents over standard phylogenetic methods is that it can reveal signals in the data that are in conflict with a strictly hierarchical tree structure.This is important in phylogenetics where gene trees may conflict with species trees, and in linguistics where distinct characters may have been shared at different times, sometimes between spatially and temporally distant languages.Recently, NNet has also been used to analyze structure in single-cell gene expression data [32].Though NNet has not previously been used for political data analysis, it is similar to commonly used MDS techniques in that it takes as input dissimilarity based measurements between the elements of the dataset [26].Given a set of n elements in M : {m 1 , ....m n }, representing members of the Senate as m, and an element × feature matrix (features represented as votes here) R (Fig. 1a), a pairwise dissimilarity (distance) matrix δ is constructed (Fig. 1b).With this distance matrix NNet will generate a circular ordering of these elements π = {m 1 , ....m n }, where m i and m i+1 are adjacent vertices on an n-cycle C n comprised of the elements of M , and a split-system (Fig. 1c).
A split A|B is a bi-partition of the set of elements in M , where A ∪ B = M, A, B = ∅, and A ∩ B = ∅, and a split-system is a collection of splits (Fig. 1c).NNet is an agglomerative algorithm which works to construct a circular ordering by iteratively joining the nodes of graph G whose vertices are composed of the elements of M , and defining splits of these nodes at each agglomeration (joining) step [26].This produces a circular split-system Σ which can be graphically described as placing the elements "around a circle and consider[ing] the splits given by cutting the circle along a line" [26] (Fig. 1c).
The system can then be visualized as a planar splits graph [27] (Fig. 1d).The weights of the splits, λ, (represented by lengths in the planar graph)(Fig.1d) are then obtained by performing non-negative least squares optimization for the constructed π and Σ so as to best match δ [26].Together π, Σ, and λ constitute the 2D visualization of a circular split-network, as is shown in Fig. 1d.
A particularly useful interpretation of a split network is through its representation of 'feature diversity' [34].For instance, features found for elements in A but not for B can be assigned to split A|B and features exclusive to B not A can be assigned to a split A|B ∪ ρ where ρ represents an outgroup [35].This implies that these distinguishing features can be inferred from the splits comprising the split network [35].With these underlying facets of the circular split system created by NNet, we will demonstrate its applicability to generating visualizations in the political arena, focusing on the multi-level, quantitative relationships and structures it reveals between the members of the US Senate.

Methods
To assess the behaviors of the senators we chose to use their roll-call votes [36], a popular choice for determining similarities, differences, and coalitions between political members [14,16,17,20].We initially investigated the structure of the current Senate in the 116th Congress.We placed the vote records of the senators for the given Congress in an n × v matrix R with n senators, and v votes.Senators not present for the entirety of a given Congress are not used in this evaluation.
Values for each entry in matrix R are defined by each of the n senator's votes:

*Abstain incorporates abstentions and 'Present' votes.
A distance (dissimilarity) matrix δ was then created by computing pairwise L 1 distances between pairs of senators to obtain an n × n matrix with entries: for i, j = 1...n.This matrix δ was provided as the distance matrix input to the NNet implementation in SplitsTree4 [28], to calculate the ordering and splits for a graphical, non-hierarchical visualization, and to extract the split weights for the system.
We then used this representation of the matrix δ to identify outstanding structures or relationships and trace their origin back to the original input votes, given the nature of split-systems highlighted previously i.e. the feature representation underlying the construction of the splits.To extract votes which contribute to particular splits of interest we applied the split to the original voting input, and selected for features (votes) which characterize that split (separate the individual or group of individuals from the other members).
We additionally ranked votes by their likelihood of being associated with any given split of interest.
To do so we define p-values for a vote using the Fisher's exact test [37] for a 2 × 3 contingency tables between a split {A|B} and counts of the vote types {0, 1/2, 1}.The table, shown below, denotes the counts for the intersections between members in {A, B} and {C, D, E}, where C, D, and E are the sets of members whose votes were 0, 1/2, and 1 respectively.
We used a two-sided Fisher's exact test to determine p-values for assessing how likely a more 'extreme' contingency table for (how far from random) a particular vote's table would be.We were thereby ranking, for a given split of Senate members, the likelihood of the voting behaviors in each vote being associated with that split.For ranking purposes, we report the raw p-values for each vote.
In order to quantitatively compare agreement of senators within and across parties we used the circular split-system to define a Senate 'center' against which all senators can be compared.For the given Congress, the center can be defined by the exact split which delineates the two main divisions within the split-network i.e. the Republican party members and the Democrat members (inclusive of Independents) [38].From this split, distances of any individuals (or groups) can be obtained by summing the appropriate split weights from the calculated λ.In general the distance d(S) between some subset S of all members M is the sum of the split weights of all splits A|B for which the elements (members) of S are separated [35]: Center distances are then calculated for individuals over their time in the Senate, by summing the weights of all splits with the member separated from all members of the opposing party.

Results and Discussion
For the current 116th Senate, the split network output by running NNet on the distance matrix generated from Senate votes is shown in Fig. 2a.To better understand voting patterns by party we also generated the split network from Democrat (including Independent) and Republican senators separately (Fig. 2b,c).Note that the split network produced by running NNet run on a subset of a matrix will be the same as the restriction of the split network produced by running NNet on the full matrix.
We first verified that the generated split weights λ represented the same magnitudes of dissimilarity between pairs of senators as encapsulated in the input matrix δ, constructed directly from the vote matrix.By Pearson correlation analysis of the pairwise distances calculated from λ, using (2), and δ (Fig. 3) we found a correlation of 0.994.This shows that the split network representation of the votes is highly concordant with the raw voting matrix, confirming that the circular split system is a good model for the voting structure.We did find several 'outlier' pairwise distances, outlined in black, which lie on the rim of the convex hull encompassing the distances (Fig. 3).These represent pairs of senators for which the pairwise distances are more discordant with the split network.Individuals with repeated representation in these outlier points included Senators Warren, Booker, and Gillibrand.That the circular split system does not reflect their voting patterns perfectly is also evident in the longer lengths of their respective splits in Fig. 2a relative to the other senators.
Having inferred the circular split-system representation and split weights for the 116th Senate, we next examined individual relationships and neighbors across all members (Fig. 2a).As expected, there is a strong split dividing members of the two major parties.The split network also reveals member's nearest neighbors based on their voting behaviors, and noted 'mavericks' or 'centrists', such as Sen. Collins (Rep.) and Sen. Manchin (Dem.), stand out in their distant, centered placement relative to the rest of the Senate [10] (Fig. 2a).
While many individual's nearest neighbors are maintained regardless of whether the split system is generated with only votes from within-party members or all senators' votes, there are some  neighbor differences dependent on the inclusion or exclusion of the other party's voting data.For example within the Republican party, while Senators Perdue and Sullivan remain neighbors in both split networks (Fig 2a,b, marked as 3), Sen. Tillis joins their ranks only in the Senate-wide split system.These changes suggest a delineation between those who vote similarly with respect to senators within their party, versus with respect to how they vote against the other party, and can be helpful for determining whether inclusion/exclusion of the other party's voting data is useful for a particular investigation or question.
Beyond pairs of individuals, the Senate-wide diagram highlights apparent coalitions within the greater Senate structure, visible by clustering of particular individuals in the circular order, and in larger relative magnitudes of split weights (lengths) separating groups of individuals from the rest of the system (Fig. 2a, denoted 1 and 2).An interesting and notable example is the split of Democratic primary candidates from the rest of the Senate (Fig. 2a, denoted 1).Of the seven main incumbent senators to run in Democratic presidential primary [39], five consistently cluster together in both the Senate-wide and intra-party circular split systems (Fig. 2a,c).It should be noted that Senators Bennett and Gillibrand do not consistently cluster together with the rest of the candidates in both diagrams, again suggesting separation by voting behavior when votes of the opposing party are under consideration.
To verify whether this sequential ordering of these candidates was significant we used the Wald-Wolfowitz runs test [40] to determine the likelihood that this particular ordering was random (the null hypothesis).For this test, the circular ordering of senators can be represented as a linear ordering with senators that were Democratic Primary candidates represented as 0's and the other senators as 1's.To test for significant difference from the null hypothesis of a random ordering we found the probability of observing less than seven runs (at least five candidates clustered together) occurring in any ordering of the binarized senator representations.A run denotes a contiguous stretch of the ordering with senators from the same category (0 or 1).In both the Senate-wide and Democrat-only circular orderings, p-values were <0.001, revealing a statistically significant departure from randomness in the non-random ordering of these senate members The inherent feature-representation of split-systems described previously also facilitates mapping of the splits of interest back to the features (votes) that underlie that split.For instance, given the split of five Democratic Primary candidates, we traced back the split to the votes contributing to their unique voting pattern by first extracting votes where all candidates voted the same.Of these votes, we found a particular set in which a majority of the rest of the party did not vote in accordance with these senators (Fig. 4a), temporally clustered in the latter half of 2019 (Fig 4a).These votes with the largest discrepancy were all abstentions by these senators, behavior which aligns with the previously noted trend of Presidential candidates abstaining during campaign periods [41] (Fig. 4a).
For these (or any) splits of interest we can assign a statistical interpretation to how the votes contribute to the splits of interest by ranking them by p-value as described in the Methods.For this particular split of the five Primary candidates we see that the ranking results (Fig. 4b) are concordant with the votes of low intra-party agreement (Fig. 4a).The clustered abstentions have the highest likelihoods of contributing to splits, among other Yea or Nay votes also contributing to this split.The p-value assignments also allow for investigation of distinct behaviors in the votes contributing to a split of interest.With the ranked votes we fit a LOESS (Local Regression) curve to the p-values (Fig. 4b, dashed line).This demonstrates the apparent temporal progression the contributing votes follow, with an upward trend in p-values leading to the abstention period, and a decrease in rankings following that time period (Fig. 4b).
We then removed these clustered abstentions to discern who these five Senate members vote similarly to outside of this abstention time period.After a second removal of low agreement abstentions, for the split of Senators Booker, Sanders, Warren and Harris, who remained clustered despite the initial removal, we see that this group remains split from the rest of the party by voting behavior, with Sen. Gillibrand (Fig. 4c).
This analysis is also not limited to any particular split.Thus we next applied these techniques to another apparent 'coalition' within the party structure (Fig. 5).We focused on the split of Senators Manchin, Sinema, and Jones, who cluster on the opposite end of the Democrat split network from the candidate senators, (Fig. 2c, denoted as 2) and are situated between both major parties in the Senate-wide split network (Fig. 2a, denoted as 2).Votes separating these senators from the rest of their party, obtained by the p-value ranking described above, are scattered across time (Fig. 5a,b), in contrast to the clustered abstentions of the previous split of senators analyzed.By tracing back the contributing votes for this split, we can additionally quantify the prevalence of particular topics in the highest-ranked votes contributing to the split (Fig. 5c).With detailed descriptions of the top votes, we can visualize the representative content of these votes as well (Fig. 5d).Although p-values are used here for ordinal purposes, they can be corrected for multiple testing to determine which votes are significantly associated with a split.
From the split networks in Figures 2 and 4c,d, we see a variety of structures within the Senate and the individual parties, with particularly dense areas as well as sparse or distant regions of individuals denoting areas of high or low voting agreement.To assess and visualize this agreement across Senate members we denoted the 'center' (Fig. 2a) split as described in the Methods to make relative quantifications of how spread out member's voting behaviors are.This also provides a comparative metric for how 'left' or 'right' of center members are [42].This assignment of distances from the center is not limited to the 116th Congress, and thus we viewed the dynamics of this metric over time for all Senates over the last 30 years (Fig. 6a).
By aggregating distances for each of the main parties, we can visualize if or how the spread and magnitude of voting agreement within and between parties has changed over time as a product of their constituents.What we observe fits with the trend, noted in previous literature, of increasing partisanship in the Senate [4,23,43], at least within the last six years.This is demonstrated by upward shifts in the median party distances, i.e. increasing distances of each party's members from the center.The larger spread of center distances observed in the Democratic party in recent Senates versus a tightening of the Republican distances also suggests differing levels of voting unification within each party [44].This is also in contrast to earlier senates, where greater 'unification' (tighter distance distributions) in the Democratic party is demonstrated (Fig. 6a 101st, 102nd).We can additionally investigate these agreement distributions at the level of their constituent members, as visualized for the 116th Senate (Fig. 6b).At this individual-level we can note the differences in magnitude of the center distances among non-Republican senators versus Republican senators and place each senator within the greater distribution.
For Senate members who were not active for the full term of the 116th Congress, and thus not included in the main Senate analysis, we created split network visuals for the voting period in which they were present (Fig. 7).From this, we can view for these senators who their nearest political neighbors are given the votes they did cast.This includes a split network for Sen. Loeffler (Rep.)(Fig.7a) and her predecessor Sen. Isakson (Rep.) (Fig. 7b), as well as Sen. McSally.Sen. Kelly (Dem.)(Fig.7c), who recently succeeded Sen. McSally, did not have enough votes to create a representative splits graph.While Sen. Loeffler's nearest neighbor is her fellow Georgia senator, Sen. Perdue, Sen. Isakson appears more discordant in his voting behavior during the 116th Senate, visible by the long split separating him from the rest of the party.Sen. McSally's positioning is similar to that of Sen. Loeffler, in that she is clustered near other members (neighbors with Senators McConnell and Capito), lying within the denser region of the party's split network.
Together these findings demonstrate the utility of the NNet-SplitsTree algorithms for creating representations and visualizations of voting data that facilitate exploratory analysis and identify voting patterns that may not otherwise be obvious.In our analyses we discovered relationships   between pairs of senators as well as voting relationships among larger clusters of members.The nature of circular split networks also facilitates both a qualitative and quantitative investigation of the combinatorial nature of the underlying data.Their implied and traceable relations to the original features used for split creation can be used to generate an interpretable visualization of the relationships between political members based on their pertinent behaviors, and enable comparative analysis across the discovered structures over time.

Figure 1 :
Figure 1: a) Matrix R of feature (vote) values for six elements (senators) m b) L1 pairwise distance matrix between all six elements c) Circular split system with elements m shown in circular ordering, and splits defined as bi-sections of the circle d) Splits graph representation of circular system in c with split weights denoted along splits.Parallel splits (representing the same split) denoted by the same colors.Figure adapted from [26, 33].[Code]

Figure 2 :
Figure 2: a) Splits graph representation of Senate-wide split network, for the 116th Congress.Republican, Democrat, and Independent members shown with colors.Nearest-neighbors and apparent 'coalitions' of members shown in circles.b) Splits graph for split network of only Republican votes c) Splits graph for split network of only Democrat (inclusive of Independent) votes.[Code]

Figure 3 :
Figure 3: Pearson correlation between pairwise L1 distances and pairwise split weight distances (Methods).Each dot represents distance between a pair of senators.Pairs forming the edge of the correlation plot, including points with the most discordant distances, are circled in black.[Code] /2019 -01/08/2020 (SANDERS,BOOKER,WARREN, HAR RIS,K LOB U CH AR) Vote Disagreeme nt

Figure 4 :
Figure 4: a) For roll-call votes where these five Primary candidates voted the same, percent disagreement within the party is shown (fraction of remaining members of the party who voted differently).Votes colored by the vote cast by the candidates.b) All roll-call votes ranked by p-value for the given split of the five Primary candidates.Abstentions with low agreement from a colored in ranking.Raw p-values reported here.LOESS (Local Regression) fit for p-value rankings over time (roll-call votes) shown by dashed line c) Splits graph for splits network of only Democrat (inclusive of Independent) votes after two iterative removals of low-agreement abstentions for clustered Primary candidates.[Code] for {MANCHIN,SINEMA,JONES} Split 2 {MANCHIN,SINEMA,Jones} Vote Disagreement 2

Figure 5 :
Figure 5: a) For roll-call votes where Senators Manchin, Sinema, and Jones voted the same, percent disagreement within the party is shown (fraction of remaining members of the party who voted differently).Votes colored by the vote cast by the candidates.b) All roll-call votes ranked by p-value for the given split of the Senate members.'Top Ranked' votes (-log10(p-values) above 3) highlighted c) Counts of vote topics for top votes colored in b shown d) Word cloud for top vote descriptions shown.Generated from worditout.com.[Code]