Decision Tree in R, with Graphs: Predicting State Politics from Big Five Traits
This was a continuation of prior explorations, logistic regression predicting Red/Blue state dichotomy by income or by personality. This uses the same five personality dimensions, but instead builds a decision tree. Of the Big Five traits, only two were found to useful in the decision tree, conscientiousness and openness.
Links to sample data, as well as to source references, are at the end of this entry.
Example Code
Initial Results
Fitted Results
Divided we stand: Three psychological regions of the United States and their political, economic, social, and health correlates Abstract, PDF, PDF (copy)
List of United States presidential election results by state
Sample Data
Links to sample data, as well as to source references, are at the end of this entry.
Example Code
# Decision Tree - Big Five and Politics
library("rpart")
# grow tree
input.dat <- read.table("BigFiveScoresByState.csv", header = TRUE, sep = ",")
fit <- rpart(Liberal ~ Openness + Conscientiousness + Neuroticism + Extraversion + Agreeableness, data = input.dat, method="poisson")
# display the results
printcp(fit)
# visualize cross-validation results
plotcp(fit)
# detailed summary of splits
summary(fit)
# plot tree
plot(fit, uniform = TRUE, main = "Classification Tree for Liberal")
text(fit, use.n = TRUE, all = TRUE, cex = .8)
# create attractive postscript plot of tree
#post(fit, file = file.choose(), title = "Classification Tree for Liberal")
pfit <- prune(fit, cp = fit$cptable[which.min(fit$cptable[, "xerror"]), "CP"])
# plot the pruned tree
plot(pfit, uniform = TRUE, main = "Pruned Classification Tree for Liberal")
text(pfit, use.n = TRUE, all = TRUE, cex = .8)
# create attractive postscript plot of tree
post(pfit, file = file.choose(), title = "Pruned Classification Tree for Liberal")
Initial Results
Rates regression tree:
rpart(formula = Liberal ~ Openness + Conscientiousness + Neuroticism +
Extraversion + Agreeableness, data = input.dat, method = "poisson")
Variables actually used in tree construction:
[1] Conscientiousness Openness
Root node error: 35.019/48 = 0.72956
n=48 (2 observations deleted due to missingness)
CP nsplit rel error xerror xstd
1 0.36104 0 1.00000 1.03391 0.024507
2 0.13203 1 0.63896 0.70221 0.134896
3 0.01000 2 0.50692 0.63244 0.137426
> # detailed summary of splits
+ summary(fit)
+
Call:
rpart(formula = Liberal ~ Openness + Conscientiousness + Neuroticism +
Extraversion + Agreeableness, data = input.dat, method = "poisson")
n=48 (2 observations deleted due to missingness)
CP nsplit rel error xerror xstd
1 0.3610441 0 1.0000000 1.0339150 0.02450674
2 0.1320324 1 0.6389559 0.7022068 0.13489629
3 0.0100000 2 0.5069236 0.6324368 0.13742579
Variable importance
Conscientiousness Openness Neuroticism Agreeableness
43 16 15 14
Extraversion
11
Node number 1: 48 observations, complexity param=0.3610441
events=20, estimated rate=0.4166667 , mean deviance=0.7295573
left son=2 (19 obs) right son=3 (29 obs)
Primary splits:
Conscientiousness < 53.7 to the right, improve=13.061310, (0 missing)
Openness < 48.2 to the left, improve= 5.008984, (0 missing)
Agreeableness < 45 to the right, improve= 4.720179, (0 missing)
Neuroticism < 60.9 to the left, improve= 2.936428, (0 missing)
Extraversion < 44.85 to the right, improve= 1.560031, (0 missing)
Surrogate splits:
Neuroticism < 45.9 to the left, agree=0.729, adj=0.316, (0 split)
Agreeableness < 59.2 to the right, agree=0.708, adj=0.263, (0 split)
Extraversion < 54.9 to the right, agree=0.688, adj=0.211, (0 split)
Openness < 60.3 to the right, agree=0.625, adj=0.053, (0 split)
Node number 2: 19 observations
events=1, estimated rate=0.09345794 , mean deviance=0.3311521
Node number 3: 29 observations, complexity param=0.1320324
events=19, estimated rate=0.6369427 , mean deviance=0.5546051
left son=6 (13 obs) right son=7 (16 obs)
Primary splits:
Openness < 47.35 to the left, improve=4.7031650, (0 missing)
Conscientiousness < 40.9 to the right, improve=2.2056170, (0 missing)
Agreeableness < 45.25 to the right, improve=1.3349930, (0 missing)
Extraversion < 44.95 to the right, improve=0.4743020, (0 missing)
Neuroticism < 48.8 to the right, improve=0.2903434, (0 missing)
Surrogate splits:
Conscientiousness < 46.15 to the right, agree=0.724, adj=0.385, (0 split)
Agreeableness < 50.75 to the right, agree=0.690, adj=0.308, (0 split)
Neuroticism < 49.25 to the left, agree=0.655, adj=0.231, (0 split)
Extraversion < 44.95 to the right, agree=0.655, adj=0.231, (0 split)
Node number 6: 13 observations
events=4, estimated rate=0.3246753 , mean deviance=0.7262304
Node number 7: 16 observations
events=15, estimated rate=0.8695652 , mean deviance=0.1261841
# plot tree
plot(fit, uniform = TRUE, main = "Classification Tree for Liberal")
text(fit, use.n = TRUE, all = TRUE, cex = .8)
Fitted Results
#prune the tree
pfit <- prune(fit, cp = fit$cptable[which.min(fit$cptable[, "xerror"]), "CP"])
# plot the pruned tree
plot(pfit, uniform = TRUE, main = "Pruned Classification Tree for Liberal")
text(pfit, use.n = TRUE, all = TRUE, cex = .8)
# create attractive postscript plot of tree - creates.ps, converted to .pdf, and then cropped for image (below)
post(pfit, file = file.choose(), title = "Pruned Classification Tree for Liberal")
Divided we stand: Three psychological regions of the United States and their political, economic, social, and health correlates Abstract, PDF, PDF (copy)
List of United States presidential election results by state
Sample Data
Comments
Post a Comment