Support Vector Machines on Big Five Traits and Politics

This is an example of Support Vector Machines, using one of my usual data sets, as part of a Pluralsight training presentation, Data Mining Algorithms in SSAS, Excel, and R.

In terms of results, the prediction primarily predicts voter leanings based on two (2) traits, openness and conscientiousness, and although using all five (5) factors improved the prediction quality, plotting that is problematic. For this, the model is 96% predictive of Republican outcomes, but only 66% accurate in predicting Democratic leaning.

 Politics.prediction  Blue Red  
                Blue   15   1  
                 Red    5  27  

The code is below, as are some related graphs. Source data is here.
 # Clear memory  
 rm(list = ls())  
   
 # set Working directory  
 getwd()  
 setwd('../Data')  

 # load data
 Politics.df <- read.csv("BigFiveScoresByState.csv", na.strings = c("", "NA"))  
   
 # clean data - remove NULLs  
 Politics.df <- na.omit(Politics.df)  
  
 # explore data #1 - str shows type, more useful in some instances than summary is useful  
 str(Politics.df)  

 # explore data #2 - simple plots
 library(ggplot2)  

 # plot openness
 plot.openness <- ggplot(Politics.df, aes(x = Openness, fill = Politics))  
 plot.openness.histo <- plot.openness + geom_histogram(binwidth = 1)  
 plot.openness.histo + scale_fill_manual(values = c("Red" = "red", "Blue" = "blue"))  



# plot conscientiousness
 plot.conscientiousness <- ggplot(Politics.df, aes(x = Conscientiousness, fill = Politics))  
 plot.conscientiousness.histo <- plot.conscientiousness + geom_histogram(binwidth = 1)  
 plot.conscientiousness.histo + scale_fill_manual(values = c("Red" = "red", "Blue" = "blue"))  
   


# set working data set
 Politics.training <- Politics.df  
 
 # create subsets
 Politics.predictors <- Politics.training[, 4:6]  
 Politics.predicted <- Politics.training[, 12]  
    
 # review subsets - I used this to troubleshoot code issues
 # the svm function needs matrices, and I was incorrectly giving it lists
 str(Politics.predictors)  
 str(Politics.predicted)  

 # load library for SVM  
 library(e1071)  

 # train
 Politics.svm <- svm(data = Politics.training, Politics ~ Openness + Conscientiousness)

 # review results 
 summary(Politics.svm)  

 # generate predict
 Politics.prediction <- predict(Politics.svm, Politics.predictors)  
  
 # combine predictions with actual  
 table(Politics.prediction, Politics.predicted)  
  
 # plot results
 plot(Politics.svm, Politics.training, Openness ~ Conscientiousness)  

Comments

Popular posts from this blog

Charting Correlation Matrices in R

Cultural Dimensions and Coffee Consumption

Developers in New York City by Zip Code