Posts

Showing posts from May, 2017

Neural Networks (Part 4 of 4) - R Packages and Resources

Image
While developing these demonstrations in logistic regression and neural networks, I used and discovered some interesting methods and techniques:
Better Methods A few useful commands and packages...:
update.packages() for updating installed packages in one easy actionas.formula() for creating a formula that I can reuse and update in one action across all my code sectionsView() for looking at data framesfourfoldplot() for plotting confusion matricesneuralnet for developing neural networkscaret, used with nnet, to create predictive modelplotnet() in NeuralNetTools, for creating attractive neural network models Resources that I used or that I would like to explore... MS Azure Notebooks, for working online with Python, R, and F#, all part of MS's data workflowsEfficient R Programming, that seems to have many good tips on working with RData Mining Algorithms in SSAS, Excel, and R, showing various algorithms in each technologyR Documentation, a high quality, useable resource To explor…

Attractive Confusion Matrices in R Plotted with fourfoldplot

Image
As part of writing analyses using neural networks I thought of displaying a confusion matrix, but went looking for something more. What I found was either ugly and simple, or attractive but complicated. The following code demonstrates using fourfoldplot, with original insight gained from fourfoldplot: A prettier confusion matrix in base R, and a great documentation page, R Documentation's page for fourfoldplot.

The code is below, as is the related graph. Source data can be found here.

############################################### # Load and clean data ################################################ Politics.df <- read.csv("BigFiveScoresByState.csv", na.strings = c("", "NA")) Politics.df <- na.omit(Politics.df) ################################################ # Neural Net with nnet and caret ################################################ # load packages library(nnet) library(caret) # set equat…

Neural Networks in R (Part 3 of 4) - Neural Networks on Price Changes in Financial Data

Image
This post is a demonstration using the caret and nnet packages on aggregate Big Five traits per state and political leanings, and a continuation of a series:
Neural Networks (Part 1 of 4) - Logistic Regression and neuralnet on State 'Personality' and Political OutcomesNeural Networks (Part 2 of 4) - caret and nnet on State 'Personality' and Political Outcomes This process using a neural net process to train then test on data. The neural net itself would be envisioned as follows:

Initially, the results seemed promising, with a prediction accuracy of ~.85, further analysis revealed that the bulk of accurate predictions were those that had no price change, and that price change classes were often inaccurate. I excluded security types without price change, and the average dropped even further. I then created a loop that calculated test and training data by security type, and those result indicated that the neural net very right and very wrong, for price change state. Only…

Neural Networks in R (Part 2 of 4) - caret and nnet on State 'Personality' and Political Outcomes

Image
This is a continuation of a prior post, Neural Networks (Part 1 of 4) - Logistic Regression and neuralnet on State 'Personality' and Political Outcomes, the second in a series exploring neural networks.

This post is a demonstration using the caret and nnet packages on aggregate Big Five traits per state and political leanings. Given the small sample size, there was a lower predictive ability using the train/test scenario using 80%, around .66, versus the entire set, which was correct at .83. This is evident in the graphs using lm() and stat_smoot().

The code is below, as are some related graphs. Source data is here.

################################################ # Neural Net with nnet and caret ################################################ # load packages library(nnet) library(caret) # train Politics.model <- train(equation, Politics.df, method = 'nnet', linout = TRUE, trace = FALSE) Politics.model.predicted <- predict(Po…

Neural Networks in R (Part 1 of 4) - Logistic Regression and neuralnet on State 'Personality' and Political Outcomes

Image
This is an example of neural networks using the neuralnet package on one of my sample data sets, walking through a Pluralsight training series, Data Mining Algorithms in SSAS, Excel, and R.

In terms of results, the regression primarily predicts voter leanings based on five (5) traits, openness, conscientiousness, extraversion, agreeableness, and neuroticism, although only the first two (2) traits have a significant impact. Logistic regression is about 85% predictive, and slightly better at predicting Red states over its ability in predicting Blue states.

[1] "Logistic Regression (All) - Correct (%) = 0.854166666666667" [1] "Logistic Regression (Red) - Correct (%) = 0.892857142857143" [1] "Logistic Regression (Blue) - Correct (%) = 0.8"
For the neuralnet package, I created a loop to vary the hidden layers and the number of repetitions, since this is such a small number of records. This is obviously much less predictive than logistic regression, …

Support Vector Machines on Big Five Traits and Politics

Image
This is an example of Support Vector Machines, using one of my usual data sets, as part of a Pluralsight training presentation, Data Mining Algorithms in SSAS, Excel, and R.

In terms of results, the prediction primarily predicts voter leanings based on two (2) traits, openness and conscientiousness, and although using all five (5) factors improved the prediction quality, plotting that is problematic. For this, the model is 96% predictive of Republican outcomes, but only 66% accurate in predicting Democratic leaning.
Politics.prediction Blue Red Blue 15 1 Red 5 27 The code is below, as are some related graphs. Source data is here.
# Clear memory rm(list = ls()) # set Working directory getwd() setwd('../Data') # load data Politics.df <- read.csv("BigFiveScoresByState.csv", na.strings = c("", "NA")) # clean data - remove NULLs Politics.df <- na.omit(Politics.df…

Charting Correlation Matrices in R

Image
I noticed this very simple, very powerful article by James Marquez, Seven Easy Graphs to Visualize Correlation Matrices in R, in the Google+ community, R Programming for Data Analysis, so thought to give it a try, since I started some of my current analyses a decade ago by generating correlation matrices in Excel, which I've sometimes redone and improved in R.

Some of these packages are only designed for display, or as extensions to ggplot2:
corrplot: Visualization of a Correlation MatrixGGally: Extension to 'ggplot2'ggcorrplot: Visualization of a Correlation Matrix using 'ggplot2' These two are focused on more complex analysis:
PerformanceAnalytics: Econometric tools for performance and risk analysispsych: Procedures for Psychological, Psychometric, and Personality Research As for data, I used Hofstede's culture dimensions, limited to developed countries. Using a broader and larger set of of countries would significantly reduce the correlations, in that only in…

Decision Trees (party) on Political Outcome Based on State-level Big Five Assessment

Image
This decision tree demo is similar to a prior one I've done, but in this case it uses the party package, that produces much higher quality graphics than rpart, at least when used with plot.. This was done as part of a Pluralsight training presentation, Data Mining Algorithms in SSAS, Excel, and R.

Source data is here.

# # Load Data # # Set working directory setwd("../Data") getwd() # read from data frame BigFivByState.df <- read.table("BigFiveScoresByState.csv", header = TRUE, sep = ",") # # Run Analysis # # Load package, install.packages('party', dependencies = TRUE) library(party) # train the model BigFivByState.dt <- ctree(data = BigFivByState.df, Liberal ~ Openness + Conscientiousness + Extraversion + Neuroticism + Agreeableness) # plot result plot(BigFivByState.dt, uniform = TRUE, main = "Classification Tree for Politics on Big Five")

Naive Bayes on Political Outcome Based on State-level Big Five Assessment

Image
As part of another Pluralsight training presentation, Data Mining Algorithms in SSAS, Excel, and R, I worked through various exercises, and from that I've adapated Naive Basyes to one of my existing data sets.

The code is below, as are some related graphs. Overall, the percent correct predicted based on Big Five personality traits using the Naive Bayes calculation is 66%.

Source data is here.

# # Load & Explore Data # # read from data frame BigFivByState.df <- read.table("BigFiveScoresByState.csv", header = TRUE, sep = ",") # review data head(BigFivByState.df) nrow(BigFivByState.df) summary(BigFivByState.df) names(BigFivByState.df) # various aggregations # as "count this value" ~ grouped by this + this Liberal.dist <- aggregate(State ~ Liberal, data = BigFivByState.df, FUN = length) head(Liberal.dist) RedBlue.dist <- aggregate(State ~ Politics, data = BigFivByState.df, FUN = l…

Logistic Regression on Stock Data using Google and SPY (SPDR S&P 500)

As part of a Pluralsight training presentation, Understanding and Applying Logistic Regression, students worked through various exercises, one of which was predicting stock price changes, up or down, on Google, using Google and Spyder closing prices.

As an ordered list of actions:
Load data - Yahoo financials for each day for 5 years, taking only date and closing price for this analysis Transform sources: Merge sources, change column headings, cast the Date column as DATE type, sort descending Perform logistic regression Create a frame of actual versus predicted changes, and add a column for the correct/incorrect prediction result Find percent correct, on whether the price moved correctly up or down As a result, the lagged Google and SPY prices accurately predict next day prices about 63% of the time.

Source data is here.

# Clear memory rm(list = ls()) # Set working directory setwd("../Data") getwd() # load data # Data is Yahoo financial price f…