Posts

Principal Component Analysis (PCA) on Stock Returns in R

Image
Principal Component Analysis Principal Component Analysis is a statistical process that distills measurement variation into vectors with greater ability to predict outcomes utilizing a process of scaling, covariance, and eigendecomposition.
MS Azure Notebook The work for this is done in the following notebook, Principal Component Analysis (PCA) on Stock Returns in R, with detailed code, output, and charts. An outline of the notebook contents are below.
Overview of DemonstrationSupporting MaterialPluralsightExplained VisuallyWikipediaLoad Data: Format Data & SortPrep Data: Create ReturnsGenerate Principal ComponentsEigen Decomposition and Scree PlotCreate Principal ComponentsAnalysisFVX using PCA versus Logistic RegressionAlternative Libraries: Psych for the Social Sciences

Exercises in Programming Style by Cristina Videira Lopes

Exercises in Programming Style by Cristina Videira Lopes
My rating: 5 of 5 stars

An easily consumed, enjoyable read, and excellent review of the history of programming style, from older days of constrained memory and monolithic styles, through pipelining and object-oriented variants, to more recent patterns like model-view-controller (MVC), mapreduce, and representational state transfer (ReST). Along the way, each variant is described, along with its constraints, its history, and its context in systems design.

View all my reviews

Data Mining for Fund Raisers: How to Use Simple Statistics to Find the Gold in Your Donor Database Even If You Hate Statistics: A Starter Guide

Image
This is a repost of a Goodreads' review I did a little over 4.5 years ago, for a book I read twelve (12) years ago, which seemed relevant, as the industry seems to be picking up a data-driven focus. Plus, the world is now being transformed by advances in machine learning, particulary deep learning, and the large data sets and complexity of donor actions should greatly benefit from analysis.

Data Mining for Fund Raisers: How to Use Simple Statistics to Find the Gold in Your Donor Database Even If You Hate Statistics: A Starter Guide by Peter B. Wylie

My rating: 4 of 5 stars

My spouse, at times a development researcher of high-net worth individuals, was given this book because she was the 'numbers' person in the office. Since my undergraduate was focused on lab-design, including analysis of results using statistics, I was intrigued and decided to read it. Considering my background, I found some of the material obvious, while other aspects were good refreshers on thinking in …

Value-at-Risk (VaR) Calculator Class in Python

Image
As part of my self-development, I wanted to rework a script, which are typically one-offs, and turn it into a reusable component, although there are existing packages for VaR. As such, this is currently a work in progress. This code is a Python-based class for VaR calculations, and for those unfamiliar with VaR, it is an acronym for value at risk, the worst case loss in a period for a particular probability. It is a reworking of prior work with scripted VaR calculations, implementing various high-level good practices, e.g., hiding/encapsulation, do-not-repeat-yourself (DRY), dependency injection, etc.

Features:
Requires data frame of stock returns, factor returns, and stock weightsExpose a method to calculate and return a single VaR number for different variance typesExpose a method to calculate and return an array of VaR values by confidence levelExpose a method to calculate and plot an array of VaR values by confidence level Still to do:
Dynamic factor usage Note: Data to valida…

Calculating Value at Risk (VaR) with Python or R

Image
The following modules linked below are based on a Pluralsight course, Understanding and Applying Financial Risk Modeling Techniques, and while the code itself is nearly verbatim, this is mostly for my own development, working through the peculiarities of Value at Risk (VaR) in both R and Python, and adding commentary as needed.

The general outline of this process is as follows:
Load and clean Data Calculate returns Calculate historical variance Calculate systemic, idiosyncratic, and total variance Develop a range of stress variants, e.g. scenario-based possibilities Calculate VaR as the worst case loss in a period for a particular probability The modules:
In R: Financial Risk - Calculating Value At Risk (VaR) with R In Python: Financial Risk - Calculating Value At Risk (VaR) with Python

Review: Make Your Own Neural Network

Image
As part of understanding neural networks I was reading Make Your Own Neural Network by Tariq Rashid. A review is below:

Make Your Own Neural Network by Tariq Rashid
My rating: 4 of 5 stars

The book itself can be painful to work through, as it is written for a novice, not just in algorithms and data analysis, but also in programming. For the neural network aspect, it jumped between overly simplistic and complicated, while providing neither in enough detail. That said, by the end I found it a worthwhile dive into neural networks, since once it got to the programming structure, it all made sense, but only because I stuck with it.

View all my reviews

Basic Three Layer Neural Network in Python

Image
Introduction As part of understanding neural networks I was reading Make Your Own Neural Network by Tariq Rashid. The book itself can be painful to work through, as it is written for a novice, not just in algorithms and data analysis, but also in programming. Although the code is a verbatim transcription from the text (see Source section), I published it to better understand how neural networks are designed, made easy by the use of a Jupyter Notebook, not to present this as my own work, although I do hope that this helps others develop their talents with data analytics.

Overview The code itself develops as follows:
Constructor set number of nodes in each input, hidden, output layer link weight matrices, wih and who weights inside the arrays are w_i_j, where link is from node i to node j in the next layer set learning rate activation function is the sigmoid function Define the Training Function convert inputs list to 2d array calculate signals into hidden layer calculate…

Clustering: Hierarchical and K-Means in R on Hofstede Cultural Patterns

Image
Overview What follows is an exploration of clustering, via hierarchies and K-Means, using the Hofstede patterns data, available from my Public folder.

For a deeper understanding of clustering and the various related techniques I suggest the following: Cluster analysis (Wikipedia)An Introduction to Clustering and different methods of clustering Load Data # load data Hofstede.df.preclean <- read.csv("HofstedePatterns.csv", na.strings = c("", "NA")) #nrow(Hofstede.df.preclean) # remove NULLs Hofstede.df.preclean <- na.omit(Hofstede.df.preclean) #nrow(Hofstede.df.preclean) Hofstede.df <- Hofstede.df.preclean Hierarchical Clustering Run hclust, Generate Dendrogram The first attempt is the simplest analysis using the dist() and hclust() functions to generate a hierarchy of grouped data. The cluster size is derived from a reading of the dendrogram, although there are automated ways of selecting the cluster number, shown …

F# is Part of Microsoft's Data Science Workloads

Image
I have not worked in F# for over two (2) years, but am enthused that Microsoft has added it to it languages for Data Science Workloads, along with R and Python. To that end, I hope to repost some of my existing F# code, as well as explore Data Science Workloads utilizing all three languages. Prior work in F# is available from learning F#, and some solutions will be republished on this site.
Data Science Workloads Build Intelligent Apps Faster with Visual Studio and the Data Science Workload Published Work in F# James Igoe MS Azure Notebooks that utilizes MS's implementation of Jupyter Notesbooks. Mathematical Library, a basic mathematical NuGet package, with the source hosted on GitHub. Basic Statistical Functions: Very basic F# class for performing standard deviation and variance calculations. Various Number Functions: A collection of basic mathematical functions written in F# as part of my learning via Project Euler, functions for creating arrays or calculating values in vari…

Comparing Performance in R Using Microbenchmark

Image
This post is a very simple display of how to use microbenchmark in R. Other sites might have longer and more detailed posts, but this post is primarily to 'spread the word' about this useful function, and show how to plot it. An alternative version of this post exists in Microsoft's Azure Notebooks, as Performance Testing Results with Microbenchmark
Load Libraries Memoise as part of the code to test, microbenchmark to show usage, and ggplot2 to plot the result.
library(memoise) library(microbenchmark) library(ggplot2) Create Functions Generate several functions with varied performance times, a base function plus functions that leverage vectorization and memoisation.
# base function monte_carlo = function(N) { hits = 0 for (i in seq_len(N)) { u1 = runif(1) u2 = runif(1) if (u1 ^ 2 > u2) hits = hits + 1 } return(hits / N) } # memoise test function monte_carlo_memo <- memoise(monte_carl…