Skip to main content


Review: Text Analysis with R for Students of Literature (Quantitative Methods in the Humanities and Social Sciences)

Text Analysis with R for Students of Literature by Matthew L. Jockers
My rating: 5 of 5 stars

Engaging writing, with code samples and practices. As for programming, I thought the code quality was somewhat low or sloppy, but Jockers is not a software developer by trade. While reading, I did have a few ideas to solve some text-matching issues across systems, and generally, I found the lack of discipline in the author's approach conducive to flexible thinking about using techniques with R.

View all my reviews
Recent posts

Patents Per Capita and Hofstede's Cultural Dimensions

Thinking about social dimensions and innovation, it occurred to me that there might be relationship with masculinity, but then quickly dismissed it, considering it much more likely to be predicated on science/math education. Even then, other cultural elements might be more likely correlated. What follows is an exploration of various correlations with patents per capita.

Although Hofstede's Cultural Dimensions did have significant correlation with patents per capita, somewhat surprisingly, PISA scores by country, education, nor average IQ, had a strong relationship with patent production, although if Asia was included, statistically it would.

I often exclude Asia from analyses, as the initial driver of this work was looking at cultures that are similar, to tease out social effects. That is also why I ignore looking at all countries, as some relationships across the entire world disappear when limited to just developed economies. As an example, the value of work and its b…

Hofstede's Long-term Orientation and Individuality: Obesity Relationships (using R)

Hofstede extended his original four dimensions, adding measures Long-Term Orientation (LTO) and Indulgence (Ind) in response to other researchers studies. While reading Hofstede's Cultures and Organizations: Software of the Mind, Third Edition I was struck by the lackluster reporting of the correlation between obesity and indulgence. It seemed obvious one would delve a bit further, maybe looking at a compound relationship between both indulgence and LTO, e.g., does short-sightedness and indulgence lead to obesity. Although I limit my analysis to OECD countries, that is what I present here.

An explanation of dimensions can be found on Hofstede's site.

Hofstede's Dimensions and Obesity

A first step would be to see what relationships exist between obesity and the dimensions:

1: # LM - Multiple Regression - New Hofstede, LTO and Ind 2: # Load the data into a matrix 3: rm(list = ls()) 4: setwd("../Data") 5: oecdData <- read.table("OECD - Quality…

Do Olympians Get Too Much Exercise? - The New York Times

This article in the NY Times is to some degree an example of the problem of survivor bias, or generalizing from the few successful individuals to all individuals Do Olympians Get Too Much Exercise?

My comment: One obvious problem with this type of study, is that athletes that might have developed issues might no longer engage in, and might have previously given up, said activities. If you only look at the winners, and not the losers, one gets an unfair picture of the effect or traits on all participants. Yes, you know that the winners did not have problems, but what about those that gave the activity up, or died?

Fifty States of Anxiety - The New York Times

An article, Fifty States of Anxiety in the NY Times annoyed me a bit, and it highlighted what happens when people do not look at all the relevant variables. It also seems that the author seems to have expressed his opinion in a way that was not supported by the data. The fact that his analysis did not match his data indicates his bias.

My comment:
As I mentioned in another post, there are regional differences in personality, and if you only know anxiety and red-state blue dichotomies the map makes no sense. What you need are 3 dimensions of the Big Five personality inventory, openness to experience, neuroticism, and conscientiousness. Also, you need to know that the strongest predictor of liberal leaning is openness, while there is also a slight correlation with conscientiousness and conservatism. Neuroticism has no relationship with political orientation.
When one realizes that the strong openness areas are the Pacific and NorthEast regions, and the strong conscientiousness region is …

PLOS Computational Biology: Ten Simple Rules for Effective Statistical Practice

A interesting article by six statisticians,Ten Simple Rules for Effective Statistical Practice. Their aim:

To this point, Meng notes "sound statistical practices require a bit of science, engineering, and arts, and hence some general guidelines for helping practitioners to develop statistical insights and acumen are in order. No rules, simple or not, can be 100% applicable or foolproof, but that's the very essence that I find this is a useful exercise. It reminds practitioners that good statistical practices require far more than running software or an algorithm."
The 10 rules are:

Statistical Methods Should Enable Data to Answer Scientific QuestionsSignals Always Come with NoisePlan Ahead, Really AheadWorry about Data QualityStatistical Analysis Is More Than a Set of ComputationsKeep it SimpleProvide Assessments of VariabilityCheck Your AssumptionsWhen Possible, Replicate!Make Your Analysis Reproducible

Inequality and Religiosity: The Gini ~ Religion Matters Vector, with Correlations and Plot

Responses to a post on the correlation between country-average IQ and responding yes to a question on if religion matters are inversely correlated, but not strongly so, prompted me to dig up a more significant issue, the relationship between religiosity and inequality, as measured by the Gini coefficient.

The correlation is quite high, at about .7, although this really says nothing about the cause, if religious countries tend toward inequality because of general tendencies, or if inequality drives people to religion, as a salve against suffering. In truth, they could both be reflective of some other aspect of a country, and not in any way causative.

Example Code

# Correlations on ReligionMatters and Gini Coeficients oecdData <- read.table("OECD - Quality of Life.csv", header = TRUE, sep = ",") #names(oecdData) religionMattersVector <- oecdData$ReligionMatters giniVector <- oecdData$Gini cor.test(giniVector, religionMattersVector)…