Skip to main content

Linear Regression with R, on IQ for Gini and Linguistic Diversity

Reusing the code posted for Correlations within with Hofstede's Cultural Values, Diversity, GINI, and IQ, the same data can be used for linear regression:

Example Code
 
 # LM - Linear Regression  
 
 # Load the data into a matrix  
 oecdData <- read.table("OECD - Quality of Life.csv", header = TRUE, sep = ",")  
 
 # Access the vectors  
 v1 <- oecdData$IQ  
 v2 <- oecdData$HofstederPowerDx  
 v3 <- oecdData$HofstederMasculinity  
 v4 <- oecdData$HofstederIndividuality  
 v5 <- oecdData$HofstederUncertaintyAvoidance  
 v6 <- oecdData$Diversity_Ethnic  
 v7 <- oecdData$Diversity_Linguistic  
 v8 <- oecdData$Diversity_Religious  
 v9 <- oecdData$Gini  
 
 # IQ ~ Gini  
 relation1 <- lm(v1 ~ v9)  
 print(relation1)  
 print(summary(relation1))  

 # IQ ~ Diversity_Linguistic  
 relation2 <- lm(v1 ~ v7)  
 print(relation2)  
 print(summary(relation2))  

Example Results
 > # Access the vectors   
 + v1 <- oecdData$IQ  
 + v7 <- oecdData$Diversity_Linguistic  
 + v9 <- oecdData$Gini  
 +   
 + # IQ ~ Gini   
 + relation1 <- lm(v1 ~ v9)  
 + print(relation1)  
 + print(summary(relation1))  
 +   
 + # IQ ~ Diversity_Linguistic   
 + relation2 <- lm(v1 ~ v7)  
 + print(relation2)  
 + print(summary(relation2))  
 Call:  

 lm(formula = v1 ~ v9)  
 Coefficients:  
 (Intercept)      v9   
   107.1884   -0.2487   

 Call:  

 lm(formula = v1 ~ v9)  

 Residuals:  
   Min   1Q       Median   3Q   Max   
 -6.3842 -2.4489 -0.0381 1.9954 6.6707   

 Coefficients:  
          Estimate Std. Error  t value Pr(>|t|)    
 (Intercept) 107.1884   4.7379 22.624  1.01e-15 ***  
 v9          -0.2487    0.1472 -1.689  0.107    
 ---  
 Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
  
 Residual standard error: 3.27 on 20 degrees of freedom  
  (3 observations deleted due to missingness)  
 Multiple R-squared: 0.1248,     Adjusted R-squared: 0.08109   
 F-statistic: 2.853 on 1 and 20 DF, p-value: 0.1067  

 Call:  

 lm(formula = v1 ~ v7)  

 Coefficients:  
 (Intercept)      v7   
   99.0895    0.8328   

 Call:  
 lm(formula = v1 ~ v7)  

 Residuals:  
   Min   1Q      Median   3Q   Max   
 -7.1145 -1.5080 0.0149 2.3003 6.9105
   
 Coefficients:  
          Estimate Std. Error t value Pr(>|t|)    
 (Intercept) 99.0895   1.1035 89.793  <2e-16 ***  
 v7           0.8328   3.7034  0.225  0.824    
 ---  
 Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1  
 
 Residual standard error: 3.491 on 20 degrees of freedom  
  (3 observations deleted due to missingness)  
 Multiple R-squared: 0.002522,     Adjusted R-squared: -0.04735   
 F-statistic: 0.05056 on 1 and 20 DF, p-value: 0.8244  

Example Plot



Sample Data

Popular posts from this blog

Decision Tree in R, with Graphs: Predicting State Politics from Big Five Traits

This was a continuation of prior explorations, logistic regression predicting Red/Blue state dichotomy by income or by personality. This uses the same five personality dimensions, but instead builds a decision tree. Of the Big Five traits, only two were found to useful in the decision tree, conscientiousness and openness.

Links to sample data, as well as to source references, are at the end of this entry.

Example Code

# Decision Tree - Big Five and Politics library("rpart") # grow tree input.dat <- read.table("BigFiveScoresByState.csv", header = TRUE, sep = ",") fit <- rpart(Liberal ~ Openness + Conscientiousness + Neuroticism + Extraversion + Agreeableness, data = input.dat, method="poisson") # display the results printcp(fit) # visualize cross-validation results plotcp(fit) # detailed summary of splits summary(fit) # plot tree plot(fit, uniform = TRUE, main = "Classific…

Chi-Square in R on by State Politics (Red/Blue) and Income (Higher/Lower)

This is a significant result, but instead of a logistic regression looking at the income average per state and the likelihood of being a Democratic state, it uses Chi-Square. Interpreting this is pretty straightforward, in that liberal states typically have cities and people that earn more money. When using adjusted incomes, by cost of living, this difference disappears.

Example Code
# R - Chi Square rm(list = ls()) stateData <- read.table("CostByStateAndSalary.csv", header = TRUE, sep = ",") # Create vectors affluence.median <- median(stateData$Y2014, na.rm = TRUE) affluence.v <- ifelse(stateData$Y2014 > affluence.median, 1, 0) liberal.v <- stateData$Liberal # Solve pol.Data = table(liberal.v, affluence.v) result <- chisq.test(pol.Data) print(result) print(pol.Data)
Example Results
Pearson's Chi-squared test with Yates' continuity correction data: pol.Data X-squared = 12.672, df …

Mean Median, and Mode with R, using Country-level IQ Estimates

Reusing the code posted for Correlations within with Hofstede's Cultural Values, Diversity, GINI, and IQ, the same data can be used for mean, median, and mode. Additionally, the summary function will return values in addition to mean and median, Min, Max, and quartile values:

Example Code
oecdData <- read.table("OECD - Quality of Life.csv", header = TRUE, sep = ",") v1 <- oecdData$IQ # Mean with na.rm = TRUE removed NULL avalues mean(v1, na.rm = TRUE) # Median with na.rm = TRUE removed NULL values median(v1, na.rm = TRUE) # Returns the same data as mean and median, but also includes distribution values: # Min, Quartiles, and Max summary(v1) # Mode does not exist in R, so we need to create a function getmode <- function(v) { uniqv <- unique(v) uniqv[which.max(tabulate(match(v, uniqv)))] } #returns the mode getmode(v1)
Example Results
> oecdData <- read.table("OECD - Quality of L…