Labor and Industrial Relations 493:
Quantitative Methods in Labor and Industrial Relations
Professor Wallace Hendricks

correlation & Regression

Examples of different values for linear correlation s: (a) shows a good positive relation, approximately +0.90; (b) shows a relatively poor negative correlation , approximately -0.40; (c) shows a perfect negative correlation , ­1.00; (d) shows no linear trend, 0.00. a.
b.
c. d.  
     

Pearson Correlation Coefficient

(AKA: Product Moment Correlation)

covariance of X and Y divided by the product of their standard deviation s

where
Simple Regression

independent variable: X

Dependent variable: Y

Functional form: Y = f(X)

Predicting Y from X

when Y & X are independent:

linear relationship:

Which line is predicted?

assumptions about

homoskedasticity vs. heteroskedasticity

normal distribution

E (I) = 0

uncorrelated errors

minimizing the squared error

=
= variation in Y
  =
  =
  =
  = unexplained SS + explained SS
   
=
  =
  =
  =
  =
=
=

=% total variation explained by regression

   

Confidence Bands

AKA: Forecast Intervals

Similar to C.I. for the mean , but now for the regression line. Calculated as

Example 1: Correlations between Core Job Dimensions and Days Absent (LIR 493Q data set)

Correlations: According to the theory presented in the Hackman-Oldham model of job characteristics, job characteristics are associated with employee reactions such as more or less job satisfaction or more or fewer days absent. In this theory, job autonomy, feedback on the job, the variety of skills on the job and the ability to identify complete tasks all lead to good employee reactions. How should we indicate the null and alternative hypotheses when investigating the relationship between these characteristics and days absent? The SPSS printout below gives results from the data set LIR 493Q, the Quality of Work Life data. SPSS first provides the correlation coefficients and then provides the two-tailed significance of the coefficients. Finally, the pair wise N that was the basis for each simple (Pearson) correlation coefficient is provided.


AUTONOMY

FEEDBACK

SKILL VARIETY

Days absent during the past year

TASK IDENTITY

Pearson Correlation

AUTONOMY

1.000

.146(**)

.517(**)

-.194(**)

.253(**)

FEEDBACK

.146(**)

1.000

.032

-.021

.140(**)

SKILL VARIETY

.517(**)

.032

1.000

-.170(**)

.234(**)

Days absent during the past year

-.194(**)

-.021

-.170(**)

1.000

-.097(**)

TASK IDENTITY

.253(**)

.140(**)

.234(**)

-.097(**)

1.000

Sig. (2-tailed)

AUTONOMY

.

.000

.000

.000

.000

FEEDBACK

.000

.

.264

.479

.000

SKILL VARIETY

.000

.264

.

.000

.000

Days absent during the past year

.000

.479

.000

.

.001

TASK IDENTITY

.000

.000

.000

.001

.

N

AUTONOMY

1233

1226

1222

1069

1199

FEEDBACK

1226

1273

1253

1098

1223

SKILL VARIETY

1222

1253

1270

1098

1217

Days absent during the past year

1069

1098

1098

1098

1076

TASK IDENTITY

1199

1223

1217

1076

1228

** Correlation is significant at the 0.01 level (2-tailed).

Significance of Correlation Coefficients:

The significance of a correlation coefficient can be determined by using the coefficient along with the sample size to form a F-statistic or a t-statistic. First, we must remember that the square of the simple correlation coefficient, r2, is equal to the percentage of the variation in the dependent variable that is explained by the independent variable. This is just the explained (or between) sum of squares divided by the total sum of squares:

If we multiply the total sum of squares by (1-r2), we will get the unexplained (or within) sum of squares. To form the F-statistic to test the relationship of the explained variance to the unexplained variance, we need

In the case of a simple correlation, k=2 (this is actually the constant term and the slope of the simple regression). Therefore, the degrees of freedom are 1 and n-2. We simply take the correlation r and square it. We then take its ratio to (1-r2) and multiply the result by n-2. In our example above, the correlation between skill variety and days absent is -.17 with a sample size of 1098. Therefore the F-statistic is (-.172/(1-.172))*1096=32.643. In the special case where the F-statistic has 1 and m degrees of freedom, the t-statistic for the same test is equal to the square root of the F-statistic, and it has m degrees of freedom. Therefore, the t-statistic is the square root of 32.643, which equals 5.713. This has 1096 degrees of freedom.

 

Simple Regression Results for Days Absent and Skill Variety

Descriptive Statistics


Mean

Std. Deviation

N

Days absent during the past year

14.3415

12.5584

1098

SKILL VARIETY

3.36920

.91487

1098


Model Summary


R

R Square

Adjusted R Square

Std. Error of the Estimate

Model

1

.170(1)

.029

.028

12.3811

1 Predictors: (Constant), SKILL VARIETY

The R Square shows that about three percent of the variation in days absent is explained by the amount of skill variety on the job. We will discuss the Adjusted R Square when we talk about multiple regression. The Std. Error of the Estimate is the estimate of the forecast error at the mean of the independent variable (skill variety).


ANOVA(2)


Sum of Squares

df

Mean Square

F

Sig.

Model

1

Regression

5003.991

1

5003.991

32.643

.000(1)

Residual

168008.935

1096

153.293



Total

173012.926

1097




1 Predictors: (Constant), SKILL VARIETY
2 Dependent Variable: Days absent during the past year

 

The ANOVA for the simple regression gives the same results as the test of the simple correlation coefficient. The F-statistic gives the test of the null hypothesis that the R2 is zero. This is identical to the test that r is zero. The mean square for the regression is the sum of squares for the regression divided by degrees of freedom. The sum of squares for the regression (or between or explained sum of squares) is r2*total sum of squares. The residual sum of squares (or within or unexplained sum of squares) is (1-r2)*total sum of squares. The degrees of freedom for the total is n-1 and the degrees of freedom for the residual is n-2 (for the constant and slope). The degrees of freedom for regression and residual add up to the total (and the sum of squares for the regression and residual add to the total).


Coefficients(1)


Unstandardized Coefficients

Standardized Coefficients

t

Sig.

95% Confidence Interval for B

Correlations

B

Std. Error

Beta

Lower Bound

Upper Bound

Zero-order

Partial

Part

Model

1

(Constant)

22.207

1.426


15.568

.000

19.408

25.006




SKILL VARIETY

-2.335

.409

-.170

-5.713

.000

-3.136

-1.533

-.170

-.170

-.170

1 Dependent Variable: Days absent during the past year

 

The estimated regression coefficients are in the column marked B. They are sometimes referred to as Unstandardized Coefficients. For each one-unit increase in skill variety, days absent are estimated to decrease by 2.335 days per year. The negative sign is consistent with our alternative hypothesis that improvements in skill variety lead to less absence. The error associated with the estimated coefficients is given in the column Std. Error (standard error ). The t column is the t-test for the null hypothesis that B=0, which is simply the estimated coefficient divided by its estimated standard error. The Sig column is the significance of the t-test assuming a 2 sided alternative hypothesis. We will cover the meaning of Standardized Coefficients when we talk about multiple regression. Zero-order correlations are just simple correlations. We will discuss partial correlations when we discuss multiple regression. The forecast equation for days absent is

Days absent = 22.207 (the constant term) -2.335(SKILL VARIETY). The error associated with this forecast is equal to t times the standard error of the estimate (12.3881) when skill variety is at its mean . The error is larger when skill variety is less than or greater than its mean value. The confidence band that shows these forecast errors for various values of skill variety is show below.

Computing the Confidence Band

ChartObject Confidence Band for Mean Forecast of Days Absent

Note that all regression lines must go through the mean s of the variables. It is not possible to explain any variation in the dependent variable when the independent variable is at its mean . Therefore the forecasted line must go through the point of the pair of mean s for days absent and skill variety.


Confidence Band for Job Satisfaction


for Levels of Perception of Management Action