| Labor and Industrial Relations 493: Quantitative Methods in Labor and Industrial Relations Professor Wallace Hendricks |
![]() |
| Examples of different
values for linear correlation |
a. ![]() |
|
b.![]() |
||
c.![]() |
d.![]() |
|
(AKA: Product Moment Correlation)
covariance
of X and Y divided by the product of their standard deviation
s
where
Simple Regression
independent
variable: X
Dependent variable: Y
Functional form: Y = f(X)
Predicting Y from X
when Y & X are independent
:
linear relationship:
Which line is predicted?
assumptions about
homoskedasticity vs. heteroskedasticity
normal
distribution
E (I)
= 0
uncorrelated errors
minimizing the squared error
| = | ||||||
| = | variation in Y | |||||
| = | ||||||
| = | ||||||
| = | ||||||
| = | unexplained SS + explained SS | |||||
| = | ||||||
| = | ||||||
| = | ||||||
| = | ||||||
| = | ||||||
| = | ||||||
| = | ||||||
=% total variation explained by regression |
||||||
AKA: Forecast Intervals
Similar to C.I.
for the mean
, but now for the regression line.
Calculated as
Example 1: Correlations between Core Job Dimensions and Days Absent (LIR 493Q data set)
Correlations: According to the theory
presented in the Hackman-Oldham model of job characteristics, job
characteristics are associated with employee reactions such as
more or less job satisfaction or more or fewer days absent. In
this theory, job autonomy, feedback on the job, the variety of
skills on the job and the ability to identify complete tasks all
lead to good employee reactions. How should we indicate the null
and alternative hypotheses when investigating the relationship
between these characteristics and days absent? The SPSS printout
below gives results from the data set LIR 493Q, the Quality of
Work Life data. SPSS first provides the correlation
coefficients
and then provides the two-tailed significance of the
coefficients. Finally, the pair wise N that was the basis for
each simple (Pearson) correlation coefficient is provided.
|
AUTONOMY |
FEEDBACK |
SKILL VARIETY |
Days absent during the past year |
TASK IDENTITY |
|
Pearson Correlation |
AUTONOMY |
1.000 |
.146(**) |
.517(**) |
-.194(**) |
.253(**) |
FEEDBACK |
.146(**) |
1.000 |
.032 |
-.021 |
.140(**) |
|
SKILL VARIETY |
.517(**) |
.032 |
1.000 |
-.170(**) |
.234(**) |
|
Days absent during the past year |
-.194(**) |
-.021 |
-.170(**) |
1.000 |
-.097(**) |
|
TASK IDENTITY |
.253(**) |
.140(**) |
.234(**) |
-.097(**) |
1.000 |
|
Sig. (2-tailed) |
AUTONOMY |
. |
.000 |
.000 |
.000 |
.000 |
FEEDBACK |
.000 |
. |
.264 |
.479 |
.000 |
|
SKILL VARIETY |
.000 |
.264 |
. |
.000 |
.000 |
|
Days absent during the past year |
.000 |
.479 |
.000 |
. |
.001 |
|
TASK IDENTITY |
.000 |
.000 |
.000 |
.001 |
. |
|
N |
AUTONOMY |
1233 |
1226 |
1222 |
1069 |
1199 |
FEEDBACK |
1226 |
1273 |
1253 |
1098 |
1223 |
|
SKILL VARIETY |
1222 |
1253 |
1270 |
1098 |
1217 |
|
Days absent during the past year |
1069 |
1098 |
1098 |
1098 |
1076 |
|
TASK IDENTITY |
1199 |
1223 |
1217 |
1076 |
1228 |
|
| ** Correlation is significant at the 0.01 level (2-tailed). | ||||||
Significance of Correlation Coefficients:
The significance
of a correlation
coefficient can
be determined by using the coefficient along with the sample size
to form a F-statistic or a t-statistic. First, we must remember
that the square of the simple correlation
coefficient, r2,
is equal to the percentage of the variation in the dependent
variable that is explained by the independent
variable. This is
just the explained (or between) sum of squares divided by the
total sum of squares:

If we multiply the total sum of squares by (1-r2), we will get the unexplained (or within) sum of squares. To form the F-statistic to test the relationship of the explained variance to the unexplained variance, we need

In the case of a simple correlation
, k=2 (this is
actually the constant term and the slope of the simple
regression). Therefore, the degrees of freedom are 1 and n-2. We
simply take the correlation
r and square it. We then take its
ratio to (1-r2) and multiply the result by n-2. In our
example above, the correlation
between skill variety and days
absent is -.17 with a sample size of 1098. Therefore the
F-statistic is (-.172/(1-.172))*1096=32.643.
In the special case where the F-statistic has 1 and m degrees of
freedom, the t-statistic for the same test is equal to the square
root of the F-statistic, and it has m degrees of freedom.
Therefore, the t-statistic is the square root of 32.643, which
equals 5.713. This has 1096 degrees of freedom.
Simple Regression Results for Days Absent and Skill Variety
Descriptive Statistics
|
Mean |
Std. Deviation |
N |
Days absent during the past year |
14.3415 |
12.5584 |
1098 |
SKILL VARIETY |
3.36920 |
.91487 |
1098 |
Model Summary
|
R |
R Square |
Adjusted R Square |
Std. Error of the Estimate |
|
Model |
1 |
.170(1) |
.029 |
.028 |
12.3811 |
| 1 Predictors: (Constant), SKILL VARIETY | |||||
The R Square shows that about three percent of
the variation in days absent is explained by the amount of skill
variety on the job. We will discuss the Adjusted R Square when we
talk about multiple regression. The Std. Error of the Estimate is
the estimate of the forecast error at the mean of the independent
variable (skill variety).
ANOVA(2)
|
Sum of Squares |
df |
Mean Square |
F |
Sig. |
||
Model |
1 |
Regression |
5003.991 |
1 |
5003.991 |
32.643 |
.000(1) |
Residual |
168008.935 |
1096 |
153.293 |
||||
Total |
173012.926 |
1097 |
|||||
| 1 Predictors: (Constant), SKILL VARIETY | |||||||
| 2 Dependent Variable: Days absent during the past year | |||||||
The ANOVA for the simple regression gives the
same results as the test of the simple correlation
coefficient.
The F-statistic gives the test of the null hypothesis
that the R2
is zero. This is identical to the test that r is zero. The mean
square for the regression is the sum of squares for the
regression divided by degrees of freedom. The sum of squares for
the regression (or between or explained sum of squares) is r2*total
sum of squares. The residual sum of squares (or within or
unexplained sum of squares) is (1-r2)*total sum of
squares. The degrees of freedom for the total is n-1 and the
degrees of freedom for the residual is n-2 (for the constant and
slope). The degrees of freedom for regression and residual add up
to the total (and the sum of squares for the regression and
residual add to the total).
Coefficients(1)
|
Unstandardized Coefficients |
Standardized Coefficients |
t |
Sig. |
95% Confidence Interval for B |
Correlations |
||||||
B |
Std. Error |
Beta |
Lower Bound |
Upper Bound |
Zero-order |
Partial |
Part |
|||||
Model |
1 |
(Constant) |
22.207 |
1.426 |
15.568 |
.000 |
19.408 |
25.006 |
||||
SKILL VARIETY |
-2.335 |
.409 |
-.170 |
-5.713 |
.000 |
-3.136 |
-1.533 |
-.170 |
-.170 |
-.170 |
||
| 1 Dependent Variable: Days absent during the past year | ||||||||||||
The estimated regression coefficients are in the
column marked B. They are sometimes referred to as Unstandardized
Coefficients. For each one-unit increase in skill variety, days
absent are estimated to decrease by 2.335 days per year. The
negative sign is consistent with our alternative hypothesis that
improvements in skill variety lead to less absence. The error
associated with the estimated coefficients is given in the column
Std. Error (standard error
). The t column is the t-test for the
null hypothesis
that B=0, which is simply the estimated
coefficient divided by its estimated standard error
. The Sig
column is the significance
of the t-test assuming a 2 sided
alternative hypothesis. We will cover the meaning of Standardized
Coefficients when we talk about multiple regression. Zero-order
correlations are just simple correlation
s. We will discuss
partial correlations when we discuss multiple regression. The
forecast equation for days absent is
Days absent = 22.207 (the constant term)
-2.335(SKILL VARIETY). The error associated with this forecast is
equal to t times the standard error of the estimate (12.3881)
when skill variety is at its mean
. The error is larger when skill
variety is less than or greater than its mean
value. The
confidence band that shows these forecast errors for various
values of skill variety is show below.
Computing the Confidence Band


Note that all regression lines must go through
the mean
s of the variables. It is not possible to explain any
variation in the dependent variable when the independent
variable
is at its mean
. Therefore the forecasted line must go through the
point of the pair of mean
s for days absent and skill variety.
Confidence Band for Job Satisfaction
for Levels of Perception of Management Action