20140210

Probing simple slopes with polytomous predictors (i.e., categorical with more than two levels)

Simple slopes analysis with continuous and dichotomous predictors is straight forward once you have acquainted yourself with the 'trick' that Aiken & West (1991) describe in detail (i.e., 'shifting' a predictor A such that 0 is where you want the simple slope of B, recalculate all interaction terms with the shifted variable, run the regression analysis with the shifted predictors and interaction terms and read off the slope of predictor B, see this earlier post on probing simple slopes). But what if you have a polytomous predictor, say 'type of employment' which could take four qualitatively different levels:
  • private industry
  • government
  • NGO
  • none
?

Dummy variables

The procedure is simply an extension of the logic for continuous and dichotomous predictors, but it involves the use of dummy coded variables, or rather 'packages' of dummy coded variables. This is because in order to deal with a polytomous predictor with k levels, k-1 dummy variables are needed to represent and model the effect of the polytomous variable. For example, the variable 'type of employment' referred to above would be represented in three dummy variables (k is 4, thus k-1=3 dummy variables). A dummy variable takes the value of 1 for one of the levels each and zero for all other levels:
Original variableDummy Variable 1Dummy Variable 2Dummy Variable 3
1 (private industry)100
2 (government)010
3 (NGO)001
4 (unemployed)000
The category that will get a value of 0 on all dummy variables is the so-called reference category. It is special in that it will not get a dummy variable that assigns a 1 if an observation is in that category, but it is simple the category that is left: In an exhaustive set of k levels for a polytomous variable, once you know that a case is not in any of k-1 categories, it must be in the last, remaining category. That's neat because it saves you one dummy when you accurately and completely describe you observations with regard to the polytomous variable.

Testing the omnibus effect of the polytomous predictor

These dummy variables are generally correlated to a substantial degree. You can see this if you try to guess which value an observation has on a different dummy variable if you know already which value it has on a different dummy variable. Look at 4 fictional observations:
Obs#type of employmentDummy Variable 1Dummy Variable 2Dummy Variable 3
11100
24000
33001
42010
In observation #1, once you know the value on Dummy variable 1 (i.e., DummyVar1 = 1), you know the values on all other Dummy variables - and you know this because of the mathematical structure of the dummies, not because of some empirical association. But also for the remaining three observations with DummyVar1 = 0, you can guess relatively well which value they will have on DummyVar2: In 2/3 of all cases with DummyVar1 = 0, DummyVar2 = 0, and in only 1/3 of the cases with DummyVar1 = 0 is the value of DummyVar2 = 1. So there is substantial overlap of DummyVar1 and DummyVar2 and the argument can be extended to all pairs of dummy variables. So, if all three dummy variables are entered into a regression model, they will 'take away' predictive power from each other. The regression coefficients you would obtain for the individual dummy variables represent the difference between the category that is assigned 1 by the individually considered dummy variable and the category that is assigned 0 by all dummy variables. So it only compares two categories, but the initial endeavor in the regression analysis was to assess the omnibus effect of the original polytomous variable. That omnibus effect does not represent the difference between two of the categories, but all differences between all categories. Therefore, the dummy variables obtained from the polytomous predictor need to be considered simultaneously, all at once. Such a 'package' of dummy variable predictors can be treated like one variable, but you will not obtain a regression coefficient for an effect. Rather you will receive the amount of change in explained variance for the whole package, R2change, and an associated F value with k-1 numerator degrees of freedom, Fchange. Fchange can be tested against zero and if the test is significant, the variance that is captured by the package of dummy variables is so far off zero that you are willing to bet money on it actually not being the result of random fluctuations.
In SPSS syntax, this all works out pretty neatly. First recode the original variable typeOfEmployment into the three dummies D1, D2, D3:
RECODE typeOfEmployment (1=1)(2,3,4=0)(ELSE=SYSMIS) INTO D1. RECODE typeOfEmployment (2=1)(1,3,4=0)(ELSE=SYSMIS) INTO D2. RECODE typeOfEmployment (3=1)(1,2,4=0)(ELSE=SYSMIS) INTO D3. EXEC.
For the sake of the interaction discussion below, let's say we also have a continuous predictor Cont that has already been mean-centered into cCont and a dependent variable Crit. So we need the interaction term for cCont × typeOfEmployment. This interaction term will also be a 'package' of predictors with all possible products from the dummy variables and the the other predictor cCont:
COMPUTE IA_cCont_D1 = cCont*D1. COMPUTE IA_cCont_D2 = cCont*D2. COMPUTE IA_cCont_D3 = cCont*D3. EXEC.
Now the complete regression model would be estimated by:
REGRESSION /STAT=COEF R ANOVA CHANGE /DEPENDENT=Crit /METHOD=ENTER cCont D1 D2 D3 IA_cCont_D1 IA_cCont_D2 IA_cCont_D3.
I will call this model Model 0 below. The coefficients estimated in this model are:
A comma is the decimal separator in this table screenshot in lieu of a dot.
As mentioned above, however, the regression coefficients if D1, D2, D3 are differences between pairs of categories of typeOfEmployment, but we want to know the one effect of this predictor on Crit. Therefore, we structure the predictors in the regression model so that the unique prediction of the dummy variables is assessed as a whole:
REGRESSION /STAT=COEF R ANOVA CHANGE /DEPENDENT=Crit /METHOD=ENTER cCont IA_cCont_D1 IA_cCont_D2 IA_cCont_D3 /METHOD=ENTER D1 D2 D3.
This code estimates first a regression analysis with the predictor terms in the first /METHOD=ENTER portion (note that the dummy variables are not included here, hence: "reduced modelpolytomous predictor"), and then a second model that comprises all the predictors from the first as well as those from the second /METHOD=ENTER section, most notably including the dummy variables. Thus, a "complete model" (i.e., Model 0). The output will report the coefficients for both of these models in two halves of a table that is split horizontally and entitled "Coefficients". But the interesting portion of the output for now is the table entitled "Model summary".
A comma is the decimal separator in this table screenshot in lieu of a dot.
This table contains R2 statistics for both regression analyses from the last command: the reduced model without the dummy variables, as well as for the complete model with the dummy variables added. These two models can now be compared by reading off the R2change for the second (i.e., complete) model: That's the effect of the polytomous predictor expressed as additional variance explained by all dummy variables as a package. R2change is associated with a Fchange (also in that table), and there is a p value for this Fchange under the heading "Sig. F Change" in that table. That's your effect of the polytomous predictor if Cond = MEAN or cCond = 0.

Testing the interaction

Is this a "main effect" of typeOfEmployment? Only if there is no interaction of typeOfEmployment and Cont! So we have yet to test for an interaction of typeOfEmployment and Cont, and that is probably what we are interested in in the first place if we aim to find simple slopes involving a polytomous predictor. In order to test the interaction we run the regression analysis from above again, but this time we estimate a reduced modelinteraction that does not include the interaction term package (but cCont and the simple dummy variables), and then the complete model with all simple predictor terms and the product interaction terms (i.e., Model 0):
REGRESSION /STAT=COEF R ANOVA CHANGE /DEPENDENT=Crit /METHOD=ENTER cCont D1 D2 D3 /METHOD=ENTER IA_cCont_D1 IA_cCont_D2 IA_cCont_D3.
We again get an output including a "Model Summary" table:
A comma is the decimal separator in this table screenshot in lieu of a dot.
Again, the R2change, the Fchange and the associated p value give us an idea of whether there is an interaction of typeOfEmployment and Cont. In this example case, R2change = .094, the Fchange = 5.991, and p = .001: there is a significant interaction. So, the relationship between Cont and Crit is different depending on the value of typeOfEmployment. This does not necessarily mean that such simple slopes (i.e., effects of Cont on Crit conditional on typeOfEmployment) will be significant or not significant at any one particular level of typeOfEmployment. It is even possible that none of the simple slopes are significant. What this interaction means is that among the differences between any pair of conditional slopes of Cont on Crit, there is more variance than to be expected by chance alone if you allow for a type I error rate of less than 5%.

Obtaining the simple slopes

Now that we know that there is evidence of interaction in the example, we are interested in the specific nature of this interaction, or in other words: We want to know what the different simple slopes conditional on typeOfEmployment are. Let's have a look at Model 0: The coefficient associated with cCont is already one of the simple slopes. Namely, it is the slope of Cont at the value of typeOfEmployment that was assigned 0 in all three dummy variables. Hence, in the coefficient output of Model 0 above the regression coefficient B = -0.347 (SE = 0.15, p = .022) is the simple slope of Cont at typeOfEmployment = 4 (i.e. the relationship of Cont among those observations that are "unemployed").
Now it becomes obvious how we can obtain the simple slopes of Cont for any other value of typeOfEmployment: We simply chose different sets of dummy variables to represent the variable typeOfEmployment such that a different value of typeOfEmployment is assigned 0 in all dummy variables of the set:
Original variable Dummy set I Dummy set II Dummy set III Dummy set IV
D1 D2 D3 D1 D2 D4 D1 D3 D4 D2 D3 D4
1 (private industry) 1 0 0 1 0 0 1 0 0 0 0 0
2 (government) 0 1 0 0 1 0 0 0 0 1 0 0
3 (NGO) 0 0 1 0 0 0 0 1 0 0 1 0
4 (unemployed) 0 0 0 0 0 1 0 0 1 0 0 1
In this table of dummy sets, Dummy set I is the set used above, making typeOfEmployment = 4 (unemployed) the zero category and thus allowing to read of the simple slope of Cont on Crit for this group of observations. Dummy set II makes typeOfEmployment = 3 (NGO) the zero category. In order to obatin the simple slope of Cont on Crit for observations with employment in an NGO, this dummy set is used instead of Dummy set I. We have to also calculate the interaction terms, or actually only one of them, because only Dummy II-3 is different from Dummy I-3, the dummies I-1 and I-2 can be reused, as they are the same as II-1 and II-2, and the same is true for the interaction terms IA_cCont_D1 and IA_cCont_D2:
* create Dummy D4. RECODE typeOfEmployment (4=1)(1,2,3=0)(ELSE=SYSMIS) INTO D4. * create interaction term involving Dummy D4. COMPUTE IA_cCont_D4 = cCont*D4. * estimate the simple slope of Cont at typeOfEmployment = 3. REGRESSION /DEPENDENT = Crit /METHOD=ENTER cCont D1 D2 D4 IA_cCont_D1 IA_cCont_D2 IA_cCont_D4.
The regression analysis in this syntax yields the following output of coefficients:
A comma is the decimal separator in this table screenshot in lieu of a dot.
and from this we conveniently read off the simple slope of Cont on Crit at typeOfEmployment = 3 (i.e., the cases that receive 0 on all dummy variables of Dummy set II that is used here): B = 0.10, SE = 0.151, p = .509.
The simple slope for typeOfEmployment = 2 is obtained by recycling Dummy D1 and D3, and adding Dummy D4 - they make up Dummy set III - and their respective interaction terms.
* estimate the simple slope of Cont at typeOfEmployment = 2. REGRESSION /DEPENDENT = Crit /METHOD=ENTER cCont D1 D3 D4 IA_cCont_D1 IA_cCont_D3 IA_cCont_D4.
A comma is the decimal separator in this table screenshot in lieu of a dot.
The simple slope for observations that have employment in government is thus B = 0.451, SE = 0.151, p = .003.
Finally, reusing dummies D1, D3, and D4 (making up Dummy set IV) along with their interaction terms:
* estimate the simple slope of Cont at typeOfEmployment = 1. REGRESSION /DEPENDENT = Crit /METHOD=ENTER cCont D2 D3 D4 IA_cCont_D2 IA_cCont_D3 IA_cCont_D4.
yields:
A comma is the decimal separator in this table screenshot in lieu of a dot.
The simple slope of Cont on Crit at typeOfEmployment = 1 is thus B = 0.357, SE = 0.122, p = .004.
An overview of the simple slope estimates and the test for the interaction thus gives us:
Simple slope at
typeOfEmployment = ...
Dummy set usedDummy terms
(re-)used in syntax
Interaction terms
used
BSEβp
1 (private industry)Dummy set IVD2, D3, D4IA_cCont_D2
IA_cCont_D3
IA_cCont_D4
0.3570.122.37.004
2 (government)Dummy set IIID1, D3, D4IA_cCont_D1
IA_cCont_D3
IA_cCont_D4
0.4510.151.467.003
3 (NGO)Dummy set IID1, D2, D4IA_cCont_D1
IA_cCont_D2
IA_cCont_D4
0.100.151.103.509
4 (unemployed) Dummy set ID1, D2, D3IA_cCont_D1
IA_cCont_D2
IA_cCont_D3
-0.3470.150-.36.022

The procedure has thus given us all the conditional effects of Cont at the different values of typeOfEmployment (i.e., simple slopes) along with their SEs, standardized beta values and p values. Note that they need not form a linear pattern, such that, e.g., the slopes would steadily increase or decrease from typeOfEmployment = 1 to typeOfEmployment = 4. This may be the case, but not necessarily.

Comparing individual simple slopes

Also, for now, we only know that the slopes are heterogeneous (i.e., should not be considered approximately equal and random deviations from a common, equal simple slope). But we do not know whether The slopes in typeOfEmployment = 1 and typeOfEmployment = 2 are different or not. We also do not know if the slopes in typeOfEmployment = 3 and typeOfEmployment = 4 are different or not. (Comparing p values by rule of thumb is not a good test of difference between two slopes, see http://methstuff.blogspot.de/2014/02/my-interaction-is-not-significant-but.html.)
But we can look at the different models to obtain some test of whether the slopes for three of the conditions of typeOfEmployment differ from that in the remaining fourth condition of typeOfEmployment. Specifically the interaction terms will inform us of such differences between pairs of simple slopes. For example, look at the regression model with dummy variables D1, D2, and D4. Condition typeOfEmployment = 3 (i.e., NGO) is the reference group here that is assigned a value of zero on each dummy. The interaction terms therefore indicate how much the simple slopes of Cont in one of the conditions of typeOfEmployment differ from the slope in condition typeOfEmployment = 3:

A comma is the decimal separator in this table screenshot in lieu of a dot.
IA_cCont_D1 has a regression coefficient B = 0.258 (SE = 0.194, p < .185). The slope of Cont in condition 1 of typeOfEmployment is thus 0.258 original scale units larger than the slope in condition 3 of typeOfEmployment, and this difference is not significant, but can be easily explained to be a random fluctuation around a zero difference with 95% confidence. The simple slope in condition 2 is similarly larger descriptively, B = 0.351 (SE = 0.214, p < .102). Finally, as evident from the coefficients for IA_cCont_D4, the simple slope in condition 4 of typeOfEmployment is substantially smaller than the slope in condition 3, B = -0.447 (SE = 0.212, p = .037). This excercise can be extended to all possible pairwise comparisons of the simple slopes using the interaction terms from the different regression models with the different sets of dummy variables. HOWEVER,
  1. The standardized β coefficients for these product terms that SPSS (and other software packages for that matter) yields are not correct. Do not use them. If you want them, run the same model again, but with a) all continuous predictors standardized, b) the dummy variables not standardized, c) the product terms from the standardized continuous predictors and the unstandardized dummy variables, and d) the standardized dependent variable Crit. The unstandardized coefficients from the output of this special regression model are in fact the standardized coefficients for the original regression model.
  2. be aware that you are running 4 (regression models) × 3 (comparisons within each model) = 12 such post-hoc tests and thus more than overuse the degrees of freedom available in the design.

Using contrasts to test simple slope patterns

A different approach to testing patterns of simple slopes may be to use, instead of dummy variables, orthogonal contrasts that allow to specifically test a particular pattern of simple slopes. For example you could test whether indeed the simple slopes in conditions 1 and 2 are largest and not different from each other, condition 3 has a medium slope and condition 4 has the smallest by using:

* Focal contrast KF. RECODE typeOfEmployment (1,2=1)(3=0)(4=-2)(ELSE=SYSMIS) INTO KF. *Residual contrasts Kr1 and Kr2. RECODE typeOfEmployment (1=1)(2=-1)(3,4=0)(ELSE=SYSMIS) INTO Kr1. RECODE typeOfEmployment (1,2,4=1)(3=-3)(ELSE=SYSMIS) INTO Kr2. *Create the product terms. COMPUTE P_KF_cCont = cCont*KF. COMPUTE P_Kr1_cCont = cCont*Kr1. COMPUTE P_Kr2_cCont = cCont*Kr2.
and then estimating
REGRESSION /DEPENDENT Crit /METHOD = ENTER cCont KF Kr1 Kr2 P_KF_cCont P_Kr1_cCont P_Kr2_cCont.

A comma is the decimal separator in this table screenshot in lieu of a dot.

The regression coefficient of P_KF_cCont indicates whether the simple slopes across conditions of typeOfEmployment behave in the predicted way, and the two coefficients for P_Kr1_cCont and P_Kr2_cCont are far from any conventional significance levels so for now this can be taken that there are no other patterns among the simple slopes except the one described in KF.

Average effect of the "other" predictor

The four regression models with the different sets of dummy variables all yield a simple slope of Cont, i.e., an effect of Cont conditional on typeOfEmployment. So there is never a test of an average or "overall" effect of Cont. But we might want to know this average effect. In order to obtain it, run the regression Model 0 again, but not with typeOfEmployment dummy coded, but effect coded. There are also k-1 effect codes for k levels of a polytomous variable, and each of them assigns 1 to the same level in each variable, -1 to one of the remaining levels, and 0 to all other levels, respectively. There are four different sets of such effect codes:

Original variable   Effect code set 1   Effect code set 2   Effect code set 3   Effect code set 4
typeOfEmployment   E1_1 E1_2 E1_3   E2_1 E2_2 E2_3   E3_1 E3_2 E3_3   E4_1 E4_2 E4_3
1 (private industry)   1 1 1   -1 0 0   -1 0 0   -1 0 0
2 (government)   -1 0 0   1 1 1   0 -1 0   0 -1 0
3 (NGO)   0 -1 0   0 -1 0   1 1 1   0 0 -1
4 (unemployed)   0 0 -1   0 0 -1   0 0 -1   1 1 1

It does not matter which of these sets you take, they all yield the same result. Recode the original typeOfEmployment into the effect code set chosen, e.g., Effect code set 2:

RECODE typeOfEmployment (1=-1)(2=1)(3,4=0)(ELSE=SYSMIS) INTO E2_1. RECODE typeOfEmployment (1,4=0)(2=1)(3=-1)(ELSE=SYSMIS) INTO E2_2. RECODE typeOfEmployment (1,3=0)(2=1)(4=-1)(ELSE=SYSMIS) INTO E2_3.

then calculate the product terms again:

COMPUTE IA_cCont_E2_1 = cCont*E2_1. COMPUTE IA_cCont_E2_2 = cCont*E2_2. COMPUTE IA_cCont_E2_3 = cCont*E2_3.

and run the regression model with these terms:

REGRESSION /DEPENDENT Crit /METHOD = ENTER cCont E2_1 E2_2 E2_3 IA_cCont_E2_1 IA_cCont_E2_2 IA_cCont_E2_3.

A comma is the decimal separator in this table screenshot in lieu of a dot.

The regression coefficient for cCont in this regression model, B = 0.14, SE = 0.72, β = .145, p = .053, is the effect of Cont on Crit across all conditions of typeOfEmployment. Remember that this effect is not a main effect since the interaction shows that it is not an effect of Cont that is present regardless of the value of other predictors in the model (which is the definition of a main effect). Therefore the interpretation of this average effect should be cautious.

No comments: