FITTING MULTIVARIATE LINEAR REGRESSION MODELS USING MULTIPLE INFORMANT DATA

It is straightforward to implement the methods described by Goldwasser and Fitzmaurice (International Journal of Methods in Psychiatric Research, 2001 10:1-10). This page provides details about dataset construction and a brief summary regarding the SAS syntax to fit the model.

I. Sample SAS commands for data manipulation

Suppose the dataset was constructed with one record per subject, an indicator of the ID of the subject, parent and teacher multiple informant reports of psychopathology, gender (FEMALE), and socioeconomic status (SES).

ID PARENT TEACHER FEMALE SES
    1 51 53 1 1
    2 43  . 0 1
    3 70 57 0 2

Above is an example of 3 subjects from the data analysis presented in the Goldwasser and Fitzmaurice manuscript, with only gender and SES included as covariates. Subject 2 has a missing teacher response, represented by a period. Note that SAS PROC MIXED requires the data to be in a univariate form, with as many records as there are informant reports. Below is the SAS code for carrying out the data transformation, followed by the resulting SAS output. The "INTERN" and "INF" variables, representing internalizing score and the corresponding informant, are created in the SAS code. It can be seen in the output that there are two entries per subject, one row or record for parent report and one for teacher report.

data one; input id parent teacher female ses;
cards;
1 51 53 1 1
2 43  . 0 1
3 70 57 0 2
run;

data internal; set one;
    intern=parent;  inf=0; output;
    intern=teacher; inf=1; output;
    drop parent teacher;
run;

proc print data=internal;

OBS ID FEMALE SES INTERN INF
1 1 1 1 51 0
2 1 1 1 53 1
3 2 0 1 43 0
4 2 0 1  . 1
5 3 0 2 70 0
6 3 0 2 57 1


II. Sample SAS commands to fit multivariate linear regression

The multivariate linear regression analysis is carried out using the SAS PROC MIXED code presented below. The "model" statement specifies the regression of internalizing score onto an indicator of informant status, the gender and SES risk factors, and their interactions with informant. The "repeated" statement in PROC MIXED is used to identify observations that are correlated and to model the covariance structure. Informant ("inf") represents the repeated measures factor, and in this example ensures that missing informant reports are handled correctly. An unstructured covariance matrix is specified with the "type=un" option.

proc mixed data=internal noclprint;
    class id inf ses;
    model intern = inf female ses inf*female inf*ses / s;
    repeated inf / type=un subject=id;
run;

A pointer to the paper can be found at: http://www.biostat.harvard.edu/~horton/goldfitz.pdf.

More information about the multiple informant web page can be found at: http://www.biostat.harvard.edu/multinform.

Nicholas Horton, horton@hsph.harvard.edu, July, 2002