Jenny Powers: Stability of groups of correlated variables identified by exploratory factor and cluster analysis.

Objective: To determine which of the variables in a large health survey of women are correlated with each other so that problems of multicollinearity can be avoided in future analyses.

Methods: Cross-sectional data were obtained from the baseline survey of 14,100 women (aged 45-50 years) enrolled in the Australian Longitudinal Study on Women's Health. Five different analytical techniques (factor analyses with varimax and promax rotation and cluster analyses with single, complete and density linkages) were used on three samples of these women. The resulting factors and clusters were used to define common groups of correlated variables.

Results: The 18 common groups of variables, consisting of 122 items, fell into five broad categories - perceived physical and mental health, health service use, gynaecological health, lifestyle and demographics. The results obtained using split samples and different analytical methods suggest that these groups of correlated variables are stable.

Conclusions and Implications: In studies such as this, where large number of variables are collected, a combination of factor analysis and cluster analysis can be used to identify stable groups of correlated variables. These groups may be used to create composite variables, such as factor scores or summated scores, or to identify one variable representative of the group. The new variables can then be used in future analyses so that the problems of multicollinearity can be avoided. The identified correlated variables may also be used to reduce both redundancy and the number of missing values.