Notes for data users - Full dataset
Notes for data users is useful information for first time users of the ALSWH data. It provides an introduction to each cohort, the study representativeness and attrition and notes relating to naming conventions for datasets and variables and missing data.
Important information you may need to know as a data user can be found on the data documentation page. On this page you will find information about the surveys, responses for each of the items asked in the surveys (Data Books) and information on how variables have been derived (Data Dictionary and Data Dictionary Supplement) just click on the links to learn more. Importantly please read the following prior to getting started:
Variable names, labels and formats for each cohort and survey can be accessed by clicking on the ‘Survey Variables’ option or directly by following this link.
You can weight for area of residence at Survey 1 (y1wtarea, m1wtarea, o1wtarea) in all crosstabs, frequencies and analyses to adjust for the initial deliberate oversampling in rural and remote areas. This is not required when running models that include area of residence.
Check the data map, the data dictionary and Data Dictionary Supplement for further information about survey items and derived variables. They are available at here.
Data must be downloaded and stored onto a secured environment as soon as it is received. Analysis undertaken must only be in accordance with the approved EOI. Changes to the nature of the analysis must be approved by the ALSWH Data Access Committee.
All publications must include the appropriate acknowledgments. more information can be found here.
If your project includes linkage of datasets, use the ‘IDproj’ ’ key variable for joining the datasets.
You should have the list of women opting out of the data linkage project(s). These women will not be in the linked data and should be considered.
Make sure when merging your survey and administrative datasets you ensure only consented women have been added to the combined outcome file. Note: in some cases all participants are required for the analysis but this should be confirmed by your ALSWH liaison.
Useful notes for data users linking the PBS and MBS data may be found in Tech Report 38 – December 2015 page 118 and Tech Report 39 page 71.
Medicare variable formats may be found on the ALSWH website at https://alswh.org.au/how-to-access-the-data/external-linked-datasets
Dummy PBS and MBS data are available for testing and development here. Information regarding these data is available here.
Useful programming code
Reliable programming code to join multiple files may be found for SAS and Stata programmers at the following webpages:
Useful SAS code clearly explained by Wieczkowski, Michael J. Alternatives to Merging SAS Data Sets. But Be Careful. IMS HEALTH, Plymouth Meeting, PA (http://stats.idre.ucla.edu/wp-content/uploads/2016/02/bt150.pdf ). Also see http://support.sas.com/resources/papers/proceedings09/036-2009.pdf .
Stata code: http://stats.idre.ucla.edu/stata/modules/combining-data/
Included on our website is the stripping program to change variable names – making wide to long transformation easier: http://stats.idre.ucla.edu/sas/faq/how-can-i-reshape-pairs-data-long-to-wide/
Information on enduring conditions is in Tech Report #29 here https://alswh.org.au/2007_technical-report_29#page=95
(Datasets with these variables may be requested by following the link https://alswh.org.au/who-is-involved/staff ))
Questions related to the Dietary Questionnaires may be found here.
Be careful that you do not inappropriately analyse single items from a scale. For example, the 36 items in the SF-36 should not be considered as separate items, other than the first self-rated health item. The Data Dictionary Supplement has details about which scales have been included in the surveys.
Commonly used data variable cut-points may be found in the Data Dictionary Supplement or here for the following:
- Physical activity in Report 21 page 104
- Mental health cut-points for possible psychosocial distress in Report 16 pages 48 and 66
- Notes regarding methods of standardising life events may be found in Report 31 page 106
Selection of the sample 1973-78, 1946-51 and 1921-26 cohorts
The study sample was selected by Medicare Australia (previously known as the Health Insurance Commission) from three zones – urban, rural and remote defined according to the Australian Standard Geographical Classification RRMA scheme where urban includes Capital City and Other Metropolitan Centres; rural, Large Rural Centres, Small Rural Centres and Other Rural Areas; and remote, Remote Centres and Other Remote Areas.
The age groups sampled from the Medicare database in April 1996 were 18-22 years, 45-49 years and 70-74 years. By the time the invitations to participate were mailed later in 1996, some women at the upper limit of the age groups had had their birthday and were a year older. Hence, some women recruited were 23, 50 and 75 years old and so the cohort age ranges in the study are: 18-23, 45-50, and 70-75 years (although there are relatively fewer women in the oldest year of each cohort). The cohorts are now referred to by their years of birth but some study material may refer to them as ‘Young’, ‘Mid-aged’ and ‘Older’ and datasets use ‘y’, ‘m’, and ‘o’ (further information below).
Sampling from the population was random within each age group, except that women from rural and remote areas were selected in twice the proportions of the Australian population living in these areas. Women from capital cities and other metropolitan areas made up the balance of the samples.
There were also a small number of women invited to participate whose age was outside the cohort birth years (by a year or two), possibly due to errors in date of birth in the Medicare database. However, the survey data for these women have been retained. We recommend that when using the data, these women are either excluded or their age set to the nearest valid age.
Selection of the sample 1989-95 cohort
Please note that some variables in Surveys 1 and 2 of the 1989-95 cohort were renamed for consistency in April 2016.
See: Renaming of Variables in the Surveys 1 and 2 for the 1989-95 cohort.
Recruitment for the 1989-95 cohort was different from the other cohorts. A variety of recruitment strategies were used (see the Report I, section 3.) A brief summary is given here.
For inclusion in the 1989-95 cohort, respondents needed to:
- meet the eligibility criteria of being female, aged 18 to 23 and having a Medicare
number; - answer at least some survey questions; and
- meet the requirements for data linkage
A total of 17,567 women met the above inclusion criteria. To establish a pilot study group for the cohort, the first 498 young women that met the above criteria were removed from the main cohort. As a result, the pilot study group included all women recruited in October 2012 who were verified by the Department of Human Services. Of the remaining sample, 17,069 participants were verified by the Department of Human Services.
Some participants in this cohort were later found to be ineligible due to their birth years being out of range and they have been removed from the cohort. In April 2018, there were 17010 participants in the cohort.
1973-78, 1946-51 and 1921-26 cohorts
The women were selected based on their postcodes recorded by Medicare. The variable in the datasets called ‘inarea’ reflects the area from which the women were sampled (urban, rural, remote). However by the time the survey was ailed, some women, particularly in the younger age group, had moved. The variable ‘y1area’ reflects their actual area of residence when completing the survey. The number of respondents who lived in urban, rural and remote areas at the time of completing the first survey in 1996 (wave 1 area) was used to create the sample weights for each age group for each area (urban, rural, remote), by comparing these numbers of respondents to 1996 Census figures. The sample weights
appear in the datasets and are labelled y1wtarea, m1wtarea and o1wtarea.
1989-95 cohort
Sample weights were calculated for the 1989-95 cohort based on the women’s ages and areas of residence (urban, rural and remote). The 2011 Census was used as the best available measure of Australia’s population of women aged 18 to 23.
Weights for women in the sample of age x (at baseline) residing in geographical region z:
(??, ??) = [P(??, ??)/P] ÷ [N(??, ??)/N]
Where N is the total number of women in the sample and N(??, ??) is the number of women aged ?? years residing in geographical region z in the sample. Similarly, P is the total number of women aged 18 to 23 in the Australian population, and P(??, ??) is the number of women in the Australian population aged ?? years residing in geographical region z.
These papers explain representativeness and attrition:
- Lee C, Dobson AJ, Brown WJ, Bryson L, Byles J, Warner-Smith P, Young AF. (2005) Cohort Profile: The Australian Longitudinal Study on Women’s Health. International Journal of Epidemiology; 34: 987-991.
- Young AF, Powers JR, Bell SL. Attrition in longitudinal studies: who do you lose? Australian and New Zealand Journal of Public Health. 2006 Aug;
- Brilleman SL, Pachana NA, Dobson AJ. The impact of attrition on the representativeness of cohort studies of older people. BMC Medical Research Methodology. 2010 Aug;10.
- Powers J, Loxton D. The Impact of Attrition in an 11-Year Prospective Longitudinal Study of Younger Women. Annals of Epidemiology 2010; 20(4):318-21.)
For representativeness for the 1989-95 cohort see:
- Health and wellbeing of women aged 18 to 23 in 2013 and 1996: Findings from the Australian Longitudinal Study on Women’s Health. Mishra G, Loxton D, Anderson A, Hockey R, Powers J, Brown W, Dobson A, Duffy L, Graves A, Harris M, Harris S, Lucke J, McLaughlin D, Mooney R, Pachana N, Pease S, Tavener M, Thomson C, Tooth L, Townsend N, Tuckerman R and Byles J. Report prepared for the Australian Government Department of Health, June 2014. (Section 4).
When doing longitudinal analyses with the cohorts beginning in 1996, remember to weight for area of residence at Survey 1 (y1wtarea, m1wtarea, o1wtarea) in all crosstabs, frequencies and analyses to adjust for the initial deliberate oversampling in rural and remote areas. This weighting may not be required in models that include a geographic area of residence variable. For information on geographic area of residence, see below in Notes about specific variables.
Available for each of the four ALSWH cohorts, these datasets contain key longitudinal variables that have been harmonised across survey waves to save data users time by reducing duplication of work and programming errors.
There is one longitudinal KLV dataset per cohort. As of April 2023, the following survey waves are
included: 1989‐95 Waves 1‐6, 1973‐78 Waves 1‐9, 1946‐51 Waves 1‐9, and 1921‐26 Waves 1‐6. The
included variables are presented in the linked document. The raw survey variables used to
derive each longitudinal variable can be identified by viewing the source code.
Some participants completed a short survey instead of the full survey, accounting for some missing data. This occurred in Survey 2 for the three original cohorts and Survey 3 for the 1921-26 and 1946-51 cohorts. The variable ‘**survey’, has the value 2 for a short survey and one otherwise. The type of survey completed is identified with variables such as y2survey for Survey 2 of the 1973-78 cohort. Survey 2 of the 1946-51 cohort Q70 on income is missing the first category ($1-$119). There are large amounts of missing data in some income questions. Surveys 2, 3 and 4 of the 1946-51 cohort are missing the question about being admitted to hospital. Survey 2 of the 1973-78 cohort is missing the question about ability to manage on income. Survey 2 of the 1946-51 cohort Q67 is unreliable as the instruction was incorrectly stated as “mark one only” rather than “mark all that apply.” Many participants realised that this was an error and answered the question, as it should have been. Others may not have done so.
The first survey of the 1989-95 cohort has 167 records whose data are almost all missing. These records are identified by the allmissing variable. This variable has the value 1 for those records that are almost all missing, zero otherwise. These records represent eligible respondents who did complete the first survey but we unfortunately lost their data. They are kept in the dataset so that the first wave’s dataset contains the whole sample.
The quantitative survey data are available as SAS, STATA and SPSS data files, or as tab delimited text files. The dataset files include almost all survey items as well as all derived and calculated variables.
The analysis datasets without formats and labels attached are named WHAsurveycohortB
Where survey is the survey wave number
Where ‘cohort’ is the three-letter cohort abbreviation:
yng (1973-78 cohort), mid (1946-51 cohort) , old (1921-26 cohort), nyc (1989-95 cohort)
B = level B data (identifying information removed). For example, wha1yngB.txt is the text dataset for Survey 1 of the 1973-78 cohort.
The analysis datasets with formats and labels attached are named WsurveycohortBF
Where survey is the survey wave number
Where ‘cohort’ is the one letter cohort abbreviation:
y (1973-78 cohort), m (1946-51 cohort) , o (1921-26 cohort), z (1989-95 cohort)
B = level B data (identifying information removed) and F refers to formats and labels attached. For example, w2mBf.sas is the SAS dataset with formats for Survey 2 of the 1946-51 cohort.
The variables in the three original cohorts are named with a two-letter prefix, e.g. ‘m1’ that identifies the cohort and survey wave.
The letters are y (1973-78 cohort), m (1946-51 cohort), and o (1921-26 cohort)
The 1989-95 cohort, also referred to as the New Young Cohort, or NYC, has been allocated the one-letter abbreviation ‘z’ because it follows on from the first young cohort, which used ‘y’. However, the variable names in the 1989-95 cohort data do not use the prefixes that are used in the other cohorts.
Some variables in the first two waves of the 1989-95 cohort have been renamed to achieve consistency with the Data Dictionary and within all the 1989-95 surveys. This was done in April 2016. Some of the variable names in this cohort had become inconsistent with the Data Dictionary Index Numbers, which are the standard reference for variables, and also between the various waves of the surveys.
The variables in this cohort are different from the other cohorts’ variables in that they do not have paper questionnaire names, e.g., in the 1973-78 cohort y6q18 is the variable for the 18th question in the sixth wave of this (Young) cohort. The 1989-95 cohort data have different variable names that are from the Data Dictionary Index Numbers. For example, the question ‘In general, would you say your health is …‘ has Index Number SF36-001 and the variable is named SF36001. However, the first two surveys, as they were initially released, had some variables that ended up with names that were not from the Data Dictionary Index Numbers and therefore they have been renamed so they are consistent with the Data Dictionary.
Examples of inconsistent naming in Survey 1
Variable G1_HSRV201 changed to HSRV201
(“Where do you get information about your health? Other”)
ALCS032 changed to ALCS033
(“Have you ever drunk alcohol?”)
The first example above removes the prefix ‘G1_’ because it has no meaning in the analysis data set and removing it matches the variable with the Data Dictionary Index Number. The second example had a misleading name since the Data Dictionary Index Number was ALCS-033 but the variable was labelled ALCS032 – not a good name since the Data Dictionary Index number ALCS-032 is for another variable altogether.
After this variable name change, all the questionnaire variables are now named the same as their Data Dictionary Index name. This is not necessarily true for the derived variables, that is, those not on the questionnaire. The derived variables have names that are designed for easy reference. For example, the BMI variable is called ‘BMI’ on the data set, but its Index Number on the Data Dictionary is WTSH-088.
Amendment from the first version of this document
Note that the variables ending in ‘TEXT’ with a number, e.g. ‘TEXT2’, have all had their final number removed.
In Survey 2 there were some renamed variables in the Composite Abuse Scale that were not included in an earlier version of this document. These were variables whose names both needed to be changed from and also variables were renamed to these names. For example, CASC128 is the new name for what was called CASC119, furthermore, the variable that was called CASC128 is now changed to CASC140. These variables are now all in the lists below.
Survey 1 1989-95 cohort renamed variables
This table has all the variables that were renamed in Survey 1 of the 1989-95 cohort.
Earlier Variable Name | New Variable Name |
ALCS032 | ALCS033 |
CASC119 | CASC128 |
CASC120 | CASC129 |
CASC100 | CASC132 |
CASC123 | CASC133 |
CASC124 | CASC134 |
CASC125 | CASC135 |
CASC117 | CASC136 |
CASC106 | CASC137 |
CASC095 | CASC138 |
CASC118 | CASC139 |
CPRB305 | CPRB181 |
CPRB304 | CPRB230 |
DEMO06__NO | DEMO062 |
DEMO06__A | DEMO063 |
DEMO06__TSI | DEMO064 |
G6_DEMO156 | DEMO156 |
6_DEMO157 | DEMO157 |
G6_DEMO158 | DEMO158 |
G6_DEMO159 | DEMO159 |
G6_DEMO160 | DEMO160 |
G6_DEMO161 | DEMO161 |
DEMO152 | DEMO168 |
G6_DEMO162 | DEMO169 |
G6_DEMO162_TEXT | DEMO169_TEXT |
EMPL087 | EMPL093 |
EMPL088 | EMPL094 |
G1_HSRV201 | HSRV201 |
G1_HSRV202 | HSRV202 |
G1_HSRV203 | HSRV203 |
G1_HSRV203_TEXT2 | HSRV203_TEXT |
G1_HSRV204 | HSRV204 |
G1_HSRV205 | HSRV205 |
G1_HSRV206 | HSRV206 |
G1_HSRV207 | HSRV207 |
G1_HSRV208 | HSRV208 |
G1_HSRV209 | HSRV209 |
G1_HSRV210 | HSRV210 |
G1_HSRV210_TEXT | HSRV210_TEXT |
G1_HSRV211 | HSRV211 |
REPH217 | HSRV217 |
LFEVPGSK | LFEV283 |
LFEVUNSEX | LFEV284 |
LFEVBULLY | LFEV285 |
G2_MEDH375 | MEDH375 |
G2_MEDH376 | MEDH376 |
G2_MEDH377 | MEDH377 |
G2_MEDH378 | MEDH378 |
G2_MEDH379 | MEDH379 |
G2_MEDH380 | MEDH380 |
G2_MEDH381 | MEDH381 |
G2_MEDH382 | MEDH382 |
G2_MEDH383 | MEDH383 |
G2_MEDH384 | MEDH384 |
G2_MEDH385 | MEDH385 |
G2_MEDH385_TEXT | MEDH385_TEXT |
G2_MEDH386 | MEDH386 |
G2_MEDH386_TEXT2 | MEDH386_TEXT |
G2_MEDH388 | MEDH466 |
G3_MEDH389 | MEDH389 |
G3_MEDH390 | MEDH390 |
3_MEDH391 | MEDH391 |
G3_MEDH392 | MEDH392 |
G3_MEDH393 | MEDH393 |
G3_MEDH394 | MEDH394 |
G3_MEDH395 | MEDH395 |
G3_MEDH395_TEXT4 | MEDH395_TEXT |
G4_MEDH396 | MEDH396 |
G4_MEDH397 | MEDH397 |
G4_MEDH398 | MEDH398 |
G4_MEDH388 | MEDH388 |
G4_MEDH398_TEXT3 | MEDH398_TEXT |
G2_MEDH374 | MEDH419 |
G2_MEDH387 | MEDH420 |
G2_MEDH387_TEXT5 | MEDH420_TEXT |
G3_MEDH388 | MEDH452 |
PWEL001 | PWEL005 |
PWEL002 | PWEL006 |
REPH215 | REPH028 |
REPH218 | REPH040 |
REPH220 | REPH041 |
REPH226 | REPH160 |
REPH234 | REPH242 |
REPH236 | REPH243 |
REPH228 | REPH245 |
REPH230 | REPH246 |
REPH232 | REPH247 |
REPH216 | REPH271 |
REPH219 | REPH272 |
G5_REPH221 | REPH273 |
G5_REPH222 | REPH274 |
G5_REPH237 | REPH275 |
G5_REPH238 | REPH276 |
G5_REPH225 | REPH277 |
G5_REPH225_TEXT | REPH277_TEXT |
G5_REPH226 | REPH278 |
REPH225 | REPH279 |
REPH227 | REPH280 |
REPH229 | REPH281 |
REPH231 | REPH282 |
REPH235 | REPH283 |
SMOK034 | SMOK038 |
SMOK035 | SMOK039 |
K10001 | KTEN001 |
K10002 | KTEN002 |
K10003 | KTEN003 |
K10004 | KTEN004 |
K10005 | KTEN005 |
K10006 | KTEN006 |
K10007 | KTEN007 |
K10008 | KTEN008 |
K10009 | KTEN009 |
K10010 | KTEN010 |
DEMO155_TEXT4 | DEMO155_TEXT |
Survey 2 1989-95 cohort renamed variables
This table has all the variables that were renamed in Survey 2 of the 1989-95 cohort.
Earlier Variable Name | New Variable Name |
CASC119 | CASC128 |
CASC120 | CASC129 |
CASC100 | CASC132 |
CASC123 | CASC133 |
CASC124 | CASC134 |
CASC125 | CASC135 |
CASC117 | CASC136 |
CASC106 | CASC137 |
CASC095 | CASC138 |
CASC118 | CASC139 |
CASC128 | CASC140 Repeated |
CASC129 | CASC141 Repeated |
CASC132 | CASC142 Repeated |
CASC133 | CASC143 Repeated |
CASC134 | CASC144 Repeated |
CASC135 | CASC145 Repeated |
CPRB305 | CPRB181 |
CPRB304 | CPRB230 |
DEMO06__NO | DEMO062 |
DEMO06__A | DEMO063 |
DEMO06__TSI | DEMO064 |
G6_DEMO156 | DEMO156 |
G6_DEMO157 | DEMO157 |
G6_DEMO158 | DEMO158 |
G6_DEMO159 | DEMO159 |
G6_DEMO160 | DEMO160 |
G6_DEMO161 | DEMO161 |
G7_EATS032 | EATS032 |
G7_EATS033 | EATS033 |
G7_EATS034 | EATS034 |
G7_EATS040 | EATS040 |
G7_EATS064 | EATS064 |
G7_EATS065 | EATS065 |
EMPL087 | EMPL093 |
EMPL088 | EMPL094 |
G1_HSRV201 | HSRV201 |
G1_HSRV202 | HSRV202 |
G1_HSRV203 | HSRV203 |
G1_HSRV204 | HSRV204 |
G1_HSRV205 | HSRV205 |
G1_HSRV206 | HSRV206 |
G1_HSRV207 | HSRV207 |
G1_HSRV208 | HSRV208 |
G1_HSRV209 | HSRV209 |
G1_HSRV211 | HSRV211 |
G1_HSRV213 | HSRV213 |
G1_HSRV214 | HSRV214 |
REPH217 | HSRV217 |
G2_MEDH375 | MEDH375 |
G2_MEDH376 | MEDH376 |
G2_MEDH377 | MEDH377 |
G2_MEDH378 | MEDH378 |
G2_MEDH379 | MEDH379 |
G2_MEDH380 | MEDH380 |
G2_MEDH381 | MEDH381 |
G2_MEDH382 | MEDH382 |
G2_MEDH383 | MEDH383 |
G2_MEDH384 | MEDH384 |
G2_MEDH386 | MEDH386 |
G2_MEDH388 | MEDH466 |
G4_MEDH388 | MEDH388 |
G3_MEDH389 | MEDH389 |
G3_MEDH390 | MEDH390 |
G3_MEDH391 | MEDH391 |
G3_MEDH392 | MEDH392 |
G3_MEDH394 | MEDH394 |
G4_MEDH396 | MEDH396 |
G4_MEDH397 | MEDH397 |
G4_MEDH398 | MEDH398 |
G4_MEDH413 | MEDH413 |
G4_MEDH414 | MEDH414 |
G4_MEDH415 | MEDH415 |
G4_MEDH416 | MEDH416 |
G3_MEDH417 | MEDH417 |
G3_MEDH418 | MEDH418 |
G2_MEDH374 | MEDH419 |
G3_MEDH388 | MEDH452 |
G4_MEDH421 | MEDH454 |
G4_MEDH419 | MEDH455 |
G4_MEDH420 | MEDH456 |
PWEL001 | PWEL005 |
PWEL002 | PWEL006 |
REPH215 | REPH028 |
REPH216 | REPH271 |
REPH219 | REPH272 |
G5_REPH221 | REPH273 |
G5_REPH222 | REPH274 |
G5_REPH237 | REPH275 |
G5_REPH238 | REPH276 |
G5_REPH225 | REPH277 |
G5_REPH226 | REPH278 |
SMOK018 | SMOK029 |
SMOK038 | SMOK043 |
G2_MEDH386_TEXT2 | MEDH386_TEXT |
G4_MEDH398_TEXT | MEDH398_TEXT |
K10001 | KTEN001 |
K10002 | KTEN002 |
K10003 | KTEN003 |
K10004 | KTEN004 |
K10005 | KTEN005 |
K10006 | KTEN006 |
K10007 | KTEN007 |
K10008 | KTEN008 |
K10009 | KTEN009 |
K10010 | KTEN010 |
REPH244 | REPH160 |
Label files allocate meanings to variables. E.g., m1q1=’How is your health now?’
Format files allocate meanings to the values of variables. E.g., 1=very good, 2=good etc.
As well as the survey datasets, there are some supplementary datasets that have been created. Information about dates of deaths and withdrawal of participants is available in the participant status file.
The qualitative data recorded on the back page are also available for analyses. For further information, refer to the Qualitative processing protocols here.
Birth Events
There is a Birth Events dataset for the 1973-78 cohort and another one for the 1989-95 cohort. These were referred to as the ‘Child’ datasets before November 2022. These datasets contain information on birth deliveries, birth complications, and some information about the child. The data is from all the relevant survey waves for the cohort. They are structured so there is a record for each child. Each record is unique based on the mother’s ID, the date of birth, and the multiple birth count variable. The Birth Events datasets get updated with each new survey that has relevant information.
Medications datasets
The fourth survey of the 1921-26 cohort, the fifth and sixth of the 1946-51 cohort and the fifth and sixth of the 1973-78 cohort have data on self-reported medications the respondents are taking. These data are available on separate datasets. Where possible, the medications are given by name and ATC code.
Participant Status and Cause of Death files
For a detailed description of Participant Status and Cause of Death files please see section 8 of the Data Dictionary Supplement page.
The Data Dictionary is a Microsoft Access database that gives a detailed description of the questions used in the survey, their source and how they are used, as well as information on the derived and calculated variables. The Data Dictionary is constantly updated and is available here. (The table is over 1,000 pages long so do not try to print it).
The Data Dictionary Supplement is a series of documents that accompanies the Data Dictionary. The Data Dictionary Supplement contains information about scales and other measures used in the ALSWH surveys. Before using any summary or scale score included in an ALSWH dataset, the appropriate section of the Data Dictionary Supplement should be reviewed. The Data Dictionary and Data Dictionary Supplement are available here.
Check the survey data books if unsure about response frequencies. Electronic copies of the surveys and data books are available here.
In general, it is the responsibility of the analyst to become familiar with and carefully examine all data before proceeding with data analysis.
There are different naming conventions for survey items and derived items. IDalias is a unique de-identified participant number, present in all data files. This participant number can be used to merge data files across surveys. The survey questions and method used in the calculation of the derived variables are listed in the Data Dictionary. A few survey items at Survey 1 (birth date, country of birth, language spoken at home) were removed or aggregated into groups, as these were considered potentially able to make participants identifiable.
It is not recommended to arbitrarily replace missing values with the null value or any other value. Questions involving “mark all that apply” responses have been coded to 0 (no response) or 1 (yes response). In general, a “none of the above” response option was offered at the end of each set of “mark all that apply” questions. If responses to all sections of a specific question were missing, including the null option (“none of the above”), all responses were set to missing.
Scales
Regarding items that form part of a scale, be careful that you do not inappropriately analyse single items from a scale. For example, the 36 items in the SF-36 should not be considered as separate items, other than the first self-rated health item. The Data Dictionary Supplement has details about which scales have been included in the surveys. Regarding items that form part of a scale, be careful that you do not inappropriately analyse single items from a scale. For example, the 36 items in the SF-36 should not be considered as separate items, other than the first self-rated health item. The Data Dictionary Supplement has details about which scales have been included in the surveys.
Counting symptoms
When looking at symptoms, the general rule is to count the number of women who had the symptom “sometimes” or “often”.
Measure of depressive symptoms
The 10-item CES-D scale has an extra item at the end (“I felt terrific”) which is not included in the calculation of the CES-D score. The CES-D score is available in the datasets.
Menopause
The menopause status variable was calculated at each survey incorporating previous surveys’ information for the 1946-51 cohort during the time the women were experiencing menopause.
Measures of physical activity
The physical activity questions were changed after Survey 1. The new physical activity measures from Survey 2 are not comparable to Survey 1 in longitudinal analysis. Refer to the Data Dictionary Supplement for more information.
Summary variables
There are a few “standard” ways to collapse some of the main categorical variables we collect. For example, education (highest qualification) can be dichotomised as “school only”, “post school” or in three categories: “no formal qualifications”, “school qualifications”, “trade/tertiary qualifications” and so on. There have been several variables created to summarise sets of items in the surveys (eg. the illicit drug use items) and it is important that
data analysts become familiar with these new variables (See Data Dictionary Supplement)
Area of residence
The recommended measures are ARIA+, present on all surveys, and Modified Monash Model, MMM, only present on surveys after 2012. ARIA+ is an index of accessibility/remoteness based on the distance to the nearest service centre. The scores range from 0 to 15 and the ABS has defined 5 categories for remoteness: major cities of
Australia, inner regional Australia, outer regional Australia, remote, and very remote. Only a few of the study’s women live in very remote areas, so the fourth and fifth categories are often grouped together. Aria+ and MMM are recommended over the previously used RRMA area classification. For more information see https://www.adelaide.edu.au/hugo-centre/news/list/2018/11/21/accessibilityremoteness-index-of-australia-plus-aria-2016. For the Modified Monash Model, see the Data Dictionary Supplement section.
ATSI status
Asked at Survey 1 in all age groups. This variable can be used in statistical models but results should not be reported separately by ATSI status in any reports. See Indigenous data policy for more information.
Shorter questionnaires have been used for some respondents in Women’s Health Australia when the women had not responded and was contacted late and offered a short survey to complete. The short surveys were only offered in the second surveys of the 1921-26, 1946- 51, and 1973-78 cohorts, and the third survey of the 1946-51.
The short surveys only contained those questions that were considered particularly important. These questions are listed in the Short Surveys document. The researcher can identify which respondent did the short survey because their ‘survey’ variable will have the value 2 rather than 1. These records will have many variables that are entirely missing; the variables that were not included in the short survey.
This link has good examples of data analysis using SAS, STATA, R and SPSS https://stats.idre.ucla.edu/other/dae/
For more information about using study data and applying to the Data Access Committee for access to the data please refer to the how to access the data page.