Icons / Login Created with Sketch.
Icons / User Created with Sketch.

Notes for data users - Core dataset

ADA-Accessed (Core) data
A subset of the ALSWH survey data can be assessed through the Australian Data Archive, ADA. These datasets are known as the ADA-Accessed (Core) data. These datasets contain most but not all of the survey data that are assessable through the ALSWH EOI procedure.

Differences between the ALSWH data and the ADA-Accessed (Core) data
Some variables have been removed from the ADA-Accessed (Core) data. These variables were considered to be sensitive and therefore requiring ALSWH oversight. Also, some variables that are not needed for analysis were removed. These include the survey questions that are used to derive another variable or scale. Furthermore, a small number of variables were recoded to avoid sensitivities. These variables are outlined below. The ADA-Accessed (Core) data will be referred to as the Core data from here.

Questionnaire items that were components in deriving a scale variable were removed from the Core datasets and only the derived scale items were kept. Sensitive variables dealing with violence and drug use are not in the Core datasets.

The ALSWH Data Dictionary, available on the website, lists all the ASLWH variables with an indicator whether they are in the Core datasets or not.

The Data Dictionary can be downloaded from here:

Age Variables
The ‘age’ variable is age in integer years at time the survey was returned. Other age variables will be dropped, such as, age first starting smoking, came to Australia. Exceptions will be where the age range is narrow and there are many women in each category.

Date variables
There are not be any exact dates on the Core datasets. The only date will be birth by year and response date by year. All other dates will be removed, such as date first came to Australia. Exceptions will be where the year range is narrow and there are many women in each category.

Geographic Variables

The geographic variables kept are State and ARIAPGP. The variables ARIAPLUS and MMM were dropped.

For ARIAPGP, the very remote and remote categories were collapsed into a single category.

State / Territories
ACT and NSW will be collapsed together and NT and SA will be collapsed together.

The exercise statistic was removed from Survey 1 because this was a different variable to subsequent surveys. The exercise variables begin from Survey 2.

Short Surveys records are kept within the main datasets, as they are with the main ALSWH datasets.
The variables survey, e.g., ‘m2survey’ has values 1 for full survey and 2 for short survey.


All the following variables and sets of variables are not in the Core datasets:

  • Complete Food Frequency datasets
  • All text variables
  • The qualitative data
  • All ATSI
  • Domestic and child abuse questions
  • Illicit drug questions
  • Medications free text data
  • Cause of death
  • Linked data
  • Six month follow up data from the 1921-26 cohort


Where a categorical variable had a category less than 10 it was either collapsed into fewer categories or dropped. If the Variable was not commonly used in research it was dropped and if it was commonly used, such as Marital Status in certain Cohorts, then it was kept with collapsed categories.

The individual survey items for time use and labour force were removed from the Core datasets where they were used to derive other variables. The derived Labour Force Status variables are LABF, HRS, HRSWORK, and these are on the Core datasets.

The Birth Events dataset variables will not be kept except for the ‘children’ variable. This is number of children (0, 1, 2, 3 or more). All other ‘Number of … ‘ reproductive variables are removed.

There will be a participant file. This will have ID, Year of response for each survey. Otherwise for each survey: year of response, did not respond, dead.

There are a few variables in the Core dataset that have been recoded from the main ALSWH variables. These differences are described below.

There is an identifier for each woman in the Core datasets, IDcore. This is different from the IDalias.

State of residence
The ACT and NSW are combined together, as are the Northern Territory and South Australia

ARIA+ grouped, has only 4 categories, with very remote combined with remote.

Age completed survey and year of birth are available. Some ages that were outside the standard range for the cohort and had very small frequencies were recoded to the nearest age within the standard range.

Some cohorts and waves had their own recodes, particularly the 1946-51 cohort.

Cohort 1989 – 1995

Variable recodes

  • The number of children variable (children) is capped at 3 for all relevant surveys
  • ‘Divorced’ and ‘widowed’ are collapsed with ‘separated’ for the marital status variable in all surveys

Cohort 1973 – 1978

Variable recodes

  • The number of children variable (children) is capped at 3 for all surveys

Survey 1

  • Y1Q76, Age left school : ‘Never attended school’ recoded to missing
  • y1q83, Speak English: “Speak English not at all” is collapsed with “Not well”

Surveys 2, 3, 4, 5, 6, 7

  • There are no recodes beyond those mentioned above

1946-51 Cohort

From survey 3 onwards these two variable sets have been recoded

  • ‘How many times have you consulted GP/hospital doctor/ specialist in the last 12 months’
    The category ‘25+ times’ is collapsed with ‘13-24 times’
  • ‘Number of people living with you’
    Category ‘3 or more’ combined with ‘2’

Survey 1

  • Nothing beyond mentioned above

Survey 2

  • m2q71, ‘Number people dependent on income’, is capped at 7

Survey 3

  • M3q49a – j , ‘How often drink cola, etc’
    Category ‘3 or more times a day’ is collapsed with ‘2 times a day’
  • M3q88, ‘How many dependent on household income’
    Capped at 7

Survey 4,5 

  • Nothing beyond mentioned above

Survey 6

  • m6q70, “How many slices of bread eat per day”
    Category ‘8+ slices per day’ is collapsed with ‘5-7 slices per day’

Survey 7

  • M7q77a – l , ‘Number of drinks’
    The category ‘3 or more’ is collapsed with ‘2 times per day’

Survey 8

  • M8q57a – l , ‘Number of drinks’
    The category ‘3 or more’ is collapsed with ‘2 times per day’
  • M8q82 ‘Which best describes your housing situation’
    Category ‘Nursing home / residential aged care’ is collapsed with ‘Retirement village / self care unit’, and ‘Hostel / boarding house’ collapsed with ‘Other’
  • M8q83 “How many bedrooms”
    Capped at 8
  • M8q85 “Years lived in current home ”
    Capped at 50
  • M8q87, ‘Where living in 10 years time’
    Category ‘Hostel/ boarding house’ collapsed with ‘Have no idea’
  • M8q76 a/b, Retirement of self/partner
    Category ‘Never been in paid work’ is collapsed with ‘Other’

Cohort 1921 – 1926

Variable recodes for all surveys

  • ‘De facto’ is collapsed with ‘married’ in the marital status variables
  • ‘High risk drinker’ is collapsed with ‘Risky drinker’ in the Alcohol Status (NHMRC) variables