WHO MONICA Project e-publications, No.
5
Participation Rates, Quality of Sampling Frames and Sampling
Fractions in the MONICA Surveys
September 1998
Hermann K. Wolf1, Kari Kuulasmaa2, Hanna
Tolonen2 and Esa Ruokokoski2 for the WHO MONICA Project3
1 Department of Physiology and Biophysics, Dalhousie University,
Halifax, Canada
2 MONICA Data Centre, National Public Health Institute, Helsinki,
Finland
3 Annex: Sites and key personnel of the WHO
MONICA Project
© Copyright World Health Organization (WHO) and the WHO MONICA Project
investigators 1999. All rights reserved.
This document includes the main findings of unpublished report:
- Kuulasmaa K, Sans S, Molarius A, Koivisto A-M, Moltchanov V for the WHO
MONICA Project. Participation rate and the quality of sampling frame in the
first and second risk factor surveys of the WHO MONICA Project. MONICA Memo
259A, March 1994.
Acknowledgements
The MONICA Centres are funded predominantly by regional and national
governments, research councils, and research charities. Coordination is the
responsibility of the World Health Organization (WHO), assisted by local fund
raising for congresses and workshops. WHO also supports the MONICA Data Centre (MDC)
in Helsinki. Not covered by this general description is the ongoing generous
support of the MDC by the National Public Health Institute of Finland, and a
contribution to WHO from the National Heart, Lung, and Blood Institute, National
Institutes of Health, Bethesda, Maryland, USA for support of the MDC. The
completion of the MONICA Project is generously assisted through a Concerted
Action Grant from the European Community. Likewise appreciated are grants from
ASTRA Hässle AB, Sweden, Hoechst AG, Germany, Hoffmann-La Roche AG,
Switzerland, the Institut de Recherches Internationales Servier (IRIS), France,
and Merck & Co. Inc., New Jersey, USA, to support data analysis and
preparation of publications.
Contents
The purpose of the MONICA risk factor surveys (1,
2) is to estimate the cardiovascular risk factor
distribution in the study populations. When the main hypotheses are tested, the
risk factor trends will be related to the trends of cardiovascular disease rates
in the same populations. Therefore, it is necessary that the risk factor data
and the event registration data refer to the same population. Apart from the
quality of the measurements, the risk factor data may be biased for the
following reasons:
- the survey population is geographically different from the event
registration population;
- some groups of individuals are excluded from the survey population but not
from the event population;
- the age range considered is different for survey and event registration;
- the sampling frame does not correspond to the general population;
- the sampling scheme is not taken adequately into account in the analysis;
- the survey non-respondents are not a randomly selected sub-sample of the
general population and the non-response rate is large for the total sample
or some age groups.
According to the information provided by the MONICA Collaborating Centres (MCC),
the population survey data and event registration data were collected in the
same geographical populations in all Reporting Units. The exceptions are those
MCCs where one of the data components was not collected at all, and Auckland
where the area of the initial survey population covered only 80% of the area of
the event registration population.
The sampling scheme has not been taken into account in the analyses so far,
and therefore it may bias the results in some populations. However, preliminary
investigation of the sampling schemes rules out major influences on the results
of data analysis. The exception may be, when the population used for analysis is
a combination of several sub-populations but the sampling was not planned for
the combination of the sub-populations. In such a case there may be a need for
weighting of the sub-populations. This situation is investigated in Section
12 of the current report.
The age range surveyed varies a little between the populations, depending on
how the age was defined for the sample selection. This has been investigated in
a separate report (3). The remaining aspects, which are
related to the quality of the sampling frame and survey non-response are
investigated in the present report.
Since the quality of the sampling frame and non-response are interrelated,
they are considered together: If the sampling frame is not accurate, it may
include a substantial number of subjects who actually do not belong to the
population. Often it cannot be confirmed that the subjects are not members of
the population, and then they will show up as non-respondents. If it can be
confirmed that such subjects do not belong to the population, they should be
defined as ineligible and they should be ignored when calculating participation
rates. The sampling fractions are considered together with the sampling frame
and non-response because all these investigations are based on the same data
sources.
The overall quality of data for the non-respondents is poor. A number of
possible explanations exist for this quality problem: i) the requirements for
non-respondent data collection were not properly defined and described in the
MONICA Manual until 1985; ii) there was a lack of statistical and/or
epidemiological input in the protocol design of many MCCs; iii) in many cases
the non-respondents were not prepared to provide any information and sometimes,
they were outright hostile.
The present quality report aims to assess objectively the results from each
individual survey and determine the trend between the surveys.
- Target population:
- The target population comprises all the individuals included in a study.
In MONICA, it is defined as the Reporting Units. It should be the population
from which event data are gathered and to which the survey data should
apply.
- Sampling frame:
- The sampling frame is the list of sampling units from which the sample is
selected. In a single-stage sampling scheme, the sampling frame ideally
represents the target population exactly. There are, however, reasons why
actual sampling frames often deviate from the ideal. For example, there are
always people dying or moving in and out of the target population, and
therefore the sampling frame is never fully up-to-date.
In multi-stage sampling each sampling stage has a separate sampling frame.
For example, the primary sampling frame may list the towns and villages of
the population area, and the second-stage frame lists the people of the
towns and villages. Usually there is no difficulty in getting a complete
primary-stage frame. However, the frames that list the people have similar
problems as the sampling frames of a single-stage sampling.
- Foreign elements:
- These are elements in the sampling frame that are not valid members of the
target population. In our context, foreign elements are, for example,
persons still listed in the sampling frame but no longer living in the area
under study (ineligibles).
- Ineligibles:
- Members of the sampling frame that are excluded from the survey by
definition (moved away or died between time of sampling and survey).
Ideally, the ineligibles are exactly the same as the foreign elements of the
sampling frame. In practice, however, the membership in the target
population cannot always be established for everybody selected to the
sample. According to the technical definition given in MONICA Manual (Part
III, Section 1, Subsection 3 of Reference 2),
selected sample are by default eligible. Ineligible are only those for which
the ineligibility criteria can be confirmed, i.e. if in doubt, he/she is
eligible.
- Non-respondents:
- All members of the sample set, except the ineligibles, for which no survey
data have been collected.
The report considers the Reporting Unit Aggregates (RUA) which are potential
candidates for units of analysis of the MONICA data. The RUAs, their
abbreviations and their Reporting Units (RU) are listed in Table
1.1. Some of the RUAs have several versions distinguished by suffix a and b.
Different combinations of RUs may be used for cross-sectional and trend
analysis. This is the case if not all RUs of the RUA were included in every
survey. Therefore, in AUS-PER, GER-BER, GER-BRE, GER-EGE, GER-KMS, GER-RDM,
RUS-MOI and RUS-NOC there is an overlap of reporting units included in the RUAs
in some surveys. For UNK-GLA, which carried out four surveys, the first (Ini),
third (Mid) and fourth (Fin) survey are considered. Altogether 54 RUAs are
considered for the initial survey, 43 for the middle and 41 for the final
survey.
All subjects for whom data are available were included in most analyses of
the current document. For some analyses the subjects were selected according to
their age and sex. In such cases the age and sex are specified in the respective
tables. Age-standardization was not used for the analyses of this report. When
data from the survey core data (Form 04) or survey non-respondent data (Form 08)
have been used, age has been defined as age in full years on the date of
examination (see DEF1 in Reference
3).
The data sources for this report are:
- the Sample Selection Descriptions (in unpublished document: MONICA Memo
50) which the MCCs had to complete in 1985.
- "Aggregate data":
- Tables Participation rate in the first MONICA survey
completed by the MCCs in 1988 (see Appendix 1a);
- Tables Sample size in the 2nd MONICA survey completed by
the MCCs in 1993, on which the MCCs reported the original sample size,
the eligible sample size, number of respondents and the participation
rate by sex and 10-year age group (see Appendix
1b);
- Tables Sample size in the final MONICA survey completed
by the MCCs in 1995 (Appendix 1c).
- "Individual data":
- Annual population demographic data (Form
A, see Reference 2).
- Other correspondence between the MONICA Data Centre (MDC) and the
individual MCCs.
We will consider here indicators of the quality of the sampling frames used
by the MCCs. For RUAs with multi-stage sampling, the frame used in the
primary-stage sampling is usually simple. Therefore, we will focus attention on
the critical sampling frames that list individual persons or households. The
data for and results of the assessment are presented in Table
2.
We used the following indicators of the quality of sampling frame:
- Source of the sampling frame:
- Population registers are usually relatively good sampling frames. In many
RUAs there exist no population registers. Therefore, population lists
compiled for other purposes must be used, such as public health service
registers or electoral rolls. There are usually different reasons why such
lists may contain too few or too many names. The sampling frames used in the
RUAs are listed in Table 1.2
- Age of sampling frame:
- We define the age of the sampling frame as the difference between the last
update of the sampling frame and the time when it was used to draw the
sample. This information is based in part on replies to question 10 in the
form Sample Selection Descriptions (MONICA Memo 50) and in part on
communication with individual MCCs. An "old" frame is likely to be
inaccurate because of migration and deaths in the target population after
the frame was updated. Theoretically, the age of a sampling frame has two
components, one for new elements to be entered, and one for foreign elements
to be deleted. If sampling frames are created de novo by counting, such as
at census, there is no difference between these two age components. However,
with sampling frames that represent lists that are updated at regular
intervals, entries may be made more promptly than deletions (or vice versa)
and as a result the two age components can be quite different. In one case,
it will result in incomplete coverage of the target population, in the other
in excessive foreign elements. We ignore any differences in the two age
components of sampling frames for the tabulations of this report. However,
we will refer to them in comments on individual RUAs, as needed.
- Proportion of ineligibles:
- The proportion of persons who were ineligible in the original sample
indicates how commonly the sampling frame includes identifiable foreign
elements. Theoretically we should also be interested in missing elements,
i.e. the proportion of the members of the target population who are not
included in the sampling frame, but such data are not available in MONICA.
The proportion of ineligibles is used as a general indicator of the accuracy
of the sampling frame.
- Proportion not possible to contact:
- The survey non-respondent data, which the MCCs should have provided for
every non-respondent, has an item labelled "reason of
non-response". One of the options, "not possible to
contact", refers to those with whom no contact could be made,
but no information was available to indicate that they are ineligible for
the sample. As some of the subjects in this category may actually be foreign
elements of the sampling frame, their proportion in the original sample may
also reflect the inaccuracy of the sampling frame. The proportions given in Table
2 are for those RUAs where the reason of non-response is available for
at least half of the non-respondents. (For more information about data item
"reason for non-response", see Table 10).
For the proportion in Table 2 the numerator was
calculated from the individual non-respondent data and the denominator from
the aggregate data provided by the MCCs.
Individually, none of these indicators, except the age of sampling frame, is
very specific for quality. However, based on all of them combined, and on
additional information available from the MCCs, a Sampling Frame Score
(SFS) was assigned to the RUAs, to indicate our current understanding of the
quality of the sampling frame. The score has the following values (the
respective mean proportion is calculated over all the surveys of a RUA):
| SFS = |
2 |
if there is no major concern about the sampling frame:
SFS not equal to 0
AND no change in sampling frame between surveys
AND mean proportion "not eligible" < 5%
AND mean proportion "not possible to contact" < 5%. |
|
1 |
if SFS not = 0 and SFS not = 2 |
|
0 |
if the sampling frame has major problems:
All proportions "not eligible" are missing
OR all proportions "not possible to contact" are missing
OR maximal value of proportion "not eligible" > 20%
OR maximal value of proportion "not possible to contact" >
20%. |
The score was "0" in 12 RUAs (CAN-HAL, GER-COT, GER-RDM, GER-RHN,
HUN-PEC, ISR-TEL, ITA-LAT, LTU-KAU, MLT-MLT, RUS-MOIb, RUS-NOCb, UNK-GLA).
No RUA changed the sampling frame between the initial and middle survey, but
two RUAs adopted new sampling frames for the final survey with some implications
for frame compatibility (FRA-LIL, FRA-STR).
It is assumed that all MCCs used the best available sampling frame at their
disposal. Therefore, a low sampling frame score does not usually indicate bad
performance by the MCC, but reflects local constraints. However, it still means
that the sample may be biased because of the poor quality of the sampling frame.
On the other hand, the score is not only a reflection of the quality of the
sampling frame but also depends to some extent on the efforts of the MCC.
Therefore, some of the MCCs with a score of zero probably used a good sampling
frame but failed to put enough effort into pursuing persons who were hard to
locate (e.g. GER-COT).
In the MONICA Project the eligibility of a subject selected to the
sample was defined at the time of sample selection. The MONICA Manual states
that "The individuals selected in the original sample who died or
moved out of the reporting unit area before the survey examination are called
non-eligibles". In addition, some RUAs have occasional subjects who
are ineligible because of a clerical error in the sampling frame. There have
been subjects whose gender was incorrect in the sampling frame, or whose age in
the sampling frame was inaccurate and out of the range for the survey.
For non-respondents a technical definition was given in order to
determine which data should be reported to the MDC for these subjects. The
definition was:
"A non-respondent is a person selected and eligible to the original
sample who could not be found or contacted or a person who did not provide
questionnaire data ...."
In most cases this technical definition gives a sensible estimate of the
availability of data. There are two situations where it is misleading:
- If there is a considerable number of subjects who provided the
questionnaire data but never attended the clinical examination, then this
definition does not provide useful information about the response rate for
the cholesterol and blood pressure measurements. This was the case in six
RUAs (see Section 8 and Table 7);
- When the sampling frame includes a large number of foreign elements and
there is no way to identify such individuals, the response rate will be
under-estimated, because subjects who are ineligible become classified as
non-respondents. Such a bias may be significant in CAN-HAL (Ini, Fin),
FRA-TOU, LTU-KAU (Fin), UNK-BEL, UNK-GLA (Ini).
The MCCs were first contacted in 1985 for the definition of eligibility,
which was applied when data were submitted to the MDC (Table
1.2). At that time, no clear information was received from most MCCs. One
possible explanation for this failure is that the MCCs did not define
eligibility in any systematic way when collecting the data. This might indicate
deficient organization during the initial survey, when more than half of the
MCCs had only data for fewer than 50% of non-respondents. In MONICA the
definition of eligibility was introduced in the Manual in 1985, and the wording
was clarified later because there was misunderstanding in some MCCs. The
introduction of the definition improved the situation and, therefore, more
information is available for the middle and final surveys. MCCs reported their
compliance with the eligibility definition of the Manual with the submission of
the aggregate data of the middle and final survey (see Appendix
1b and Appendix 1c). According to this
information there was a high degree of compliance with the Manual definition.
However, some of the information conflicted with other data available at the MDC
and it cannot be ruled out that the improved compliance is partly due to
erroneous data from the MCCs.
Among the RUAs for which the definition of eligibility is known, the
following had a different definition from the one given in the Manual or
provided conflicting information:
- AUS-NEW:
- In addition to MONICA Manual's requirements, the following were ruled as
ineligibles for the initial and middle survey: a) mentally handicapped, b)
too old for study when interviewed. According to the MONICA definition, the
mentally handicapped persons ought to be non-respondents due to medical
reasons. The final survey adhered apparently to the Manual's definition, but
this information is suspect since it was submitted together with a statement
that the initial and middle survey also followed the Manual's definition of
eligibility.
- BEL-LUX:
- The centre used an electoral list as sampling frame. Presumably, this is
the reason why non-citizens were treated as ineligible. Also, persons found
to be living elsewhere than the address of the electoral list were
designated as ineligible. It is not clear whether the same exclusion
criteria were applied to the event and demographic data.
- CAN-HAL:
- Persons in correctional institutions and persons unable to understand
English and without an interpreter were excluded from the survey. (The
language problem happened twice in the initial and 10 times in the final
survey; no information is provided about prevalence of incarceration). No
language exclusion was applied to the event and demographic data. However,
prisoners usually receive their medical care within the prison system and
are therefore automatically excluded from the event registration.
- FRA-STR:
- For the initial survey "collective households" (e.g. old-age
homes, monasteries or convents) are defined as ineligible in addition to the
persons specified in the Manual. According to the MCC, "collective
households" represent a negligible fraction of the total sample. The
MCC changed the sampling frame for the final survey. In a second reply to
the Sample Selection Description the MCC stated that the Manual's definition
of ineligibility has been broadened by including persons without French
citizenship and those not registered on the electoral roll. These additional
restrictions do not apply to the event and demographic data. From the
results of the initial survey it was estimated that there are about 4.5%
non-French citizens in the population.
- FRA-TOU:
- Persons in prison and foreigners were included in the category of
ineligibles. In the Sample Selection Description the MCC describes a scheme
of augmenting the sampling frame by lists of foreigners obtained from
various consulates. It is uncertain whether this scheme was ever implemented
and whether it was carried through to the middle and final survey. It is
also not clear how big a problem the omission of foreigners is, i.e. how
they impact on event and demographic data.
- GER-AUR and GER-AUU:
- The ineligibles included persons without German citizenship (estimated to
be about 2% in the age group > 50). These exclusion criteria do not apply
to the event and demographic data.
- GER-BRE:
- The ineligibles included persons without German citizenship. Such an
exclusion criterion was not applied to the event and demographic data.
- HUN-PEC:
- No sample size information is available for the initial survey. For the
middle survey the MCC was not able to examine the reasons for
non-attendance. As a result, all ineligibles were included in the group of
non-respondents.
- ISR-TEL:
- In the second stage of the sampling, apartments were selected. The
apartments were visited and the people living in them formed the sampling
frame of the final stage. The apartments whose tenants were absent at the
time of the visit were excluded from the sampling frame.
- ITA-LAT:
- Eligibility was restricted to Italian citizens. Ineligible were also
persons where there was a problem with the address.
- LTU-KAU:
- The initial and middle survey defined officers and their wives, as well as
persons who had changed addresses, as ineligible. Address changes only
qualify as ineligibles if it is known that the persons moved permanently
outside the target area. It is not clear whether the same eligibility
criteria were also used for the final survey and the event and demographic
data.
- NEZ-AUC:
- The initial survey declared Maoris and Polynesians as ineligible. It is
not known whether these individuals were also excluded from the final
survey. As reported by the MCC, the same exclusion rules did not apply to
the event and demographic data. However, local investigations have shown
that this is not a major problem.
A related problem concerns the difference in target population between the
various components of the MONICA Project (see comments in Section
7)
- POL-TAR:
- The "not eligible" included some subjects who were not possible
to contact, but their ineligibility could not be confirmed. According to the
MONICA definition, such subjects are non-respondents. It is not clear
whether the same definition was used for all three surveys.
- RUS-NOC and RUS-NOI:
- For all three surveys, imprisoned persons (estimated to be < 1% of the
population) and those away from home for more than one year for occupational
reasons were included in the group of ineligibles. These individuals were
not excluded from demographic data, but prisoners are excluded from event
registration.
- SWI-TIC and SWI-VAF:
- Severely ill or handicapped persons unable to attend the examination room
were classified as ineligibles. According to the MONICA definition, such
persons are non-respondents due to medical reasons.
- UNK-GLA:
- Before the sample was selected, the general practitioners (GP) excluded
from their list those whom they considered not suitable for the screening.
Otherwise the definition is as specified in the MONICA Manual. The category
of people excluded from the GPs' lists could be non-respondents (unable to
attend for medical reasons). However, in this case they were not sampled
because they were not included in the sampling frame. The calculation of the
participation rate, therefore, may be biased by the absence of these
individuals.
- USA-STA:
- Reason for ineligibility included severe illness, language problems and
"previously surveyed". The two latter reasons involved about 2%
each of the ineligibles during the middle and final survey. This numerical
information is not known for the initial survey. Also, it is not known how
the "language ineligibility" has been considered in the event
registration and demographic and mortality data.
One aspect that decreases the accuracy of the participation rate in many RUAs
is the uncertainty about the definitions used. Another is the inconsistency of
the participation rate information from different sources of data. We made two
comparisons between the following data sources: the serial number inventory data
(individual data), the sample size data reported by the MCC (aggregate data),
and survey respondent and non-respondent data received by MDC (individual data).
The results are summarised in Table 3 and Table
4.
The MCCs had to provide three survey data sets at the individual level:
- Serial number inventory data (Form 05) for everybody selected for the
original sample. In the data each subject was assigned a status:
- Status 1 = respondent
- Status 2 = non-respondent
- Status 3 = ineligible.
- Survey core data (Form 04) for every respondent, i.e. for those who had
status 1 in the serial number inventory data.
- Non-respondent data (Form 08) for every non-respondent.
No individual data for the ineligibles had to be submitted other than the
serial number inventory data. Therefore, we do not know the age and sex
distribution of the ineligibles.
The MDC had not received the serial number inventory data for two RUAs for
the initial survey (ITA-LAT, MLT-MLT) and one RUA for the final survey (RUS-NOCa).
Non-respondent data were missing for five RUAs of the initial survey (HUN-PEC,
MLT-MLT, NEZ-AUC, RUS-MOC and RUS-MOI), one RUA of the middle survey (GER-COT),
and three RUAs of the final survey (RUS-MOC, RUS-MOI, RUS-NOCa).
The following pairs of data sets should have equal numbers of records:
- inventory data with status 1 and survey core data for respondents;
- inventory data with status 2 and non-respondent data; and
- inventory data with status 3 and number of ineligibles.
The number of ineligibles is available at individual level only in the serial
number inventory data. Therefore, their consistency is compared with the
aggregate number of ineligibles reported by the MCCs.
The numbers of the three pairs are given in Table 3
for all three surveys. In addition, the table lists a summary quality score for
each individual survey and a quality trend for all surveys combined.
The Individual data agreement score (IDA) is defined as:
| IDA = |
2 |
if no discrepancies within pairs of sources of information. |
|
1 |
if small discrepancies within pairs; all three pairs differ
by less than + 20. |
|
0 |
if major discrepancies within pairs or unavailability of
some of the data; any pair differs by more than + 20. |
A Quality trend score (QTS) was assigned according to the
following rules:
| QTS = |
2 |
if improved or maintained high quality:
IDA = 2 for final survey
AND
there has been no decrease in IDA between successive surveys. |
|
1 |
if no quality fluctuations or no clear trend:
QTS is neither 0 nor 2. |
|
0 |
if deterioration or no improvement in poor quality:
IDA = 0 in final survey
OR
there has been a decrease between successive IDAs, but no increase. |
In the initial survey, 19/54 (35%) RUAs had score "2", 12/54 (22%)
had score "1" and 23/54 (43%) RUAs had score "0". In the
middle survey 23/43 (53%) RUAs had score "2", 10/43 (23%) had score
"1" and 10/43 (23%) had score "0". In the final survey, the
distribution of the scores was 21/41 (51%) for score" 2", 11/41 (27%)
for score "1", and 9/41 (22%) for score 0". The result indicates
that the data management for initial surveys in most MCCs was not reliable, but
it improved with the next two surveys. However, it is clearly not satisfactory
that in the final survey 9 RUAs still had a score of zero.
The trend score confirms that there was an improvement between surveys in 20
RUAs but 13 RUAs got even worse. The overall trend is poor in more than half of
the RUAs, an unsatisfactory situation.
We compared individual data and aggregate data that were reported by the MCCs
using the forms Participation rate in the initial MONICA population survey
(Appendix 1a), Sample size in the 2nd
MONICA survey (Appendix 1b) and Sample
size in the final MONICA survey (Appendix 1c).
Table 4 gives ratios where the aggregate data are in
the numerator and the individual data in denominator. For "ineligible"
the individual data are the serial number inventory data and the denominator
consists of the data in column "ineligible" of Table
3. For "respondents" the denominator is the survey core data (Form
04) and for "non-respondents" the non-respondent data (Form 08). All
ratios should be equal to 1.
The Individual and aggregate data agreement score (IAA) was
defined for the agreement between the two sources of data for respondents and
non-respondents. The ratio for "ineligibles" was not considered for
the score, because the same data were already evaluated in the score for serial
number inventory. The criteria for the IAA score are as follows:
| IAA = |
2 |
if both ratios are equal to 1.00. |
|
1 |
if both ratios are between 0.95 and 1.05 but at least one
of them is different from 1.00. |
|
0 |
if at least one of the ratios is less than 0.95 or more
than 1.05, or some of the data are missing |
The ratios for respondents are between 0.98 and 1.03 for most RUAs in the
initial survey, the exceptions being RUS-NOCb (1.33). For the middle survey, the
ratios extend from 0.94 to 1.02 with the exception of BER-BRE (0.92). For the
final survey the ratios lie between 0.98 and 1.02 with the exception of AUS-PER
(1.05), FRA-STR (1.13), GER-BREa (0.91) and GER-BREb (0.92). It indicates that,
with a few exceptions, the management of core data is reasonable. However, for
non-respondents there are major shortcomings in most RUAs, especially in the
initial survey, where the ratios range from 0.66 to 13.5.
A quality trend score was defined similar to QTS in Section
6.1. An improvement or maintenance of quality (trend score 2 or 1) was
observed for 37/47 RUAs (79%), indicating that for a significant number of RUAs
the accounting for the sample units got worse in later surveys.
Table 5.1 shows the participation rates for the
RUAs in age group 35-64. In accordance with the MONICA definition, these rates
are the proportion of eligibles that provided at least questionnaire
information. Item-response rates, especially for data items collected during
clinic visits, may be lower and are discussed in Section 8.
Two definitions for participation rates are employed:
- Definition A.
- The rate is calculated with the size of the eligible sample in the
denominator and the number of respondents (according to the technical
definition of the MONICA Manual, see Reference 2)
in the numerator.
- Definition B.
- The numerator is the same as for definition A. The denominator is defined
as the number of eligibles minus the number of non-respondents that were not
possible to be contacted. This definition is introduced to provide for those
MCCs where the sampling frame may have contained a large number of foreign
elements, whose status of eligibility could not be determined. In these
situations, a large difference exists between the rates according to the two
definitions. The relationship between participation rate according to
definition A (PRA) and definition B (PRB) is given by:
PRB = PRA/(1 - X(1-PRA)),
where X is the proportion of those not possible to contact (i.e. REASON=1)
among all non-respondent records. If X is missing, it is assumed to be 0 and
PRB = PRA.
Based on different data sources three participation rates are determined.
- By aggregate data:
- The first participation rate is based on the aggregate data provided by
the MCC.
- By individual data:
- The second participation rate is based on the individual level survey core
data (Form 04) and non-respondent data (Form 08).
- To be reported:
- The third participation rate is the one to be used when participation rate
is reported. It is normally identical to the rate calculated by aggregate
data, unless it is known that the aggregate data are in error. In that case,
the rates calculated by individual data will be reported.
In some MCCs a minor discrepancy between the participation rates "by
aggregate data" and "by individual data" can be explained by a
difference in the calculation of age. The following explanations concern RUAs
where an exception was made to the general reporting rule, i.e. there was a
major discrepancy between the participation rates calculated from different data
sources, or some other comment is needed:
- AUS-NEW
- Ini: The MCC does not know the original sample size or number of
ineligibles and thus, no aggregate data are available.
Mid: Note that a subgroup of the non-respondents was considered as
ineligible (see comment in Section 4).
- AUS-PER
- Mid: The large difference between the response rates "by aggregate
data" and "by individual data" in Table
5.1 is due to erroneous coding of age for non-respondents.
- BEL-CHA and BEL-GHE
- All three surveys: The rates refer to persons who were interviewed at home
and had provided the questionnaire data. However, only a fraction of them
attended the clinical examination. Therefore, the item-response rates for
blood pressure and total cholesterol are significantly lower (see Table
7).
The date of examination was not available for the middle survey
non-respondent data from Ghent. When calculating the age group of the
non-respondents, the date of examination was estimated as the middle of the
survey period.
The significantly lower participation rate "by individual data"
for the middle survey is a consequence of the discrepancy between
ineligibles "by aggregate data" and "by individual data"
(see Table 3).
- BEL-LUX
- Ini: It looks like non-respondent data (Form 08) have been received by MDC
also for those who are ineligible. Therefore, the rate "by individual
data" is probably too low, although no confirmation for this suggestion
has been received from the MCC.
- CAN-HAL
- Ini: The MCC has communicated that the non-respondents may include a large
number of subjects who are ineligible to the sample. This is because the
sampling frame is out of date. The proportion of non-respondents who were
not possible to contact was 53% in the original sample. A similar situation
exists for the final survey, although to a lesser extent.
- CHN-BEI
- Mid: The aggregate data are incorrect, since the number of non-respondents
includes only the fraction that was closely investigated.
- FRA-LIL
- Ini: There were additional 45 non-respondents without known age. They were
ignored for the rate calculations, since they influenced the results only
minimally.
- FRA-STR
- Ini: The sample was selected from age group 25-64. More accurate age
information for the subjects is available to the MONICA investigators only
for those who could be contacted. The following Table A,
reported by the MCC, gives the classification of the subjects in age group
25-64 (The numbers in brackets show the values computed from the individual
data. They are shown only when they are different from the values reported
by the MCC.):
Table A.
Participation rate data for FRA-STR
| |
Men |
Women |
Total |
| Respondents |
741 |
803 |
1544 |
| Non-respondents |
835 |
777 |
1612 |
| Non-respondents with age group known |
389
(390) |
407
(413) |
796
(803) |
| Ineligible |
|
|
258 |
| Participation rate A |
47.0% |
50.8% |
48.9% |
| Participation rate B |
|
|
same as rate A |
- GER-BRE
- Ini: The sample was selected from age group 25-69, and the accurate age is
known only for those who were contacted. Therefore, the participation rate
"by individual data" is incorrect. The participation rate "by
aggregate data" and "to be reported" concerns age group
25-69.
- GER-RHN
- Ini: MDC has not received an explanation for the large discrepancy between
participation rate "by aggregate data" and "by individual
data". Therefore, the rate "by individual data" is to be
reported, because the individual non-respondent data had been updated by the
MCC, but not the aggregate data.
- HUN-BUD
- Ini: The MCC has confirmed that the number of non-respondents reported
(400 for all ages) is correct. There are some reservations about the rates;
non-respondent data are available for a surprisingly high 95% of the
non-respondents (see Tables 9.1 and 9.2).
The way the high item-response rate for non-respondents was achieved is
unknown.
- HUN-PEC
- Ini: The MCC has no information about the original sample size or the
eligible sample size. Therefore, it is not possible to give any estimate of
the participation rate.
Mid: The MCC is unable to separate the non-respondents and the ineligibles.
Therefore, all that did not attend the examination were classified as
non-respondents.
- ISR-TEL
- Ini: The MDC has contradictory information. According to the Sample
Selection Description the sample selection and interview took place at
the same time. Only those who were at home at the time of the visit were
selected to the sample. However, according to the non-respondent data 82% of
the non-respondents were in the category "not possible to
contact".
- ITA-BRI
- Mid: The high item-response rate of 90% (see Tables 9.1
and 9.2) was achieved through an extraordinary
effort to reach non-respondents via telephone interviews.
- LTU-KAU
- Fin: 55% of the non-respondents were not possible to contact. The sample
was selected two years before the examination because the survey was
postponed after the sample selection. Furthermore, before the final survey
the country underwent major changes, and during that time no reliable
information on migration was available.
- MLT-MLT
- Serial number inventory data and non-respondent data are missing.
- NEZ-AUC
- Ini: The MDC has no non-respondent data for the RUA.
From the sampling frame it is only known whether a person is older than 18
years. Therefore, the age becomes known only when contact is made with the
person. The MCC estimated the sample size by multiplying the total number of
letters sent by the proportion of the population 18 years and over who were
aged 35-64 years.
Note also that the target population for the initial survey covers only
about 80% of the population for other data components (see Section
1). The MCC might consider restricting all data components to the same
common target area, or dividing the target population into two RUs, with one
of them being the area of the initial survey.
- POL-TAR
- Ini: The group of ineligibles includes some potential non-respondents (see
comments in Section 4). Therefore, values for the
participation rate are slightly inflated. However, the effect is small
(designating all ineligibles as non-respondents would lower the
participation rate by about 5%).
- ROM-BUC
- Ini: It appears that non-respondent data (Form 08) have been received by
MDC only for part of the non-respondents. Therefore, the rate "by
individual data" is probably too high, although the MCC has not
confirmed this suspicion.
- RUS-MOC and RUS-MOI
- Ini: The MCC is unable to provide the non-respondent data for RUS-MOI and
RUS-MOC for organizational reasons.
Fin: The MCC could not provide non-response data for RUS-MOC and RUS-MOIa
because of lack of resources.
- RUS-NOC
- Ini: Aggregate data are missing for RUS-NOCa (RUS-NOCa was created after
the initial survey by sub-dividing RUS-NOCb).
- SWI-TIC and SWI-VAF
- Ini and Mid: A subgroup of the non-respondents was considered as
ineligible (see comment in Section 4). The
participation rates may therefore be slightly inflated.
- UNK-BEL
- Ini: The response rate "by individual data" will be reported,
since the aggregate data have not been corrected for the status of some
ineligibles that should be respondents.
Mid: The MCC has indicated that the non-respondents may include a large
number of subjects who are ineligible for the sample. This is because the
sampling frame is out of date.
- UNK-GLA
- A subgroup of potential non-respondents was excluded from the sampling
frame (see comment in Section 4). The participation
rates may therefore be slightly inflated.
- USA-STA
- Ini: The participation rates are slightly inflated (by about 2%), since
households that could not be contacted have been included in the
ineligibles.
Mid and Fin: The individual age was not known for 157 non-respondents in the
middle survey and 83 non-respondents in the final survey. Therefore, as an
exception, the participation rates were calculated for the age group 25-64.
All three surveys: Also, some subjects were considered as ineligible because
of language problems (see comments in Section 4). As
far as could be determined, their effect on participation rates was minor.
- YUG-NOS
- Ini: Some individuals who completed the survey questionnaire but did not
attend the clinic were considered as non-respondents (should be
respondents). This causes a minor depression of the participation rates.
Mid: Surprisingly, there were no ineligibles during the middle survey,
whereas the other two surveys had ineligibles. No explanation for this
difference is available.
Table 5.2 summarises the change in participation
rates between the different surveys. The change is always calculated as the rate
for the later survey minus the rate for the earlier survey. Thus, negative
values indicate a decrease in participation with time. The average change
between initial and middle survey was -2.15%, between middle and final -2.14%,
and between initial and final -4.08%. These numbers confirm a trend of
decreasing participation rates that has also been observed in other surveys.
Table B summarises the changes in the participation
rates "to be reported/Definition A" between initial and middle, middle
and final, as well as between initial and final.
Table B. Summary
of the changes in the participation rate A "to be reported"
between surveys
| |
Number of RUAs in categories of change in response |
| |
<-20 |
-20...-10 |
-10...0 |
0...10 |
10...20 |
Mid-Ini
Fin-Mid
Fin-Ini |
0
0
1 |
3
2
5 |
21
18
20 |
16
17
12 |
0
0
1 |
Tables 6.1 and 6.2 give
the age specific participation rates for men and women respectively. The rates
are based on the "to be reported" selection and calculated for 10-year
age groups for each of the three surveys. The data show that there are
differences in participation rates between age groups and sexes, but the
patterns differ between RUAs. Only the fact that the subjects of the youngest
age group (25-34 years) are the most reluctant to participate seems to hold for
most of the RUAs, but there are several exceptions also to this rule. (The
clearest examples are ISR-TEL, RUS-NOI and FRA-LIL in the initial survey,
CZE-CZE and RUS-NOI in the middle survey, and RUS-NOI, BEL-CHA, GER-BRE and
YUG-NOS in the final survey).
The participation rate, as discussed in Section 7,
defines the rate with which contact with a consenting participant was
established. Successful contact with a respondent does not imply that all data
items stipulated by the MONICA protocol can be collected. Therefore, the
response rate for specific data items may be considerably lower than the rates
discussed in Section 7. A typical scenario that might lead
to reduced item-response rates for cholesterol measurement would be where the
respondent completes the questionnaire part of the data but refuses to attend a
separate clinic session for physiological data collection.
In Table 7, we compare the participation rate with
the item-response rates for blood pressure, cholesterol, BMI, and smoking for
each RUA and its respective surveys. Tables 8.1a
(systolic blood pressure, men), 8.1b (systolic blood
pressure, women), 8.2a (total cholesterol, men),
8.2b (total cholesterol, women), 8.3a
(BMI, men), 8.3b (BMI, women), 8.4a
(smoking, men) and 8.4b (smoking, women)
provided this information in a sex and age-group specific format. The
item-response rates were calculated as a product of the overall participation
rate and the proportion of respondent records for which data for the specific
variable were available.
In general, the item-response rates are quite close to the participation
rates. The exceptions are BEL-CHA (all surveys), BEL-GHE (all surveys), CAN-HAL
(Fin), ISR-TEL (Ini), SWE-GOT (Fin), and USA-STA (Ini and Fin). As far as it is
known, the above scenario applies to all of these RUAs. This assumption is also
supported by the item-response rate for smoking. Since smoking data were
collected at the time of the interview, item-response rate and participation
rate are very close.
There is little difference between the item-response rates for the four
variables. However there are also a few exceptions to this general observation.
Local idiosyncrasies in data collection are probably the explanation for
the fact that the item-response rates are only low for blood pressure and
cholesterol in ISR-TEL.
The MONICA Manual states:
"Even though it is not possible to get complete data required for the
core study for the non-respondents, the MCC should try to collect information
about their age, sex, marital status, education, smoking history and blood
pressure. The objective is to estimate the selection bias which non-response
inflicts on the core study. Age and sex are often known at the time of sample
selection. For other data, a telephone interview or a postal questionnaire can
be tried. It is recommended that all non-respondents are asked to provide
information. However, it is acceptable that only a random sample of
non-respondents is investigated in full. The reason for non-response should be
recorded in all cases."
"The survey non-respondent data (Form 08) should be submitted to the
MDC for every non-respondent, regardless of how much information was received
from the non-respondent. In most cases the MCC should at least elicit the age
group, sex and reason for non-response."
The instructions for collecting non-respondent data in MONICA were introduced
in 1985. Therefore, it is understandable that some of the MCCs, which did the
initial survey earlier, have not collected the data as required. However, it is
difficult to accept a situation where the MCC is not even able to list the
non-respondents, which points to a poor survey management.
Tables 9.1 and 9.2 show
for each RUA and the initial, middle and final survey the availability of the
non-respondent data items as a percentage of submitted non-respondent records.
In column 3 of Table 9.1, the correspondence between
aggregate and individual non-respondent counts is categorized as follows:
- SAME: This applies for RUAs where the denominator of the proportions
probably represents the true number of non-respondents.
- MORE: In this category the denominator probably includes also cases that
are ineligible for the sample. Therefore, the availability of the data may
actually be better than indicated in the table.
- LESS: For these RUAs we suspect that non-respondent data have not been
submitted for cases where few data are available. Therefore, the true
proportions may be smaller than indicated in the table.
- SAMPLE: In these RUAs only a sample of the non-respondents was
investigated. Therefore, a small proportion may actually indicate a very
large availability of data in the sample. The sampling procedures in these
populations were:
- BEL-LUX
- Initial survey: An attempt was made to investigate thoroughly a non-random
sample of 30% of the non-respondents. The sampling criterion is not known.
MDC probably also has non-respondent records for the ineligibles.
- CHN-BEI
- Initial survey: An attempt was made to investigate thoroughly a non-random
sample of 50% of the non-respondents.
Mid: An attempt was made to investigate thoroughly a non-random sample of
40% of the non-respondents.
- HUN-BUD
- Initial survey: A random sample of unknown size was attempted for thorough
investigation. This, however, is in conflict with the fact that the MDC has
a record for every non-respondent and detailed information is known for
about 95% of them.
- ITA-BRI
- Initial survey: A non-random sample of about 20% of the non-respondents
was attempted for thorough investigation. The sampling criteria are unclear,
and Tables 9.1 and 9.2 reveal that detailed data are available in 26% of the
non-respondents.
- POL-WAR
- Initial survey: A random sample of 20% of the non-respondents was
attempted for thorough investigation.
- ROM-BUC
- Initial survey: There is no information about the sampling, but the MDC
has a non-respondent record only for about 25% of the non-respondents.
Therefore, the proportions shown in Tables 9.1 and 9.2 are over-estimates.
- RUS-NOC and RUS-NOI
- Final survey: A non-random sample of about 25% of non-respondents with
equal number of individuals in each 10-year age group was investigated
thoroughly by home visits.
One may assume that the complete absence of data for the items Marital
Status to Weight in Table 9.2 is
an indication for a systematic neglect of investigation of non-respondents. This
then implies that in 21/54 (39%) RUAs in the initial survey, in 10/43 (23%) RUAs
in the middle survey, and in 7/41 (17%) RUAs in the final survey, no attempt was
made to investigate the non-respondents in more detail. In most of the cases
where such an attempt was made, detailed information is available for fewer than
50% of the non-respondents. Therefore, the utility of the non-respondent data is
very limited for many RUAs.
The reason for non-response may not be available from the RUAs where the
initial survey was started before this data item was introduced in 1985.
However, the reason for non-response should be available for every
non-respondent of every RUA in the middle and final surveys. Table
9.1 indicates that there is only a slight improvement in the availability of
this data item between surveys. In the initial survey the reason of non-response
was available for more than 80% non-respondent in 21/54 (39%) RUAs. In the
middle survey the number was 19/43 (44%), and in the final survey it was 21/41
(51%).
Table 10 gives the proportions of the different
reasons for non-response for the initial, middle, and final survey respectively.
Among the RUAs where the reason is known in more than 80% of the records, the
main reasons of non-response were "Not possible to contact",
"Not interested" and "Other refusal",
but the proportions varied remarkably between the RUAs. It is possible that the
difference between "Not interested" and "Other
refusal" has been interpreted differently between the MCCs, and even
between the two surveys within some RUAs. "Temporarily out of the
area" and "Medical reasons" were rare reasons
in nearly all RUAs. This scenario did not change much from survey to survey.
When repeated surveys are conducted in random samples of the same population,
as happens in MONICA, the sample mean values of the measurements can change for
many reasons:
- There is a change in the population mean values, either because of changes
in the individual persons' values or because of a change in the composition
of the population;
- There is a statistical error because we are measuring only a sample of the
population and many of the persons' values (like blood pressure and
cholesterol) have short term fluctuations;
- There is a change in the representation of the respondent sample, either
because of a bias in the sampling frame or because of non-response bias; or
- There is a measurement bias.
The objective of the MONICA Project is to estimate the changes due to the
first reason. The standard error of the estimates gives information about the
statistical error. The third and fourth reasons indicate bias, which we want to
avoid. The fourth reason is investigated in the quality assessment reports of
the individual risk factors, whereas the third reason is a topic of the current
report.
Estimates of population changes in situations where there should be very
little or no change in the actual population mean values can be used as
indicators of possible bias in the estimates. Under the assumption that the
MONICA participants aged 35 or older have essentially concluded their formal
education, a birth cohort (i.e. people born in certain years) should show no
change in the number of years of schooling from survey to survey, except for
randomness caused by sample selection. Similarly, birth cohorts in the age range
that is of interest to MONICA should not increase in mean body height, but
instead show a small decline as they age. Finally, birth cohorts are unlikely to
show a change in the proportion of never-smokers, unless there is a significant
selective mortality of smokers. These cohort trends have been investigated in
detail in the quality assessment reports (QA) on education (4),
weight and height ( 5) and smoking (6).
In each of the three quality assessment reports, Cohort trend scores
(CTS) were defined. The scores were based in the estimated changes and
their standard errors for men and women in the common age groups 35-44 and
45-54, in three steps:
- The average change (A) was calculated for each sex/birth cohort. This
average change was used as the reference value around which the random
variation
was expected to occur.
- Upper and lower limits were set for a change as A ± 2.5 SE, where SE is
the standard error of the estimated change. If a change is normally
distributed with mean A and variance SE2, then the probability that all four
changes (two birth cohorts and two sexes) are within the limits is 95%.
- The Cohort Trend Score (CTS) was defined as:
| CTS = |
2 |
if all four changes are within limits; |
|
1 |
if one of the four changes is out of limits; |
|
0 |
if at least two of the four changes are out of
limits. |
The CTS for body height is derived in Table 6
of the Weight and height QA (5). Similarly, CTS
for years of schooling can be found in Table 8
of the Education QA (4) and the CTS for
never-smokers in Table 18 of the Smoking QA
(6). The three CTS for each RUA are collected
in Table 11 of this report to provide a comprehensive
picture of birth cohort trends.
It is often difficult to assess from the data whether the changes in the
cohort trends are due to measurement bias, change in the target population or
change in sample representation. Therefore, the MCCs with low CTS were asked to
check possible reasons for the cohort changes in the quality assessment reports
for the three individual variables. A change in the target population or in the
survey representation is likely to induce changes in the cohorts for more than
one variable. A measurement bias in several variables is also possible but less
likely. Hence, we will confine our comments here to RUAs where more than one
variable has a low CTS for a particular survey pair:
- BEL-CHA
- Between the initial and the final survey there was a large increase in
body height and a decrease in years of schooling. Neither of these changes
is plausible, suggesting a change either in the target population or in the
survey participation. Furthermore, the fact that height and years of
schooling are usually positively correlated, strengthens the possibility of
measurement bias as an explanation.
- CHN-BEI
- A major change appears to have occurred between the initial and the middle
survey. All three variables have low CTS, which is due to a decrease in all
three variables. The observed changes provide strong evidence of changes in
either the target population or the survey participation.
- FRA-STR
- All three variables have low CTS for the Ini-Fin survey pair. The low
scores for body height and never-smokers are due to an increase in these
variables, which is incompatible with stable target population or survey
participation. The low score for years of schooling is due to an increase in
years of schooling. The MCC reported that the Census also found increased
schooling between 1982 and 1990. Nevertheless, the simultaneous occurrence
of three low CTS rather suggests a change in either target population or
survey participation.
- RUS-MOI
- For the Ini-Fin survey pair, years-of-schooling has a low CTS because of a
large increase in this variable. An increase in the proportion of
never-smokers also produced a low CTS. The increase in the years of
schooling is plausible, but the increase in the never-smokers is not. The
combination of the two scores thus is more likely to reflect a change in
target population or survey participation.
- RUS-NOI
- For the initial and final survey comparison a large decrease in body
height and a large increase in years of schooling caused low CTS for both
variables. These changes warrant a careful investigation of the stability of
the target population or participant characteristics.
- SPA-CAT
- Only between the middle and final survey occurred more than one low CTS.
This was the result of an increase in body height and a decrease in years of
schooling. Furthermore, years-of-schooling showed progressive decrease with
every survey, producing CTS of zero for all three comparisons. According to
the MCC, the decrease in years of schooling is the consequence of a change
in the target population as a result of immigration.
- UNK-BEL
- Body height and years-of-schooling produced low CTS for the Mid-Fin and
Ini-Fin comparison. The low scores for both comparisons were the result of
increases in the respective variables. While the increase in years of
schooling is plausible, the increase in body height is not. The combination
of the two observations, therefore, suggests that in the final survey a
change has occurred in either the target population or the survey
participation.
It should be pointed out that a high CTS is not a proof for unbiased survey
participation. However, it makes it less likely that target population or
participant characteristics have changed between surveys. As mentioned above, a
change in the characteristics of the target population is fully compatible with
the objectives of the MONICA Project and not a source of bias. (Nevertheless, if
caused by large migration, it is often associated with difficulties in obtaining
a representative sampling frame and accurate population estimates.) A change in
the characteristics of the survey participation, however, is a source of bias in
the estimation of changes in the risk factors in the population.
A sample is said to be self-weighting if every member of the
population has an equal probability of becoming selected to the sample. A simple
mean calculated from a self-weighting sample gives an unbiased estimate of the
population mean value. If the sample is not self-weighting, a weighted mean of
the sample will be needed to get unbiased estimates of the population mean
value. Therefore, the data analysis will be simpler for a self-weighting sample
than for a sample that is not self-weighting.
Typical situations, which may lead to unequal sampling probabilities of the
subjects, are stratified sampling and multi-stage sampling. Most of the MONICA
samples were stratified by 10-year age group, but this is not a problem because
the age or age-group will nearly always be taken into account in the data
analysis anyway. Within the age groups, nearly all, if not all, of the MONICA
samples were designed such that the subjects have approximately equal sampling
probabilities. There is, however, one situation where this is not necessarily
the case. The sample sizes in the different Reporting Units (RU) of the RUAs may
have been chosen to be equal even if the population sizes are not equal, or some
other criteria may have been used for the sample sizes. Therefore, there is
particular interest to check the sampling probabilities of the subjects of the
RUs of the RUAs that consist of more than one RU. There are nine such RUAs: AUS-NEWa,
CZE-CZEa, GER-EGEb, ICE-ICEa, ITA-FRIa, POL-TARa, POL-WARa, SWE-NSWa and USA-STAa.
A simple measure of the sampling probability is the sampling fraction,
defined as the proportion of the sample in the population. The sampling fraction
of the RUs of the nine RUAs by sex and 10-year age group is given in Table
12. The denominator for calculating sampling fractions was the population
size of the age/sex group of the RU in the calendar year of the middle of the
survey (or the nearest year for which the data were available). The eligible
sample size from the aggregate data was the numerator.
No weighting of the subjects will be needed if the sampling fractions are
approximately equal in the RUs within each age/sex group within each survey.
According to Table 12 this seems to be the case for
most of the RUAs. The exceptions are:
- ICE-ICE:
- The sampling fraction of RU2 (Arnes County) is consistently nearly 10
times the sampling fraction of RU1 (Reykjavik). The reason is that the
sample size was the same in Arnes County as in Reykjavik although Reykjavik
is nearly ten times as big as Arnes County. Arnes County was included in the
survey as a representative of the rest of the country (except Reykjavik). A
third of Iceland's population lives in Reykjavik. Unweighted sample mean
values would give a very high relative weight to RU2 compared with RU1, but
the relative weight would be similar for each of the three surveys. As a
consequence, unweighted estimates of trends in the mean values will
represent the trend of the RUA as a whole only if the trends in the
different RUAs are similar.
- USA-STA:
- The sampling fractions vary between about 1 and 3 between the populations
but remain consistent between the three surveys. Unweighted sample means
would give much higher relative weight to some reporting units compared with
others, but the relative weights would be similar for each of the three
surveys. As a consequence, unweighted estimates of trends in the mean values
will represent the trend of the RUA as a whole only if the trends in the
different RUAs are similar.
- GER-EGEb:
- The sampling fractions of RU19 are twice the sampling fractions of RU17 in
the initial survey but the sampling fractions are equal in the final survey.
Unweighted sample mean values would give equal relative weights to the two
RUs in the final survey but not in the initial or the middle survey. As a
consequence, if there is a major difference in the risk factor levels
between the two RUs, unweighted estimates of trends in the mean values will
not represent the trend of the RUA as a whole even if the trends in the
different RUAs were similar.
Weighting the RUAs in analysis is a solution to these problems, but will
complicate all analyses essentially. If we can assume that each of these three
RUAs are relatively homogenous, no weighting will be needed.
It is not very likely that unweighted analysis will have a major impact in
the cases of ICE-ICE and USA STA. For GER-EGE the situation seems more complex.
To see the potential effect of neglecting the weighting in GER-EGE, Table C
gives an example for risk factor differences which are probably much larger than
those actually observed for the reporting units.
Table C. Influence of neglecting
weighting when estimating change in risk factor mean value in a
hypothetical situation
| Variable |
RU |
Sampling fraction |
Proportion or mean value |
Estimate of change |
| Ini |
Fin |
Ini |
Mid |
Weighted |
Unweighted |
| Smoking |
A
B |
1
2 |
2
2 |
30 %
40 % |
30 %
40 % |
0 % |
-1.7 % |
| Systolic blood pressure |
A
B |
1
2 |
2
2 |
130 mmHg
140 mmHg |
130 mmHg
140 mmHg |
0 mmHg |
-1.7 mmHg |
| Total cholesterol |
A
B |
1
2 |
2
2 |
5.4 mmol/l
6.0 mmol/l |
5.4 mmol/l
6.0 mmol/l |
0 mmol/l |
-0.1 mmol/l |
The biases given in the rightmost column of Table C are relatively small
compared with the achievable measurement accuracy of the risk factors. As the
differences in the risk factor levels between the RUs in Table C are probably
also much larger than the real differences between the RUs of GER-EGE, one
should feel quite comfortable in estimating the trend without weighting even for
GER-EGE.
This report was an attempt to summarise the situation concerning the quality
of sampling frames and survey non-respondents in the MONICA risk factor surveys.
We can draw some general conclusions from the findings.
There were big differences in the availability of good sampling frames among
the RUAs. Only about 40% of the sampling frames were found to be of good
quality, and 25% were clearly of poor quality. There still exists a lot of
uncertainty about the specific properties of the sampling frames in the various
RUAs. For example, it is not clear whether all population registers can be
considered equal in terms of being up-to-date. Also, there appear to be
significant differences in the quality of electoral rolls. We therefore feel
that our use of proportion of ineligibles and proportion of
not located for quality assessment was probably the best choice under the
circumstances.
Many MCCs did not report the exact definition of eligibility used in their
initial survey. In the middle and final surveys, most MCCs defined eligibility
to the survey according to the instructions given in the MONICA Manual. However,
such assertions are often suspect since they conflict with other data available
at the MDC. If definition of eligibility changes, the comparison of response
rates between different surveys becomes difficult. In RUAs where the proportion
of ineligibles is high, there is a great risk of bias, if many ineligibles were
so classified by mistake and if they were exceptional with respect to the risk
factors for cardiovascular diseases. Such biases could have a noticeable
influence on the estimates of risk factor trends. We suggest that data weighting
should be considered for RUAs, where eligibility changed or where major
discrepancies between individual and aggregate data (Table
4) exist.
If the response rate exceeds 80%, we can be quite confident in applying the
results of the survey to the whole RUA, provided that the quality of the data is
otherwise good. Only one third of the RUAs in the initial survey, one fifth of
the RUAs in the middle survey, and one sixth of the RUAs in the final survey
reached that level. The 70% limit, which we might still consider satisfactory,
was exceeded by two thirds of the RUAs in the initial survey, three-quarters in
the middle survey, and two thirds again in the final survey. The fact that a
noticeable number of RUAs remained below 70% and the extreme ones even below 50%
is a concern. It may be erroneous to assume that the risk factor changes, which
are observed in these surveys, reflect the situation in the population.
The data available for this report do not provide hard evidence for the
representativeness of the respondents. The crucial problem is the low
availability of information about the non-respondents, a problem shared with
most other surveys. Even if more data were available, they would have to be
treated with caution, in comparison to the data about the respondents. The
conclusions one could draw from the non-respondent data would hardly be more
than qualitative. Nevertheless, such data would help in understanding the full
risk factor profile and trends in the population. Perhaps the next step for
investigating the non-respondents in more detail has to be taken locally by the
MCCs. The MCCs have the best knowledge of the available sampling frames, the
procedures used to achieve high response rates, the procedures used to
investigate the non-respondents, and possibly other information which helps to
characterise the non-respondents.
Suggestions for conducting such investigations are:
- If the level of education of the target population is available from
census data, it can be compared with the level of education in the MONICA
sample. Note, however, that the questions for establishing education levels
have to be comparable between census and MONICA surveys.
- Evaluate the changes for "Reason of non-response" between
surveys.
The following list includes only the RUAs with specific findings or
exceptional background information relevant for the use of the data.
AUS-NEW
- Conflicting information on survey eligibility (Sample Size of Final
MONICA Survey: All surveys used MONICA Manual definitions, Sample
Size of 2nd MONICA Survey: Mentally handicapped were excluded).
AUS-PER
- Erroneous coding of age for non-respondents limits the use of
non-respondent data of the middle survey.
BEL-CHA
- The participation rate is much higher than the item-response rates for
blood pressure and total cholesterol. This is because the respondents were
interviewed at home and many of them never came to the clinical examination.
The MCC has found that the respondents, who answered the questionnaire only
and did not attend the clinic, had a significantly higher prevalence of
smoking, but not of antecedents of hypercholesterolaemia.
- Cohort analysis shows large changes in body height and years of schooling
between initial
and final survey, suggesting measurement and/or participation bias.
BEL-GHE
- The participation rate is much higher than the item-response rates for
blood pressure and total cholesterol. This is because the respondents were
interviewed at home and many of them never came to the clinical examination.
No estimates are available on the possible bias between respondents who
answered questionnaires only and those who also visited the clinic
BEL-LUX
- Survey excluded persons on electoral list but living elsewhere. The
frequency of this is unknown.
CAN-HAL
- Prisoners were excluded from the survey and event data, but not the
demographic data.
FRA-LIL
- The increase in the estimated schooling level between initial and final
survey in women is probably true, since a cohort analysis of census data
yields similar results.
- The implications of the change in the sampling frame between the initial
and final survey are unknown.
FRA-STR
- Foreigners are excluded from the final survey, but not from the event and
demographic data. They represent about 4.5% of the population.
- The simultaneous significant cohort trends in body height,
proportion of never-smokers, and education levels are very suggestive of a
response bias, although the census data support a possible increase in
education levels.
- The implications of the change in the sampling frame between the initial
and final survey are unknown.
FRA-TOU
- Persons in prison and foreigners were included in the category of
ineligibles. In the Sample Selection Description the MCC describes a scheme
of augmenting the sampling frame by lists of foreigners obtained from
various consulates. It is uncertain whether this scheme was ever implemented
and whether it was carried through to the middle and final survey. It is
also not clear how big a problem the omission of foreigners is, i.e. how
they impact on event and demographic data.
- The increase in the estimated schooling level between initial and final
survey in men may represent a response bias.
GER-AUR and GER-AUU
- The surveys excluded non-Germans (about 2% of population >50), but this
group is not excluded from event and demographic data.
GER-BER and GER-COT and GER-EGE and GER-HAC and GER-KMS and GER-RDM
- The National X-ray register was used for the initial survey and was
regularly updated from the population register. For all practical purposes,
the two were identical. For the middle and final survey, the population
register was used.
- The increased level of schooling could be due to a larger participation of
women in the "Open University". However, only anecdotal data are
available to support such an explanation.
GER-BRE
- Participation rates apply to the age group 25-69.
- German citizenship was an eligibility requirement for the surveys. Such an
exclusion criterion was not applied to the event and demographic data.
GER-RHN
- The large difference between participation rate by aggregate and
individual data is unexplained. Therefore, the response rate reported may be
incorrect.
HUN-BUD
- There are some reservations about the rates; non-respondent data are
available for a surprisingly high 95% of the non-respondents (see Tables 9.1
and 9.2). The way the high item response rate for non-respondents achieved
is unknown.
ISR-TEL
- According to the Sample Selection Description the sample selection and
interview took place at the same time. Only those who were at home at the
time of the visit were selected to the sample. This is in contradiction with
the non-respondent data where 82% of the non-respondents were in the
category "not possible to contact".
ITA-BRI
- There is an item response rate of more than 90% of the non-respondent data
in the middle survey. The MCC explains this by very intensive effort to
contact the non-respondents by telephone.
ITA-FRI
- According to the MCC, there are no ineligibles in the final survey because
the sample was selected just before the survey from the computerized
Regional Health Rolls, which are being updated continuously.
- The minor discrepancy in the final survey between the number of
non-respondents in the aggregate data and in the individual data (see Tables
4 and 9.1) has been
clarified by the MCC after the MDC database was closed for the final
analyses of the MONICA Project: The individual data are correct. The only
significant implication of the discrepancy is code "0" instead of
code "2" for the "Trend in Quality" in Table
4.
ITA-LAT
- Individuals with address problems were ineligible for the survey.
LTU-KAU
- Persons who had changed their residence without giving a new address to
the address bureau were classified as ineligible in the final survey.
- The MCC explains the large increase in ineligibles in the final survey by
the fact that the sample was selected two years before the examination
because the survey was postponed after the sample selection. Furthermore,
during the time before the final survey the country underwent major changes
and no reliable information on migration was available.
MLT-MLT
- Serial number inventory and non-respondent data are not available.
NEZ-AUC
- The target population of the initial survey covers only 80% of the
population for event and demographic data. Maoris and Polynesians were
excluded from the initial survey, but not from the event and demographic
data. The MCC found that the difference was not a major problem. It is not
known whether Maoris and Polynesians were also excluded from the final
survey.
ROM-BUC
- There are non-respondent records missing or the aggregate data are
incorrect.
RUS-MOC and RUS-MOI
- The estimated increase in schooling level between initial and final survey
in men and women may represent a response bias.
RUS-NOC and RUS-NOI
- There were additional eligibility criteria described in the response to
the Sample Selection Description (imprisonment, away for occupational
reasons), which were applied to all three surveys. These exclusion criteria
do not apply to demographic data, but prisoners (estimated <1% of
population) are excluded from event registration.
- The estimated increase in schooling level between initial and final survey
in men may represent a response bias.
- The serial number inventory is missing for RUS-NOCa (Fin).
SPA-CAT
- Immigration between the surveys altered the target population leading to a
lowering of years of schooling and possibly other changes.
- The dates of birth of three non-respondents in the initial survey have
been corrected in the MONICA database after the preparation of this
document.
UNK-BEL
- The estimated increase in years of schooling and body height between
initial and final survey may represent a response bias.
UNK-GLA
- The participation rate may be biased since GPs excluded from the sampling
frame persons whom they considered unsuitable for the survey.
USA-STA
- The MCC thinks that the frequency of exclusion due to language during the
first survey is less than 30 but they have no numeric proof.
YUG-NOS
- For the initial and middle survey no ineligibles were reported. However,
the final survey had a significant number of them. No explanation for this
difference is available.
- Tunstall-Pedoe H for the WHO MONICA Project.
The World Health Organization MONICA Project (Monitoring Trends and
Determinants in Cardiovascular Disease): A major international
collaboration. J Clin Epidemiol 1988;41:105-14.
- WHO MONICA Project. MONICA Manual. (1998-1999).
Available from: URL:http://www.ktl.fi/publications/monica/manual/index.htm,
URN:NBN:fi-fe19981146.
- Kuulasmaa K, Tolonen H, Ferrario M, Ruokokoski E for
the WHO MONICA Project. Age, date of examination and survey periods in the
MONICA surveys. (May 1998). Available from: URL:http://www.ktl.fi/publications/monica/age/ageqa.htm,
URN:NBN:fi-fe19991075.
- Molarius A, Kuulasmaa K, Moltchanov V, Ferrario M for
the WHO MONICA Project. Quality Assessment of Data on Marital Status and
Educational Achievement in the WHO MONICA Project. (December 1998).
Available from: URL:http://www.ktl.fi/publications/monica/educ/educqa.htm,
URN:NBN:fi-fe19991078.
- Molarius A, Kuulasmaa K, Sans S for the WHO MONICA
Project. Quality assessment of weight and height measurements in the WHO
MONICA Project. (May 1998). Available from: URL:http://www.ktl.fi/publications/monica/bmi/bmiqa20.htm,
URN:NBN:fi-fe19991079.
- Molarius A, Kuulasmaa K, Evans A, McCrum E, Tolonen H
for the WHO MONICA Project. Quality assessment of data on smoking behaviour
in the WHO MONICA Project. (February 1999). Available from: URL:http://www.ktl.fi/publications/monica/smoking/qa30.htm,
URN:NBN:fi-fe19991077.