WWW-publications from
the WHO MONICA Project
Participation Rates, Quality of Sampling Frames and Sampling Fractions
in the MONICA Surveys
September 1998
Hermann K. Wolf1, Kari Kuulasmaa2, Hanna Tolonen2
and Esa Ruokokoski2 for the WHO MONICA Project3
1 Department of Physiology and Biophysics, Dalhousie University, Halifax,
Canada
2 MONICA Data Centre, National Public Health Institute, Helsinki, Finland
3 Annex: Sites and key personnel of the WHO MONICA
Project
© Copyright World Health Organization (WHO) and the WHO MONICA Project investigators
1999. All rights reserved.
- Copyright notice
- Document identification:
- URL:http://www.ktl.fi/publications/monica/nonres/nonres.htm
- URN:NBN:fi-fe19991076
This document includes the main findings of unpublished report:
- Kuulasmaa K, Sans S, Molarius A, Koivisto A-M, Moltchanov V for the WHO MONICA Project.
Participation rate and the quality of sampling frame in the first and second risk factor
surveys of the WHO MONICA Project. MONICA Memo 259A, March 1994.
Acknowledgements
The MONICA Centres are funded predominantly by regional and national governments,
research councils, and research charities. Coordination is the responsibility of the World
Health Organization (WHO), assisted by local fund raising for congresses and workshops.
WHO also supports the MONICA Data Centre (MDC) in Helsinki. Not covered by this general
description is the ongoing generous support of the MDC by the National Public Health
Institute of Finland, and a contribution to WHO from the National Heart, Lung, and Blood
Institute, National Institutes of Health, Bethesda, Maryland, USA for support of the MDC.
The completion of the MONICA Project is generously assisted through a Concerted Action
Grant from the European Community. Likewise appreciated are grants from ASTRA Hässle AB,
Sweden, Hoechst AG, Germany, Hoffmann-La Roche AG, Switzerland, the Institut de Recherches
Internationales Servier (IRIS), France, and Merck & Co. Inc., New Jersey, USA, to
support data analysis and preparation of publications.
Contents
The purpose of the MONICA risk factor surveys (1, 2) is to estimate the cardiovascular risk factor distribution in the
study populations. When the main hypotheses are tested, the risk factor trends will be
related to the trends of cardiovascular disease rates in the same populations. Therefore,
it is necessary that the risk factor data and the event registration data refer to the
same population. Apart from the quality of the measurements, the risk factor data may be
biased for the following reasons:
- the survey population is geographically different from the event registration
population;
- some groups of individuals are excluded from the survey population but not from the
event population;
- the age range considered is different for survey and event registration;
- the sampling frame does not correspond to the general population;
- the sampling scheme is not taken adequately into account in the analysis;
- the survey non-respondents are not a randomly selected sub-sample of the general
population and the non-response rate is large for the total sample or some age groups.
According to the information provided by the MONICA Collaborating Centres (MCC), the
population survey data and event registration data were collected in the same geographical
populations in all Reporting Units. The exceptions are those MCCs where one of the data
components was not collected at all, and Auckland where the area of the initial survey
population covered only 80% of the area of the event registration population.
The sampling scheme has not been taken into account in the analyses so far, and
therefore it may bias the results in some populations. However, preliminary investigation
of the sampling schemes rules out major influences on the results of data analysis. The
exception may be, when the population used for analysis is a combination of several
sub-populations but the sampling was not planned for the combination of the
sub-populations. In such a case there may be a need for weighting of the sub-populations.
This situation is investigated in Section 12 of the
current report.
The age range surveyed varies a little between the populations, depending on how the
age was defined for the sample selection. This has been investigated in a separate report
(3). The remaining aspects, which are related to the quality of the
sampling frame and survey non-response are investigated in the present report.
Since the quality of the sampling frame and non-response are interrelated, they are
considered together: If the sampling frame is not accurate, it may include a substantial
number of subjects who actually do not belong to the population. Often it cannot be
confirmed that the subjects are not members of the population, and then they will show up
as non-respondents. If it can be confirmed that such subjects do not belong to the
population, they should be defined as ineligible and they should be ignored when
calculating participation rates. The sampling fractions are considered together with the
sampling frame and non-response because all these investigations are based on the same
data sources.
The overall quality of data for the non-respondents is poor. A number of possible
explanations exist for this quality problem: i) the requirements for non-respondent data
collection were not properly defined and described in the MONICA Manual until 1985; ii)
there was a lack of statistical and/or epidemiological input in the protocol design of
many MCCs; iii) in many cases the non-respondents were not prepared to provide any
information and sometimes, they were outright hostile.
The present quality report aims to assess objectively the results from each individual
survey and determine the trend between the surveys.
- Target population:
- The target population comprises all the individuals included in a study. In MONICA, it
is defined as the Reporting Units. It should be the population from which event data are
gathered and to which the survey data should apply.
- Sampling frame:
- The sampling frame is the list of sampling units from which the sample is selected. In a
single-stage sampling scheme, the sampling frame ideally represents the target population
exactly. There are, however, reasons why actual sampling frames often deviate from the
ideal. For example, there are always people dying or moving in and out of the target
population, and therefore the sampling frame is never fully up-to-date.
In multi-stage sampling each sampling stage has a separate sampling frame. For example,
the primary sampling frame may list the towns and villages of the population area, and the
second-stage frame lists the people of the towns and villages. Usually there is no
difficulty in getting a complete primary-stage frame. However, the frames that list the
people have similar problems as the sampling frames of a single-stage sampling.
- Foreign elements:
- These are elements in the sampling frame that are not valid members of the target
population. In our context, foreign elements are, for example, persons still listed in the
sampling frame but no longer living in the area under study (ineligibles).
- Ineligibles:
- Members of the sampling frame that are excluded from the survey by definition (moved
away or died between time of sampling and survey). Ideally, the ineligibles are exactly
the same as the foreign elements of the sampling frame. In practice, however, the
membership in the target population cannot always be established for everybody selected to
the sample. According to the technical definition given in MONICA Manual (Part III, Section 1, Subsection 3 of Reference 2), selected sample are by default eligible. Ineligible are
only those for which the ineligibility criteria can be confirmed, i.e. if in doubt, he/she
is eligible.
- Non-respondents:
- All members of the sample set, except the ineligibles, for which no survey data have
been collected.
The report considers the Reporting Unit Aggregates (RUA) which are potential candidates
for units of analysis of the MONICA data. The RUAs, their abbreviations and their
Reporting Units (RU) are listed in Table 1.1. Some of the RUAs
have several versions distinguished by suffix a and b. Different combinations of RUs may
be used for cross-sectional and trend analysis. This is the case if not all RUs of the RUA
were included in every survey. Therefore, in AUS-PER, GER-BER, GER-BRE, GER-EGE, GER-KMS,
GER-RDM, RUS-MOI and RUS-NOC there is an overlap of reporting units included in the RUAs
in some surveys. For UNK-GLA, which carried out four surveys, the first (Ini), third (Mid)
and fourth (Fin) survey are considered. Altogether 54 RUAs are considered for the initial
survey, 43 for the middle and 41 for the final survey.
All subjects for whom data are available were included in most analyses of the current
document. For some analyses the subjects were selected according to their age and sex. In
such cases the age and sex are specified in the respective tables. Age-standardization was
not used for the analyses of this report. When data from the survey core data (Form 04) or
survey non-respondent data (Form 08) have been used, age has been defined as age in full
years on the date of examination (see DEF1 in Reference 3).
The data sources for this report are:
- the Sample Selection Descriptions (in unpublished document: MONICA Memo 50) which the
MCCs had to complete in 1985.
- "Aggregate data":
- Tables Participation rate in the first MONICA survey completed by the MCCs
in 1988 (see Appendix 1a);
- Tables Sample size in the 2nd MONICA survey completed by the MCCs in 1993,
on which the MCCs reported the original sample size, the eligible sample size, number of
respondents and the participation rate by sex and 10-year age group (see Appendix 1b);
- Tables Sample size in the final MONICA survey completed by the MCCs in 1995
(Appendix 1c).
- "Individual data":
- Annual population demographic data (Form A, see
Reference 2).
- Other correspondence between the MONICA Data Centre (MDC) and the individual MCCs.
We will consider here indicators of the quality of the sampling frames used by the
MCCs. For RUAs with multi-stage sampling, the frame used in the primary-stage sampling is
usually simple. Therefore, we will focus attention on the critical sampling frames that
list individual persons or households. The data for and results of the assessment are
presented in Table 2.
We used the following indicators of the quality of sampling frame:
- Source of the sampling frame:
- Population registers are usually relatively good sampling frames. In many RUAs there
exist no population registers. Therefore, population lists compiled for other purposes
must be used, such as public health service registers or electoral rolls. There are
usually different reasons why such lists may contain too few or too many names. The
sampling frames used in the RUAs are listed in Table 1.2
- Age of sampling frame:
- We define the age of the sampling frame as the difference between the last update of the
sampling frame and the time when it was used to draw the sample. This information is based
in part on replies to question 10 in the form Sample Selection Descriptions (MONICA Memo
50) and in part on communication with individual MCCs. An "old" frame is likely
to be inaccurate because of migration and deaths in the target population after the frame
was updated. Theoretically, the age of a sampling frame has two components, one for new
elements to be entered, and one for foreign elements to be deleted. If sampling frames are
created de novo by counting, such as at census, there is no difference between these two
age components. However, with sampling frames that represent lists that are updated at
regular intervals, entries may be made more promptly than deletions (or vice versa) and as
a result the two age components can be quite different. In one case, it will result in
incomplete coverage of the target population, in the other in excessive foreign elements.
We ignore any differences in the two age components of sampling frames for the tabulations
of this report. However, we will refer to them in comments on individual RUAs, as needed.
- Proportion of ineligibles:
- The proportion of persons who were ineligible in the original sample indicates how
commonly the sampling frame includes identifiable foreign elements. Theoretically we
should also be interested in missing elements, i.e. the proportion of the members of the
target population who are not included in the sampling frame, but such data are not
available in MONICA. The proportion of ineligibles is used as a general indicator of the
accuracy of the sampling frame.
- Proportion not possible to contact:
- The survey non-respondent data, which the MCCs should have provided for every
non-respondent, has an item labelled "reason of non-response". One
of the options, "not possible to contact", refers to those with
whom no contact could be made, but no information was available to indicate that they are
ineligible for the sample. As some of the subjects in this category may actually be
foreign elements of the sampling frame, their proportion in the original sample may also
reflect the inaccuracy of the sampling frame. The proportions given in Table 2 are for those RUAs where the reason of non-response is
available for at least half of the non-respondents. (For more information about data item
"reason for non-response", see Table 10). For the
proportion in Table 2 the numerator was calculated from the
individual non-respondent data and the denominator from the aggregate data provided by the
MCCs.
Individually, none of these indicators, except the age of sampling frame, is very
specific for quality. However, based on all of them combined, and on additional
information available from the MCCs, a Sampling Frame Score (SFS) was assigned
to the RUAs, to indicate our current understanding of the quality of the sampling frame.
The score has the following values (the respective mean proportion is calculated over all
the surveys of a RUA):
| SFS = |
2 |
if there is no major concern about the sampling frame:
SFS not equal to 0
AND no change in sampling frame between surveys
AND mean proportion "not eligible" < 5%
AND mean proportion "not possible to contact" < 5%. |
|
1 |
if SFS not = 0 and SFS not = 2 |
|
0 |
if the sampling frame has major problems:
All proportions "not eligible" are missing
OR all proportions "not possible to contact" are missing
OR maximal value of proportion "not eligible" > 20%
OR maximal value of proportion "not possible to contact" > 20%. |
The score was "0" in 12 RUAs (CAN-HAL, GER-COT, GER-RDM, GER-RHN, HUN-PEC,
ISR-TEL, ITA-LAT, LTU-KAU, MLT-MLT, RUS-MOIb, RUS-NOCb, UNK-GLA).
No RUA changed the sampling frame between the initial and middle survey, but two RUAs
adopted new sampling frames for the final survey with some implications for frame
compatibility (FRA-LIL, FRA-STR).
It is assumed that all MCCs used the best available sampling frame at their disposal.
Therefore, a low sampling frame score does not usually indicate bad performance by the
MCC, but reflects local constraints. However, it still means that the sample may be biased
because of the poor quality of the sampling frame. On the other hand, the score is not
only a reflection of the quality of the sampling frame but also depends to some extent on
the efforts of the MCC. Therefore, some of the MCCs with a score of zero probably used a
good sampling frame but failed to put enough effort into pursuing persons who were hard to
locate (e.g. GER-COT).
In the MONICA Project the eligibility of a subject selected to the sample was
defined at the time of sample selection. The MONICA Manual states that "The
individuals selected in the original sample who died or moved out of the reporting unit
area before the survey examination are called non-eligibles". In addition,
some RUAs have occasional subjects who are ineligible because of a clerical error in the
sampling frame. There have been subjects whose gender was incorrect in the sampling frame,
or whose age in the sampling frame was inaccurate and out of the range for the
survey.
For non-respondents a technical definition was given in order to determine
which data should be reported to the MDC for these subjects. The definition was:
"A non-respondent is a person selected and eligible to the original sample who
could not be found or contacted or a person who did not provide questionnaire data
...."
In most cases this technical definition gives a sensible estimate of the availability
of data. There are two situations where it is misleading:
- If there is a considerable number of subjects who provided the questionnaire data but
never attended the clinical examination, then this definition does not provide useful
information about the response rate for the cholesterol and blood pressure measurements.
This was the case in six RUAs (see Section 8 and Table 7);
- When the sampling frame includes a large number of foreign elements and there is no way
to identify such individuals, the response rate will be under-estimated, because subjects
who are ineligible become classified as non-respondents. Such a bias may be significant in
CAN-HAL (Ini, Fin), FRA-TOU, LTU-KAU (Fin), UNK-BEL, UNK-GLA (Ini).
The MCCs were first contacted in 1985 for the definition of eligibility, which was
applied when data were submitted to the MDC (Table 1.2). At
that time, no clear information was received from most MCCs. One possible explanation for
this failure is that the MCCs did not define eligibility in any systematic way when
collecting the data. This might indicate deficient organization during the initial survey,
when more than half of the MCCs had only data for fewer than 50% of non-respondents. In
MONICA the definition of eligibility was introduced in the Manual in 1985, and the wording
was clarified later because there was misunderstanding in some MCCs. The introduction of
the definition improved the situation and, therefore, more information is available for
the middle and final surveys. MCCs reported their compliance with the eligibility
definition of the Manual with the submission of the aggregate data of the middle and final
survey (see Appendix 1b and Appendix
1c). According to this information there was a high degree of compliance with the
Manual definition. However, some of the information conflicted with other data available
at the MDC and it cannot be ruled out that the improved compliance is partly due to
erroneous data from the MCCs.
Among the RUAs for which the definition of eligibility is known, the following had a
different definition from the one given in the Manual or provided conflicting information:
- AUS-NEW:
- In addition to MONICA Manual's requirements, the following were ruled as ineligibles for
the initial and middle survey: a) mentally handicapped, b) too old for study when
interviewed. According to the MONICA definition, the mentally handicapped persons ought to
be non-respondents due to medical reasons. The final survey adhered apparently to the
Manual's definition, but this information is suspect since it was submitted together with
a statement that the initial and middle survey also followed the Manual's definition of
eligibility.
- BEL-LUX:
- The centre used an electoral list as sampling frame. Presumably, this is the reason why
non-citizens were treated as ineligible. Also, persons found to be living elsewhere than
the address of the electoral list were designated as ineligible. It is not clear whether
the same exclusion criteria were applied to the event and demographic data.
- CAN-HAL:
- Persons in correctional institutions and persons unable to understand English and
without an interpreter were excluded from the survey. (The language problem happened twice
in the initial and 10 times in the final survey; no information is provided about
prevalence of incarceration). No language exclusion was applied to the event and
demographic data. However, prisoners usually receive their medical care within the prison
system and are therefore automatically excluded from the event registration.
- FRA-STR:
- For the initial survey "collective households" (e.g. old-age homes,
monasteries or convents) are defined as ineligible in addition to the persons specified in
the Manual. According to the MCC, "collective households" represent a negligible
fraction of the total sample. The MCC changed the sampling frame for the final survey. In
a second reply to the Sample Selection Description the MCC stated that the Manual's
definition of ineligibility has been broadened by including persons without French
citizenship and those not registered on the electoral roll. These additional restrictions
do not apply to the event and demographic data. From the results of the initial survey it
was estimated that there are about 4.5% non-French citizens in the population.
- FRA-TOU:
- Persons in prison and foreigners were included in the category of ineligibles. In the
Sample Selection Description the MCC describes a scheme of augmenting the sampling frame
by lists of foreigners obtained from various consulates. It is uncertain whether this
scheme was ever implemented and whether it was carried through to the middle and final
survey. It is also not clear how big a problem the omission of foreigners is, i.e. how
they impact on event and demographic data.
- GER-AUR and GER-AUU:
- The ineligibles included persons without German citizenship (estimated to be about 2% in
the age group > 50). These exclusion criteria do not apply to the event and demographic
data.
- GER-BRE:
- The ineligibles included persons without German citizenship. Such an exclusion criterion
was not applied to the event and demographic data.
- HUN-PEC:
- No sample size information is available for the initial survey. For the middle survey
the MCC was not able to examine the reasons for non-attendance. As a result, all
ineligibles were included in the group of non-respondents.
- ISR-TEL:
- In the second stage of the sampling, apartments were selected. The apartments were
visited and the people living in them formed the sampling frame of the final stage. The
apartments whose tenants were absent at the time of the visit were excluded from the
sampling frame.
- ITA-LAT:
- Eligibility was restricted to Italian citizens. Ineligible were also persons where there
was a problem with the address.
- LTU-KAU:
- The initial and middle survey defined officers and their wives, as well as persons who
had changed addresses, as ineligible. Address changes only qualify as ineligibles if it is
known that the persons moved permanently outside the target area. It is not clear whether
the same eligibility criteria were also used for the final survey and the event and
demographic data.
- NEZ-AUC:
- The initial survey declared Maoris and Polynesians as ineligible. It is not known
whether these individuals were also excluded from the final survey. As reported by the
MCC, the same exclusion rules did not apply to the event and demographic data. However,
local investigations have shown that this is not a major problem.
A related problem concerns the difference in target population between the various
components of the MONICA Project (see comments in Section 7)
- POL-TAR:
- The "not eligible" included some subjects who were not possible to contact,
but their ineligibility could not be confirmed. According to the MONICA definition, such
subjects are non-respondents. It is not clear whether the same definition was used for all
three surveys.
- RUS-NOC and RUS-NOI:
- For all three surveys, imprisoned persons (estimated to be < 1% of the population)
and those away from home for more than one year for occupational reasons were included in
the group of ineligibles. These individuals were not excluded from demographic data, but
prisoners are excluded from event registration.
- SWI-TIC and SWI-VAF:
- Severely ill or handicapped persons unable to attend the examination room were
classified as ineligibles. According to the MONICA definition, such persons are
non-respondents due to medical reasons.
- UNK-GLA:
- Before the sample was selected, the general practitioners (GP) excluded from their list
those whom they considered not suitable for the screening. Otherwise the definition is as
specified in the MONICA Manual. The category of people excluded from the GPs' lists could
be non-respondents (unable to attend for medical reasons). However, in this case they were
not sampled because they were not included in the sampling frame. The calculation of the
participation rate, therefore, may be biased by the absence of these individuals.
- USA-STA:
- Reason for ineligibility included severe illness, language problems and "previously
surveyed". The two latter reasons involved about 2% each of the ineligibles during
the middle and final survey. This numerical information is not known for the initial
survey. Also, it is not known how the "language ineligibility" has been
considered in the event registration and demographic and mortality data.
One aspect that decreases the accuracy of the participation rate in many RUAs is the
uncertainty about the definitions used. Another is the inconsistency of the participation
rate information from different sources of data. We made two comparisons between the
following data sources: the serial number inventory data (individual data), the sample
size data reported by the MCC (aggregate data), and survey respondent and non-respondent
data received by MDC (individual data). The results are summarised in Table
3 and Table 4.
The MCCs had to provide three survey data sets at the individual level:
- Serial number inventory data (Form 05) for everybody selected for the original sample.
In the data each subject was assigned a status:
- Status 1 = respondent
- Status 2 = non-respondent
- Status 3 = ineligible.
- Survey core data (Form 04) for every respondent, i.e. for those who had status 1 in the
serial number inventory data.
- Non-respondent data (Form 08) for every non-respondent.
No individual data for the ineligibles had to be submitted other than the serial number
inventory data. Therefore, we do not know the age and sex distribution of the ineligibles.
The MDC had not received the serial number inventory data for two RUAs for the initial
survey (ITA-LAT, MLT-MLT) and one RUA for the final survey (RUS-NOCa).
Non-respondent data were missing for five RUAs of the initial survey (HUN-PEC, MLT-MLT,
NEZ-AUC, RUS-MOC and RUS-MOI), one RUA of the middle survey (GER-COT), and three RUAs of
the final survey (RUS-MOC, RUS-MOI, RUS-NOCa).
The following pairs of data sets should have equal numbers of records:
- inventory data with status 1 and survey core data for respondents;
- inventory data with status 2 and non-respondent data; and
- inventory data with status 3 and number of ineligibles.
The number of ineligibles is available at individual level only in the serial number
inventory data. Therefore, their consistency is compared with the aggregate number of
ineligibles reported by the MCCs.
The numbers of the three pairs are given in Table 3 for all
three surveys. In addition, the table lists a summary quality score for each individual
survey and a quality trend for all surveys combined.
The Individual data agreement score (IDA) is defined as:
| IDA = |
2 |
if no discrepancies within pairs of sources of information. |
|
1 |
if small discrepancies within pairs; all three pairs differ by less than +
20. |
|
0 |
if major discrepancies within pairs or unavailability of some of the
data; any pair differs by more than + 20. |
A Quality trend score (QTS) was assigned according to the following rules:
| QTS = |
2 |
if improved or maintained high quality:
IDA = 2 for final survey
AND
there has been no decrease in IDA between successive surveys. |
|
1 |
if no quality fluctuations or no clear trend:
QTS is neither 0 nor 2. |
|
0 |
if deterioration or no improvement in poor quality:
IDA = 0 in final survey
OR
there has been a decrease between successive IDAs, but no increase. |
In the initial survey, 19/54 (35%) RUAs had score "2", 12/54 (22%) had score
"1" and 23/54 (43%) RUAs had score "0". In the middle survey 23/43
(53%) RUAs had score "2", 10/43 (23%) had score "1" and 10/43 (23%)
had score "0". In the final survey, the distribution of the scores was 21/41
(51%) for score" 2", 11/41 (27%) for score "1", and 9/41 (22%) for
score 0". The result indicates that the data management for initial surveys in most
MCCs was not reliable, but it improved with the next two surveys. However, it is clearly
not satisfactory that in the final survey 9 RUAs still had a score of zero.
The trend score confirms that there was an improvement between surveys in 20 RUAs but
13 RUAs got even worse. The overall trend is poor in more than half of the RUAs, an
unsatisfactory situation.
We compared individual data and aggregate data that were reported by the MCCs using the
forms Participation rate in the initial MONICA population survey (Appendix 1a), Sample size in the 2nd MONICA
survey (Appendix 1b) and Sample size in the final
MONICA survey (Appendix 1c). Table
4 gives ratios where the aggregate data are in the numerator and the individual data
in denominator. For "ineligible" the individual data are the serial number
inventory data and the denominator consists of the data in column "ineligible"
of Table 3. For "respondents" the denominator is the
survey core data (Form 04) and for "non-respondents" the non-respondent data
(Form 08). All ratios should be equal to 1.
The Individual and aggregate data agreement score (IAA) was defined for the
agreement between the two sources of data for respondents and non-respondents. The ratio
for "ineligibles" was not considered for the score, because the same data were
already evaluated in the score for serial number inventory. The criteria for the IAA score
are as follows:
| IAA = |
2 |
if both ratios are equal to 1.00. |
|
1 |
if both ratios are between 0.95 and 1.05 but at least one of them is
different from 1.00. |
|
0 |
if at least one of the ratios is less than 0.95 or more than 1.05, or
some of the data are missing |
The ratios for respondents are between 0.98 and 1.03 for most RUAs in the initial
survey, the exceptions being RUS-NOCb (1.33). For the middle survey, the ratios extend
from 0.94 to 1.02 with the exception of BER-BRE (0.92). For the final survey the ratios
lie between 0.98 and 1.02 with the exception of AUS-PER (1.05), FRA-STR (1.13), GER-BREa
(0.91) and GER-BREb (0.92). It indicates that, with a few exceptions, the management of
core data is reasonable. However, for non-respondents there are major shortcomings in most
RUAs, especially in the initial survey, where the ratios range from 0.66 to 13.5.
A quality trend score was defined similar to QTS in Section 6.1.
An improvement or maintenance of quality (trend score 2 or 1) was observed for 37/47 RUAs
(79%), indicating that for a significant number of RUAs the accounting for the sample
units got worse in later surveys.
Table 5.1 shows the participation rates for the RUAs in age
group 35-64. In accordance with the MONICA definition, these rates are the proportion of
eligibles that provided at least questionnaire information. Item-response rates,
especially for data items collected during clinic visits, may be lower and are discussed
in Section 8.
Two definitions for participation rates are employed:
- Definition A.
- The rate is calculated with the size of the eligible sample in the denominator and the
number of respondents (according to the technical
definition of the MONICA Manual, see Reference 2) in the
numerator.
- Definition B.
- The numerator is the same as for definition A. The denominator is defined as the number
of eligibles minus the number of non-respondents that were not possible to be contacted.
This definition is introduced to provide for those MCCs where the sampling frame may have
contained a large number of foreign elements, whose status of eligibility could not be
determined. In these situations, a large difference exists between the rates according to
the two definitions. The relationship between participation rate according to definition A
(PRA) and definition B (PRB) is given by:
PRB = PRA/(1 - X(1-PRA)),
where X is the proportion of those not possible to contact (i.e. REASON=1) among all
non-respondent records. If X is missing, it is assumed to be 0 and PRB = PRA.
Based on different data sources three participation rates are determined.
- By aggregate data:
- The first participation rate is based on the aggregate data provided by the MCC.
- By individual data:
- The second participation rate is based on the individual level survey core data (Form
04) and non-respondent data (Form 08).
- To be reported:
- The third participation rate is the one to be used when participation rate is reported.
It is normally identical to the rate calculated by aggregate data, unless it is known that
the aggregate data are in error. In that case, the rates calculated by individual data
will be reported.
In some MCCs a minor discrepancy between the participation rates "by aggregate
data" and "by individual data" can be explained by a difference in the
calculation of age. The following explanations concern RUAs where an exception was made to
the general reporting rule, i.e. there was a major discrepancy between the participation
rates calculated from different data sources, or some other comment is needed:
- AUS-NEW
- Ini: The MCC does not know the original sample size or number of ineligibles and thus,
no aggregate data are available.
Mid: Note that a subgroup of the non-respondents was considered as ineligible (see comment
in Section 4).
- AUS-PER
- Mid: The large difference between the response rates "by aggregate data" and
"by individual data" in Table 5.1 is due to erroneous
coding of age for non-respondents.
- BEL-CHA and BEL-GHE
- All three surveys: The rates refer to persons who were interviewed at home and had
provided the questionnaire data. However, only a fraction of them attended the clinical
examination. Therefore, the item-response rates for blood pressure and total cholesterol
are significantly lower (see Table 7).
The date of examination was not available for the middle survey non-respondent data from
Ghent. When calculating the age group of the non-respondents, the date of examination was
estimated as the middle of the survey period.
The significantly lower participation rate "by individual data" for the middle
survey is a consequence of the discrepancy between ineligibles "by aggregate
data" and "by individual data" (see Table 3).
- BEL-LUX
- Ini: It looks like non-respondent data (Form 08) have been received by MDC also for
those who are ineligible. Therefore, the rate "by individual data" is probably
too low, although no confirmation for this suggestion has been received from the MCC.
- CAN-HAL
- Ini: The MCC has communicated that the non-respondents may include a large number of
subjects who are ineligible to the sample. This is because the sampling frame is out of
date. The proportion of non-respondents who were not possible to contact was 53% in the
original sample. A similar situation exists for the final survey, although to a lesser
extent.
- CHN-BEI
- Mid: The aggregate data are incorrect, since the number of non-respondents includes only
the fraction that was closely investigated.
- FRA-LIL
- Ini: There were additional 45 non-respondents without known age. They were ignored for
the rate calculations, since they influenced the results only minimally.
- FRA-STR
- Ini: The sample was selected from age group 25-64. More accurate age information for the
subjects is available to the MONICA investigators only for those who could be contacted.
The following Table A, reported by the MCC, gives the classification
of the subjects in age group 25-64 (The numbers in brackets show the values computed from
the individual data. They are shown only when they are different from the values reported
by the MCC.):
Table A. Participation rate
data for FRA-STR
| |
Men |
Women |
Total |
| Respondents |
741 |
803 |
1544 |
| Non-respondents |
835 |
777 |
1612 |
| Non-respondents with age group known |
389
(390) |
407
(413) |
796
(803) |
| Ineligible |
|
|
258 |
| Participation rate A |
47.0% |
50.8% |
48.9% |
| Participation rate B |
|
|
same as rate A |
- GER-BRE
- Ini: The sample was selected from age group 25-69, and the accurate age is known only
for those who were contacted. Therefore, the participation rate "by individual
data" is incorrect. The participation rate "by aggregate data" and "to
be reported" concerns age group 25-69.
- GER-RHN
- Ini: MDC has not received an explanation for the large discrepancy between participation
rate "by aggregate data" and "by individual data". Therefore, the rate
"by individual data" is to be reported, because the individual non-respondent
data had been updated by the MCC, but not the aggregate data.
- HUN-BUD
- Ini: The MCC has confirmed that the number of non-respondents reported (400 for all
ages) is correct. There are some reservations about the rates; non-respondent data are
available for a surprisingly high 95% of the non-respondents (see Table
9). The way the high item-response rate for non-respondents was achieved is unknown.
- HUN-PEC
- Ini: The MCC has no information about the original sample size or the eligible sample
size. Therefore, it is not possible to give any estimate of the participation rate.
Mid: The MCC is unable to separate the non-respondents and the ineligibles. Therefore, all
that did not attend the examination were classified as non-respondents.
- ISR-TEL
- Ini: The MDC has contradictory information. According to the Sample Selection
Description the sample selection and interview took place at the same time. Only those
who were at home at the time of the visit were selected to the sample. However, according
to the non-respondent data 82% of the non-respondents were in the category "not
possible to contact".
- ITA-BRI
- Mid: The high item-response rate of 90% (Table 9) was achieved
through an extraordinary effort to reach non-respondents via telephone interviews.
- LTU-KAU
- Fin: 55% of the non-respondents were not possible to contact. The sample was selected
two years before the examination because the survey was postponed after the sample
selection. Furthermore, before the final survey the country underwent major changes, and
during that time no reliable information on migration was available.
- MLT-MLT
- Serial number inventory data and non-respondent data are missing.
- NEZ-AUC
- Ini: The MDC has no non-respondent data for the RUA.
From the sampling frame it is only known whether a person is older than 18 years.
Therefore, the age becomes known only when contact is made with the person. The MCC
estimated the sample size by multiplying the total number of letters sent by the
proportion of the population 18 years and over who were aged 35-64 years.
Note also that the target population for the initial survey covers only about 80% of the
population for other data components (see Section 1). The MCC
might consider restricting all data components to the same common target area, or dividing
the target population into two RUs, with one of them being the area of the initial survey.
- POL-TAR
- Ini: The group of ineligibles includes some potential non-respondents (see comments in Section 4). Therefore, values for the participation rate are slightly
inflated. However, the effect is small (designating all ineligibles as non-respondents
would lower the participation rate by about 5%).
- ROM-BUC
- Ini: It appears that non-respondent data (Form 08) have been received by MDC only for
part of the non-respondents. Therefore, the rate "by individual data" is
probably too high, although the MCC has not confirmed this suspicion.
- RUS-MOC and RUS-MOI
- Ini: The MCC is unable to provide the non-respondent data for RUS-MOI and RUS-MOC for
organizational reasons.
Fin: The MCC could not provide non-response data for RUS-MOC and RUS-MOIa because of lack
of resources.
- RUS-NOC
- Ini: Aggregate data are missing for RUS-NOCa (RUS-NOCa was created after the initial
survey by sub-dividing RUS-NOCb).
- SWI-TIC and SWI-VAF
- Ini and Mid: A subgroup of the non-respondents was considered as ineligible (see comment
in Section 4). The participation rates may therefore be slightly
inflated.
- UNK-BEL
- Ini: The response rate "by individual data" will be reported, since the
aggregate data have not been corrected for the status of some ineligibles that should be
respondents.
Mid: The MCC has indicated that the non-respondents may include a large number of subjects
who are ineligible for the sample. This is because the sampling frame is out of date.
- UNK-GLA
- A subgroup of potential non-respondents was excluded from the sampling frame (see
comment in Section 4). The participation rates may therefore be
slightly inflated.
- USA-STA
- Ini: The participation rates are slightly inflated (by about 2%), since households that
could not be contacted have been included in the ineligibles.
Mid and Fin: The individual age was not known for 157 non-respondents in the middle survey
and 83 non-respondents in the final survey. Therefore, as an exception, the participation
rates were calculated for the age group 25-64.
All three surveys: Also, some subjects were considered as ineligible because of language
problems (see comments in Section 4). As far as could be determined,
their effect on participation rates was minor.
- YUG-NOS
- Ini: Some individuals who completed the survey questionnaire but did not attend the
clinic were considered as non-respondents (should be respondents). This causes a minor
depression of the participation rates.
Mid: Surprisingly, there were no ineligibles during the middle survey, whereas the other
two surveys had ineligibles. No explanation for this difference is available.
Table 5.2 summarises the change in participation rates
between the different surveys. The change is always calculated as the rate for the later
survey minus the rate for the earlier survey. Thus, negative values indicate a decrease in
participation with time. The average change between initial and middle survey was -2.15%,
between middle and final -2.14%, and between initial and final -4.08%. These numbers
confirm a trend of decreasing participation rates that has also been observed in other
surveys.
Table B summarises the changes in the participation rates
"to be reported/Definition A" between initial and middle, middle and final, as
well as between initial and final.
Table B. Summary of the
changes in the participation rate A "to be reported" between surveys
| |
Number of RUAs in categories of change in response |
| |
<-20 |
-20...-10 |
-10...0 |
0...10 |
10...20 |
Mid-Ini
Fin-Mid
Fin-Ini |
0
0
1 |
3
2
5 |
21
18
20 |
16
17
12 |
0
0
1 |
Tables 6.1 and 6.2 give the age
specific participation rates for men and women respectively. The rates are based on the
"to be reported" selection and calculated for 10-year age groups for each of the
three surveys. The data show that there are differences in participation rates between age
groups and sexes, but the patterns differ between RUAs. Only the fact that the subjects of
the youngest age group (25-34 years) are the most reluctant to participate seems to hold
for most of the RUAs, but there are several exceptions also to this rule. (The clearest
examples are ISR-TEL, RUS-NOI and FRA-LIL in the initial survey, CZE-CZE and RUS-NOI in
the middle survey, and RUS-NOI, BEL-CHA, GER-BRE and YUG-NOS in the final survey).
The participation rate, as discussed in Section 7, defines the rate
with which contact with a consenting participant was established. Successful contact with
a respondent does not imply that all data items stipulated by the MONICA protocol can be
collected. Therefore, the response rate for specific data items may be considerably lower
than the rates discussed in Section 7. A typical scenario that might
lead to reduced item-response rates for cholesterol measurement would be where the
respondent completes the questionnaire part of the data but refuses to attend a separate
clinic session for physiological data collection.
In Table 7, we compare the participation rate with the
item-response rates for blood pressure, cholesterol, BMI, and smoking for each RUA and its
respective surveys. Tables 8.1a (systolic blood pressure, men),
8.1b (systolic blood pressure, women), 8.2a
(total cholesterol, men), 8.2b (total cholesterol, women),
8.3a (BMI, men), 8.3b (BMI, women),
8.4a (smoking, men) and 8.4b (smoking,
women) provided this information in a sex and age-group specific format. The
item-response rates were calculated as a product of the overall participation rate and the
proportion of respondent records for which data for the specific variable were available.
In general, the item-response rates are quite close to the participation rates. The
exceptions are BEL-CHA (all surveys), BEL-GHE (all surveys), CAN-HAL (Fin), ISR-TEL (Ini),
SWE-GOT (Fin), and USA-STA (Ini and Fin). As far as it is known, the above scenario
applies to all of these RUAs. This assumption is also supported by the item-response rate
for smoking. Since smoking data were collected at the time of the interview, item-response
rate and participation rate are very close.
There is little difference between the item-response rates for the four variables.
However there are also a few exceptions to this general observation. Local idiosyncrasies
in data collection are probably the explanation for the fact that the item-response
rates are only low for blood pressure and cholesterol in ISR-TEL.
The MONICA Manual states:
"Even though it is not possible to get complete data required for the core study
for the non-respondents, the MCC should try to collect information about their age, sex,
marital status, education, smoking history and blood pressure. The objective is to
estimate the selection bias which non-response inflicts on the core study. Age and sex are
often known at the time of sample selection. For other data, a telephone interview or a
postal questionnaire can be tried. It is recommended that all non-respondents are asked to
provide information. However, it is acceptable that only a random sample of
non-respondents is investigated in full. The reason for non-response should be recorded in
all cases."
"The survey non-respondent data (Form 08) should be submitted to the MDC for every
non-respondent, regardless of how much information was received from the non-respondent.
In most cases the MCC should at least elicit the age group, sex and reason for
non-response."
The instructions for collecting non-respondent data in MONICA were introduced in 1985.
Therefore, it is understandable that some of the MCCs, which did the initial survey
earlier, have not collected the data as required. However, it is difficult to accept a
situation where the MCC is not even able to list the non-respondents, which points to a
poor survey management.
Tables 9.1 and 9.2 show for each
RUA and the initial, middle and final survey the availability of the non-respondent data
items as a percentage of submitted non-respondent records. In column 3 of Table 9.1, the correspondence between aggregate and individual
non-respondent counts is categorized as follows:
- SAME: This applies for RUAs where the denominator of the proportions probably represents
the true number of non-respondents.
- MORE: In this category the denominator probably includes also cases that are ineligible
for the sample. Therefore, the availability of the data may actually be better than
indicated in the table.
- LESS: For these RUAs we suspect that non-respondent data have not been submitted for
cases where few data are available. Therefore, the true proportions may be smaller than
indicated in the table.
- SAMPLE: In these RUAs only a sample of the non-respondents was investigated. Therefore,
a small proportion may actually indicate a very large availability of data in the sample.
The sampling procedures in these populations were:
- BEL-LUX
- Initial survey: An attempt was made to investigate thoroughly a non-random sample of 30%
of the non-respondents. The sampling criterion is not known. MDC probably also has
non-respondent records for the ineligibles.
- CHN-BEI
- Initial survey: An attempt was made to investigate thoroughly a non-random sample of 50%
of the non-respondents.
Mid: An attempt was made to investigate thoroughly a non-random sample of 40% of the
non-respondents.
- HUN-BUD
- Initial survey: A random sample of unknown size was attempted for thorough
investigation. This, however, is in conflict with the fact that the MDC has a record for
every non-respondent and detailed information is known for about 95% of them.
- ITA-BRI
- Initial survey: A non-random sample of about 20% of the non-respondents was attempted
for thorough investigation. The sampling criteria are unclear, and Table 9 reveals that
detailed data are available in 26% of the non-respondents.
- POL-WAR
- Initial survey: A random sample of 20% of the non-respondents was attempted for thorough
investigation.
- ROM-BUC
- Initial survey: There is no information about the sampling, but the MDC has a
non-respondent record only for about 25% of the non-respondents. Therefore, the
proportions shown on Table 9 are over-estimates.
- RUS-NOC and RUS-NOI
- Final survey: A non-random sample of about 25% of non-respondents with equal number of
individuals in each 10-year age group was investigated thoroughly by home visits.
One may assume that the complete absence of data for the items Marital Status
to Weight in Table 9.2 is an indication for a
systematic neglect of investigation of non-respondents. This then implies that in 21/54
(39%) RUAs in the initial survey, in 10/43 (23%) RUAs in the middle survey, and in 7/41
(17%) RUAs in the final survey, no attempt was made to investigate the non-respondents in
more detail. In most of the cases where such an attempt was made, detailed information is
available for fewer than 50% of the non-respondents. Therefore, the utility of the
non-respondent data is very limited for many RUAs.
The reason for non-response may not be available from the RUAs where the initial survey
was started before this data item was introduced in 1985. However, the reason for
non-response should be available for every non-respondent of every RUA in the middle and
final surveys. Table 9.1 indicates that there is only a slight
improvement in the availability of this data item between surveys. In the initial survey
the reason of non-response was available for more than 80% non-respondent in 21/54 (39%)
RUAs. In the middle survey the number was 19/43 (44%), and in the final survey it was
21/41 (51%).
Table 10 gives the proportions of the different reasons for
non-response for the initial, middle, and final survey respectively. Among the RUAs where
the reason is known in more than 80% of the records, the main reasons of non-response were
"Not possible to contact", "Not interested"
and "Other refusal", but the proportions varied remarkably between
the RUAs. It is possible that the difference between "Not interested"
and "Other refusal" has been interpreted differently between the
MCCs, and even between the two surveys within some RUAs. "Temporarily out of
the area" and "Medical reasons" were rare reasons in
nearly all RUAs. This scenario did not change much from survey to survey.
When repeated surveys are conducted in random samples of the same population, as
happens in MONICA, the sample mean values of the measurements can change for many reasons:
- There is a change in the population mean values, either because of changes in the
individual persons' values or because of a change in the composition of the population;
- There is a statistical error because we are measuring only a sample of the population
and many of the persons' values (like blood pressure and cholesterol) have short term
fluctuations;
- There is a change in the representation of the respondent sample, either because of a
bias in the sampling frame or because of non-response bias; or
- There is a measurement bias.
The objective of the MONICA Project is to estimate the changes due to the first reason.
The standard error of the estimates gives information about the statistical error. The
third and fourth reasons indicate bias, which we want to avoid. The fourth reason is
investigated in the quality assessment reports of the individual risk factors, whereas the
third reason is a topic of the current report.
Estimates of population changes in situations where there should be very little or no
change in the actual population mean values can be used as indicators of possible bias in
the estimates. Under the assumption that the MONICA participants aged 35 or older have
essentially concluded their formal education, a birth cohort (i.e. people born in certain
years) should show no change in the number of years of schooling from survey to survey,
except for randomness caused by sample selection. Similarly, birth cohorts in the age
range that is of interest to MONICA should not increase in mean body height, but instead
show a small decline as they age. Finally, birth cohorts are unlikely to show a change in
the proportion of never-smokers, unless there is a significant selective mortality of
smokers. These cohort trends have been investigated in detail in the quality assessment
reports (QA) on education (4), weight and height ( 5) and smoking (6). In each of the three
quality assessment reports, Cohort trend scores (CTS) were defined. The
scores were based in the estimated changes and their standard errors for men and women in
the common age groups 35-44 and 45-54, in three steps:
- The average change (A) was calculated for each sex/birth cohort. This average change was
used as the reference value around which the random variation
was expected to occur.
- Upper and lower limits were set for a change as A ± 2.5 SE, where SE is the standard
error of the estimated change. If a change is normally distributed with mean A and
variance SE2, then the probability that all four changes (two birth cohorts and two sexes)
are within the limits is 95%.
- The Cohort Trend Score (CTS) was defined as:
| CTS = |
2 |
if all four changes are within limits; |
|
1 |
if one of the four changes is out of limits; |
|
0 |
if at least two of the four changes are out of
limits. |
The CTS for body height is derived in Table 6 of the
Weight and height QA (5). Similarly, CTS for years of
schooling can be found in Table 8 of the Education QA (4) and the CTS for never-smokers in Table 18 of the Smoking QA (6). The three CTS for each RUA are collected in Table 11 of this report to provide a comprehensive picture of birth
cohort trends.
It is often difficult to assess from the data whether the changes in the cohort trends
are due to measurement bias, change in the target population or change in sample
representation. Therefore, the MCCs with low CTS were asked to check possible reasons for
the cohort changes in the quality assessment reports for the three individual variables. A
change in the target population or in the survey representation is likely to induce
changes in the cohorts for more than one variable. A measurement bias in several variables
is also possible but less likely. Hence, we will confine our comments here to RUAs where
more than one variable has a low CTS for a particular survey pair:
- BEL-CHA
- Between the initial and the final survey there was a large increase in body height and a
decrease in years of schooling. Neither of these changes is plausible, suggesting a change
either in the target population or in the survey participation. Furthermore, the fact that
height and years of schooling are usually positively correlated, strengthens the
possibility of measurement bias as an explanation.
- CHN-BEI
- A major change appears to have occurred between the initial and the middle survey. All
three variables have low CTS, which is due to a decrease in all three variables. The
observed changes provide strong evidence of changes in either the target population or the
survey participation.
- FRA-STR
- All three variables have low CTS for the Ini-Fin survey pair. The low scores for body
height and never-smokers are due to an increase in these variables, which is incompatible
with stable target population or survey participation. The low score for years of
schooling is due to an increase in years of schooling. The MCC reported that the Census
also found increased schooling between 1982 and 1990. Nevertheless, the simultaneous
occurrence of three low CTS rather suggests a change in either target population or survey
participation.
- RUS-MOI
- For the Ini-Fin survey pair, years-of-schooling has a low CTS because of a large
increase in this variable. An increase in the proportion of never-smokers also produced a
low CTS. The increase in the years of schooling is plausible, but the increase in the
never-smokers is not. The combination of the two scores thus is more likely to reflect a
change in target population or survey participation.
- RUS-NOI
- For the initial and final survey comparison a large decrease in body height and a large
increase in years of schooling caused low CTS for both variables. These changes warrant a
careful investigation of the stability of the target population or participant
characteristics.
- SPA-CAT
- Only between the middle and final survey occurred more than one low CTS. This was the
result of an increase in body height and a decrease in years of schooling. Furthermore,
years-of-schooling showed progressive decrease with every survey, producing CTS of zero
for all three comparisons. According to the MCC, the decrease in years of schooling is the
consequence of a change in the target population as a result of immigration.
- UNK-BEL
- Body height and years-of-schooling produced low CTS for the Mid-Fin and Ini-Fin
comparison. The low scores for both comparisons were the result of increases in the
respective variables. While the increase in years of schooling is plausible, the increase
in body height is not. The combination of the two observations, therefore, suggests that
in the final survey a change has occurred in either the target population or the survey
participation.
It should be pointed out that a high CTS is not a proof for unbiased survey
participation. However, it makes it less likely that target population or participant
characteristics have changed between surveys. As mentioned above, a change in the
characteristics of the target population is fully compatible with the objectives of the
MONICA Project and not a source of bias. (Nevertheless, if caused by large migration, it
is often associated with difficulties in obtaining a representative sampling frame and
accurate population estimates.) A change in the characteristics of the survey
participation, however, is a source of bias in the estimation of changes in the risk
factors in the population.
A sample is said to be self-weighting if every member of the population has
an equal probability of becoming selected to the sample. A simple mean calculated from a
self-weighting sample gives an unbiased estimate of the population mean value. If the
sample is not self-weighting, a weighted mean of the sample will be needed to get unbiased
estimates of the population mean value. Therefore, the data analysis will be simpler for a
self-weighting sample than for a sample that is not self-weighting.
Typical situations, which may lead to unequal sampling probabilities of the subjects,
are stratified sampling and multi-stage sampling. Most of the MONICA samples were
stratified by 10-year age group, but this is not a problem because the age or age-group
will nearly always be taken into account in the data analysis anyway. Within the age
groups, nearly all, if not all, of the MONICA samples were designed such that the subjects
have approximately equal sampling probabilities. There is, however, one situation where
this is not necessarily the case. The sample sizes in the different Reporting Units (RU)
of the RUAs may have been chosen to be equal even if the population sizes are not equal,
or some other criteria may have been used for the sample sizes. Therefore, there is
particular interest to check the sampling probabilities of the subjects of the RUs of the
RUAs that consist of more than one RU. There are nine such RUAs: AUS-NEWa, CZE-CZEa,
GER-EGEb, ICE-ICEa, ITA-FRIa, POL-TARa, POL-WARa, SWE-NSWa and USA-STAa.
A simple measure of the sampling probability is the sampling fraction,
defined as the proportion of the sample in the population. The sampling fraction of the
RUs of the nine RUAs by sex and 10-year age group is given in Table
12. The denominator for calculating sampling fractions was the population size of the
age/sex group of the RU in the calendar year of the middle of the survey (or the nearest
year for which the data were available). The eligible sample size from the aggregate data
was the numerator.
No weighting of the subjects will be needed if the sampling fractions are approximately
equal in the RUs within each age/sex group within each survey. According to Table 12 this seems to be the case for most of the RUAs. The
exceptions are:
- ICE-ICE:
- The sampling fraction of RU2 (Arnes County) is consistently nearly 10 times the sampling
fraction of RU1 (Reykjavik). The reason is that the sample size was the same in Arnes
County as in Reykjavik although Reykjavik is nearly ten times as big as Arnes County.
Arnes County was included in the survey as a representative of the rest of the country
(except Reykjavik). A third of Iceland's population lives in Reykjavik. Unweighted sample
mean values would give a very high relative weight to RU2 compared with RU1, but the
relative weight would be similar for each of the three surveys. As a consequence,
unweighted estimates of trends in the mean values will represent the trend of the RUA as a
whole only if the trends in the different RUAs are similar.
- USA-STA:
- The sampling fractions vary between about 1 and 3 between the populations but remain
consistent between the three surveys. Unweighted sample means would give much higher
relative weight to some reporting units compared with others, but the relative weights
would be similar for each of the three surveys. As a consequence, unweighted estimates of
trends in the mean values will represent the trend of the RUA as a whole only if the
trends in the different RUAs are similar.
- GER-EGEb:
- The sampling fractions of RU19 are twice the sampling fractions of RU17 in the initial
survey but the sampling fractions are equal in the final survey. Unweighted sample mean
values would give equal relative weights to the two RUs in the final survey but not in the
initial or the middle survey. As a consequence, if there is a major difference in the risk
factor levels between the two RUs, unweighted estimates of trends in the mean values will
not represent the trend of the RUA as a whole even if the trends in the different RUAs
were similar.
Weighting the RUAs in analysis is a solution to these problems, but will complicate all
analyses essentially. If we can assume that each of these three RUAs are relatively
homogenous, no weighting will be needed.
It is not very likely that unweighted analysis will have a major impact in the cases of
ICE-ICE and USA STA. For GER-EGE the situation seems more complex. To see the potential
effect of neglecting the weighting in GER-EGE, Table C gives an example for risk factor
differences which are probably much larger than those actually observed for the reporting
units.
Table C. Influence of neglecting weighting when
estimating change in risk factor mean value in a hypothetical situation
| Variable |
RU |
Sampling fraction |
Proportion or mean value |
Estimate of change |
| Ini |
Fin |
Ini |
Mid |
Weighted |
Unweighted |
| Smoking |
A
B |
1
2 |
2
2 |
30 %
40 % |
30 %
40 % |
0 % |
-1.7 % |
| Systolic blood pressure |
A
B |
1
2 |
2
2 |
130 mmHg
140 mmHg |
130 mmHg
140 mmHg |
0 mmHg |
-1.7 mmHg |
| Total cholesterol |
A
B |
1
2 |
2
2 |
5.4 mmol/l
6.0 mmol/l |
5.4 mmol/l
6.0 mmol/l |
0 mmol/l |
-0.1 mmol/l |
The biases given in the rightmost column of Table C are relatively small compared with
the achievable measurement accuracy of the risk factors. As the differences in the risk
factor levels between the RUs in Table C are probably also much larger than the real
differences between the RUs of GER-EGE, one should feel quite comfortable in estimating
the trend without weighting even for GER-EGE.
This report was an attempt to summarise the situation concerning the quality of
sampling frames and survey non-respondents in the MONICA risk factor surveys. We can draw
some general conclusions from the findings.
There were big differences in the availability of good sampling frames among the RUAs.
Only about 40% of the sampling frames were found to be of good quality, and 25% were
clearly of poor quality. There still exists a lot of uncertainty about the specific
properties of the sampling frames in the various RUAs. For example, it is not clear
whether all population registers can be considered equal in terms of being up-to-date.
Also, there appear to be significant differences in the quality of electoral rolls. We
therefore feel that our use of proportion of ineligibles and proportion
of not located for quality assessment was probably the best choice under the
circumstances.
Many MCCs did not report the exact definition of eligibility used in their initial
survey. In the middle and final surveys, most MCCs defined eligibility to the survey
according to the instructions given in the MONICA Manual. However, such assertions are
often suspect since they conflict with other data available at the MDC. If definition of
eligibility changes, the comparison of response rates between different surveys becomes
difficult. In RUAs where the proportion of ineligibles is high, there is a great risk of
bias, if many ineligibles were so classified by mistake and if they were exceptional with
respect to the risk factors for cardiovascular diseases. Such biases could have a
noticeable influence on the estimates of risk factor trends. We suggest that data
weighting should be considered for RUAs, where eligibility changed or where major
discrepancies between individual and aggregate data (Table 4)
exist.
If the response rate exceeds 80%, we can be quite confident in applying the results of
the survey to the whole RUA, provided that the quality of the data is otherwise good. Only
one third of the RUAs in the initial survey, one fifth of the RUAs in the middle survey,
and one sixth of the RUAs in the final survey reached that level. The 70% limit, which we
might still consider satisfactory, was exceeded by two thirds of the RUAs in the initial
survey, three-quarters in the middle survey, and two thirds again in the final survey. The
fact that a noticeable number of RUAs remained below 70% and the extreme ones even below
50% is a concern. It may be erroneous to assume that the risk factor changes, which are
observed in these surveys, reflect the situation in the population.
The data available for this report do not provide hard evidence for the
representativeness of the respondents. The crucial problem is the low availability of
information about the non-respondents, a problem shared with most other surveys. Even if
more data were available, they would have to be treated with caution, in comparison to the
data about the respondents. The conclusions one could draw from the non-respondent data
would hardly be more than qualitative. Nevertheless, such data would help in understanding
the full risk factor profile and trends in the population. Perhaps the next step for
investigating the non-respondents in more detail has to be taken locally by the MCCs. The
MCCs have the best knowledge of the available sampling frames, the procedures used to
achieve high response rates, the procedures used to investigate the non-respondents, and
possibly other information which helps to characterise the non-respondents.
Suggestions for conducting such investigations are:
- If the level of education of the target population is available from census data, it can
be compared with the level of education in the MONICA sample. Note, however, that the
questions for establishing education levels have to be comparable between census and
MONICA surveys.
- Evaluate the changes for "Reason of non-response" between surveys.
The following list includes only the RUAs with specific findings or exceptional
background information relevant for the use of the data.
AUS-NEW
- Conflicting information on survey eligibility (Sample Size of Final MONICA Survey:
All surveys used MONICA Manual definitions, Sample Size of 2nd MONICA Survey:
Mentally handicapped were excluded).
AUS-PER
- Erroneous coding of age for non-respondents limits the use of non-respondent data
of the middle survey.
BEL-CHA
- The participation rate is much higher than the item-response rates for blood pressure
and total cholesterol. This is because the respondents were interviewed at home and many
of them never came to the clinical examination. The MCC has found that the respondents,
who answered the questionnaire only and did not attend the clinic, had a significantly
higher prevalence of smoking, but not of antecedents of hypercholesterolaemia.
- Cohort analysis shows large changes in body height and years of schooling between
initial
and final survey, suggesting measurement and/or participation bias.
BEL-GHE
- The participation rate is much higher than the item-response rates for blood pressure
and total cholesterol. This is because the respondents were interviewed at home and many
of them never came to the clinical examination. No estimates are available on the possible
bias between respondents who answered questionnaires only and those who also visited the
clinic
BEL-LUX
- Survey excluded persons on electoral list but living elsewhere. The frequency of this is
unknown.
CAN-HAL
- Prisoners were excluded from the survey and event data, but not the demographic data.
FRA-LIL
- The increase in the estimated schooling level between initial and final survey in women
is probably true, since a cohort analysis of census data yields similar results.
- The implications of the change in the sampling frame between the initial and final
survey are unknown.
FRA-STR
- Foreigners are excluded from the final survey, but not from the event and demographic
data. They represent about 4.5% of the population.
- The simultaneous significant cohort trends in body height, proportion of
never-smokers, and education levels are very suggestive of a response bias, although the
census data support a possible increase in education levels.
- The implications of the change in the sampling frame between the initial and final
survey are unknown.
FRA-TOU
- Persons in prison and foreigners were included in the category of ineligibles. In the
Sample Selection Description the MCC describes a scheme of augmenting the sampling frame
by lists of foreigners obtained from various consulates. It is uncertain whether this
scheme was ever implemented and whether it was carried through to the middle and final
survey. It is also not clear how big a problem the omission of foreigners is, i.e. how
they impact on event and demographic data.
- The increase in the estimated schooling level between initial and final survey in men
may represent a response bias.
GER-AUR and GER-AUU
- The surveys excluded non-Germans (about 2% of population >50), but this group is not
excluded from event and demographic data.
GER-BER and GER-COT and GER-EGE and GER-HAC and GER-KMS and GER-RDM
- The National X-ray register was used for the initial survey and was regularly updated
from the population register. For all practical purposes, the two were identical. For the
middle and final survey, the population register was used.
- The increased level of schooling could be due to a larger participation of women in the
"Open University". However, only anecdotal data are available to support such an
explanation.
GER-BRE
- Participation rates apply to the age group 25-69.
- German citizenship was an eligibility requirement for the surveys. Such an exclusion
criterion was not applied to the event and demographic data.
GER-RHN
- The large difference between participation rate by aggregate and individual data is
unexplained. Therefore, the response rate reported may be incorrect.
HUN-BUD
- There are some reservations about the rates; non-respondent data are available for a
surprisingly high 95% of the non-respondents (see Table 9). The way the high item response
rate for non-respondents achieved is unknown.
ISR-TEL
- According to the Sample Selection Description the sample selection and interview took
place at the same time. Only those who were at home at the time of the visit were selected
to the sample. This is in contradiction with the non-respondent data where 82% of the
non-respondents were in the category "not possible to contact".
ITA-BRI
- There is an item response rate of more than 90% of the non-respondent data in the middle
survey. The MCC explains this by very intensive effort to contact the non-respondents by
telephone.
ITA-FRI
- According to the MCC, there are no ineligibles in the final survey because the sample
was selected just before the survey from the computerized Regional Health Rolls, which are
being updated continuously.
- The minor discrepancy in the final survey between the number of non-respondents in the
aggregate data and in the individual data (see Tables 4 and 9.1) has been clarified by the MCC after the MDC database was
closed for the final analyses of the MONICA Project: The individual data are correct. The
only significant implication of the discrepancy is code "0" instead of code
"2" for the "Trend in Quality" in Table 4.
ITA-LAT
- Individuals with address problems were ineligible for the survey.
LTU-KAU
- Persons who had changed their residence without giving a new address to the address
bureau were classified as ineligible in the final survey.
- The MCC explains the large increase in ineligibles in the final survey by the fact that
the sample was selected two years before the examination because the survey was postponed
after the sample selection. Furthermore, during the time before the final survey the
country underwent major changes and no reliable information on migration was available.
MLT-MLT
- Serial number inventory and non-respondent data are not available.
NEZ-AUC
- The target population of the initial survey covers only 80% of the population for event
and demographic data. Maoris and Polynesians were excluded from the initial survey, but
not from the event and demographic data. The MCC found that the difference was not a major
problem. It is not known whether Maoris and Polynesians were also excluded from the final
survey.
ROM-BUC
- There are non-respondent records missing or the aggregate data are incorrect.
RUS-MOC and RUS-MOI
- The estimated increase in schooling level between initial and final survey in men and
women may represent a response bias.
RUS-NOC and RUS-NOI
- There were additional eligibility criteria described in the response to the Sample
Selection Description (imprisonment, away for occupational reasons), which were applied to
all three surveys. These exclusion criteria do not apply to demographic data, but
prisoners (estimated <1% of population) are excluded from event registration.
- The estimated increase in schooling level between initial and final survey in men may
represent a response bias.
- The serial number inventory is missing for RUS-NOCa (Fin).
SPA-CAT
- Immigration between the surveys altered the target population leading to a lowering of
years of schooling and possibly other changes.
- The dates of birth of three non-respondents in the initial survey have been corrected in
the MONICA database after the preparation of this document.
UNK-BEL
- The estimated increase in years of schooling and body height between initial and final
survey may represent a response bias.
UNK-GLA
- The participation rate may be biased since GPs excluded from the sampling frame persons
whom they considered unsuitable for the survey.
USA-STA
- The MCC thinks that the frequency of exclusion due to language during the first survey
is less than 30 but they have no numeric proof.
YUG-NOS
- For the initial and middle survey no ineligibles were reported. However, the final
survey had a significant number of them. No explanation for this difference is available.
- Tunstall-Pedoe H for the WHO MONICA Project. The World
Health Organization MONICA Project (Monitoring Trends and Determinants in Cardiovascular
Disease): A major international collaboration. J Clin Epidemiol 1988;41:105-14.
- WHO MONICA Project. MONICA Manual. (1998-1999). Available from:
URL:http://www.ktl.fi/publications/monica/manual/index.htm,
URN:NBN:fi-fe19981146.
- Kuulasmaa K, Tolonen H, Ferrario M, Ruokokoski E for the WHO
MONICA Project. Age, date of examination and survey periods in the MONICA surveys. (May
1998). Available from: URL:http://www.ktl.fi/publications/monica/age/ageqa.htm,
URN:NBN:fi-fe19991075.
- Molarius A, Kuulasmaa K, Moltchanov V, Ferrario M for the WHO
MONICA Project. Quality Assessment of Data on Marital Status and Educational Achievement
in the WHO MONICA Project. (December 1998). Available from: URL:http://www.ktl.fi/publications/monica/educ/educqa.htm,
URN:NBN:fi-fe19991078.
- Molarius A, Kuulasmaa K, Sans S for the WHO MONICA Project.
Quality assessment of weight and height measurements in the WHO MONICA Project. (May
1998). Available from: URL:http://www.ktl.fi/publications/monica/bmi/bmiqa20.htm,
URN:NBN:fi-fe19991079.
- Molarius A, Kuulasmaa K, Evans A, McCrum E, Tolonen H for the WHO
MONICA Project. Quality assessment of data on smoking behaviour in the WHO MONICA Project.
(February 1999). Available from: URL:http://www.ktl.fi/publications/monica/smoking/qa30.htm,
URN:NBN:fi-fe19991077.