WWW-publications from the WHO MONICA Project
May 1998
Kari Kuulasmaa1, Hanna Tolonen1, Marco Ferrario2 and Esa Ruokokoski1 for the WHO MONICA Project3
1 MONICA Data Centre, National Public Health Institute, Helsinki, Finland
2 Research Centre for Chronic-Degenerative Diseases, Institute of Biomedical
Sciences San Gerardo, University of Milan, Milan, Italy
3 Annex: Sites and key personnel of the WHO MONICA
Project
Thanks are due to Alun Evans who commented on the text and Vladislav Moltchanov who helped in the preparation of an earlier version of the document.
The MONICA Centres are funded predominantly by regional and national governments, research councils, and research charities. Coordination is the responsibility of the World Health Organization (WHO), assisted by local fund raising for congresses and workshops. WHO also supports the MONICA Data Centre (MDC) in Helsinki. Not covered by this general description is the ongoing generous support of the MDC by the National Public Health Institute of Finland, and a contribution to WHO from the National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland, USA for support of the MDC. The completion of the MONICA Project is generously assisted through a Concerted Action Grant from the European Community. Likewise appreciated are grants from ASTRA Hässle AB, Sweden, Hoechst AG, Germany, Hoffmann-La Roche AG, Switzerland, the Institut de Recherches Internationales Servier (IRIS), France, and Merck & Co. Inc., New Jersey, USA, to support data analysis and preparation of publications.
DEXAM, DBIRTH, AGEGRP
In the MONICA surveys (1,2) three data items are related to the age of the subjects and the survey period:
These items are used for determining the age and age group of the subjects and the survey period, which is important in risk factor trend analysis. The purpose of this report is to:
The report considers the Reporting Unit Aggregates (RUA) which are foreseen as potential candidates for units of analyses of the MONICA data. The RUAs, their abbreviations and Reporting Units are listed in Table 1. Some of the RUAs have several versions because different combinations of Reporting Units (RU) may be used for cross-sectional and trend analyses if all reporting units of the population were not included in all three or two surveys. Therefore, in AUS-PER, GER-BRE, GER-EGE, GER-KMS, GER-RDM, RUS-MOI and RUS-NOC there is an overlap of reporting units included in the RUAs in some surveys. The RUAs are identified by the abbreviation and a version letter. For UNK-GLA which carried out four surveys, the first (Ini), third (Mid) and fourth (Fin) survey are considered. Altogether 54 RUAs are considered for the initial survey, 43 for the middle and 41 for the final survey.
No age selection of subjects' age was made for this report. Data for men and women have been combined unless otherwise specified.
The sources of data for this report are the survey core data in the MONICA Data Centre (MDC), sample selection description forms completed by the MONICA Collaborating Centres (MCC) and other relevant information.
When the survey core data were received in MDC, the data were checked routinely for the following constraints:
All violations of these constraints were reported to the MCC for their correction or confirmation. The MCCs were expected to check locally the items AGEGRP, DBIRTH and DEXAM more carefully than was possible in MDC, because the MCCs were aware of the exact examination periods and the local definition of age. The current unresolved constraint violations are shown in Appendix 1. There are only few unresolved constraint violations, for BEL-CHA (Mid), BEL-GHE (Mid), CAN-HAL (Ini), FRA-STR (Fin), ITA-BRI (Mid), ITA-LAT (Ini), and ROM-BUC (Ini).
The MONICA Manual gave an option to stratify the sample by 10-year age group. The purpose of item AGEGRP is to identify the age stratum of the subjects if the sample was stratified by age. Code 8 was to be used by the MCCs which did not stratify the sample by age. Code 9 was to be applied only for the initial survey if data on AGEGRP was not available. Such an option for coding the item was given because the item was introduced to MONICA after many MCCs had already done their initial survey.
Table 2 gives the distribution of item AGEGRP and tells whether or not age stratification was used when the sample was selected. In most RUAs the sample was stratified by 10-year age group. In AUS-NEW and ITA-FRI the sample was stratified by 5-year age group. In three RUAs one survey was stratified and another survey was not: This concerns AUS-PERa, HUN-BUD and HUN-PEC.
In 7 RUAs where the sample was stratified, data for AGEGRP are not available in the initial survey (FIN-KUO, FIN-NKA, FIN-TUL, FRA-STR, LTU-KAU, NEZ-AUC and YUG-NOS). In these populations it will not be possible to use age group definition DEF2 (see Section 7), although it would be relevant for them. In addition, in CZE-CZE there are 20 subjects whose AGEGRP is not known.
On the other hand, the age group has been coded for many of the surveys where the sample was not stratified by age. This is not a problem. On the contrary, it would have been beneficial to ask all MCCs to code the item using the definition of age which was used for the overall age group at the time of the sample selection.
Item DBIRTH, the date of birth, is used to calculate the age of the subject, and to classify the subjects to birth cohorts. If the exact day of birth is not know, the columns for day should be coded 99 (99mmyy). If the month of birth is not know, 9999 should be coded in the columns for day and month (9999yy). Data for the year should always be available. If even the year of birth is unknown, then the record cannot be used for most of the analyses.
Table 3 gives the availability of data for item DBIRTH. Only year of birth is known in 4 RUAs in the initial survey (GER-ERF, ISR-TEL, MLT-MLT, SWE-GOT) and in one population in the middle survey (SWE-GOT). There are another 3 RUAs in the initial survey where in a large proportion of subjects the year of birth only is known. In the final survey at least the month of birth is known for all populations, but in two populations the exact day is not known for any of the subjects.
In SWE-GOT the date of birth is available, but for confidentiality reasons it is not allowed to be sent abroad. In the initial and middle survey they provided only the year but in the final survey also the month.
Item DEXAM, the date of examination, is used to calculate the age of the subject, and as an explanatory variable in the estimation of trends in the risk factors. If the exact day of examination is not known, the columns for day should be coded 99 (99mmyy). If the month of examination is not known, 9999 should be coded in the columns for day and month (9999yy). Data for the year should always be available. If the year of examination is not known, then the record cannot be used for trend analyses.
Table 4 gives the availability of data for item DEXAM. In most RUAs the exact date of examination can be identified. There are occasional subjects which do not have a valid date of examination (22 in the middle survey of BEL-CHA and 3 in the middle survey of BEL-GHE). Furthermore, there are several RUs in the initial survey (RU 19 of GER-EGE, GER-ERF, RUs 16, 18 and 19 of GER-KMS, RUs 22 and 26 of GER-RDM and ISR-TEL) where only the year of examination is known.
Table 7, which shows the number of observations by month and year of examination, reveals occasional isolated dates of examination, which are likely to be caused by misprints in the data. There are such suspected misprints at least for BEL-CHA, BEL-GHE, GER-COT, GER-ERF, GER-HAC, GER-KMS, GER-RHN, HUN-BUD, ICE-ICE, ITA-FRI, RUS-MOC, RUS-NOC, RUS-NOI and SWE-NSW and possibly also for GER-BER, HUN-PEC, ITA-LAT and RUS-MOI. Occasional dates of examination soon after the main survey period are likely to be late respondents.
Age has been defined in the MONICA data analyses as the age in full years on the date of examination, using data items DEXAM and DBIRTH. If the year of birth or examination was unknown, the observation was excluded from the analysis. If the month was unknown, the day and month were replaced by 3006. If only the day was unknown, it was replaced by 15. These rules seem reasonable also for the MONICA data analyses in the future.
For classifying the subjects into age groups, the following two definitions have been used:
When the MONICA sample was selected, the MCCs had to define the age of the subjects without knowing their exact date of examination. MONICA did not have any general recommendations for the definitions of the age for sample selection. Some MCCs considered the age at the beginning or in the middle of the examination period. Some considered the age at the time of sample selection, and some at the end of the year of the examinations. As DEF2 of the age group is based on the definition of age which was used at the time of the sample selection and DEF1 is based on the accurate age on the day of examination, there are inevitable discrepancies between DEF1 and DEF2.
In Table 5 we can see the number of subjects who were selected to the sample but whose age group according to DEF1 is outside the range 25-64 years. If the age used in sample selection was not around the middle of the survey period, the number of such 'outliers' can be relatively large. The proportion of such 'outliers' was more than 10% of the nearest age group in 10 RUAs. It was particularly large in GER-BRE which, however, has a good explanation: they sent data to MDC for subjects up to age 69.
Also, if the sample was stratified by age group, some of the subjects in the age groups of DEF1 may have different sampling probabilities than the others. Use of DEF2 weights the samples correctly if the sample was only stratified by sex and 10-year age group.
In Tables 5 and 6 we can also see that in some RUAs the number of subjects in the age groups and the mean ages of the age groups using the two definitions are exactly (or very nearly) the same in all age/sex groups, suggesting that the item AGEGRP does not reflect the sampling age but was calculated from DBIRTH and DEXAM. This concerns AUS-NEW (Mid), CZE-CZE (Ini, Mid, Fin), FRA-TOU (Ini, Mid), LTU-KAU (Mid) and RUS-MOI (Mid). Of these, CZE-CZE has confirmed that AGEGRP does indicate the sampling stratum. In these cases the use of DEF2 would actually mean the use of DEF1. For DEN-GLO (Ini, Mid, Fin), HUN-BUD (Ini) and HUN-PEC (Ini) the two definitions also define identical age groups, but for a different reason which will be explained in Section 9.
A recommendation about the two definitions will be given in Section 12.
The MONICA Project concerns age group 25-64, but the age group 25-34 is optional in the population survey. Table 5 gives the number of observations in the sex-specific 10-year age groups in the initial, middle and final MONICA survey using DEF1 and DEF2. (Note that in Table 5 the rows for DEF2 includes only the subjects whose AGEGRP was coded in the MCC.) From the table, the inclusion of age group 25-34 in the survey is obvious for all RUAs except AUS-NEW, FRA-STR, and GER-BER, where in one survey in each the number of observations in age group 25-34 is relatively large, but essentially less than in the other age groups. Comparison between the initial, middle and final survey shows that age group 25-34 should be excluded from the analysis for these RUAs except GER-BER. Therefore, the age range of each RUA to be considered in collaborative MONICA analyses is:
For FRA-LIL the full age range 25-64 is available for analysis in the initial survey.
Most of the measurements in the MONICA risk factor surveys are influenced by age. Therefore, when looking at changes in risk factor levels in time or when comparing risk factor levels between populations, it is important that the effect of possible changes or differences in the age distribution of the samples will be eliminated. As most of the MONICA samples were stratified by 10-year age group, it is natural to also stratify the analyses by 10-year age groups. Then differences in the proportions of the 10-year age groups will not be a problem for the results, but it will be crucial that the age distributions within the 10-year age groups are approximately equal.
A basic assumption for the sample selection was that within the 10-year age groups each individual of the population had the same probability for selection to the sample. The study protocol, however, allowed an exception: "An acceptable alternative to each ten year age group will be to narrow it down to a single year, age 40, 50 and 60 years". This alternative was used by DEN-GLO in all three surveys, and the sample of HUN-PEC in the initial survey was very similar.
For most of the data analyses, the most important indicator of the equality of the age distributions is the mean age within the 10-year age groups. Table 6 gives the mean age as the deviation from the 'expected' age group mean 29.5, 39.5 etc. The rows for DEF2 in Table 6 include only the subjects whose AGEGRP was coded in the MCC.
The most striking finding concerns the initial survey of HUN-BUD where the difference between the observed and expected mean age is about 4 years. The bias in the age distribution is a reflection of the fact that everybody's age is in the two lowest years of the 10-year age group.
For the other RUAs the findings of Table 6 are summarized in Table A. The deviations from the expected mean are in most cases less than a year, and exceeds two years very rarely and only in occasional age groups. As expected, the deviations are in general smaller for DEF1 than for DEF2.
| Definition of age group |
Survey | Number of RUAs |
Cells of Table 6 > 1.0 or < -1.0 |
|---|---|---|---|
| DEF1 | Ini Mid Fin |
54 43 41 |
11 0 3 |
| DEF2 | Ini Mid Fin |
34 34 37 |
24 14 32 |
The difference of the mean age between the surveys within the RUAs is summarized in Table B. The difference never exceeds 2 years, except in HUN-BUD. The
difference is, perhaps unexpectedly, generally smaller for DEF1 than for DEF2. For DEF1 it
is greater than one in more than one age group in FRA-LIL and RUS-MOC only. From this
point of view, the age distributions within the age groups are reasonably good.
| Definition of age group |
Number of RUAs |
Cells of Table 6 with difference between any two surveys > 1.0 |
|---|---|---|
| DEF1 | 49 | 16 |
| DEF2 | 37 | 36 |
Table 7 gives the number of examinations in each month for each RUA. Table 8 gives a summary of the periods to be used in publications of the risk factor results. The summary gives continuous periods, where the beginning and the end of the survey were defined as follows: The months with less than 5 examinations at the beginning or at the end of the survey were excluded from the period. Also, any isolated months with less than 10 examinations were excluded. These exclusion concern the reporting of the survey periods only, and in general, such exclusions should not be done for the data analysis. For GER-ERF the whole period and for GER-EGE, GER-KMS and GER-RDM the starting month of the initial survey have been reported by the MCC separately, as only the year is given in the survey core data. In RU 19 of GER-EGE, the survey started already in November 1982, but the year of examination was coded as 1983 for all core data records.
Because of seasonal variation of some of the cardiovascular risk factors, the study protocol stated: "Screening surveys should be repeated at the same time of year to minimise the effects of seasonal trends".
The seasonal differences between the survey periods can be seen visually in Table 7, Appendix 3 and Table 9. The "Total" row of Table 7 can be used to compare the month distributions between the RUAs or between the surveys within the RUAs. Appendix 3 gives similar data as Table 7 but by sex and 10-year age group. Table 9 gives a graphical representation of the proportions of the survey in each month of the year, distinguishing between the main 80% of the survey, the main 95% of the survey and the tails.
As quantitative measure of the seasonal overlap between surveys within the RUAs, Table 10 shows the month difference between each pair of surveys by age group and sex and overall. The month difference is defined as the proportion-months needed to move the months of one survey completely on top of the months of the other survey. If the proportional month distributions of the two surveys overlap completely, the month difference is 0. Value 1.0 arises from the situation where there is exactly a month's shift between the surveys, or there is a three month shift for one third of the survey. The maximum value of 6 is attained only in the case where both surveys were done within one month and exactly at the opposite ends of the year. A more accurate definition of the month difference and the algorithm used for computing it for this report is given in Appendix 2.
The month difference counts the difference between each calendar month equally, and therefore does not separate between pairs of months belong to the same season or to different seasons. Such distinction between the seasons is probably impossible to make in the general levels as it is likely that the relevant seasons would need to be defined individually for each risk factor and for each country. Nevertheless, the month difference helps us to point out the RUAs where there is lack of overlap between the surveys and it does also take into account the magnitude of the overall shift between the survey months. In the cases where the month difference is large, Tables 7 and 9 and Appendix 3 should be used to find out the nature of the problem.
Table C gives the cumulative number of RUAs where the month difference is more than a given value in at least three 10-year age/sex groups.
| Survey | Month difference exceeding | Number of RUAs | ||
|---|---|---|---|---|
| > 1.5 | > 2.0 | > 3.0 | ||
| Ini-Mid | 7 | 2 | 0 | 38 |
| Mid-Fin | 7 | 4 | 1 | 37 |
| Ini-Fin | 6 | 3 | 1 | 38 |
For three RUAs it is not possible to calculate the month difference in the initial survey because the month of examination is not known for all Reporting Units involved:
The RUAs with the largest month differences according to Table 10 are:
The large month difference is a major concern in GER-ERF, RUS-MOCa, RUS-NOCa and RUS-NOI.. For the other RUAs listed above there is potential problem, which will require attention from those who analyzing trends in risk factors.
We have considered two ways of defining the age group: DEF1, which is based on the age on the day of examination and DEF2, which is based on the stratification of the sample. DEF2 is more natural to use because it accounts for the sampling weights if the sampling was stratified by the age group. Also, it always makes use of the whole sample. DEF2, however, has a number of shortcomings:
DEF1 has the following shortcomings:
Shortcoming 1 of DEF1 cannot be expected to have a major effect because the wrong weighting would concern only a small proportion of the sample and the sampling weights of the neighbouring age groups do not differ much. If those who had no chance of being selected according to DEF1 had a significant influence, the problem should be reflected in the extreme age groups in Table 6. However, one can see from Table 6, that the comparability between the surveys even in the extreme age groups is better using DEF1 than DEF2. Therefore, it is recommended that DEF1 will be used for the definition of age group both in cross-sectional and trend analysis.
Because of the very biased age distributions within the 10-year age groups, data from HUN-BUD should not be used for trend analysis.
To obviate the effect of incorrect dates of examination on the estimates of risk factor trends, the occasional subjects whose date of examination is more than one month before the start of examination of the others should be excluded from trend analysis. The same concerns the occasional subjects whose date of examination is more than six months after the period of examination of the others.
The subjects whose year of examination or year of birth is missing, should be excluded from analysis.
If the day of DEXAM or DBIRTH is missing, but the month and year are available, the day should be interpreted as 15 in all data analyses. If the month of DEXAM or DBIRTH is missing, but the year is available, the day and month should be interpreted as 3006.
When the survey period is reported as a single time interval, the periods given in Table 8 should be used.
The seasonal differences between the survey periods may decrease the comparability between the RUAs, and therefore should be considered when drawing conclusions on cross-sectional comparisons.
The large month difference is a major concern in GER-ERF, RUS-MOC, RUS-NOC and RUS-NOI.. Also for DEN-GLO, GER-EGE, GER-KMS, HUN-BUD, HUN-PEC, ICE-ICE, ITA-FRI, NEZ-AUC, RUS-MOI, RUS-NOC, SWE-GOT, UNK-GLA the shift between the survey periods may be a potential source of bias for risk factor trends, and therefore require attention from those who analyse trends in risk factors.
The following list includes only the RUAs with specific findings or exceptional background information relevant for the use of the data.
AUS-NEW
BEL-CHA
BEL-GHE
CZE-CZE
DEN-GLO
FIN-KUO, FIN-NKA and FIN-TUL
FRA-STR
FRA-TOU
GER-BRE
GER-COT
GER-EGE
GER-ERF
GER-HAC
GER-KMS
GER-RDM
HUN-BUD
HUN-PEC
ICE-ICE
ISR-TEL
ITA-FRI
ITA-LAT
LTU-KAU
MLT-MLT
NEZ-AUC
ROM-BUC
RUS-MOC
RUS-MOI
RUS-NOC
RUS-NOI
SWE-GOT
UNK-GLA
YUG-NOS