WWW-publications from the WHO MONICA Project

Age, Date of Examination and Survey Periods in the MONICA Surveys

May 1998

Kari Kuulasmaa1, Hanna Tolonen1, Marco Ferrario2 and Esa Ruokokoski1 for the WHO MONICA Project3

1 MONICA Data Centre, National Public Health Institute, Helsinki, Finland
2 Research Centre for Chronic-Degenerative Diseases, Institute of Biomedical Sciences San Gerardo, University of Milan, Milan, Italy
3 Annex: Sites and key personnel of the WHO MONICA Project


© Copyright World Health Organization (WHO) and the WHO MONICA Project investigators 1999. All rights reserved.

Acknowledgements

Thanks are due to Alun Evans who commented on the text and Vladislav Moltchanov who helped in the preparation of an earlier version of the document.

The MONICA Centres are funded predominantly by regional and national governments, research councils, and research charities. Coordination is the responsibility of the World Health Organization (WHO), assisted by local fund raising for congresses and workshops. WHO also supports the MONICA Data Centre (MDC) in Helsinki. Not covered by this general description is the ongoing generous support of the MDC by the National Public Health Institute of Finland, and a contribution to WHO from the National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland, USA for support of the MDC. The completion of the MONICA Project is generously assisted through a Concerted Action Grant from the European Community. Likewise appreciated are grants from ASTRA Hässle AB, Sweden, Hoechst AG, Germany, Hoffmann-La Roche AG, Switzerland, the Institut de Recherches Internationales Servier (IRIS), France, and Merck & Co. Inc., New Jersey, USA, to support data analysis and preparation of publications.


MONICA data items considered in this document

DEXAM, DBIRTH, AGEGRP


Contents

1. Introduction

In the MONICA surveys (1,2) three data items are related to the age of the subjects and the survey period:

DEXAM
Date of examination
DBIRTH
Date of birth
AGEGRP
Age group in which the subject was originally selected for the sample

These items are used for determining the age and age group of the subjects and the survey period, which is important in risk factor trend analysis. The purpose of this report is to:

2. Material and methods

2.1. Populations

The report considers the Reporting Unit Aggregates (RUA) which are foreseen as potential candidates for units of analyses of the MONICA data. The RUAs, their abbreviations and Reporting Units are listed in Table 1. Some of the RUAs have several versions because different combinations of Reporting Units (RU) may be used for cross-sectional and trend analyses if all reporting units of the population were not included in all three or two surveys. Therefore, in AUS-PER, GER-BRE, GER-EGE, GER-KMS, GER-RDM, RUS-MOI and RUS-NOC there is an overlap of reporting units included in the RUAs in some surveys. The RUAs are identified by the abbreviation and a version letter. For UNK-GLA which carried out four surveys, the first (Ini), third (Mid) and fourth (Fin) survey are considered. Altogether 54 RUAs are considered for the initial survey, 43 for the middle and 41 for the final survey.

2.2. Age and sex

No age selection of subjects' age was made for this report. Data for men and women have been combined unless otherwise specified.

2.3. Sources of information

The sources of data for this report are the survey core data in the MONICA Data Centre (MDC), sample selection description forms completed by the MONICA Collaborating Centres (MCC) and other relevant information.

3. Routine checking of the data

When the survey core data were received in MDC, the data were checked routinely for the following constraints:

DEXAM_LIMITS_4
DEXAM must be a date or 99MMYY or 9999YY, and between 010179 and today.
DBIRTH_LIMITS_4
DBIRTH must be a date or 99MMYY or 9999YY.
DBIRTH_DEXAM_4
DEXAM-DBIRTH must be at least 25 years and at most 64 years. (Note the following interpretations in the checking:
DBIRTH 99MMYY = 15MMYY
DBIRTH 9999YY = 3006YY
DEXAM 99MMYY = 15MMYY
DEXAM 9999YY = 3006YY.)
AGEGRP_CHECK_4
Accepted values for AGEGRP are 1, 2, 3, 4, 8 and 9.
If AGEGRP=1, then should be 23 < years of DEXAM-DBIRTH < 36;
If AGEGRP=2, then should be 33 < years of DEXAM-DBIRTH < 46;
If AGEGRP=3, then should be 43 < years of DEXAM-DBIRTH < 56;
If AGEGRP=4, then should be 53 < years of DEXAM-DBIRTH < 66.

All violations of these constraints were reported to the MCC for their correction or confirmation. The MCCs were expected to check locally the items AGEGRP, DBIRTH and DEXAM more carefully than was possible in MDC, because the MCCs were aware of the exact examination periods and the local definition of age. The current unresolved constraint violations are shown in Appendix 1. There are only few unresolved constraint violations, for BEL-CHA (Mid), BEL-GHE (Mid), CAN-HAL (Ini), FRA-STR (Fin), ITA-BRI (Mid), ITA-LAT (Ini), and ROM-BUC (Ini).

4. Age stratification of the sample and quality of data on AGEGRP

The MONICA Manual gave an option to stratify the sample by 10-year age group. The purpose of item AGEGRP is to identify the age stratum of the subjects if the sample was stratified by age. Code 8 was to be used by the MCCs which did not stratify the sample by age. Code 9 was to be applied only for the initial survey if data on AGEGRP was not available. Such an option for coding the item was given because the item was introduced to MONICA after many MCCs had already done their initial survey.

Table 2 gives the distribution of item AGEGRP and tells whether or not age stratification was used when the sample was selected. In most RUAs the sample was stratified by 10-year age group. In AUS-NEW and ITA-FRI the sample was stratified by 5-year age group. In three RUAs one survey was stratified and another survey was not: This concerns AUS-PERa, HUN-BUD and HUN-PEC.

In 7 RUAs where the sample was stratified, data for AGEGRP are not available in the initial survey (FIN-KUO, FIN-NKA, FIN-TUL, FRA-STR, LTU-KAU, NEZ-AUC and YUG-NOS). In these populations it will not be possible to use age group definition DEF2 (see Section 7), although it would be relevant for them. In addition, in CZE-CZE there are 20 subjects whose AGEGRP is not known.

On the other hand, the age group has been coded for many of the surveys where the sample was not stratified by age. This is not a problem. On the contrary, it would have been beneficial to ask all MCCs to code the item using the definition of age which was used for the overall age group at the time of the sample selection.

5. Quality of data on DBIRTH

Item DBIRTH, the date of birth, is used to calculate the age of the subject, and to classify the subjects to birth cohorts. If the exact day of birth is not know, the columns for day should be coded 99 (99mmyy). If the month of birth is not know, 9999 should be coded in the columns for day and month (9999yy). Data for the year should always be available. If even the year of birth is unknown, then the record cannot be used for most of the analyses.

Table 3 gives the availability of data for item DBIRTH. Only year of birth is known in 4 RUAs in the initial survey (GER-ERF, ISR-TEL, MLT-MLT, SWE-GOT) and in one population in the middle survey (SWE-GOT). There are another 3 RUAs in the initial survey where in a large proportion of subjects the year of birth only is known. In the final survey at least the month of birth is known for all populations, but in two populations the exact day is not known for any of the subjects.

In SWE-GOT the date of birth is available, but for confidentiality reasons it is not allowed to be sent abroad. In the initial and middle survey they provided only the year but in the final survey also the month.

6. Quality of data on DEXAM

Item DEXAM, the date of examination, is used to calculate the age of the subject, and as an explanatory variable in the estimation of trends in the risk factors. If the exact day of examination is not known, the columns for day should be coded 99 (99mmyy). If the month of examination is not known, 9999 should be coded in the columns for day and month (9999yy). Data for the year should always be available. If the year of examination is not known, then the record cannot be used for trend analyses.

Table 4 gives the availability of data for item DEXAM. In most RUAs the exact date of examination can be identified. There are occasional subjects which do not have a valid date of examination (22 in the middle survey of BEL-CHA and 3 in the middle survey of BEL-GHE). Furthermore, there are several RUs in the initial survey (RU 19 of GER-EGE, GER-ERF, RUs 16, 18 and 19 of GER-KMS, RUs 22 and 26 of GER-RDM and ISR-TEL) where only the year of examination is known.

Table 7, which shows the number of observations by month and year of examination, reveals occasional isolated dates of examination, which are likely to be caused by misprints in the data. There are such suspected misprints at least for BEL-CHA, BEL-GHE, GER-COT, GER-ERF, GER-HAC, GER-KMS, GER-RHN, HUN-BUD, ICE-ICE, ITA-FRI, RUS-MOC, RUS-NOC, RUS-NOI and SWE-NSW and possibly also for GER-BER, HUN-PEC, ITA-LAT and RUS-MOI. Occasional dates of examination soon after the main survey period are likely to be late respondents.

7. Definitions of age and age group

Age has been defined in the MONICA data analyses as the age in full years on the date of examination, using data items DEXAM and DBIRTH. If the year of birth or examination was unknown, the observation was excluded from the analysis. If the month was unknown, the day and month were replaced by 3006. If only the day was unknown, it was replaced by 15. These rules seem reasonable also for the MONICA data analyses in the future.

For classifying the subjects into age groups, the following two definitions have been used:

DEF1:
The subjects are classified into 10-year age groups 25-34, 35-44, etc. according to the age in full years on the date of examination. If the date of birth or date of examination is not known exactly, the rules given above have been used in the calculation of age.
DEF2:
The subjects are classified into 10-year age groups according to the item AGEGRP. In the cases where the age group cannot be specified from the item AGEGRP, DEF1 has been used.

When the MONICA sample was selected, the MCCs had to define the age of the subjects without knowing their exact date of examination. MONICA did not have any general recommendations for the definitions of the age for sample selection. Some MCCs considered the age at the beginning or in the middle of the examination period. Some considered the age at the time of sample selection, and some at the end of the year of the examinations. As DEF2 of the age group is based on the definition of age which was used at the time of the sample selection and DEF1 is based on the accurate age on the day of examination, there are inevitable discrepancies between DEF1 and DEF2.

In Table 5 we can see the number of subjects who were selected to the sample but whose age group according to DEF1 is outside the range 25-64 years. If the age used in sample selection was not around the middle of the survey period, the number of such 'outliers' can be relatively large. The proportion of such 'outliers' was more than 10% of the nearest age group in 10 RUAs. It was particularly large in GER-BRE which, however, has a good explanation: they sent data to MDC for subjects up to age 69.

Also, if the sample was stratified by age group, some of the subjects in the age groups of DEF1 may have different sampling probabilities than the others. Use of DEF2 weights the samples correctly if the sample was only stratified by sex and 10-year age group.

In Tables 5 and 6 we can also see that in some RUAs the number of subjects in the age groups and the mean ages of the age groups using the two definitions are exactly (or very nearly) the same in all age/sex groups, suggesting that the item AGEGRP does not reflect the sampling age but was calculated from DBIRTH and DEXAM. This concerns AUS-NEW (Mid), CZE-CZE (Ini, Mid, Fin), FRA-TOU (Ini, Mid), LTU-KAU (Mid) and RUS-MOI (Mid). Of these, CZE-CZE has confirmed that AGEGRP does indicate the sampling stratum. In these cases the use of DEF2 would actually mean the use of DEF1. For DEN-GLO (Ini, Mid, Fin), HUN-BUD (Ini) and HUN-PEC (Ini) the two definitions also define identical age groups, but for a different reason which will be explained in Section 9.

A recommendation about the two definitions will be given in Section 12.

8. Age range surveyed in each RUA

The MONICA Project concerns age group 25-64, but the age group 25-34 is optional in the population survey. Table 5 gives the number of observations in the sex-specific 10-year age groups in the initial, middle and final MONICA survey using DEF1 and DEF2. (Note that in Table 5 the rows for DEF2 includes only the subjects whose AGEGRP was coded in the MCC.) From the table, the inclusion of age group 25-34 in the survey is obvious for all RUAs except AUS-NEW, FRA-STR, and GER-BER, where in one survey in each the number of observations in age group 25-34 is relatively large, but essentially less than in the other age groups. Comparison between the initial, middle and final survey shows that age group 25-34 should be excluded from the analysis for these RUAs except GER-BER. Therefore, the age range of each RUA to be considered in collaborative MONICA analyses is:

For FRA-LIL the full age range 25-64 is available for analysis in the initial survey.

9. Distribution of age within the age groups

Most of the measurements in the MONICA risk factor surveys are influenced by age. Therefore, when looking at changes in risk factor levels in time or when comparing risk factor levels between populations, it is important that the effect of possible changes or differences in the age distribution of the samples will be eliminated. As most of the MONICA samples were stratified by 10-year age group, it is natural to also stratify the analyses by 10-year age groups. Then differences in the proportions of the 10-year age groups will not be a problem for the results, but it will be crucial that the age distributions within the 10-year age groups are approximately equal.

A basic assumption for the sample selection was that within the 10-year age groups each individual of the population had the same probability for selection to the sample. The study protocol, however, allowed an exception: "An acceptable alternative to each ten year age group will be to narrow it down to a single year, age 40, 50 and 60 years". This alternative was used by DEN-GLO in all three surveys, and the sample of HUN-PEC in the initial survey was very similar.

For most of the data analyses, the most important indicator of the equality of the age distributions is the mean age within the 10-year age groups. Table 6 gives the mean age as the deviation from the 'expected' age group mean 29.5, 39.5 etc. The rows for DEF2 in Table 6 include only the subjects whose AGEGRP was coded in the MCC.

The most striking finding concerns the initial survey of HUN-BUD where the difference between the observed and expected mean age is about 4 years. The bias in the age distribution is a reflection of the fact that everybody's age is in the two lowest years of the 10-year age group.

For the other RUAs the findings of Table 6 are summarized in Table A. The deviations from the expected mean are in most cases less than a year, and exceeds two years very rarely and only in occasional age groups. As expected, the deviations are in general smaller for DEF1 than for DEF2.

Table A. Number of cells of Table 6 (excluding HUN-BUD) where the absolute value is more than 1.0
Definition of
age group
Survey Number of
RUAs
Cells of Table 6
> 1.0 or < -1.0
DEF1 Ini
Mid
Fin
54
43
41
11
0
3
DEF2 Ini
Mid
Fin
34
34
37
24
14
32

The difference of the mean age between the surveys within the RUAs is summarized in Table B. The difference never exceeds 2 years, except in HUN-BUD. The difference is, perhaps unexpectedly, generally smaller for DEF1 than for DEF2. For DEF1 it is greater than one in more than one age group in FRA-LIL and RUS-MOC only. From this point of view, the age distributions within the age groups are reasonably good.

Table B. Cells of Table 6 (excluding HUN-BUD) where the difference between any two of the surveys exceeds 1.0
Definition of
age group
Number of
RUAs
Cells of Table 6 with difference between any two surveys > 1.0
DEF1 49 16
DEF2 37 36

10. Survey periods

Table 7 gives the number of examinations in each month for each RUA. Table 8 gives a summary of the periods to be used in publications of the risk factor results. The summary gives continuous periods, where the beginning and the end of the survey were defined as follows: The months with less than 5 examinations at the beginning or at the end of the survey were excluded from the period. Also, any isolated months with less than 10 examinations were excluded. These exclusion concern the reporting of the survey periods only, and in general, such exclusions should not be done for the data analysis. For GER-ERF the whole period and for GER-EGE, GER-KMS and GER-RDM the starting month of the initial survey have been reported by the MCC separately, as only the year is given in the survey core data. In RU 19 of GER-EGE, the survey started already in November 1982, but the year of examination was coded as 1983 for all core data records.

11. Seasonal differences between survey periods

Because of seasonal variation of some of the cardiovascular risk factors, the study protocol stated: "Screening surveys should be repeated at the same time of year to minimise the effects of seasonal trends".

The seasonal differences between the survey periods can be seen visually in Table 7, Appendix 3 and Table 9. The "Total" row of Table 7 can be used to compare the month distributions between the RUAs or between the surveys within the RUAs. Appendix 3 gives similar data as Table 7 but by sex and 10-year age group. Table 9 gives a graphical representation of the proportions of the survey in each month of the year, distinguishing between the main 80% of the survey, the main 95% of the survey and the tails.

As quantitative measure of the seasonal overlap between surveys within the RUAs, Table 10 shows the month difference between each pair of surveys by age group and sex and overall. The month difference is defined as the proportion-months needed to move the months of one survey completely on top of the months of the other survey. If the proportional month distributions of the two surveys overlap completely, the month difference is 0. Value 1.0 arises from the situation where there is exactly a month's shift between the surveys, or there is a three month shift for one third of the survey. The maximum value of 6 is attained only in the case where both surveys were done within one month and exactly at the opposite ends of the year. A more accurate definition of the month difference and the algorithm used for computing it for this report is given in Appendix 2.

The month difference counts the difference between each calendar month equally, and therefore does not separate between pairs of months belong to the same season or to different seasons. Such distinction between the seasons is probably impossible to make in the general levels as it is likely that the relevant seasons would need to be defined individually for each risk factor and for each country. Nevertheless, the month difference helps us to point out the RUAs where there is lack of overlap between the surveys and it does also take into account the magnitude of the overall shift between the survey months. In the cases where the month difference is large, Tables 7 and 9 and Appendix 3 should be used to find out the nature of the problem.

Table C gives the cumulative number of RUAs where the month difference is more than a given value in at least three 10-year age/sex groups.

Table C. Cumulative number of RUAs with the month difference exceeding the given values in at least three of the six or eight age/sex groups
Survey Month difference exceeding Number of RUAs
> 1.5 > 2.0 > 3.0
Ini-Mid 7 2 0 38
Mid-Fin 7 4 1 37
Ini-Fin 6 3 1 38

For three RUAs it is not possible to calculate the month difference in the initial survey because the month of examination is not known for all Reporting Units involved:

The RUAs with the largest month differences according to Table 10 are:

The large month difference is a major concern in GER-ERF, RUS-MOCa, RUS-NOCa and RUS-NOI.. For the other RUAs listed above there is potential problem, which will require attention from those who analyzing trends in risk factors.

12. Discussion and recommendations

12.1. Definition of age group

We have considered two ways of defining the age group: DEF1, which is based on the age on the day of examination and DEF2, which is based on the stratification of the sample. DEF2 is more natural to use because it accounts for the sampling weights if the sampling was stratified by the age group. Also, it always makes use of the whole sample. DEF2, however, has a number of shortcomings:

  1. Data for item AGEGRP, on which DEF2 is based, was asked only from the MCCs which stratified the sample by age group. The definition of the sampling age is relevant in the youngest and oldest age groups also in the MCCs which did not stratify their sample by age group.
  2. In eight RUAs data for AGEGRP are not available in the initial survey even though the sample was stratified by age. In two RUAs the age stratification was done by 5-year age group, and therefore the item AGEGRP would not be sufficient to account for the sampling weights.
  3. The actual age range covered by the age groups varies from centre to centre, and there is a 2-3 year shift between the extreme RUAs. This decreases the comparability between the populations.
  4. The actual age range covered by the age groups varies between the surveys within many RUAs. The differences between the surveys in the mean ages within the 10-year age groups are in general larger than for DEF1.

DEF1 has the following shortcomings:

  1. If the sampling scheme is not otherwise taken into account in analysis, DEF1 gives wrong weights to the observations which would belong to a different age group using DEF2. In an extreme situation, some observations which would belong to the population defined using DEF1 had no chance of being selected. This happens at the ends of the overall age range.

Shortcoming 1 of DEF1 cannot be expected to have a major effect because the wrong weighting would concern only a small proportion of the sample and the sampling weights of the neighbouring age groups do not differ much. If those who had no chance of being selected according to DEF1 had a significant influence, the problem should be reflected in the extreme age groups in Table 6. However, one can see from Table 6, that the comparability between the surveys even in the extreme age groups is better using DEF1 than DEF2. Therefore, it is recommended that DEF1 will be used for the definition of age group both in cross-sectional and trend analysis.

12.2. Exclusion of populations from the analyses

Because of the very biased age distributions within the 10-year age groups, data from HUN-BUD should not be used for trend analysis.

12.3. Outlying dates of examination

To obviate the effect of incorrect dates of examination on the estimates of risk factor trends, the occasional subjects whose date of examination is more than one month before the start of examination of the others should be excluded from trend analysis. The same concerns the occasional subjects whose date of examination is more than six months after the period of examination of the others.

12.4. Missing year of DEXAM and DBIRTH

The subjects whose year of examination or year of birth is missing, should be excluded from analysis.

12.5. Missing day or month of examination or birth

If the day of DEXAM or DBIRTH is missing, but the month and year are available, the day should be interpreted as 15 in all data analyses. If the month of DEXAM or DBIRTH is missing, but the year is available, the day and month should be interpreted as 3006.

12.6. Reporting survey period

When the survey period is reported as a single time interval, the periods given in Table 8 should be used.

12.7. Seasonal differences between the surveys

The seasonal differences between the survey periods may decrease the comparability between the RUAs, and therefore should be considered when drawing conclusions on cross-sectional comparisons.

The large month difference is a major concern in GER-ERF, RUS-MOC, RUS-NOC and RUS-NOI.. Also for DEN-GLO, GER-EGE, GER-KMS, HUN-BUD, HUN-PEC, ICE-ICE, ITA-FRI, NEZ-AUC, RUS-MOI, RUS-NOC, SWE-GOT, UNK-GLA the shift between the survey periods may be a potential source of bias for risk factor trends, and therefore require attention from those who analyse trends in risk factors.

13. Comments on individual RUAs

The following list includes only the RUAs with specific findings or exceptional background information relevant for the use of the data.

AUS-NEW

BEL-CHA

BEL-GHE

CZE-CZE

DEN-GLO

FIN-KUO, FIN-NKA and FIN-TUL

FRA-STR

FRA-TOU

GER-BRE

GER-COT

GER-EGE

GER-ERF

GER-HAC

GER-KMS

GER-RDM

HUN-BUD

HUN-PEC

ICE-ICE

ISR-TEL

ITA-FRI

ITA-LAT

LTU-KAU

MLT-MLT

NEZ-AUC

ROM-BUC

RUS-MOC

RUS-MOI

RUS-NOC

RUS-NOI

SWE-GOT

UNK-GLA

YUG-NOS

References

  1. Tunstall-Pedoe H for the WHO MONICA Project. The World Health Organization MONICA Project (Monitoring Trends and Determinants in Cardiovascular Disease): A major international collaboration. J Clin Epidemiol 1988;41:105-14.
  2. WHO MONICA Project. MONICA Manual. Part III: Population Survey. Section 1: Population survey data component. (December 1997). Available from: URL: http://www.ktl.fi/publications/monica/manual/part3/iii-1.htm, URN:NBN:fi-fe19981151.