Age, Date of Examination and Survey Periods in the MONICA Surveys

Appendix 2. Month difference between two surveys

(See Section 11 and Table 10.)

Our aim is to define a quantitative measure of the overlap between the seasons of two survey periods. There are a number of issues which complicate the definition of such a measure:

  1. The number of subjects may be different in the two surveys;
  2. If the seasons are defined as long time periods, like in the case of four seasons, the definition of the cut points between the surveys may have a large influence on the result;
  3. If the time is considered as continuous, it may become unfeasible to compute the measure.

A practical solution to problems 2 and 3 is to consider the year as 12 months. Then the magnitude of the influence of the selection of cut points is insignificant, but there will be no major computational problems. The natural definition of the 12 periods are the calendar months. The problem of differences in the survey sizes can be solved by considering proportional frequencies of the each month within the survey rather than the actual numbers of subjects examined.

For the two surveys, survey A and survey B say, we now have the proportions of examinations in each month. For each survey the sum of the proportions over the 12 months equals one. We can make the surveys overlap fully if we move the parts of survey A from the months where A is more frequent than B to the months where B is more frequent than A. We can quantify each such move by multiplying the proportion of survey A moved by the distance (in months) of the move. The total of such moves required to get surveys A and B overlap fully could be used as a measure of the difference between the survey periods, unless there were a major problem: The total depends on the way the moves were done. Each move can be done to the next months or to the previous months (assuming that December precedes January). Both the direction and the order of the moves influences the total moves required.

We define the month difference between surveys A and B as the minimum of the total of all possible series of moves required to make survey A overlap fully with survey B.

The month difference has the following properties:

Example

Figure 1 shows an example where survey A is distributed uniformly on February and March, and survey B is distributed uniformly on December, January, February, March and April. To calculate the month difference, we would move 20% of survey A from March to April, 10% from March to January, 10% from February to January and 20% from February to December. The month difference would then be

0.1 × 1 month + 0.1 × 2 months + 0.1 × 1 month + 0.2 × 2 months = 0.9 months.

Figure 1. Example of the month distributions of two surveys


How to compute the month difference between two surveys

The algorithm used in MDC to compute the month difference does not correspond closely to the description of the definition above, but it can be shown that it leads to a good approximation of the same result. It is several times faster than alternative algorithms tested. The steps of the algorithm used are:

  1. Monthly proportions of observations for both surveys were calculated.
  2. The difference between the monthly proportions of the two surveys was calculated. Some of the months had negative and some positive differences. For the rest of the algorithm we need these differences only.
  3. Starting from January, we added its difference to the difference of February. The magnitude of this move from January to February was the absolute value of the original difference in January. Next we moved the sum of the differences of January and February to March, which increased the total magnitude of moves by the absolute value of what was left for February, etc. When we reached December, we got the cumulative magnitude of moves.
  4. We repeated the same, but starting at February, March, etc.
  5. As an estimate of the month difference we used the minimum of the 12 cumulative magnitudes of moves at step 4.

Mathematical formulation of the algorithm

Let s(1), … s(12) be the number of observations in each month in one survey and t(1), …,t(12) the number of observation in the other. Let the total number of observations in the two surveys be

Formula

and

Formula

Let p(1), …, p(12) be the monthly proportions of observations in one survey and q(1), …, q(12) in the other:

Formula

and

Formula

From this we get that

Formula

The difference between the monthly proportions is

Formula

from which we see that

Formula

The cumulative magnitude of moves required to make the distributions equal within the range January-December is

Formula

Similarly the magnitude of moves required to make the distribution equal within the year starting at month i is

Formula

We estimate the month difference by

Formula