MORGAM logo

Subsample selection after extension of follow-up


© National Institute for Health and Welfare and the MORGAM Project investigators
Last updated: 5 April 2005
For more information, please contact Kari.Kuulasmaa (firstname.lastname@thl.fi)

Contents

1. Introduction

As described in Selection of cases and cohort subsample, a case-cohort design is used in the MORGAM genetic study. The size of the subcohort to be selected is determined by the number of CHD and stroke events. Extension in the follow-up period will result into increase in the number of observed events and hence an increase in the desired size of the subcohort. A special consideration is needed for selection of the supplementary subcohort to reach the new subcohort size.

2. Enlargement of subcohort

For the cohort-sampling in MORGAM, a subject with age a at baseline is selected to the sample with probability proportional to f(a) where f(a), a function of age, is obtained from the total death rate of the cohort using a logistic model. With increase in the number of deaths, there will be some change in the function f though the change is unlikely to be dramatic.

Let N be the cohort size. Let n1 be the subcohort size and f1(a) be the function used for sampling probabilities using the shorter follow-up period. In the PPS sampling without replacement, an individual i is selected in the sample with probability p1i = min(n1f1(ai)/F1, 1) where F1 is the sum of f1(ai)over all the N individuals.

Let n2 (>n1) be the new subcohort size and f2(a) be the function used for sampling probabilities using the extended follow-up period. In this case, an individual i is selected in the sample with probability p2i = min(n2f2(ai)/F2, 1) where F2 is the sum of f2(ai)over all the N individuals.

The question is how to select a subcohort of size ns(=n2 - n1) which needs to be augmented with the earlier selected subcohort of size n1 so that selection probability for an individual i is p2i, finally.

Let psi be the required selection probability for individual i at the second phase given that the individual was not selected at the first phase. Because the subcohort sampling is done without replacement, an individual selected at the first phase is not considered for the second phase. Then

p2i = p1i + (1- p1i ) psi,

which gives a nice expression for psi as

psi= (p2i - p1i )/(1-p1i).

Note that psi need not be non-negative but it is always less than or equal to 1 since p2i is always less than or equal to 1.

We assume that a sample of size n1 has been selected with the ultimate selection probability for individual i as p1i. The following algorithm can be given for the enlargement of the sample:

Step 1: Obtain p2i, i=1, 2, ..., N using the extended follow-up data. Determine the new subcohort size, n2.

Step 2: Calculate psi= max[0,(p2i - p1i)/(1-p1i)] and ns = n2 - n1.

Step 3: Select ns individuals out of (N-n1) individuals who were not selected at the first phase with probability proportional to psi. The sampling is done using the Hanurav-Vijayan algorithm that is implemented in SAS as procedure PROC SURVEYSELECT METHOD=PSS.

Although psi is only an approximation of the sampling probability, which also depends on the sample that was selected at the first phase, p2i is recorded as the ultimate sampling probability (PROB in Form 65).

References

  1. Kim S, De Gruttola V. Strategies for cohort sampling under Cox proportional hazards model, application to an aids trial. Lifetime Data Anal 1999;5:149-172.
  2. SAS Institue Inc. SAS/STAT User's Guide, Version 8, 2000.
  3. Hanurav, T.V. Optimum Utilization of Auxiliary Information: \pips Sampling of Two Units from a Stratum, Journal of the Royal Statistical Society, Series B, 1967;29:374-391.
  4. Vijayan, K. An Exact \pips Sampling Scheme: Generalization of a Method of Hanurav, Journal of the Royal Statistical Society, Series B, 1968;30:556-566.