WWW-publications from the WHO MONICA Project
April 1999
Marco Ferrario1, Kari Kuulasmaa2, Dusan Grafnetter3 and Vladislav Moltchanov2 for the WHO MONICA Project4
1 Institute of Biomedical Sciences San Gerardo, and Research
Centre on Chronic-Degenerative Diseases, University of Milan, Monza, Italy;
2 MONICA Data Centre, National Public Health Institute, Helsinki, Finland;
3 WHO Lipid Reference Centre, Institute for Clinical and Experimental Medicine
(IKEM), Prague, Czech Republic;
4 Annex: Sites and key personnel of the WHO MONICA
Project.
This document includes the main findings of unpublished reports:
Thanks are due to Liliane Marie Chatenoud, Tuula Virman-Ojanen for their help in collecting the data and preparing the tables on the survey and laboratory procedures, to Hanna Tolonen for her help in the preparing the other tables and the figures, and to Alun Evans who commented on the text. Professor Gian Carlo Cesana, Director of the Research Centre for Chronic Degenerative Diseases, University of Milan, for his generous and continual support is gratefully acknowledged.
The MONICA Centres are funded predominantly by regional and national governments, research councils, and research charities. Coordination is the responsibility of the World Health Organization (WHO), assisted by local fund raising for congresses and workshops. WHO also supports the MONICA Data Centre (MDC) in Helsinki. Not covered by this general description is the ongoing generous support of the MDC by the National Public Health Institute of Finland, and a contribution to WHO from the National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland, USA for support of the MDC. The completion of the MONICA Project is generously assisted through a Concerted Action Grant from the European Community. Likewise appreciated are grants from ASTRA Hässle AB, Sweden, Hoechst AG, Germany, Hoffmann-La Roche AG, Switzerland, the Institut de Recherches Internationales Servier (IRIS), France, and Merck & Co. Inc., New Jersey, USA, to support data analysis and preparation of publications.
"Quid est veritas? Vir qui adest"
Saint Augustine
The main hypothesis of the WHO MONICA Project (28) focuses on the estimate, for the participating populations, of the relationship between 10-year trends in the major CVD risk factors (total cholesterol (TC), blood pressure and cigarette consumption) and 10-year trends in incidence rates (fatal plus non-fatal attack rates) of coronary events. Population surveys conducted at the beginning and the end of the event registration period to assess coronary risk factor distributions and trends, are part of the core design of the study. An additional optional population survey may be carried out in the middle of the study period.
The MONICA Project, a world-wide model for cardiovascular disease investigation, conducted according to a common protocol and using advanced standardised methods, furnishes a unique data set for testing other relevant hypotheses. With reference to TC, it can be summarised as follows:
It was stated in the study Protocol that the standardisation and control of laboratory standards are crucial, as the levels of change to be detected over time are of the same magnitude as possible laboratory drift. Any collaborating centre should standardise its methods in accordance with the World Health Organization Regional Lipid Reference Centre (WHO-RLRC) in Prague or the Centers for Disease Control (CDC) in Atlanta, USA six months before starting and as often as requested during the study, and should maintain daily laboratory control. The Manual of Lipid Standardisation (29) gives an extensive description of the standardisation procedure.
Like most biological variables, TC measurement is subject to biological and environmental variabilities as well as to variability caused by sampling, storage and analytical procedures. Comparability of cholesterol measurements is influenced by the conditions operating over several steps of the measurement procedure, roughly divided into two stages: the pre-analytic stage (blood drawing conditions, vial handling and storing before analysis) and the analytic stage (different methods, reagents or analysers).
The aim of this report is to focus on relevant sources of variability in TC determination and to evaluate their impact on the comparability of cholesterol levels among MONICA Reporting Unit Aggregates (RUAs) as well as their impact when trends are assessed over time. It should be noted that for trend analyses the quality requirements are usually higher than for cross-sectional comparisons because the 5- or 10-year cholesterol changes in a population are usually smaller than the differences across populations.
This report considers the Reporting Unit Aggregates (RUA) which are potential candidates for units of analyses for the MONICA data. The RUAs, their abbreviations and Reporting Units (RUs) are listed in Table 1. Some of the RUAs have several versions because different combinations of RUs may be used for cross-sectional and trend analyses if all RUs of the population were not included in all surveys. Therefore, in AUS-PER, GER-BRE, GER-EGE, GER-KMS, GER-RDM, RUS-MOI and RUS-NOC there is an overlap of RUs included in the RUAs in some surveys. The RUAs are identified by the abbreviation and a version letter. For UNK-GLAa which carried out four surveys, the first (initial), third (middle) and fourth (final) survey are considered.
Compared with other survey quality assessment reports, in the current document GER-RDM and ITA-FRI have been split into smaller RUAs according to the different laboratories used by their different RUs in some or all of the surveys. Altogether 57 RUAs are considered for the initial survey, 44 for the middle, and 41 for the final survey.
Information on methods adopted in the different populations was originally gathered from the site visit questionnaires (sections: Taking blood samples and Preparation of plasma/serum samples, MONICA Memo 68, pages 17-33) and was updated with details mainly from questionnaires on survey procedures (Questionnaires on MONICA Population Survey Procedures, Form VI, pages 31-37). At the time of the preparation of the present report, this information was not available for four RUAs which participated in the initial surveys (GER-RHN, ISR-TEL, MLT-MLT and ROM-BUC). Moreover, other information was collected directly by the WHO-RLRC or the Lipid Quality Assessment Working Group, which prepared this Report.
The external quality control data were obtained from WHO-RLRC for the Centres standardized by it and from the MCCs which were standardized by CDC.
Some of the laboratories measured cholesterol in mmol/l to one decimal, others in mg/dl without decimals. The external quality control data were reported to the WHO-RLRC and the actual cholesterol data to the MONICA Data Centre (MDC) in the original units. The WHO-RLRC and the MDC have used the units mmol/l in all reports. To convert between the units: 1 mg/dl = 0.025864 mmol/l.
The standardisation of a measurement procedure consists of establishing clear and feasible guidelines and standards, which need to be adopted and implemented in the field centres. Table 2 presents RUA-specific self-reported information on training, certification, and testing of the personnel involved in blood drawing and vial handing in the first and the middle MONICA surveys, respectively. Similarly, Table 3 reports analogous information for laboratory team members directly involved in preparation of serum/plasma samples and TC determinations, separately for the two surveys. In MONICA, personnel training appears to be a very common method of standardising TC procedures in all surveys for either blood drawing or laboratory staff. A few exceptions are represented by: SWI-TIC in the initial survey and FRA-TOU in all surveys. On the other hand, certification and testing was adopted by a relatively small number of Principal Investigators (PIs) in both surveys. Recertification was not used at all. It is notable that RUAs which enrolled a higher number of technicians are the ones in which training, certification and testing were promoted most. No improvements over time in these standards have been reported, with a few exceptions: FRA-LILa and UNK-GLAa introduced testing of personnel and FRA-STRa and CHN-BEIa introduced certification of personnel in the final survey.
In the MONICA Project, precise standardisation guidelines are given in section 6 of the Manual of Standardisation of Lipid Measurements (29).
Pre-analytic sources of variation have been discussed extensively and recommendations have been reinforced in the QA-TC First Surveys Report (31). In this final report emphasis will be placed on major contributors to variability within the MONICA Project.
In MNM 175A most of these sources of variation were considered extensively and ruled out as potential confounders for cross-sectional, between-centre comparisons, as well as for longitudinal within-centre trend analyses. Included in this section is tourniquet use, because more extensive information gathered from MCCs indicates that it is a minor source of variation in the MONICA Project.
Diurnal variation in TC levels is estimated to be about 2.5% (7). According to Cooper (4), TC diurnal variations are detectable if large fluctuations of triglycerides, which are not only related to food intake, occur during the day. In most subjects, the time of blood drawing does not seem to be important. The fasting time may make a difference in cholesterol levels for a small percentage of people (3). Mayer did not find significant differences in mean cholesterol levels measured at different times of the day in a large screening programme (17). MONICA recommendations state that overnight fasting samples should be used, but non-fasting samples would also be accepted. In addition, MCCs must standardise their procedures so that they are consistent across surveys.
In the initial MONICA surveys, overnight fasting samples were collected in 23 out of 55 RUAs and blood samples were taken after at least 4 hours of fasting in 5 RUAs (Table 4). Considering changes over time, AUS-NEWa shifted from overnight fasting to non-fasting in the middle and final surveys and FRA-STRa from non fasting to overnight fasting in the final survey. In most non-fasting populations neither instructions to avoid heavy meals nor records of food intake before examination were applied. This may be an effect of having made little mention of these potential sources of variation in MNM 175A (31).
Strenuous exercise 15 minutes before blood drawing increases cholesterol levels by about 6% (5). For this reason it is commonly reported that strenuous exercise should be avoided 2-3 hours before blood sampling. Under normal examination conditions people will stay for some time in the examination centres; even if they do not, it would still be unlikely that they would have strenuous exercise before blood drawing. For this reason in MONICA no specific information was gathered.
Rapid changes of body weight are followed by a decrease in cholesterol values (5). Cholesterol levels also change during pregnancy and disease: both chronic (myocardial infarction, nephrosis, liver diseases, hypothyroidism) and acute (infections). The MONICA Manual (29) recommends keeping records, but subjects with such conditions are part of the population sample and must be included.
Cholesterol levels are influenced by several drugs: propanolol and other beta-blocker agents, diuretics, lipid-lowering drugs, oral contraceptives, hormone replacement medications and sedatives. In the middle and final MONICA surveys, individual information is recorded for antihypertensive and lipid-lowering drugs for some populations. For women, pregnancy and use of oral contraceptives or hormone replacement should have been recorded, in the last two surveys. People who take such drugs are part of the population and should be included. Moreover, the impact of using selected medications on TC mean values may be explored in some RUAs for which specific information on survey participants is available.
Prolonged venous occlusion produced by tourniquet use is associated with higher cholesterol values compared to those obtained by blood drawing without tourniquet use. Levels can increase by 10-15 % after 10 minutes of venous occlusion and 2-5 % after 2 minutes. Tourniquet use up to one minute is not associated with any significant increase in cholesterol levels. Statland and co-authors (24) found a significant increase in TC levels (3.5%) when the tourniquet was used for 3 minutes. According to Naito (19), well trained technicians need less than one minute for drawing one vial. Tan and co-authors (26) did not find any significant changes in serum cholesterol concentration when the tourniquet was applied for 0.5 to 1 minute. It has also been demonstrated that the use of a low-pressure tourniquet for 3 minutes does not significantly affect the concentration of serum constituents including TC (6). Vacuum sealed containers, if correctly used, should not affect these findings.
MONICA recommendations stated that the tourniquet should be avoided: in the initial MONICA survey (Table 5) tourniquets were seldom used in only four RUAs while in the large majority (46 out of 55) a tourniquet was used for less than one minute. This statement seems to be convincing because the time needed for blood drawing had to be short when very few tubes for each blood drawing were produced, or the tube for TC determinations was first or second in the order when vacuum sealed containers were used. The tourniquet for the entire blood drawing was used for only four RUAs in the initial survey (BEL-CHAa, BEL-GHEa, BEL-LUXa and CAN-HALa). Moreover in two of them (BEL-CHAa and BEL-GHEa), even if it was decided to shift in the following surveys to other and unspecified use of tourniquets, the time used for filling the vials is expected to be very short because only two tubes were collected. A few RUAs shifted from less than one minute's use to infrequent use, but such changes are not believed to have introduced any relevant bias. Finally, for three RUAs in the initial survey (GER-RHNa, ISR-TELa, and MLT-MLTa) this information was not provided.
For serum samples, blood should be centrifuged within two hours after blood drawing, but longer storage at different temperatures does not seem to be associated with relevant differences in TC levels (20). For plasma samples, blood should be cooled in an ice bath and centrifuged as soon as possible; then the plasma should be separated immediately after centrifuging. When centrifuging, the vials should be closed tightly to avoid evaporation. This holds for both serum and plasma samples.
In the initial MONICA survey (Table 6) 41 out of 43 RUAs, which performed TC determinations on sera, stored samples before centrifugation for up to 4 hours, and most of them at room temperature. Only a few populations kept vials refrigerated at 4°C or less. Among the 9 RUAs which used plasma in the initial survey, in four (GER-RHNa, POL-TARa, POL-WARa and SWI-VAFa) samples were not refrigerated. Moreover in one of them (POL-TARa) vials were stored for 24 hours. These procedures have been reported to be consistent throughout surveys. Finally, for three RUAs in the initial survey (ISR-TELa, MLT-MLTa and ROM-BUCa) this information was not provided.
For cross-sectional comparisons, the consequences of having adopted such different procedures may probably be more critical for laboratories which used plasma stored at room temperature, but the amount of bias and/or increase in variation introduced is not clearly identifiable. Therefore, no important between-survey biases seem to have been introduced for most populations which used either serum or plasma.
Haemolysis may occur during blood drawing and handling. It will result in higher cholesterol values, especially if the direct method is used. Haemolysed samples, according to the Manual, should be discarded. From the information provided, in all MONICA surveys centrifuged samples were checked for haemolysis in all RUAs (Table 6).
It is known that using the average of two or more replicates (repeated determinations on the same patient sample) will increase the precision of the estimates, but it should have small effects on the accuracy of TC values. This procedure should be distinguished from the collection of multiple specimens for a single subject at different times (7). In MONICA, according to the information provided, only two RUAs (CZE-CZEa and UNK-GLAa) performed two determinations for each subject (Table 6) in all surveys. For such populations, if averaged values were sent to the MDC, random errors should be expected to be somewhat smaller, and the power to detect between-survey TC differences is expected to be greater.
Storage using refrigeration or at room temperature between centrifuging and analysis does not seem to be crucial if the material is analysed within a few days and bacterial contamination is avoided. Freezing in appropriate vials is acceptable if at a temperature of -20°C for 1 year or at a temperature of -60°C for a longer period (22). TC values are not influenced appreciably when vials are kept tightly closed. Sometimes problems occur if refrigerators are not protected against prolonged electricity failure. MONICA recommendations cover this.
The wide range of storage procedures adopted in MCCs is due to very different organisational conditions (availability of the collaborating laboratory in specific time periods, distance between field centres and laboratories, dispatching systems, etc.), which probably cannot be further standardised. In spite of this, in all MONICA surveys the large majority of RUAs reported having complied with MONICA recommendations (Table 7). A borderline case is represented by HUN-BUDa in which samples were deep frozen at -18° for 12 weeks in both initial and middle surveys. The most relevant exception is represented by POL-TARa in the initial survey in which samples were kept frozen at only -20°C for more than two years: reported TC values are expected to be lower than if examined under the recommended conditions, and therefore the estimates of TC trends may have a positive bias of unknown magnitude.
Some studies showed that higher TC levels were found in winter than in summer (1). Gordon (11,12), analyzing LRC follow-up data, found differences of 0.19 mmol/l in the mean levels between June and December. The findings were demonstrated to be consistent for all ten Centres participating in the LRC Study, in very different climatic conditions. The reported difference corresponds to a relative difference of approximately 2.5% at the most, at a population mean of about 7.07 mmol/l. More recently Råstam et al (21), using data from a community-based screening programme carried out in Minnesota, reported a significant cyclic time-trend in cholesterol levels, with a peak in January. The 95% confidence intervals of the peak to trough distance was 0.15-0.36 mmol/l in men, which corresponds to 2.6%-6.3% of the average cholesterol level. In women, the corresponding figures were 0.05-0.24 mmol/L, or 1.0%-4.6%. Using the National Cholesterol Education Program cut-off point (6.17 mmol/l), the authors also estimated that in men the age-adjusted prevalence in winter was double that in summer (25.4% vs 13.5%).
MONICA guidelines recommend that MCCs should collect blood samples in comparable seasonal periods. Moreover, MCCs collecting blood samples during the course of the year should randomise blood taking to sex and age groups in order to obtain the same number of subjects of different categories in comparable time periods.
In MONICA, more important than the comparability between the RUAs is the comparability between surveys within each RUA. The similarity between the survey seasons was assessed in the report "Age, date of examination and survey periods" (30), where a concept called "month difference" (MON-DIF) was used as a measure of the lack of overlap between the surveys. The MON-DIF does not take into account the actual seasonal fluctuation of TC in the population. However, under certain assumptions concerning the seasonal fluctuation of TC, the MON-DIF provides simple upper limits for the seasonal bias in TC trends.
Let us assume:
Then the bias in the risk factor difference between the two surveys is between 0 and XY.
The same result is valid if we replace assumptions a) and b) by
For TC the literature review above suggests a seasonal variation of about 0.3 mmol/l or less. Therefore, a reasonable upper limit for 6X would be 0.3 mmol/l , i.e. X = 0.05 mmol/l for consecutive months. For X = 0.05 mmol/l we get:
| MON-DIF (Y) | => | maximum bias(XY) |
| 1 month | 0.05 mmol/l | |
| 2 months | 0.1 mmol/l | |
| 3 months | 0.15 mmol/l | |
| 4 months | 0.2 mmol/l |
In conclusion, the bias related to a MON-DIF of about two months is likely to be less than 0.1 mmol/l, and may be much less depending of the true seasonal fluctuation of TC and the actual survey periods. A MON-DIF of about 2 months or more is found in:
For three RUAs (GER-ERFa, GER-EGEb and GER-KMSa) the exact date of examinations is unknown. Therefore, only approximations of the seasonal shift can be made. Only for GER-ERF is there evidence of a seasonal shift: the initial survey was carried out during May and June whereas the middle and final surveys were done during autumn and winter.
The existence of a possible seasonal variation of up to about 0.2 mmol/l should be acknowledged in cross-sectional comparisons of cholesterol levels. For trend analysis seasonal variation is a minor source of bias for most RUAs. However, for the RUAs with a MON-DIF of 2 months or more and evidence of a shift between summer and winter, the possible bias of up to 0.2 mmol/l may become important if combined with other sources of bias. Seasonal biases on TC trends are expected to be relevant for the following RUAs: DEN-GLOa, GER-ERF, HUN-BUDa, ITA-FRIb, ITA-FRIc, RUS-MOCa, RUS-MOIa, RUS-NOCa, RUS-NOIa and UNK-GLAa.
Several studies have shown that the posture of the subject when blood is drawn influences the analysis of non-filterable blood constituents (8, 9, 10, 14, 15, 24, 25, 26). Most of these studies have dealt with changes in standing and supine postures. The differences are caused mainly by shifts in the plasma volume, even though effects of the sympathetic nervous system may play some role (5). Hagan and co-authors (14) estimated a 13.8% reduction in plasma volume when position changed from 30 minutes lying to 30 minutes standing, corresponding to an increase in TC of 9.3%. According to Felding (9) the difference between sitting and lying is about 6.5%, with larger differences in women (7-9%) than in men (4.5-6%). Calculations from other studies (25, 26), i.e. differences between standing-lying and standing-sitting changes, reveal about a 4-6% decrease in TC when blood is drawn in the supine posture in comparison with the sitting position. MONICA recommendations state that blood should be drawn from subjects in the sitting posture.
In five RUAs (BEL-CHAa, BEL-GHEa, BEL-LUXa, DEN-GLOa and NEZ-AUCa) blood was drawn from supine subjects in all surveys (Table 4). No changes over time has been detected apart from for FRA-STRa where blood vials were reported to have been drawn from sitting subjects in the initial survey and from sitting or supine subjects in the final survey, with half of them in each posture.
For cross-sectional comparisons a bias of 4-6% lower TC when blood is drawn in the supine position in comparison with the sitting posture should be acknowledged for the following RUAs: BEL-CHAa, BEL-GHEa, BEL-LUXa, DEN-GLOa and NEZ-AUCa.
Some anticoagulants produce a shift of water from red blood cells to plasma. The use of heparin produces a small difference (<1%) in TC in comparison to values obtained by analyzing serum. Use of disodium ethylenediaminetetraacetate (EDTA), at concentrations of 1 mg/ml, results in a difference of about 3% (13, 16). For determination of TC, EDTA is recommended as the reference anticoagulant. According to Cloey et al. (2), plasma cholesterol concentration is 4.7% lower than that in serum samples, and plasma and serum values are highly correlated (r=.994). The higher plasma-serum differences were attributed to the higher EDTA concentrations in commercial tubes in recent years.
In the initial survey 9 RUAs carried out TC determination on plasma: two of them used Heparin as anticoagulant (GER-RHNa and SWI-VAFa) and some others used EDTA (AUS-NEWa, AUS-PERa, FRA-STRa, FRA-TOUa, POL-TARa, POL-WARa, and USA-STAa) (Table 6). All other RUAs used serum in the initial and in subsequent surveys.
Considering changes over time, SWI-VAFa shifted from plasma-heparin to serum in the following two surveys, AUS-NEWa from plasma-EDTA in the initial survey to serum in the two subsequent surveys, and FRA-TOU from plasma-EDTA in the two first surveys to serum in the final survey. For the other RUAs there was no change in the analyte used for TC determinations.
For cross-sectional analysis MONICA RUAs in which EDTA-plasma (AUS-NEWa, AUS-PERa, FRA-STRa, FRA-TOUa, POL-TARa, POL-WARa, and USA-STAa) was used should be acknowledged in publications, and the expected biases given (3% lower TC values when using data from the initial survey and 3-4.5% lower values for the other two surveys). For trend analysis in AUS-NEWa an expected bias of 3% higher values in the two last surveys should be acknowledged.
Total Cholesterol measurement methods adopted in MONICA populations differ. According to MONICA recommendations enzymatic methods are preferable, but other methods are acceptable, provided that their performance is within the limits of the external quality control. In fact, the results of the external quality control, if pool samples did not interfere with either chemical or enzymatic methods, reflect the bias of the method, so it does not matter if an extraction, a direct or an enzymatic method is used. For this reason only one external quality control system has been set up in MONICA. In a subsequent paragraph (3.4.2) criteria and methods are given to control for discrepancies in EQC results between different methods for some pools.
Table 8 summarises the methods used in the three surveys. In the initial MONICA survey the recommended enzymatic method was used in 34 RUAs, the direct method in 10 RUAs, the extraction method in 6, and both the extraction and the enzymatic methods in 5 (it is not clear in what proportions or if in duplicate). Considering changes over time, none of the RUAs which started using the enzymatic method shifted to other methods, apart from GER-BREa and GER-BREb in the middle survey in which both the enzymatic and the direct methods were used, probably in duplicate. Of the 9 RUAs which started using the direct method and carried out the middle and/or the final surveys, 8 continued using the same method in the middle survey, but two shifted to the enzymatic method in the final survey (GER-EGEb and POL-WARa), five RUAs did not perform the final survey and for GER-ERFa, the information on the method used in the final survey is missing. Only one RUA (POL-TARa) used the direct method in all three surveys. HUN-BUDa used the direct method in the initial survey, shifting to the enzymatic one in the middle. All the five RUAs which started using both the extraction and the enzymatic method in the initial survey continued using the enzymatic method alone in the subsequent surveys. The RUAs which started using the extraction method, changed to the enzymatic method in the middle or final survey.
In a significant number of RUAs the methods changed from one survey to another. This usually happened when the enzymatic method was not used in the initial survey. It should be noted that the EQC system may be considered equally valid independent of the methods used. This assumes that the different pool samples did not interfere with the chemical or the enzymatic methods and that method-specific correcting factors used to evaluate EQC lyophilised pools were adopted. It should be noted that calibration problems sometimes occurred when methods were changed in the period immediately before the start of the second analysis period in some populations.
The Lipid Manual stated that the suitability of selected enzymatic methods should be checked by means of the EQC. As far as is known this preliminary assessment was not done in most of the MONICA laboratories at the beginning of the study, but it should be recognised that some laboratories changed their methods and instruments on the basis of the experience, and as a result of the standardisation process undertaken within the MONICA Project.
The use of a number of commercial brands of enzymatic reagents for TC analysis might diminish the comparability of results between laboratories due to, for example, non-homogeneity of production lots or due to loss of kit enzyme activity during storage. Information collected at the WHO-RLRC showed that most of the participating laboratories worked, or were going to work, with the commercial brand which is widely used in Europe.
In recent years results have accumulated indicating that, within the enzymatic method, some instrument/reagent systems appear to be particularly sensitive to matrix effects (see 5.4.2), when non-fresh EQC pools are used. In this case the detected bias from using non-fresh EQC pools may not reflect the real bias due to an inaccurate calibration of the instruments, i.e. TC determinations on fresh samples are unbiased. This problem was also reported by Cooper in 1988 (4). For some of these instrument/reagent systems the estimated biases are known (19).
In MONICA it was not possible to correct EQC results for such discrepancies, but on the basis of the EQC results of specific laboratories, it was possible to identify such problems, and if present, the laboratories were contacted directly by the WHO-RLRC and the problems were usually resolved before starting the analysis period. .
Finally, in 46 out of 55 RUAs (85%) an automated instrument was used in the initial survey. The same proportion was found in the middle survey: 84%, i.e. 37 out of 44 RUAs, while automated instruments increased in the final surveys to 93%, 39 out of 42, including a semi-automated one.
Information on automated or manual methods are missing for GER-ERFa's final survey, HUN-BUDa's middle survey and YUG-NOSa's middle survey.
Calibration problems have been identified as the most important factor in determining the accuracy of TC assay. If present they introduce a real bias. It has been recommended that primary standards and/or secondary serum/plasma calibrators be used, at least in duplicate, for calibration. Each participating laboratory is responsible for its own analytical primary standards and/or secondary serum calibrators.
It is assumed that all participating laboratories will use the best and most appropriate pure substances and reagents. The cholesterol substance used for preparation of standards should be of more than 99% purity. Secondary (serum) calibrators should preferably be prepared or labelled with the correct TC concentrations in the WHO-RLRC or CDC. Some TC methods, the so-called "direct" Liebermann-Burchard chemical methods, cannot be calibrated by water-soluble standards (falsely high results might be obtained without special precautions), but the linearity of the response of these methods can be judged on the basis of these standards.
During the pre-standardisation period the WHO-RLRC distributed a set of three TC standards designed for testing linearity over the working range. It has been stated (5,23) that all cholesterol analytical systems should be calibrated with fresh specimens. This procedure, if adopted, may have produced discrepancies with the results obtained by using secondary serum/plasma calibrators.
External quality control was performed to control the analytic stage of the measurement procedure. Comparability of the results achieved by different methods of analysis and changes taking place over time need to be assessed. EQC in MONICA has been provided by the WHO-RLRC, in Prague (Czech Republic) at the Institute for Clinical and Experimental Medicine (IKEM), under the coordination of Dr. Dusan Grafnetter and Dr. Rudolf Poledne. The EQC established within MONICA was designed to detect inaccurate or imprecise TC determination methods and also to discover relevant shifts in accuracy over time which affect estimates of TC trends. It has also been stressed that residual drifts may be present in laboratories, even when well standardised. Such shortcomings are beyond the scope of the standardisation procedure which has been recognised as valid for the purpose of estimating TC trends in a range of populations in different parts of the world.
Moreover, it is clearly stated in the MONICA Manual (29) that the EQC is complementary to the Internal Quality Control (IQC) and its main purpose is to check on accuracy (assessment of bias), although it also supplies laboratories with evaluation concerning overall, between-day (between-run) and within-run variability. Regular use of IQC is a prerequisite for successful participation in the EQA programme.
This procedure is extensively reported in the MONICA Manual, Lipid Standardisation Section, paragraph 8 (29). The fundamental aspects are reported here briefly. Before starting the analysis period, each participating laboratory (Table 9) is expected to analyse one or two self-evaluation sets, with known concentrations of analytes, provided by the reference laboratories, i.e. WHO-RLRC or CDC (Table 9). Since WHO-RLRC acted as the reference laboratory for the large majority of MONICA collaborating laboratories, the EQC summarised below is the one adopted by WHO-RLRC, even though the procedure adopted by CDC was very similar. After this "open" period, a "blind" EQC system is set up to evaluate laboratory performance throughout the entire analysis period.
Essentially this control system is based on the repeated analysis of sets of lyophilised control samples shipped at regular intervals to laboratories. Each set usually contains three or four pools, with analytes at different concentrations (usually 14-21 samples per set). Sets have been sent periodically to participating laboratories during the analysis period of each survey and, less frequently during intervals between surveys. Control samples should be used in the laboratories according to the specified instructions for reconstitution and sequence of samples.
The WHO-RLRC is responsible for sending the EQC sets. The number of sets distributed depends on the length of the analysis period. Laboratories are requested to complete analysis of control sets and report the results within two months at the latest. The WHO-RLRC dispatches further sets only on receipt of the results of the previous set. It is stated in the MONICA Manual (29) that the failure of a laboratory to undertake sufficient EQC may result in elimination of its TC results from the MONICA data analysis.
Reference values (RV) for each pool were obtained in the WHO-RLRC by the modified Abell-Kendall method, in collaboration with CDC. As a secondary check in obtaining the RVs, a CHOD-PAP enzymatic TC method with practically 100% cholesterol-ester hydrolysis, as well as the direct-chemical method, was used.
For any of the control pools, calculated bias (based on the relative differences between the observed pool-means and the pool-RVs) should not be greater than 5%. At the same time standard deviations should be smaller than those reported in Table 2 of the Manual of Lipid Standardisation (29).
Participating laboratories have been supplied with the relevant information on the results of each control set, communicating the overall set and pool-specific means, biases, between-run and within-run standard deviations.
The matrix is the environment surrounding a given analyte. Most reference materials contain one or more components in addition to the analyte. The sum of the major and minor components and their structures in which the analyte is embedded, is referred to as the "matrix".
As stated by Naito (18, 19) as well as other experts, "a matrix effect may be observed with some analytical systems, which may give an enzymatically determined TC value that is different from that of the nonenzymatic reference method. This, however, does not mean that the enzymatic system is inaccurate. Final proof of accuracy should always be based on analysis of fresh human samples. ... Whenever specimens are altered from the native state, such as by lyophilisation, ... laboratories may not be able to get many results equal to the 'true' values assigned to these materials."
It follows that, to evaluate the methods correctly, one needs control materials resembling, in their properties, fresh serum or plasma samples. If there is no matrix effect connected with the control material, the bias detected by EQC pools is a 'true' one which affects the subjects' samples, and due to errors in calibration, sample volume pipetting, possible kit expiration, non-linear response of the method, linear calibration not passing through the origin, non-adherance to methodological instructions, the effect of the analytical instruments, calculation errors etc.
The non-uniformity of methods used in MONICA complicated the bias evaluation. As long as direct chemical methods were in use, sucrose could not be added as a stabilizer before lyophilisation of control pools because of a strong interference with direct methods. Adding sucrose to the EQC pools stabilised them (especially for HDL- cholesterol EQC pools), ensuring better solubility and less turbidity on reconstitution.
It is impossible to prepare ideal and universally usable control pools free of interference and matrix effects. The only procedure which helps solve the problem is to use fresh EQC pools. This procedure, however, was developed after MONICA started and it would be very difficult and costly to pursue in a world-wide standardisation programme. Even fresh pools may be affected by matrix effects but they can be easily detected. Another way to overcome this problem would be to continue preparing many control pools, each containing thousands of samples, test them by various methods, and only use those pools which in quality control showed full agreement between Abell-Kendall and consensus results. In adopting such a procedure, the majority of prepared pools would have to have been discarded at no small cost. Thus, the adopted procedure was a compromise. In practice it meant establishing method-specific reference values for pools for which the Abell-Kendall reference values were unattainable using direct or enzymatic methods due to the matrix effects and deriving correction factors for the detected biases.
The WHO-RLRC applied such corrections in a limited number of pools to ensure the best possible reference values.
The procedure for testing the pools for matrix effects was based on this principle: lyophilised samples of the prepared quality control pools were analysed repeatedly in numerous analytical runs on different days together with sets of fresh sera approximately matching the TC concentrations of the pools.
Analyses were performed by:
The values obtained on control pools using direct and enzymatic methods were compared with those obtained with the reference extraction method, and for a few pools average biases were calculated. Such correction factors were confirmed by analyzing fresh serum samples in the same runs.
It should be stressed that the reference values of the EQC pools were corrected for matrix effects due to differences in methods, but it was impossible to correct for differences in matrix effects due to the specific instrument and reagent systems of some enzymatic methods. Such residual matrix related bias was detected in EQC results from some MONICA populations, because at least three different pools were included in each EQC set.
As noted above, the reference values of the EQC pools in WHO-RLRC were obtained using the Abell-Kendall extraction method, which is known to be largely unaffected by matrix effects. In the 1970s the WHO-LRC took part in the CDC Cooperative Cholesterol Standardization Programme and was awarded a certificate. The WHO-RLRC continued sending most of its lyophilized control pools to CDC where the pools were analyzed by the Abell-Kendall method, and the mean values were reported back to WHO-RLRC. The comparison of the reference values of WHO-RLRC and the values obtained by CDC during the years 1982-1996 are shown in Figure 1. Since 1993 WHO-RLRC has been taking part in the full CDC standardization programme, where CDC sent control samples to WHO-RLRC with blinded reference values. The comparison of the reference values of the CDC control pools and the values obtained by WHO-RLRC using the Abell-Kendall method during the years 1993-1996 are shown in Figure 2.
Figures 1 and 2 show good agreement between the two laboratories and there is no indication of drift between them.
In the MNM 175A (31) two approaches to External Quality Assessment (EQA) were given. The first and simplest one, called EQA1, was used for inclusion of TC data in the Geographical Variation in the Major Risk Factors of CHD paper published in the World Health Statistics Quarterly (27).
EQA1 was based on average biases of the EQC sets, but was not sufficiently informative in situations where differences in pool biases with the same set were present. Another problem of EQA1 was that laboratories which analysed many sets may have been in a worse position than laboratories which analysed only a few. Therefore, alternative performance criteria, called EQA2, were defined, based on the results of single pools. In MNM 299A (32), EQA2 had been further developed and EQA3 criteria were presented.
Both EQA1 and EQA2 criteria are a mixture of measures of accuracy and precision, with greater importance given to accuracy, which seems appropriate for the estimation of population means. A comparison between the two criteria revealed that EQA1 and EQA2 yield very similar results in the first MONICA surveys. Both EQA1 and EQA2 are based on relatively rough approaches. It seemed, however, that more sophisticated criteria would require that the individual measurement data be available in a computerised form, and that the EQA sets covered the entire survey analysis in every MONICA laboratory.
When EQA2 was applied to the middle survey EQC data, very few laboratories achieved the intermediate score, and laboratories with quite different performances got the same score. Therefore in MNM 299A (32) a new score called EQA3 was developed to pursue the following objectives:
In this final report EQA3 criteria, now called EQA criteria, has been applied to final survey EQC data as well as to previous EQC surveys data, with the following minor changes:
Coverage should be defined as the proportion of the survey analysis period covered by EQC analysis. Ideally, it should be calculated as the number of survey TC analyses performed during the external quality control analysis period (identified by the relevant sets (RS)), divided by the number of all survey TC analyses. In practice this score is very difficult to calculate in all MONICA surveys. A more pragmatic index of coverage has been calculated using the following proxies: the average time (in months) of analysis period per RS and the Maximum-Gap (see below). To establish such proxies, the following definitions are given:
Analysis Period (AP). The laboratory AP of the survey blood samples was taken from the dates of cholesterol measurement in the survey core data (item DCHOL). The AP was determined thus:
In the case where data were not available, the AP was derived from the period of examination (item DEXAM) if it was known that only fresh samples were analysed, or the MCC was contacted concerning the period. The APs were not always continuous. Months with less than 5 samples were excluded from the calculation of the AP.
Relevant Sets (RS). An EQC set is relevant if its analysis period overlaps with the survey AP, or if the EQC analysis period does not end before the month preceding the start of the survey analysis period(s). APs and RSs for each survey are reported in Table 10.
AVERAGE TIME (in months) of AP per RS is the ratio between AP and RS. Since the lengths of the analysis periods vary a lot, and the control sets are not always evenly distributed throughout the period, the AVERAGE TIME of AP per RS should be examined by including the additional index of the maximum gap.
Maximum-Gap (MAX-GAP) has been defined as the maximum length of time (in months) from a survey sample analysis to the nearest RS. In occasional situations the MAX-GAP may become extremely wide even if the actual coverage is adequate. This happens, for instance, when there is an isolated short part of the AP not covered by RSs, and there are EQC sets considered irrelevant because they were performed more than one month before the start of the isolated AP. Such situations are rare, however, and may be easily detected.
Using the AVERAGE TIME of AP per RS and the MAX-GAP, a COVERAGE SCORE was calculated according to the following simple algorithm:
| COVERAGE SCORE = | 2+ | if AVERAGE TIME of AP per RS <= 6 months and MAX-GAP <= 4 months; |
| 2 | if AVERAGE TIME of AP per RS <= 9 months and MAX-GAP <= 10 months, but COVERAGE SCORE is not 2+; | |
| 1 | if AVERAGE TIME of AP per RS > 9 months or MAX-GAP is > 10 months, but not both; | |
| 0 | if AVERAGE TIME of AP per RS > 9 months and
MAX-GAP is > 10 months or there are no relevant sets |
COVERAGE SCORE 2+ should be considered an optimal score, suitable as a basic criterion for evaluating data correction, if needed. If a population scores 0, the EQC results should be considered cautiously when evaluating EQA, because its AP was not sufficiently covered by RSs.
Variance out % (VAR%) is defined as the proportion of pools with coefficient of variation out of limits, as defined in Table 2 of the Lipid Manual (29).
The first cut-off (10%) allows the consideration of some unusual results such as outliers, i.e. 1 outlier in 10 analysed pools of RSs. Moreover, RUAs which analysed many EQC pools are not penalised since the probability of getting an unusual result increases with the number of pools analysed.
BIAS% is defined as the proportion of pools with bias out of limits (i.e. exceeding ± 5%).
Similar considerations hold for the VARIANCE SCORE. In particular, the first cut-off (15%) allows one pool out of limit in 7-13 analysed pools, or 2 pools out of limit in 14-19 pools, and so on.
CONSIST is the smallest difference of the maximum and minimum bias over the pools, after 15% of the pools at most were excluded. Even this exclusion allows a small proportion of unusual results to be considered as outliers.
AVERAGE BIAS of a survey is the average bias (%) of all control pools of a survey after exclusion of 15% of the most extreme pool biases (i.e. the same exclusion level as in the definition of CONSIST).
EQA SCORE is a quality assessment score of a single survey, defined according to the following table:
| VARIANCE SCORE |
BIAS SCORE |
CONSIST SCORE |
EQA SCORE |
|---|---|---|---|
| 2 | 2 | 2 | 2 |
| 2 | 2 | 1 | 2 |
| 2 | 2 | 0 | illogical |
| 2 | 1 | 2 | 1 or corr |
| 2 | 1 | 1 | 1 |
| 2 | 1 | 0 | 0 |
| 2 | 0 | 2 | 0 or corr |
| 2 | 0 | 1 | 0 |
| 2 | 0 | 0 | 0 |
| 1 | 2 | 2 | 2 |
| 1 | 2 | 1 | 1 |
| 1 | 2 | 0 | illogical |
| 1 | 1 | 2 | 1 |
| 1 | 1 | 1 | 1 |
| 1 | 1 | 0 | 0 |
| 1 | 0 | 2 | 0 or corr |
| 1 | 0 | 1 | 0 |
| 1 | 0 | 0 | 0 |
| 0 | 2 | 2 | 1 |
| 0 | 2 | 1 | 0 |
| 0 | 2 | 0 | illogical |
| 0 | 1 | 2 | 0 |
| 0 | 1 | 1 | 0 |
| 0 | 1 | 0 | 0 |
| 0 | 0 | 2 | 0 |
| 0 | 0 | 1 | 0 |
| 0 | 0 | 0 | 0 |
However, if COVERAGE SCORE is zero, then EQA SCORE cannot be more than 1.
EQA is an overall score with three levels: 2 denotes a good performance, 1 an acceptable performance and 0 an unacceptable one. RUAs which score 0 should be considered for exclusion from cross-sectional comparisons. EQA also allows the identification of populations for which data correction may be considered (criteria and methods for data correction are reported in Section 5.5.2).
Before and during the AP, the procedure for correcting bias in the laboratory is calibration, and the purpose of the EQC is to verify that the process is under control.
If the EQC results indicate the presence of biases which vary during the APs in different pools, it is not possible to improve the quality of the measurements retrospectively because the reason for such fluctuations can seldom be identified, and it is quite impossible to separate the survey blood samples which the different biases involved. If the bias in the EQC is systematic (i.e. consistent) throughout the AP, it may be possible to correct the data on the basis of the results of the EQC.
There are at least four reasons why such biases could be observed:
a) something is wrong with the calibration;
b) something is wrong with the processing of the EQC samples in the laboratory;
c) the true cholesterol values of the EQC samples are systematically different from the reference values; or
d) the bias is spurious and was only observed by chance because of random variations in the analysis of the EQC samples.
We will restrict the following discussion on biases to at least 3%, because smaller biases are beyond the reasonably attainable measurement accuracy of most laboratories. Based on our knowledge of the procedures of the EQC laboratory in Prague (WHO-RLRC), and its long term reports on internal QC and EQC by CDC, it is reasonable to exclude reason c) in most cases. It is, however, not possible to fully exclude the possibility of matrix effects relating to some analytical methods. Therefore, particular attention should be placed on the detection of possible matrix effects when the correction of data is being considered.
Reason b) includes at least the following possibilities:
These factors need to be excluded before the data for a specific population is corrected.
With reference to reason d), the results of the EQC make it possible to assess the magnitude of such a bias. If this is a likely explanation for the bias, correction should not be made.
If we come to the conclusion that a) is the likely reason for the bias, we should consider correction, bearing, in mind, however, that the results of the EQC were subject to random fluctuations, and therefore are not necessarily accurate, and that the coverage of the EQC is not always complete.
Based on this reasoning and accepting the fact that it is not possible to define statistically the full relationship between the actual survey analyses and the results of the EQC, we have made the proposal that the TC values from a MONICA population should be corrected if the following criteria are all satisfied:
- the absolute value of AVERAGE BIAS is more than 3%;
- CONSIST is less than 5;
- VARIANCE SCORE = 1 or 2;
- COVERAGE SCORE = 2+;
- the number of available EQC pools should be at least 6;
- correction otherwise seems sensible.
The rationale behind the criteria is:
(i) means that we should not make corrections if the AVERAGE BIAS is smaller than, or at the limits of,
achievable standardisation.
(ii) and (iii)guarantee that the observed bias is sufficiently stable for reliable
correction.
(iv) is there to ensure that the observed bias is representative of the whole AP.
(v) is needed to make sure that the observed bias is not large by chance.
(vi) is to check that the observed bias is not caused by a systematic error in the quality
control. Particular attention should be paid to detecting any errors in the treatment of
the EQC samples in the laboratory and to identifying
possible matrix effects related to the measurement method and instrument used.
Corrections have been made at the MDC data by multiplying the original cholesterol values by:
100/(100 + AVERAGE BIAS).
This correction procedure involves two principles not yet discussed. Firstly, the procedure does not allow for the correction of different biases at different TC values (e.g. low, intermediate and high RV pools). Such an option was considered, but we were unable to find any situation among the MONICA populations where such a correction could be applied.
Secondly, the correction should be done on the individual TC values in the MDC database, and not, for example, at the data analysis stage. The latter possibility would be appealing for trend analyses, where a small bias in one survey in one direction and a small bias in the next in the opposite direction would together create a large bias which could be corrected. Such an approach was abandoned: a) it would excessively complicate the analyses, because we would use different corrections for cross-sectional analyses, for trends between two surveys, and for trends calculated from three surveys; b) correcting for the small biases in two separate surveys involves a considerable risk of miscorrecting the trends.
As was stressed earlier, the EQC, and hence the correction factor, is subject to random variation. Therefore, the correction will increase the standard error of the population mean value of TC. An approximate estimation in the increase of the standard error under conditions (i)-(v) is given in Appendix 2. The calculation suggests that the effect of the correction on the standard error is so small that it can be omitted in the statistical analyses of the data, at least if the correction is based on 6 or more EQC pools.
If the EQA SCOREs of a RUA are acceptable in all surveys, then, in principle, the quality of the TC data are also valid for estimating TC trends. Nevertheless, there might be survey-specific AVERAGE BIASes within limits but in opposite directions, which may produce sizeable effects when assessing TC trends. Therefore, to estimate changes in bias over time, two separate parameters have been developed: DELTA BIAS and TREND BIAS. DELTA BIAS is calculated as the absolute difference between the AVERAGE BIASes of two surveys. TREND BIAS is defined as the time trend of the biases over all APs, adjusted for a ten year period. It is calculated as the regression coefficient, when the pool biases are explained by the middle (in years) of the EQA analysis periods. As before, both DELTA BIAS and TREND BIAS were calculated after exclusion of 15% of the most extreme pool biases.
TREND BIAS covers all three surveys, when available, and estimates the bias in the estimates of linear trend in TC in the population. DELTA BIAS, which can be calculated between pairs of surveys only, has the advantage that it helps to identify between-survey biases in opposite directions or of different magnitude.
Because of limits in the achievable accuracy of the quality control, small values of DELTA BIAS and TREND BIAS do not necessarily indicate bias in the population cholesterol trends. We suggest that DELTA BIAS and TREND BIAS exceeding ±3 are of concern.
As measures of consistency of the trend in biases between surveys, POOLED CONSIST was calculated for DELTA BIAS, and Standard Error of the regression coefficient for TREND BIAS. POOLED CONSIST is the smallest difference between the maximum and minimum bias over all the pools of both surveys, after 15% of the extreme pools were excluded.
A POOLED CONSIST SCORE is defined as:
| POOLED CONSIST SCORE = | 2 | if POOLED CONSIST <= 5; |
| 1 | if 5 < POOLED CONSIST <= 10; | |
| 0 | if POOLED CONSIST > 10. |
Table 11 presents the coverage parameters and scores for each RUA and for each survey which has been carried out. Table 12 reports EQA parameters and Table 13 EQA SCOREs.
TC data are available from 55 RUAs in the initial survey (the initial survey was not done in AUS-PERb and GER-BREb), 44 RUAs in the middle survey and 41 in the final.
Altogether 16 RUAs attained an unfavourable COVERAGE SCORE in some survey. In 14 of them the reason is that the EQC data are not available in the conventional form (initial survey: FIN-NKAa, FIN-KUOa, FIN-TULa, NEZ-AUCa, RUS-NOCa, RUS-NOCb, RUS-NOIa, USA-STAa and YUG-NOSa; middle survey: GER-COTa; final survey: AUS-PERa, AUS-PERb, FIN-KUOa, FIN-NKAa, FIN-TULa, SWI-TICa and SWI-VAFa). Two RUAs (CAN-HALa in the initial survey and RUS-MOIa in the final survey) attained unfavourable scores in coverage, because both the AVERAGE TIME of AP per RS and the MAX-GAP exceeded the cut-off points. Both surveys had very long APs (29.9 and 31.7 months respectively) with only a few EQC pools and long MAX-GAPs (19 and 22 months, respectively).
In four RUAs in the initial survey a low COVERAGE SCORE (i.e. COVERAGE SCORE = 1) was detected. In HUN-PECa only one relevant set was performed in a 14 months AP, but it was done in the middle of the AP. In HUN-BUDa, ITA-LATa and SWE-GOTa MAX-GAPs were of 11, 13 and 11 months, respectively. In the middle survey only two RUAs showed a low COVERAGE SCORE: FRA-TOUa and HUN-PECa, for contrasting reasons: in the former RUA the average RSs per AP exceeding the cut-off point of 9.0 while in the latter the MAX-GAP was out of limits. In the final survey one RUA (RUS-MOCa) obtained a low COVERAGE SCORE because the MAX-GAP was high (20 months) even though 3 RSs were performed during the AP of 24.7 months (Average AP per RS = 8.2).
Table 12 reports the survey specific EQA parameters for each RUA with available EQC results: in the third column the number of pools with variance out of limits and the total number of pools available, and the relative percentages are presented in the fourth column; the fifth and sixth columns present similar counts and percentages for pools with biases out of limits; in the last two columns the AVERAGE BIASes and CONSIST are given.
Table 13 reports the EQA SCOREs, their component scores and the AVERAGE BIASes. Only one RUA in the initial survey (DEN-GLOa) and two RUAs in the middle (DEN-GLOa and ICE-ICEa) satisfied all criteria for correction of TC data. The correction was done by multiplying the TC values in the MONICA database by:
| RUA | Survey | AVERAGE BIAS | Correction factor |
|---|---|---|---|
| DEN-GLOa | Initial | +5,8% | 0.945 |
| DEN-GLOa | Middle | +3.4% | 0.967 |
| ICE-ICEa | Middle | -5.2% | 1.055 |
AUS-PER middle survey also satisfies criteria i-v of Section 5.5.2 for data correction. However, no clear information is available to identify the reasons for such a bias. In any case, the AVERAGE BIAS is not very large and the EQA SCORE is acceptable. Considering all these aspects, the middle survey TC data in AUS-PER should not be corrected, but the AVERAGE BIAS (-3.1%) should be given in footnotes when data for cross-sectional comparisons are presented.
Although AVERAGE BIASes are exceeding the limits, data corrections cannot be applied for violation of some of the reported criteria in Section 5.5.2 for the following RUAs: GER-ERF (Fin), BEL-LUXa (Ini), CHN-BEIa (Fin), GER-EGEb (Fin), RUS-MOIa (Mid) and RUS-NOIa (Fin). GER-ERF and CHN-BEI in their final surveys analysed only 3 pools and therefore criterion v. is not satisfied. The initial survey in BEL-LUXa was very long (20 months) and a MAX-GAP of 8 months has been detected indicating that the coverage was not optimal. For RUS-MOIa in the middle survey and RUS-NOIa in the final survey the biases were inconsistent over the APs (criterion ii. not met). Since for both RUAs the EQA SCORE was acceptable, the AVERAGE BIASes detected should be acknowledged in publications of cross-sectional comparisons. GER-ERF final survey satisfies criteria i-iv of Section 5.5.2 for data correction, but not criterion v as only three external quality control pools are available.
An unacceptable EQA SCORE has been reported in the initial survey in 13 RUAs: GER-EGEb, GER-ERFa, GER-HACa, GER-KMSa, ISR-TELa, MLT-MLTa, ROM-BUCa, RUS-MOCa, RUS-MOIa, RUS-MOIb, SWE-GOTa, SWE-NSWa and SWI-TICa; and 2 RUAs in the final survey: GER-ERFa and LTU-KAUa.
Survey-specific EQA parameters cannot be calculated because RSs were missing in 9 RUAs in the initial survey: FIN-KUOa, FIN-NKAa, FIN-TULa, NEZ-AUCa, RUS-NOCa, RUS-NOCb, RUS-NOIa, USA-STAa and YUG-NOSa; in one RUA in the middle survey: GER-COTa and in 7 RUAs in the final survey: AUS-PERa, AUS-PERb, FIN-KUOa, FIN-NKAa, FIN-TULa, SWI-TICa and SWI-VAFa. For these RUAs, other available information on the laboratory standardization has been reviewed and described in Section 9. If there is evidence that the laboratory was likely to be well standardized, EQA SCORE 1 is given in brackets in Table 13. Otherwise, 0 is given in brackets.
Table 14 presents EQA SCOREs and AVERAGE BIASes for each survey, as well as DELTA BIAS, POOLED CONSIST and POOLED CONSIST SCORE for the RUAs which performed at least two consecutive MONICA surveys and participated actively in the EQC programme. The RUAs which have performed only one survey (BEL-LUXa, GER-RDMa, GER-RHNa, ISR-TELa, ITA-LATa, MLT-MLTa, ROM-BUCa, RUS-MOIb and RUS-NOIb) have been excluded from the table because trend analyses cannot be carried out. In most of the RUAs where DELTA BIAS and POOLED CONSIST exceed the limits, at least one survey specific EQA SCORE was unacceptable, usually the initial. This occurs between the initial and the middle surveys for six RUAs: GER-EGEb, GER-KMSa, RUS-MOCa, SWE-GOTa, SWE-NSWa and SWI-TICa; between the middle and the final surveys for only one RUA: LTU-KAU; and between the initial and the final surveys for 6 RUAs: GER-EGEb, GER-ERFa, LTU-KAU, RUS-MOCa, RUS-MOIa and SWE-NSWa. There were 6 RUAs where at least one DELTA BIAS exceeded ±3% and the EQA SCORE was more than zero in each of the surveys concerned (CHN-BEI Fin-Ini, GER-BRE Fin-Ini, ITA-BRI Fin-Ini, RUS-MOIa Fin-Mid, SWE-GOT Fin-Mid and USA-STA Fin-Mid). Moreover, because RSs were not available in one or more survey, EQA trend parameters could not be calculated between the initial and the middle surveys for 6 RUAs: FIN-KUOa, FIN-NKAa, FIN-TULa, GER-COTa, RUS-NOCa, RUS-NOIa; between the final and the middle surveys for 8 RUAs: AUS-PERa, FIN-KUOa, FIN-NKAa, FIN-TULa, SWI-TICa, SWI-VAFa, USA-STAa and YUG-NOSa; finally, between the final and the initial surveys for 12 RUAs: AUS-PERa, AUS-PERb, FIN-KUOa, FIN-NKAa, FIN-TULa, NEZ-AUCa, RUS-NOCa, RUS-NOIa, SWI-TICa, SWI-VAFa, USA-STAa and YUG-NOSa.
Table 15 shows the numbers of EQC pools, the TREND BIAS per 10-year period, calculated considering the initial and final surveys or all three surveys, the related standard errors and the p-values for testing the null hypothesis that the trend is zero. As for Table 14, the 9 RUAs which carried out just one survey have been excluded. For 8 RUAs TREND BIAS cannot be calculated because the initial or the final survey was not carried out (AUS-PERb, GER-BERa, GER-BREb, GER-COTa, GER-HACa, GER-KMSa, HUN-BUDa and HUN-PECa). For 12 RUAs it was not possible to calculate TREND BIAS because, as reported in Section 9, RS results were not available in a suitable form (AUS-PERa, FIN-KUOa, FIN-NKAa, FIN-TULa, NEZ-AUCa, RUS-NOCa, RUS-NOIa, SWI-TICa, SWI-VAFa, USA-STAa and YUG-NOSa). For 3 RUAs it was only possible to calculate TREND BIAS for the initial and final surveys, because the middle one was not carried out (CAN-HALa, FRA-LILa and FRA-STRa). Of the 28 RUAs for which TREND BIAS has been calculated, in 6 RUAs the Ini-Fin or Ini-Mid-Fin TREND BIAS exceeded ± 3 even though none of the surveys concerned had an EQA SCORE of zero (BEL-CHA, BEL-GHE, CHN-BEI, GER-BREa, ITA-BRI, UNK-BEL). Although all of these TREND BIASes are significantly different from zero, none of them is significantly outside the range, which was considered as acceptable. Therefore, although there is a concern about bias in the trend estimates for these RUAs, there is no strong evidence that the high TREND BIAS is caused by a bias in the laboratory standardisation. This confirms the fact that the EQA SCORE gives a good indication of problems in the laboratory standardization for individual surveys.
In the cases where the survey cholesterol data were corrected on the basis of the results of the EQC (DEN-GLO Ini and Mid, ICE-ICE Mid), DELTA BIAS and TREND BIAS no longer estimate the bias in trends in total cholesterol in the population. Therefore, for such RUAs, Tables 14 and 15 have foot-notes which give the results if a similar correction was made to the EQC data.
When the survey core data were received in the MDC, they were checked routinely applying these constraints:
All violations of these constraints were reported to the MCCs for their correction or elucidation. Data values outside the constraint limits were acceptable, but the MCC had to check that the values were not outliers because of data errors. The MCCs were asked to correct values only if they were incorrect. The current unresolved constraint violations are shown in Appendix 1. There are unresolved constraint violations for BEL-GHE (Mid), GER-BER (Mid), GER-HAC (Mid) GER-KMS (Mid), ITA-BRI (Mid), ROM-BUC (Ini) and RUS-NOC (Fin).
To summarise the quality of total cholesterol data for trend analysis, Total Cholesterol Overall Summary Score (TCOSS), consisting of Pre-Analytic Summary Score (TCPASS) and External Quality Assessment Summary Score (TCEQASS) was developed. The score is high for those RUAs with such evidence of good quality and a lower value for the RUAs where such evidence is inconsistent or lacking.
TCPASS will be defined according to four subscores,
taking into account the
four major pre-analytic sources of variability:
A Storage After Centrifuging Score (SACS) is defined using two sub-scores, RRS and DFS:
For samples stored using refrigeration, at room temperature or employing other procedures:
| RRS = | 2 | if max.time is less than 6 days, in all surveys; |
| 1 | if max.time is between 6 and 10 days, in at least one survey; | |
| 0 | if max.time is more than 10 days or max.time is missing, in at least one survey. |
For frozen samples:
| DFS = | 2 | if temperature is between -60°C and -20°C and max.time at most 1 year, in all surveys; OR if temperature is -60° or less, in all surveys; |
| 1 | when DFS is not 2 or 0; | |
| 0 | if temperature is at least -20°C and max.time is more than 1 year, in at least one survey; OR if temperature if more than -15°, in at least one survey; OR if temperature or max.time is missing. |
If samples were not frozen, DFS is defined as 2.
Then:
| SACS = | 2 | if RRS = 2 AND DFS = 2; |
| 1 | if SACS is not 2 or 0; | |
| 0 | if RRS = 0 OR DFS = 0 |
A Seasonal Variation Score (SVS) is defined as:
| SVS = | 2 | if month difference < 2 months in all age/sex groups and overall; |
| 1 | if not 2 or 0; | |
| 0 | if there is at least a 4 month shift for age group 25-64 or 35-64. |
A Posture of Blood Drawing Score (PBDS) is defined as:
| PBDS = | 2 | if blood samples were drawn from sitting subjects, in all surveys OR blood samples were drawn from supine subjects, in all surveys; |
| 1 | if up to 60% of blood samples were drawn with subjects in different postures (i.e. sitting or lying) in different surveys; | |
| 0 | if more than 60% of blood samples were drawn with subjects in different postures (i.e. sitting or lying), in different surveys OR when information is missing in any survey. |
A Plasma-Serum Difference Score (PSDS)is defined as:
| PSDS = | 2 | if serum was used in all surveys OR plasma, using the same anticoagulant, was used in all surveys; |
| 1 | if serum was changed to Heparin-plasma or vice versa OR heparin was changed to EDTA or vice versa OR EDTA-plasma was changed to serum between the surveys; |
|
| 0 | if serum was changed to EDTA-plasma
between the surveys OR the information is missing in any survey. |
A Pre-Analytic Summary Score (TCPASS) is then defined as:
| TCPASS = | (SACS+SVS+PBDS+PSDS-4)/2 | if SACS>0 and SVS>0 and PBDS>0 and PSDS>0; |
| 0 | if SACS=0 or SVS=0 or PBDS=0 or PSDS=0. |
The External Quality Assessment Summary Score (TCEQASS) summarises the quality of the analytic part of total cholesterol measurements. Its components are: COVERAGE SCORE, EQA SCORE, AVERAGE BIAS and TREND BIAS. It is calculated taking into account the initial and final surveys, and the middle one when available.
| TCEQASS = | 2 | if EQA SCOREs is 2 for all surveys AND AVERAGE BIAS is within (-3.0%., 3.0%) AND TREND BIAS is within (-3.0%., 3.0%) AND data correction was not applied AND COVERAGE SCORE is 2 in all surveys; |
| 1 | if not 2 or 0; | |
| 0 | if EQA SCORE is 0 for at least one survey. |
In the cases where EQA SCORE was not possible to calculate, the score in brackets in Table 13 was used for calculating TCEQASS.
Finally, the Total Cholesterol Overall Summary Score (TCOSS) is defined as:
| TCOSS = | (TCEQASS+TCPASS)/2 | if TCEQASS > 0 and TCPASS > 0 |
| 0 | if TCEQASS = 0 or TCPASS =0 |
The values of the summary score for trend analysis are presented in Table 16. The table includes only those RUAs with at least the initial and final survey data. For each RUA, scores have been calculated considering the final and the initial surveys. Separate scores have been provided when middle survey data are available. Each score has three levels: 2 indicates that MONICA requirements are fully met, 1 attests that MONICA requirements are not fully met, but no relevant deviation occurred; 0 means that major relevant deviation from MONICA requirements occurred which affects trend analysis.
The storage after centrifuging score (SACS) is 0 in POL-TAR, because in the initial survey the samples were kept at a relatively low temperature (-20°C) for a very long period of time (i.e. 120 weeks). It is now clear that this may have introduced relevant bias, as reported in Section 4.2.1, which affects the trend estimates over time. Minor deviations from the MONICA standards have been detected for AUS-NEWa (initial survey), AUS-PERa (all surveys), FIN-KUOa, FIN-NKAa and FIN-TULa (middle survey), because samples were refrigerated for a max.time between 6 and 10 days.
The seasonal variation score (SVS) was 0 in GER-ERFa because there is evidence of a relevant seasonal shift: the initial survey was carried out during May and June whereas the middle and final surveys were done during the autumn and winter. For another 11 RUAs (DEN-GLOa, GER-EGEa, ICE-ICEa, ITA-FRIb, ITA-FRIc, RUS-MOCa, RUS-MOIa, RUS-NOCa, RUS-NOIa, SWE-GOTa and UNK-GLAa) some change in the survey months have been detected but there was no evidence of major shortcomings.
Only one RUA (FRA-STRa) scored 1 for trend in posture of blood drawing (PBDS) because subjects were all kept sitting in the initial survey while 49% of them were sitting and 51% supine in the final survey. Therefore the expected bias (as reported in Section 4.2.3) is believed to be diluted on the entire sample.
Four RUAs scored 1 because minor deviations occurred: AUS-NEWa used EDTA-plasma in the initial survey shifting to sera in the middle and final surveys; FRA-TOUa shifted from EDTA-plasma in the first two surveys to sera in the final survey; FRA-STRa changed from EDTA-plasma in the initial survey to serum samples in the final survey and SWI-VAFa from heparin-plasma samples in the initial survey to sera in the next two surveys. As reported in Section 4.2.4, such changes are not considered to have relevant effects on estimates of TC trends.
The pre-analytic summary score (TCPASS) has been computed according to the algorithm reported above. It results in a score of 0 when at least one of the partial scores (SACS, SVS, PBDS or PSDS) is 0 or all of them are 1.
External quality assessment score (TCEQASS) is 0 when at least one of the survey-specific EQA SCOREs is 0. It is 1 when no EQA is 0 but coverage (CAN-HALa, FIN-KUOa, FIN-NKAa, FIN-TULa, NEZ-AUCa and USA-STAa) was not satisfactory in at least one survey or AVERAGE BIAS exceeds limits in at least one survey (CHN-BEIa) or TREND BIAS exceeds limits (BEL-CHAa, BEL-GHEa, GER-BREa, ITA-BRIa and UNK-BELa) or the data have been corrected (DEN-GLOa and ICE-ICEa), or EQA SCOREs equals 1 in at least one survey (FRA-LILa, FRA-STRa, FRA-TOUa, ITA-FRIb, ITA-FRIc, POL-WARa and SPA-CATa). Thirteen RUAs scored 0: most of these low scores are due to unacceptable EQA SCORE in the initial survey (GER-EGEa, GER-ERFa, RUS-MOCa, RUS-MOIa, RUS-NOCa, RUS-NOIa, SWE-GOTa, SWE-NSWa and YUG-NOSa), at the beginning of risk factors surveillance when the standardisation of procedures was not yet satisfactory in MONICA for all RUAs. Only six RUAs reported EQA SCORE equal to 0 in the final survey (AUS-PERa, AUS-PERb, GER-ERFa, LTU-KAUa, SWI-TICa and SWI-VAFa).
The total cholesterol overall summary score (TCOSS) is reported in the last column of Table 16. It has been computed according to the algorithm for TCPASS and TCEQASS. The value 0 is given to RUAs if at least one of the summary sub-scores is 0. It is important to recall that the trend summary score has been developed to identify satisfactory weights for use in testing the first MONICA hypothesis, and not principally for excluding RUAs from trend analysis.
The main determinants of the precision and accuracy of the estimates of population trends in TC in MONICA are the sample size and the quality of the data. In order that the full precision expected from the sample sizes used in the study can be achieved, the quality of the cholesterol measurements must be very high. Furthermore, lack of quality may result in estimates which lead to misleading conclusions.
Therefore, in the MONICA Project, major attention has been paid (a) to achieving high quality data on cholesterol and (b) to retrospective assessment of the quality of data which was actually achieved. The current quality assessment report serves the latter purpose.
A standardization protocol for TC measurements was developed in the beginning of the Project. It included the pre-analytical procedures, laboratory analytical procedures and internal and external quality control. To ensure consistency with previous studies, some MCCs continued using pre-analytical procedures which differed from the MONICA Manual as well as analytical methods other than the enzymatic automatic method which was recommended by WHO-RLRC. The non-homogeneity of pre-analytical standards among the MCCs produces biases for some RUAs in cross-sectional between-population comparisons. Such biases, however, are small compared to the differences in the cholesterol levels between the populations, and therefore do not change the overall picture of MONICA TC distribution. It is nevertheless important to acknowledge such biases in resulting publications.
It has been emphasised that the pre-analytical procedures must be kept unchanged within the RUAs between the surveys, because even small changes in collection and preparation of TC samples may produce biases in the estimates of trends over time. For this reason, changes between surveys in major pre-analytical sources of variation have been seen as a major concern in this quality assessment report. Four such major pre-analytical sources of variation were identified:
There is mounting evidence that storage in a deep-freeze at low temperature produces significant biases in TC if the duration of the storage is sufficiently long. Seasonal variations in a TC, with higher values in winter and lower ones in summer, may affect cross-sectional comparisons between the RUAs as well as comparisons over time within the RUAs. The survey periods in MONICA vary between the RUAs, and in some cases also between the surveys within the RUAs. The latter changes were mostly due to limitations in the availability of resources in some MCCs. Based on the reports of the MCCs, changes in the posture of the subject are rare in MONICA. Changes between serum and plasma for TC determination, even if rare in MONICA, are of concern. The same concerns extend to the use of plasma throughout because there is evidence of an increase in the concentration of EDTA in the vials over the years.
The potential biases between different analytical methods have not been considered in this quality assessment. It is assumed that such biases are reflected in the results of the EQC. Therefore, if the results of the EQC are acceptable, it is assumed that the accuracy of the measurements is also acceptable, regardless of the analytical method used. In order to uphold such an assumption, the preparation of the lyophilized control pools required a lengthy process in identifying pools which were free from materials interfering with chemical and enzymatic methods, and therefore free from pool-specific matrix effects. This process has been extensively explained. There could still be some matrix effects, specific to instruments or reagents, but it has been assumed that such effects would have been detected by problematic EQC results.
Figures 3a and 4a present the results of all EQC pools analyzed in the participating laboratories for the absolute value of bias and coefficients of variation, respectively. For both parameters, the proportions of unacceptable values decreased from the initial to the final surveys. This indicates that the performance of the laboratories has improved during the study period. The same is also reflected by the regression lines in the figures.
The reference laboratory, WHO-RLRC, in Prague was standardized on the CDC laboratory in Atlanta, GA, before and towards the end of the MONICA survey periods. In addition, most of the EQC pools prepared by WHO-RLRC were also analyzed in CDC using the Abell-Kendall reference method. The results of the comparison between Prague and CDC are shown in Figure 1 and Figure 2. There is no indication of bias or drift between the two quality control laboratories.
Three scoring systems were developed in MONICA to summarise the results of the EQC for each RUA. The first one, developed after the initial surveys, was a relatively simple summary. It was then refined over the years as shortcomings in the earlier versions were detected and more experience was gained on the importance of the different aspects of the quality in the data analysis. The final EQA SCORE considers coverage of the AP with EQC, percentage of pools out of limits for bias, and for imprecision and the consistency of the bias over the analysis period. This final EQA SCORE allows for the assessment of the coverage of EQC and the performance of each RUA in each survey as well as to define conservative criteria to identify RUA-specific surveys in which correction of data may be applied. If the EQC SCORE indicates unacceptable performance for an RUA in a specific survey, the RUA should be excluded or given a low weight in data analyses. Figures 3b and 4b show trends in the absolute values of biases and coefficients of variation after exclusion of results of those RUAs with unacceptable EQA SCORES. In comparison with the scatter plots including all RUAs (Figures 3a, and 4a) the proportions of pools with unacceptable bias or coefficient of variation are reduced, and the regression lines no longer show trends in bias or imprecision. It is important to note that even after the exclusion of unacceptable EQA SCORES, the Figures still show values of bias and coefficient of variation which are out of quality limits. This is due to the fact that the EQA SCORE accepts occasional values outside the limits, which are also possible in laboratories of very high quality.
Finally, an overall summary score for TC quality was developed for weighting each RUA in trend analyses. It considers major pre-analytical infringements of MONICA standards, EQA SCORE, survey-specific AVERAGE BIAS, differences in AVERAGE BIAS between surveys (called DELTA BIAS) and TREND BIAS, i.e. the regression coefficient of bias of all relevant pools on time. These two last parameters have been added to identify situations where a RUA has acceptable AVERAGE BIASes in each survey, but the biases are in opposite directions and produce sizeable trends in bias (over 3%).
The standardization of the quality of TC measurements in MONICA was a challenging task. Compared with the standardization of the measurements of the other risk factors, such as smoking and blood pressure, it has the advantage that part of it is covered by EQC which provides quantitative results. If a laboratory shows a good performance in the EQC, it is very likely that its measurements are reasonably accurate. On the other hand, if the quality of a laboratory is considered unacceptable, there is still the possibility that the quality is good, but for some reason the coverage of the EQC was poor or the laboratory had problems in the handling of the measurement of the quality control pools, but not of the actual survey samples. It is quite evident that thorough standardization and quality control procedures are imperative for risk factor monitoring, such as in MONICA. It is also vital that the participating centres consider it as an as integral part of the Project as the actual survey measurements.
The following list includes only the RUAs with specific findings or exceptional background information relevant for the use of the data, and the RUAs for which additional clarification or correction of data are expected.
AUS-NEW
AUS-PER
BEL-CHA
BEL-GHE
BEL-LUX
CAN-HAL
CHN-BEI
DEN-GLO
FIN-KUO, FIN-NKA and FIN-TUL
FRA-STR
FRA-TOU
GER-BER
GER-BRE
GER-COT
GER-EGE
GER-ERF
GER-HAC
GER-KMS
GER-RDM
GER-RHN
HUN-BUD
HUN-PEC
ICE-ICE
ISR-TEL
ITA-BRI
ITA-FRI
ITA-LAT
LTU-KAU
MLT-MLT
NEZ-AUC
POL-TAR
ROM-BUC
RUS-MOC
RUS-MOI
RUS-NOC
RUS-NOI
SWE-GOT
SWE-NSW
SWI-TIC
SWI-VAF
UNK-BEL
UNK-GLA
USA-STA
YUG-NOS