Readers' guide for causation: Was a comparison group for those at risk clearly identified?
ACP J Club. 1992 Jan-Feb;116:A12. doi:10.7326/ACPJC-1992-116-1-A12
ACP Journal Club's criteria for selecting articles on causation include “a clearly identified comparison group for those at risk of, or having, the outcome of interest.” An apparent cause-and-effect relationship is spurious or distorted if it is affected by a “confounder,” that is, a variable that is not the focus of the study, but is associated with exposure to the main factor of interest and independently influences the outcome. To assess or neutralize the influence of potential confounders, an appropriate comparison group is required. For example, in a study of whether a particular bronchodilator was associated with an increased risk for death in asthmatic patients, severity of asthma would be a confounder if those with higher exposure to the bronchodilator also had more severe asthma (1). For a valid assessment of the role of the bronchodilator, patients with asthma of equal severity would have to be compared. The ACP Journal Club criterion requires either a matched comparison group or statistical adjustment for such confounders.
There are 4 study designs commonly used to assess potential causal relationships. A randomized controlled trial (RCT) is the strongest design for assessing the effect of an exposure to a suspected causal factor because randomization ensures that confounders will be distributed by chance between the groups being compared. This is true even for those confounders that are unknown or unmeasured. When the consequences of exposure are rare or late, however, it may not be feasible to enroll enough participants or to follow them long enough to document the effect. Moreover, for potential toxins or hazardous substances, randomization to exposure is unethical, and less powerful study designs must be used.
The next strongest study design is a “cohort analytic” study in which individuals exposed to a putative causal factor, but free of the outcome of interest, are followed to assess their fate, in comparison with a control group of individuals who have not been exposed and who are also initially free of the outcome. Cohort studies can control for potential confounders either by matching exposed and nonexposed participants for these factors at the beginning of the study or by making statistical adjustments to remove the effects of confounders. For our bronchodilator example, an investigator would need to assemble a cohort of asthmatic patients who were using the suspect bronchodilator and a second cohort of asthmatic patients who were using some other bronchodilator. Because severe asthma can be fatal, with or without use of bronchodilators, the two cohorts would have to be similar for the proportion of patients with severe asthma. Similarity can be achieved by direct matching—for each person with asthma exposed to the suspect bronchodilator, an investigator could select a person with similar frequency and severity of asthma. Similarity can also be achieved by determining the effect of severity on the frequency of the outcome (in this case, death), and removing its effect from the analysis statistically. Obviously, this can only be done for those confounders that have been both suspected and measured. Cohort analytic studies can be expensive to conduct if the exposure is rare or if the outcome is infrequent or delayed, as would be the case in the example of bronchodilator use and death from asthma, for which a more feasible alternative design may be required.
The third strongest design is a case-control study. In this design, patients who do have the outcome of interest (“cases”) are compared with control subjects who do not, the objective being to determine whether cases are more likely than controls to have been exposed to the putative causal factor. For many disorders, it is relatively easy to find cases, even with rare diseases, because they are referred to specialized centers. Also, because the cases already have the outcome of interest, the investigator does not have to wait. Thus, this type of study is easier to conduct than cohort studies and RCTs. This design, however, is usually more subject to bias. For example, the data are often collected from old records or patients' memories, both of which may be faulty. Also, the patients referred to the specialists who publish such studies are often not representative of all patients with the disorder, in some situations being worse off (that is why they were referred) or, in other situations, being better off (because they survived long enough to be referred). Unfortunately, in many case-control studies, these prestudy conditions are often unknown and cannot be accounted for in the analysis.
Our asthma example can illustrate yet another problem with case-control studies. Cases would be patients who had died of asthma, and controls would be patients with asthma who had not died. Before comparing the two groups for differences in the use of a particular bronchodilator, the two groups would need to be compared for the severity of their asthma. If there is an imbalance, adjustments in the analysis could be made. Even so, it may be impossible to obtain a valid answer from such a study. If the suspected bronchodilator makes asthma worse, leading to a vicious cycle of increased bronchodilator use until lethal toxicity is reached, adjustments for severity will result in a false conclusion. This would be unavoidable in a case-control study unless data on severity had been collected and were available for all cases and controls before they began to use bronchodilators, an unlikely eventuality.
Descriptive studies, or “analytic surveys,” in which samples of people are assessed simultaneously for potential risk factors and their suspected effects, generally provide very weak evidence because there is a limit to which adjustments can be made for potential or real confounders. Although descriptive studies can occasionally provide such dramatic information that an immediate change in physician behavior is warranted (e.g., thalidomide and birth defects), there also are undesirable consequences when less convincing data lead to excessive response. The drug, Bendectin (an antiemetic used in pregnancy), was withdrawn from the market as a result of case reports suggesting that the drug was teratogenic (2). Since then many epidemiologic studies using appropriate comparison groups have supported the relative safety of this drug (3). As a consequence of the previous misinformation, however, a potentially useful drug is no longer available to the 10% to 15% of pregnant women who could benefit from its use. Studies with descriptive or “cross-sectional” designs should be considered mainly as useful for generating hypotheses to be tested in more rigorous studies.
The strength of the study design is important to the success of testing a causal hypothesis. It is, however, possible to compromise a strong study design by mishandling other aspects of methodology. A poorly done randomized trial can miss the truth, whereas a well-designed and well-executed case-control study can find it (4). Also, for determining matters of causation, one study seldom establishes the case beyond a reasonable doubt, no matter what the “P value.” Other measures of causation (5) will be described in a future editorial.
If studies of causation of interest to internists provide comparison groups and appropriate handling of confounders, we will publish them in ACP Journal Club. Our criteria for inclusion of studies of causation are lenient, and clinically relevant cause-and-effect associations will reliably come to the attention of ACP Journal Club readers. Readers must be discriminating about how much stock to put in the findings of the studies. A key determinant of strength of evidence is the study design. The structured abstract for each article indicates the design, but the reader must recall the design's relative merits and weaknesses.
Mitchell A. Levine, MD, MSc