Module 8: Error and Hierarchy of Evidence
Contributors: Scott Wells, Amy Kinsley, Julio Alvarez
Key Concepts
- Understand concept of error: What can go wrong in epidemiologic evaluations?
- Apply hierarchy of evidence to evaluate results from epidemiologic studies.
Understand concept of error: What can go wrong in epidemiologic evaluations?
Sources of error in epidemiologic data evaluation
- In experimental studies (or non-observational studies), the design of the study involves deliberately changing population parameters and applying a treatment or measure (called ‘exposure’) to randomly selected groups of animals and then assessing its effect (‘outcome’). In this situation, the evaluator controls the allocation of study subjects to study groups (for example, vaccinated vs. non-vaccinated animals). The random assignment of study subjects to different experimental groups (vaccinated or non-vaccinated, treated or non-treated, exposed or non-exposed) provides better (unbiased) information to assess the effect of the treatment (the measure under evaluation) to the outcome.
- In contrast, in observational studies, the evaluator(s) collects data from a population in a given space and time without interfering with the subjects evaluated. They ‘observe’ the individuals under study and record data on variables of interest (often through collection of samples) without assigning treatments to the subjects. The important point is that the individuals under study are not assigned to groups by the evaluator, randomly or otherwise.
Because of the uncontrolled nature of evaluation of observational studies (no randomized selection of animals into groups for treatments), particular scrutiny is required to evaluate the validity of results of these studies. To understand the sources of potential error in observational studies and their implications for the results obtained, it is important to ask some questions:
- What were the criteria used for case definition and how reliable were the procedures for measuring these criteria?
- Are estimates of expected diagnostic test error available (e.g., sensitivity and specificity)?
- How was the exposure(s) measured?
- What was the likelihood that individuals were correctly classified as exposed or non-exposed?
- Was the study population likely to be representative and were there potential errors related to sampling?
- Was the study large enough to detect statistically significant associations?
- Were the observed associations strong enough to be biologically meaningful?
Random versus Systematic Error
Error in epidemiologic studies is typically categorized as random or systematic error (Figure 1).
Random error
Observational studies are conducted on samples of populations and findings should ideally be relevant to those populations. For a defined population, there is only one true population mean (for any measure), yet multiple subsamples drawn from that population will yield a distribution of estimates of that ‘true’ mean. This random error in estimation occurs even if sampling is conducted following optimal procedures. The failure to obtain a sample mean that is exactly the same as the true population mean is attributed to random error (lack of precision) and is caused by sampling variation. Random error is large when the sample size is small and the population is very heterogeneous.
Figure 1: Random and Systematic Error
Systematic error
Systematic error (lack of accuracy) can result from many sources including bias in selection of study subjects, bias in information gathered, and bias due to confounding (to be discussed shortly). Selection bias occurs if study subjects are not randomly selected, or are selected randomly from a non-representative subset of the population. Bias is a term used to indicate the presence of systematic error in a study, and the term validity is used in epidemiology to indicate the absence of bias.
Increased sample size will improve precision (i.e., reduce the random error leading to narrower confidence intervals) but increased sample size in a biased study will only increase confidence in an incorrect estimate. Most studies can be assumed to include some degree of both random and systematic error.
Measurement Errors (One type of systematic error)
Errors can occur in measurement of both exposures and outcomes. Errors in diagnostic test performance can lead to subjects being erroneously categorized as diseased or healthy individuals. Measurement errors can lead to biased estimates of effect (strength of association), inconsistent results among studies, and inaccurate conclusions. When an outcome or exposure is dichotomous (e.g., male or female; dead or alive), these errors are termed ‘misclassification’ errors. Considerable research has been performed to understand what effects misclassification may have on the results of epidemiologic studies.
Conceptually, measurement errors are divided into 2 types:
Non-differential misclassification
Means that errors in exposure classification are independent of the disease status (e.g., not related to disease vs. no disease status). The effects of non-differential misclassification are to distort (bias) effect estimates towards the null value indicating no association. That means that the Relative Risk will be closer to the null value (e.g., RR=1) and studies will be less likely to demonstrate statistical significance (reduced ‘power to detect a significant association’).
Increasing sample size in a study does not reduce measurement error but does increase the power to detect a significant association (the probability that a true effect will be found to be statistically significant). Table 1 shows a theoretical example of how different scenarios of misclassification of exposure can affect estimates of a ‘true’ association measured by a Relative Risk of 2.0.
Table 1: Effects of non-differential misclassification of exposure in estimates of Relative Risk (true value of RR is 2.0).
Exposure Sensitivity | Exposure Specificity | Prevalence of Exposure | Estimated RR |
0.6 | 0.9 | 0.1 | 1.34 |
0.6 | 0.9 | 0.5 | 1.42 |
0.6 | 0.99 | 0.1 | 1.79 |
0.6 | 0.9 | 0.5 | 1.54 |
0.9 | 0.9 | 0.1 | 1.48 |
0.9 | 0.9 | 0.5 | 1.73 |
0.9 | 0.99 | 0.1 | 1.89 |
0.9 | 0.99 | 0.5 | 1.82 |
Differential misclassification error
Occurs when the probability of misclassification is related to the disease status (disease vs no disease status). Differential misclassification can bias effect measures (like Relative Risk) in either direction (either over or underestimate the RR), depending on whether the disease status is associated with underestimation or overestimation of exposure. Significance tests are not valid if differential misclassification error exists.
Table 2: Effect of differential misclassification of exposure in estimates of Relative Risk. This shows the effects of limited sensitivity and specificity to define exposure only among non-diseased animals (for example, the owner recalls if aborted animals were vaccinated, but doesn’t remember the vaccination status of non-aborted animals: This type of differential misclassification is called Recall bias).
Situation using Test with Sensitivity = 60% (Disease = Abortion). Only 60% (n=10) of the 17 unvaccinated animals that didn’t abort (non-diseased) were remembered as exposed (unvaccinated).
TRUE | Outcome | OBSERVED | Outcome | |||||
Disease+ (abort) | Dis–(no) | Total | Dis+ (abort) | Dis– (no) | Total | |||
Exposure + (unvacc) | 33 | 17 | 50 | Exposure + (unvacc) | 33 | 10 | 43 | |
Exposure - (vacc) | 17 | 33 | 50 | Exposure - (vacc) | 17 | 40 | 57 | |
50 | 50 | 100 | 50 | 50 | 100 |
𝑹𝑹(real)=(𝟑𝟑⁄𝟓𝟎)/(𝟏𝟕⁄𝟓𝟎)=𝟏.𝟗𝟒. The true Relative Risk of abortion was 1.94 times greater in nonvaccinated compared to vaccinated animals.
𝑹𝑹(observed) =(𝟑𝟑⁄𝟒𝟑)/(𝟏𝟕⁄𝟓𝟕)=𝟐.𝟓𝟕. The observed Relative Risk of abortion was 2.57 times greater in unvaccinated compared to vaccinated animals.
In this case, the true RR = 1.94 (almost 2 times higher risk of abortion in unvaccinated animals compared to vaccinated animals), but since the owner only recalled that 10 of the 50 animals that didn’t abort was nonvaccinated (instead of the true value of 17/50), the resulting RR = 2.57 is biased upwards. The problem is that one does not know the true values in the real world, so the evaluator needs to carefully consider whether the owner has used the same method to categorize the exposure (vaccination in this example) for each group, so there would be no recall bias.
Confounding
Another type of differential misclassification leading to bias is Confounding. A confounder (Figure 2) is a variable that is:
- Associated with the exposure of interest.
- Causally associated with the outcome (or a surrogate of a cause).
- Not an intermediate in the pathway from exposure to disease.
Figure 2: Principle of confounding
A simple example of the concept of confounding is the association between ice-cream consumption and violent crime. At an aggregate level, violent crime rates are higher at times of the year when ice-cream consumption is highest. Does that mean that higher ice-cream consumption causes higher violent crime rates? Obviously not, as a missing piece (or extraneous variable) is environmental temperature. Ice-cream consumption will be significantly associated with any activity that increases in warmer weather (e.g., violent crime).
In essence, confounding is the mixing up of the effects of more than one exposure factor when evaluating its effect on a disease outcome. Where confounding occurs, the effect estimate measured (like Relative Risk) contains components due to both the exposure of interest and the confounder(s), and will therefore be biased. In situations where known causal factors are recognized (e.g., smoking and many outcomes such as pancreatic cancer in humans), these factors should be both measured and adjusted for in the design or analysis of a study. A bigger problem is that of unknown confounders that are not measured in a study or evaluation. If any factor is associated with the exposure of interest and has a causal role in disease, the effect estimate (e.g., Relative Risk) for the exposure of interest will be inaccurate.
Confounding examples
- ‘I must be allergic to leather. Every time I go to bed with my leather shoes on, I wake up with a headache.’ Why is this the case? What is a confounder for the association between going to bed with leather shoes on and experiencing a headache the next morning?
- Carrying matches is associated with higher risk of lung cancer. Can you identify a potential confounder for this association?
- Higher rates of preweaned mortality among bull (male) calves compared to heifer (female) calves on many dairy farms. Are bull calves inherently at higher risk of death, or can you identify a potential confounder for this association?
Answers:
- Excess alcohol consumption can be a potential confounder for the association between going to bed with leather shoes on and waking up with a headache.
- Smoking can be a confounder for the association between carrying matches and lung cancer.
- Failure to provide adequate colostrum (quantity and quality) to bull calves due to the lower value of these animals to dairy operations can be a potential confounder for the association between sex and preweaning mortality in dairy cattle.
Several approaches can be taken to address can be taken to limit the impacts of confounding:
- Exclusion: Make all study subjects uniform for the confounding variable (e.g., limit inclusion criteria by age, breed, farm type, herd size, sex, etc). This may affect the representativeness of the study results and does not allow evaluation of the effects of potential confounders which may also influence the biological effect of the association being studied (Effect modification – see below).
- Matching: Ensure groups in the study are equivalent for the confounder (e.g., matching by age or year of diagnosis). Matching on multiple variables introduces logistic difficulties but some matching is often done in case-control studies. Matching does have shortcomings in both cohort and case-control studies (different study designs are covered below).
- Analysis: Both simple (stratification by confounder) and more complex (multivariable methods) analytical tools are available to adjust for known or potential confounding factors in observational studies and evaluations. However, these rely on these factors being both identified and measured. With some analytical methods, the presence of confounding can be inferred if a variable added to a model materially influences the effect estimate of the exposure of interest (e.g., addition of ‘smoking’ (yes or no) in a study of pancreatic cancer of humans might markedly alter the Relative Risk estimate of the effect of coffee consumption).
Effect modification
The term ‘Effect modification’ is used when the true effect of an exposure of interest (like Relative Risk) is different depending on the level of another variable. For example, the association between age and systolic blood pressure may differ between men and women. Unlike confounding, which distorts estimates of effect and needs to be corrected for, effect modification (similar to an interaction in experimental studies) makes up part of the biological reality that needs to be described. Research conducted to estimate the association of a given vaccine in pregnant animals and the occurrence of reproductive failure may find different results if performed on younger vs. older animals. If only one age group is analyzed, results can be incomplete (and should not be extrapolated to the other group). Instead, it can be important to evaluate the association between an exposure (like vaccination) and disease outcome (like reproductive failure) in both younger and older animals, which shows the effect modifying impact of age on this association.
Other sources of bias frequently mentioned in epidemiologic literature:
- Selection bias: Bias due to the process of selection of subjects in a study.
- Follow-up bias: In prospective studies, subjects may be lost to follow up for different reasons according to disease status.
- Response bias: Particular subjects may be more or less likely to respond to a survey according to their interest or history related to the disease or exposure.
- Recall bias: In survey studies, cases and controls may differ in their recall of exposures. Cases may search for (or believe in) a cause for their disease and be more likely to report an exposure than controls.
- Publication bias: Some analytical studies (meta-analyses) are performed using existing published studies that are collectively biased (e.g., studies showing positive results (e.g., p<0.05) are more likely to be published than studies showing results of no effect).
Apply hierarchy of evidence to evaluate results from epidemiologic studies
Evidence-based Veterinary Medicine
Regardless of our job as professionals, veterinarians often need to integrate ‘Best evidence available’ with their own clinical expertise to decide on a course of action (to make a decision). In order to do this, a veterinarian needs to be able to critically appraise the existing evidence (including peer-reviewed literature) in terms of validity, impact, and applicability, and will have to be up-to-date with new developments in his/her field of work. There has been a rapid increase in the availability of sources of information in recent years, but information quality is highly variable, thus highlighting the need for a critical mind.
When looking at the information available, it is helpful to follow a systematic approach that includes the following steps (from Stevenson, 2008):
- Describe the evidence
- Assess the internal validity of the study
- Assess the external validity of the study
- Compare the results with other available evidence
You have now learned how to assess the internal validity (misclassification errors and bias) and external validity (how well does the sample represent the population) of the study. There are a few additional things that you should consider to describe the evidence (to make sure that you understand the information presented, and how the conclusions were reached).
When describing a study, the first step is to identify its purpose (objectives) and the means to achieve it (the methods). In epidemiology, all studies will usually involve the description of something happening to someone (or some animal) at ‘some time and place.’
Remember: Case definition, that describes the population evaluated with individual , space, and time factors.
Definitions:
- Exposure: any trait, behavior, environmental factor or other characteristic that can potentially cause or prevent an outcome. Also referred to as a risk or protective factor, or independent variable or predictor.
- Outcome: an effect measured in the population, potentially as a result of an exposure. In epidemiology, usually the outcome is the occurrence of disease or a specific clinical manifestation. Also sometimes called the effect or dependent variable.
Depending on their design, studies can be broadly classified as:
Experimental studies (or non-observational studies):
The design of the study involves deliberately changing population parameters and applying a treatment or measure (sometimes called ‘exposure’) and then assessing its effect (‘outcome’). Here, the researcher controls the allocation of study subject to study groups (i.e., vaccination vs. non-vaccination).
- The ‘Randomized Controlled Clinical Trial’ (RCT) is an experimental study design that involves the random assignment of study subjects to different experimental groups (vaccinated vs.non-vaccinated, treated vs. non-treated, exposed vs. non-exposed) thus providing better (unbiased) information to assess the effect of the treatment or measure under evaluation over the outcome.
Observational studies:
The researcher(s) only collects data from a population in a given space and time without interfering with the subjects of study. They ‘observe’ the individuals under study and records data on variables of interest (often through the collection of samples) without assigning treatments to the subjects. Depending on their objective, observational studies can be further subdivided into descriptive and analytic studies.
B1. Descriptive observational studies:
These studies are conducted with no specific hypothesis to be tested. There is no formal comparison between study groups and no conclusion on an association between an exposure and an outcome. These studies however often serve as the basis to formulate an hypothesis to be tested in future studies. For example: All cases of disease X in population Y were observed in males, so is sex associated with disease?.
There are three major types of descriptive studies:
- Case reports: Description of a clinical occurrence that is noteworthy due to its unusual presentation.
- Case series: Description of a series of cases and identifies common and variable features among them.
- Surveys: A study conducted to measure the frequency of occurrence of an event (usually a disease) in a population with no further hypothesis to be evaluated.
B2. Analytical observational studies:
These studies are conducted specifically to test a hypothesis (usually if exposure X causes outcome Y). Although analytical studies usually aim to prove a causal relationship between exposure and outcome, they sometimes cannot go beyond demonstrating the association between these. The major classes of analytical observational studies (note that experimental studies are analytical by definition) are:
- Ecological studies: In these studies, the unit of analysis in which both exposure and outcome are measured is a group of individuals (for example, a cattle population in a county or all the dogs in a city). These studies may be affected by ecologic fallacy, a fallacy in the interpretation of data that occurs when inferences about the nature of individuals are deduced from inferences about the group to which those individuals belong.
- Cross-sectional studies: In these studies, the exposure and outcome are determined in a population at a given point in time (a snapshot in time).
- Case-control studies: In these studies, individuals under study are selected and classified as cases or controls based on the presence (cases) or absence (controls) of the outcome of interest, and the frequency of exposure in each group is then compared.
- Cohort studies: In these studies, individuals under study are divided into different groups (different cohorts) based on differential exposures and then followed over time to measure the occurrence of disease.
The table attached below (Dohoo et al, 2007) provides an illustration of characteristics of various study design types, and also provides a summary of the Quality of Evidence (Strength of proof of causal association) generated from each.
B3. Systematic Reviews and Meta-analyses:
Systematic Reviews provide the highest level of information (the top of the Quality of Evidence pyramid, see the Figure below) as they involve a systematic method of review of pre-appraised (previously evaluated) information. A systematic review is different from a Narrative Review of literature which does not involve a systematic process and instead is influenced by the interpretation of the lead author(s).
Meta-analysis is "the statistical analysis of a large collection of analysis results from individual studies for the purpose of integrating the findings” (Glass, 1976).
So, how to evaluate the quality of information from different studies? Answer: By evaluating evidence from different types of studies and comparing the quality of information generated. The figure below provides a guide, with the highest ranked evidence for causation on the top of the pyramid and the lowest ranked evidence on the bottom.
How to compare studies based on hierarchy of evidence?
Now to use some examples to demonstrate how to compare studies based on the quality of information provided. To do this, it is very helpful to answer the following questions:
- What is the study objective?
- What is the study outcome?
- What is the exposure evaluated?
- Which type of study design is this?
Note: These questions do not directly address the study findings, but instead the study design and quality of evidence generated.
Case 8. 1. Evaluation of safety and immune response of vaccination
Pérez-Sancho M, Adone R, García-Seco T, Tarantino M, Diez-Guerrier A, Drumo R, Francia M, Domínguez L, Pasquali P, Álvarez J. 2014. Evaluation of the immunogenicity and safety of Brucella melitensis B115 vaccination in pregnant sheep. Vaccine. 2014 Apr 1;32(16): 1877-81.
Abstract: In spite of its limitations, Rev.1 is currently recognized as the most suitable vaccine against Brucella melitensis (the causative agent of ovine and caprine brucellosis). However, its use is limited to young animals when test-and-slaughter programs are in place because of the occurrence of false positive-reactions due to Rev.1 vaccination. The B. melitensis B115 rough strain has demonstrated its efficacy against B. melitensis virulent strains in the mouse model, but there is a lack of information regarding its potential use in small ruminants for brucellosis control. Here, the safety and immune response elicited by B115 strain inoculation were evaluated in pregnant ewes vaccinated at mid-pregnancy. Vaccinated (n=8) and non-vaccinated (n=3) sheep were periodically sampled and analyzed for the 108 days following inoculations using tests designed for the detection of the response elicited by the B115 strain and routine serological tests for brucellosis [Rose Bengal Test (RBT), Complement Fixation Test (CFT) and blocking ELISA (ELISAb)]. Five out of the 8 vaccinated animals aborted, indicating a significant abortifacient effect of B115 inoculation at midpregnancy. In addition, a smooth strain was recovered from one vaccinated animal, suggesting the occurrence of an in vivo reversion phenomenon. Only one animal was positive in both RBT and CFT simultaneously (91 days after vaccination), confirming the lack of induction of cross-reacting antibody responses interfering with routine brucellosis diagnostic tests in most B115-vaccinated animals.
From Materials and Methods: ‘They were mated after oestrus synchronization and later randomly divided into two experimental groups: (1) vaccinated group (n = 8) and (2) control animals (non-vaccinated group, n = 3). All animals were kept together in the same isolated pen with food and water provided ad libitum.’
Questions:
- What is the study objective?
Evaluation of the safety and immune response elicited by B115 strain inoculation in pregnant ewes vaccinated at mid-pregnancy.
- What is the study outcome?
Abortion risk and response to serologic tests for brucellosis.
- What is the exposure evaluated?
B. melitensis B115 vaccination
- Which type of study design is this?
Randomized controlled clinical trial. This type of epidemiologic study provides the highest quality of information for disease causation among primary study types.
Case 8.2. Factors associated with Coxiella burnetii
Alvarez J, Perez A, Mardones FO, Pérez-Sancho M, García-Seco T, Pagés E, Mirat F, Díaz R, Carpintero J, Domínguez L. 2012. Epidemiological factors associated with the exposure of cattle to Coxiella burnetii in the Madrid region of Spain. 2012. Vet J. Oct;194(1):102-7.
Abstract: Domestic ruminants are considered to be the major source of Coxiella burnetii, the causative agent of Q fever. Even though Q fever is considered to be present worldwide, its distribution in many areas and countries remains unknown. Here, a serological assay was used to estimate the seroprevalence of C. burnetii in cattle in the Madrid region of Spain, to assess its spatial distribution, and to identify risk factors associated with positive results. Ten animals from each of 110 herds (n=1100) were randomly selected and analyzed using an ELISA test. In addition, epidemiologic information, at both the herd and individual level, was collected. Variables for which an association with test results was detected in a bivariate analysis were included as predictors (main effects) in a multivariable logistic regression model. Herd and individual seroprevalences were 30% (95% CI=22.2-39.1) and 6.76% (95% CI=5.42-8.41), respectively, and a strong spatial dependence was identified at the first neighbour level using the Cuzick-Edwards test. Production type (dairy >beef >bullfighting) and age of animals (old vs. young) were the only variables significantly associated (P<0.05) with positive serological results at the herd and individual levels, respectively. These results indicate that cattle are exposed to C. burnetii in the Madrid region. The high herd seroprevalence found in dairy herds (75%) indicates a higher risk of infection (probably for management reasons) whereas no C. burnetii positive bullfighting herds were identified.
Questions:
- What is the study objective?
Cattle in the Madrid region of Spain are infected with Coxiella burnetii with a non-random geographic distribution, and risk factors associated with positive results include production type and age of animals.
- What is the study outcome?
Seropositive test results for Coxiella burnetii in tested cattle.
- What were the exposures evaluated?
Production type and age of animal, among other variables.
- Which type of study design is this?
Cross-sectional study, based on testing of animals in study herds at a single point in time and collection of epidemiologic data at the same point in time.
Case 8.3. Vaccination to control Salmonella
De la Cruz ML, Conrado I, Nault A, Perez A, Dominguez L, Alvarez J. 2017. Vaccination as a control strategy against Salmonella infection in pigs: A systematic review and meta-analysis of the literature. Res Vet Sci. Oct;114:86-94.
Abstract: Consumption or handling of improperly processed or cooked pork is considered one of the top sources for foodborne salmonellosis, a common cause of intestinal disease worldwide. Asymptomatic carrier pigs may contaminate pork at slaughtering; therefore, pre-harvest reduction of Salmonella load can contribute to reduced public health risk. Multiple studies have evaluated the impact of vaccination on controlling Salmonella in swine farms, but results are highly variable due to the heterogeneity in vaccines and vaccination protocols. Here, we report the results of an inclusive systematic review and a meta-analysis of the peer-reviewed scientific literature to provide updated knowledge on the potential effectiveness of Salmonella vaccination. A total of 126 articles describing the use of Salmonella vaccines in swine were identified, of which 44 fulfilled the inclusion criteria. Most of the studies (36/44) used live vaccines, and S. Typhimurium and S. Choleraesuis were the predominant serotypes evaluated. Vaccine efficacy was most often measured through bacteriological isolation, and pooled estimates of vaccine efficacy were obtained as the difference in the percentage of positive animals when available. Attenuated and inactivated vaccines had similar efficacy [Risk Difference=-26.8% (-33.8, -19.71) and -29.5% (-44.4, -14.5), respectively]. No serotype effect was observed on the efficacy recorded for attenuated vaccines; however, a higher efficacy of inactivated vaccines against S. Choleraesuis was observed, though in a reduced sample. Results from the meta-analysis here demonstrate the impact that vaccination may have on the control of Salmonella in swine farms and could help in the design of programs to minimize the risk of transmission of certain serotypes through the food chain.
Questions:
- What is the study objective?
Vaccination against Salmonella reduces transmission in swine populations.
- What is the study outcome?
Isolation of Salmonella from swine samples.
- What is the exposure evaluated?
Use of Salmonella vaccines in swine.
- Which type of study design is this?
Systematic review and Meta-analysis. This involves a systematic review of pre-appraised (previously evaluated) information from other studies, and provides the highest level of information, including meta-analysis to generate summary effect (Risk Difference) estimates.