In previous articles, we have learned how to access & read journal articles, how to interpret validity as a measure of the strength of evidence, and the various types of studies. With this foundation, we should have a basic idea of what to look for in a study. The next step would be developing our skills of interpreting the actual data from the article, which can be expressed as charts, tables, graphs, or text. The next two articles will be focused on the two major types of results in studies: sample data & endpoint data.
Sample data are information about the subjects who participated in the study. Although not what most people think of when they think of study results, sample data can profoundly influence one’s interpretation of endpoint data.
The first thing to consider in sample data is the sample size (n), or how many people participated in the study. Ideally, studies should enrol enough individuals to detect a treatment effect. For example, drug A may slightly improve cognitive function, but perhaps only in 15% of people. If only 50 patients are enrolled in the study, & 25 receive the study drug while the other half receives placebo, then only 3.75 individuals will likely experience the slight improvement in cognitive function. This small change is unlikely to be detected, especially when tested against a placebo effect.
How many subjects are needed, then, in order to truly detect a treatment effect? There exist statistical tests which guide investigators in all aspects of study design, including calculating the minimum sample size needed to detect a treatment effect. This process usually involves:
- estimating the magnitude of the treatment effect based upon previous studies,
- estimating the number of subjects needed to truly see that treatment effect, &
- minimizing the effects of chance & randomness
The first two parts should seem fairly intuitive after the previous example. The third requirement is a new concept, & is based on the following components (This is an area of biostatistics that gets a little complicated with over-thinking, but can be understood with enough time. Comment if you have questions!):
- The hypothesis (H1) is what is predicted to happen in the experiment. It should be explicitly defined, such as: drug A shortens rapid recall time in comparison to placebo.
- The null hypothesis (H0) is the opposite of the hypothesis- it is what is predicted to happen if there is no treatment difference: drug A does not affect rapid recall time in comparison to placebo.
Next we have the concepts of true positives/negatives & false positives/negatives. This is usually discussed in terms of the null hypothesis, which can sometimes be confusing if you overthink it.
The leftmost column represents the real result. If drug A really shortens rapid recall time, then in the best case scenario, the researchers will detect this difference & claim a treatment difference. This means rejecting the null hypothesis & accepting the hypothesis. If they fail to detect this difference (maybe because they did not enrol enough participants) then we would see the false negative scenario.
On the other hand, if drug A really is ineffective, then we would either see the true negative scenario or a false positive scenario. Both the false positive & false negative scenarios emerge out of chance & confounding (e.g. some subjects on the placebo have a good day while taking the recall test, or some start taking phenylpiracetam without telling the investigators, or some subjects in the treatment group take the recall test after a night of heavy drinking; these are all examples of biases that could compromise internal validity).
- A false positive is called a type I error. The chance of a type I error happening is denoted as α (alpha).
- A false negative is called a type II error. The chance of a type II error happening is denoted as β (beta).
- When β is subtracted from 1 (1 – β), this difference is called the power of a study. The power is the chance of finding a true treatment effect (true positive). To bring the discussion back to prior to this talk on hypotheses, true & false results, & errors, power is a key component in determining what sample size we need to detect a treatment effect. The standard power in a study is 80% (1 – β = 0.80). At 80% power, we have an 80% chance of correctly rejecting the H0 & claiming that there is a treatment difference when, in fact, there is one. We achieve sufficient power by enrolling enough subjects, as power & sample size are proportionally related (⊕ n → ⊕ 1 – β ).
*Although power higher than 80% is possible, usually it is not done because:
- Initial increases in sample size lead to higher increases in power than further increases, as shown in the following graph.
- Enrolling many subjects requires a lot of funding.
- α & β are inversely proportional- that means that as you increase your power (1 – β) & β decreases, then α will increase- meaning there is a higher risk of a false positive. At 80% power, there is a 20% chance of a false negative & a 5% chance of a false positive. 80% power is considered a balance between the type I error rate & the type II error rate.
So, we have established that power is a key driver of how many patients to enroll. We need to enroll enough patients so that the treatment effect does emerge & we can reliably detect it, with minimal concern for confounding & chance.
A few other odds & ends to consider in sample data:
- Ideally, all treatment groups will be equal or similar in size
- All treatment groups should remain similar throughout the study- if more subjects drop out of one group than another, that could lead to attrition bias. If a disproportionate amount of patients drops out from the study drug group, that should be a warning sign– investigate why they dropped out (side effects?).
- Baseline characteristics are traits of the subjects in the study at the beginning before the study drug or placebo has been administered. These data, as previously mentioned, are typically presented in a standard table 1. It is important to skim this table for (1) significant differences between the groups & (2) generalisability/external validity- whether the results could be applied to you.
- Sample data describe the characteristics of participants in a study.
- Sample size (n) is the amount of subjects that are enrolled in a study. It is far from an arbitrary number– the determination of how many people are needed in a study is predicated upon the anticipated magnitude of the treatment effect, the number of subjects needed to have that effect emerge, & minimising «statistical noise» by reducing the risk of false positives (α) & false negatives (β).
- Power (1 – β) is the chance of a true positive. Researchers typically prefer 80% power as it balances the chance of a false negative with the chance of a false positive (both error rates cannot be controlled simultaneously). Power is achieved by increasing the number of patients in a study. For those interested, power & its determinants can be mathematically expressed as: ⊕ Δ , – σ, ⊕ n → ⊕ δ → ⊕ ( 1 – β ) where Δ is the true difference between groups, σ is the standard deviation, n is the sample size, & δ is the non-centrality parameter. An equation form would be δ = ( Δ ÷ σ ) √ ( n ÷ 2 )
- Other key sample data to consider are the baseline, interim, & final sample sizes between the groups as well as the baseline characteristics.
References [ + ]
|1.||↑||Mohamed AD, Lewis CR. Modafinil increases the latency of response in the Hayling Sentence Completion Test in healthy volunteers: a randomised controlled trial. PLoS One. 2014 Nov 12;9(11):e110639.|