You can enhance your skills in evidence-based practice using these PEDro tutorials:
Before you start searching for clinical research it is best to spend some time thinking about the question you want to answer. This is because forming and refining your question makes it easier to find research to answer it. This video tutorial explains how to ask clinical questions in PICO (Patient-Intervention-Comparison-Outcome) format.
Is low-energy laser an effective treatment for lateral epicondylitis? Do stretching programs prevent the development of contracture following stroke? Can the use of the flutter valve reduce postoperative respiratory complications? Rigorous answers to these questions can only be provided by properly designed, properly implemented clinical trials. Unfortunately the literature contains both well performed trials which draw valid conclusions and badly performed trials which draw invalid conclusions. The reader must be able to distinguish between the two. This tutorial describes key features of clinical trials (or “methodological filters”) which confer validity.
Some studies which purport to determine the effectiveness of physiotherapy treatments simply assemble a group of subjects with a particular condition and take measures of the severity of the condition before and after treatment. If subjects improve over the period of treatment, the treatment is said to have been effective. Studies which employ these methods rarely provide satisfactory evidence of treatment effectiveness because it is rarely certain that the observed improvements were due to treatment, and not to extraneous variables such as natural recovery, statistical regression (a statistical phenomena whereby people become less “extreme” over time simply as a result of the variability in their condition), placebo effects, or the “Hawthorne” effect (where subjects report improvements because they think this is what the investigator wants to hear). The only satisfactory way to deal with these threats to the validity of a study is to have a control group. Then a comparison is made between the outcomes of subjects who received the treatment and subjects who did not receive the treatment.
The logic of controlled studies is that, on average, extraneous variables should act to the same degree on both treatment and control groups, so that any difference between groups at the end of the experiment should be due to treatment. By way of example, it is widely known that most cases of acute low back pain resolve spontaneously and rapidly, even in the absence of any treatment, so simply showing that subjects improved with a course of a treatment would not constitute evidence of treatment effectiveness. A controlled trial which showed that treated subjects fared better than control subjects would constitute stronger evidence that the improvement was due to treatment, because natural recovery should have occurred in both treatment and control groups. The observation that treated subjects fared better than control subjects suggests that something more than natural recovery was making subjects better. Note that, in a controlled study, the “control” group need not receive no treatment. Often, in controlled trials, the comparison is between a control group which receives conventional therapy and an experimental group which receives conventional therapy plus treatment. Alternatively, some trials compare a control group which receives conventional treatment with an experimental group that receives a new therapy.
Importantly, control groups only provide protection against the confounding effects of extraneous variables in so far as treatment and control groups are alike. Only when treatment and control groups are the same in every respect that determines outcome (other than whether or not they get treated) can the experimenter be certain that differences between groups at the end of the trial are due to treatment. In practice this is achieved by randomly allocating the pool of available subjects to treatment and control groups. This ensures that extraneous factors such as the extent of natural recovery have about the same effect in treatment and control groups. In fact, when subjects are randomly allocated to groups, differences between treatment and control groups can only be due to treatment or chance, and it is possible to rule out chance if the differences are large enough – this is what statistical tests do. Note that this is the only way to ensure the comparability of treatment and control groups. There is no truly satisfactory alternative to random allocation.
Even when subjects are randomly allocated to groups, it is necessary to ensure that the effect (or lack of effect) of treatment is not distorted by “observer bias”. This refers to the possibility that the investigator’s belief in the effectiveness of a treatment may subconsciously distort the measurement of treatment outcome. The best protection is provided by “blinding” the observer – making sure that the person who measures outcomes does not know if the subject did or did not receive the treatment. It is generally desirable that patient and therapists are also blinded. When patients have been blinded, you can know that the apparent effect of therapy was not produced by placebo or Hawthorne effects. Blinding therapists to the therapy they are applying is often difficult or impossible, but in those studies where therapists are blind to the therapy (as, for example, in trials of low-energy laser where the device emits either laser or coloured light, but the therapist is not informed which), you can know that the effects of therapy were not produced by the therapist’s enthusiasm with the therapy, rather than by the therapy itself.
It is also important that few subjects discontinue participation (“drop-out”) during the course of the trial. This is because dropouts can seriously distort the study’s findings. A true treatment effect might be disguised if control subjects whose condition worsened over the period of the study left the study to seek treatment, as this would make the control group’s average outcome look better than it actually was. Conversely, if treatment caused some subjects’ condition to worsen and those subjects left the study, the treatment would look more effective than it actually was. For this reason dropouts always introduce uncertainty into the validity of a clinical trial. Of course the more dropouts, the greater the uncertainty – a rough rule of thumb is that if more than 15% of subjects drop out of a study, the study is potentially seriously flawed. Some authors simply do not report the number of dropouts. In keeping with the established scientific principal of guilty until proven innocent, these studies ought to be considered to be potentially invalid.
To summarise, valid clinical trials:
- randomly allocate subjects to treatment and control groups
- blind observers, and preferably patients and therapists as well
- have few dropouts.
The next time you read a clinical trial of a physiotherapy treatment, ask yourself if the trial has these features. As a general rule, those trials which do not satisfy these criteria could be invalid and should not be considered to constitute strong evidence of treatment effectiveness (or ineffectiveness). Those trials which do satisfy these criteria should be read carefully and their findings should be committed to memory!
If you want to read further about assessing trial validity, try:
Guyatt GH, Sackett DL, Cook DJ, et al. User’s guide to the medical literature: II. How to use an article about therapy or prevention: A. Are the results of this study valid? JAMA 1993;270:2598-601.
The preceding tutorial presented a list of criteria which readers can use to differentiate studies that are likely to be valid from those that may not be. Studies which do not satisfy most of the methodological filters are usually best ignored. This section considers how therapists should interpret those trials which satisfy most of the methodological filters. The message is that it is not sufficient to look simply for evidence of a statistically significant effect of the therapy. You need to be satisfied that the trial measures outcomes that are meaningful, and that the positive effects of the therapy are big enough to make the therapy worthwhile. The harmful effects of the therapy must be infrequent or small so that the therapy does more good than harm. Lastly, the therapy must be cost-effective.
Of course, for a trial to be useful it must investigate meaningful effects of treatment. This means that the outcomes must be measured in a valid way. In general, because we usually judge the primary worth of a treatment by whether it satisfies patients’ needs, measurement outcomes should be meaningful to patients. Thus a trial which shows that low-energy laser lowers serotonin levels is much less useful than one which shows that it reduces pain, and a trial which shows that motor training reduces spasticity is much less useful than one which shows it enhances functional independence.
The size of the therapy’s effect is obviously important, but often overlooked. Perhaps this is because many readers of clinical trials do not appreciate the distinction between “statistical significance” and “clinical significance”. Or perhaps it reflects the preoccupation of many authors of clinical trials with whether “p < 0.05” or not. Statistical significance (“p < 0.05”) refers to whether the effect of the therapy is bigger than can reasonably be attributed to chance alone. That is important (we need to know that the observed effects of therapy were not just chance findings) but on its own tells us nothing about how big the effect actually was. The best estimate of the size of the effect of a therapy is the average difference between groups. Thus, if a hypothetical trial on the effects of mobilisation reports that shoulder pain, as measured on a 10 cm visual analogue scale, was reduced by a mean of 4 cm in the treatment group and 1 cm in the control group, our best estimate of the mean effect of treatment is a 3 cm reduction in VAS (as 4 cm minus 1 cm is 3 cm). Another hypothetical trial on muscle stretching before sport might report that 2% of patients in the stretch group were subsequently injured, compared to 4% in the control group. In that case our best evidence is that stretching reduced the risk of injury by 2% (as 4% minus 2% is 2%). Readers of clinical trials need to look at the size of the reported effect to decide if the effect is big enough to be clinically worthwhile. Remember patients often come to therapy looking for cures (of course this generalisation may not hold in all areas of clinical practice) – most are not interested in therapies which have only small effects.
There is an important subtlety in looking at the size of a therapy’s effects. It applies to studies whose outcomes are measured with dichotomous outcomes (dichotomous outcomes can have one of two values, such as dead or alive, injured or not injured, admitted to nursing home or not admitted; this contrasts with variables such as VAS measures of pain, which can have any value between and including 0 and 10). Many studies that measure dichotomous outcomes will report the effect of therapy in terms of ratios, rather than in terms of differences. (The ratio is sometimes called a “relative risk” or “odds ratio” or “hazard ratio”, but it comes by other names as well). Expressed in this way, the findings of our hypothetical stretching study would be reported as a 50% reduction in injury risk (as 2% is half of 4%). Usually the effect of expressing treatment effects as ratios is to make the effect of the therapy appear large. The better measure is the difference between the two groups. (In fact, the most useful measure may well be the inverse of the difference. This is sometimes called the “number needed to treat (NNT)” because it tell us, on average, how many patients we need to treat to prevent one adverse event – in the stretching example the NNT is 1/0.02 = 50, so one injury is prevented for every 50 subjects who stretch).
Many studies do not report the harmful effects of therapies (ie, the “side effects” or “complications” of therapy). That is unfortunate, because the absence of reports of harmful effects is often interpreted as indicating that the therapy does no harm, but clearly that need not be so. Glaziou and Irwig (BMJ 1995;311:1356-9) have argued that the effects of therapy are usually most pronounced when given to patients with the most severe conditions (for example, bronchial suction might be expected to produce a greater reduction in risk of respiratory arrest in a head-injured patient with copious sputum retention than in a head-injured patient with little sputum retention). In contrast, the risks of therapy (in this case, from raised intracranial pressure) tend to be relatively constant, regardless of the severity of the condition. Thus a therapy is more likely to do more good than harm when it is applied to patients with severe conditions, and therapists should be relatively reluctant to give a therapy which has potentially serious side effects when the patient has a less serious condition.
In practice, it is often difficult for clinical trials to detect harmful effects, because harmful effects tend to occur infrequently, and most studies will have insufficient sample sizes to detect harmful effects when they occur. Thus, even after good randomised controlled trials of a therapy have been performed there is an important role for large scale “monitoring” studies which follow large cohorts of treated patients to ascertain that harmful events do not occur excessively. Until such studies have been performed, therapists should be wary about applying potentially harmful therapies, particularly to patients who stand to gain relatively little from the therapy.
An extra level of sophistication in critical appraisal involves consideration of the degree of imprecision of estimates of effect size offered by clinical trials. Trials are performed on samples of subjects that are expected to be representative of certain populations. This means that the best a trial can provide is an (imperfectly precise) estimate of the size of the treatment effect. Clinical trials on large numbers of subjects provide better (more precise) estimates of the size of treatment effects than trials on small number of subjects. Ideally readers should consider the degree of imprecision of the estimate when deciding what a clinical trials means, because this will often affect the degree of certainty that can be attached to the conclusions drawn from a particular trial. The best way to do this is to calculate confidence intervals about the estimate of the treatment effect size, if these are not explicitly supplied in the trial report. A tutorial on how to calculate and interpret confidence intervals about common measures of effect size is given in:
- Herbert RD. How to estimate treatment effects from reports of clinical trials. I: Continuous outcomes. Aust J Physiother 2000;46:229-35
- Herbert RD. How to estimate treatment effects from reports of clinical trials. II: Dichotomous outcomes. Aust J Physiother 2000;46:309-13.
Readers who are confident (sorry) with confidence intervals may find it useful to download PEDro’s confidence interval calculator. The calculator is in the form of an Excel spreadsheet.
The last part of deciding the usefulness of a therapy involves deciding if the therapy is cost-effective. This is particularly important when health care is paid for, or subsidised, by the public purse. There will never be enough resources to fund all innovations in health care (probably not even all good innovations). Thus the cost of any therapy is that money spent on it cannot be spent on other forms of health care. Sensible allocation of finite funds involves spending money where the effect per dollar is greatest. Of course a therapy cannot be cost-effective if it is not effective. But effective therapies can be cost-ineffective. The methods used determine cost-effectiveness are outside this author’s expertise, and it is probably better if I defer to more authoritative sources. If you are interested, you might like to read:
- Drummond MF, Richardson WS, O’Brien BJ, Levine M, Heyland D. User’s guide to the medical literature: XIII. How to use an article on economic analysis of clinical practice: A. Are the results of the study valid? JAMA 1997;277:1552-7.
- O’Brien BJ, Heyland D, Richardson WS, Levine M, Drummond MF. User’s guide to the medical literature: XIII. How to use an article on economic analysis of clinical practice: B. What are the results and will they help me in caring for my patients? JAMA 1997;277:1802-6.
To summarise this section:
Statistical significance does not equate to clinical usefulness. To be clinically useful, a therapy must:
- affect outcomes that patients are interested in
- have big enough effects to be worthwhile
- do more good than harm
- be cost-effective.
If you want to read further on assessing effect size, you could consult:
Guyatt GH, Sackett DL, Cook DJ, et al. User’s guide to the medical literature: II. How to use an article about therapy or prevention: B. What were the results and will they help me in caring for my patients? JAMA 1994;271:59-63.