|
|
The Persuasive Power of Percentages Tim Johnson You see it in the paper, you hear it on TV: treatment X reduces heart attack risk by 30%; therapy Y increases the risk of breast cancer by 25%. If you’re like most people, you translate these figures into something you know: a substantial 30% discount on a new flat-screen TV (great!), a 25% rise in your rent (ouch!). At first glance, it seems to be a matter of an easy percentage. If a pair of $100 jeans is marked down 50%, you’ll be taking them home for $50 off the regular price. So when a pharmaceutical company announces that its new miracle drug reduces the risk of stroke by 50%, that must mean that 50 people in 100 will benefit by taking it, right? Wrong. That’s because one of the most common ways researchers present study results is in terms of reduction in relative risk. This is the decrease in risk in the patient group receiving the treatment being tested relative to that in the control group, the group deliberately not given any treatment (or sometimes an older drug) to see whether the new drug is effective. Let’s say scientists are testing a novel medication designed to prevent heart attacks in men over 50 years of age — we’ll call it Panacea. They assemble a sample group of 200 subjects, who are then randomly and more or less evenly divided into a treatment group (given Panacea) and a control group (given a placebo, an inactive dummy pill, for comparison). Let’s say that by the trial’s end, 2% of the men in the placebo group had a heart attack versus just 1% in the Panacea group. The researchers calculate the reduction in relative risk by dividing the percentage of heart attacks in the treatment arm by that in the control arm (one divided by two equals 0.5, or 50%). They announce that Panacea cuts heart attack risk by half. But the absolute reduction in risk — that is, 2% minus 1% — amounted to just one man in 100. And 98 men received no benefit at all from Panacea. Understandably, drug companies like to emphasize relative risk, but that’s problematic because relative risk tends to overestimate the benefit to a patient, says Dr. Jonathan Lomas, CEO of the Canadian Health Services Research Foundation in Ottawa. “So something looks to have a major benefit when, in fact, it has a marginal benefit.” Relative risk can be a tricky reporting method for potential harmful effects, too. In 2002, part of a major U.S. study of hormone replacement therapy (HRT) in healthy post-menopausal women, the Women’s Health Initiative (WHI), was shut down three years before its scheduled eight-year duration. The WHI sought to study the effectiveness of HRT for reducing the risk of various diseases. Participants took daily doses of estrogen plus progestin, or a placebo, but the study was halted when researchers concluded that the risks of heart attack, stroke, blood clots and breast cancer outweighed potential benefits. At first glance, avoiding HRT would seem like an easy decision, even for women suffering severe menopausal symptoms. The WHI found that HRT increased the risk of both breast cancer and coronary heart disease by about one quarter (think a 25% increase in the number of kids in your six-year-old’s already crowded Grade 1 class). But the real WHI numbers weren’t nearly so scary. Assuming that a baseline number of women in the study’s age group would have heart attacks or develop breast cancer anyway, for every 10,000 women taking estrogen-progestin over one year, seven more would have heart attacks (37 women on estrogen-progestin versus 30 women on placebo). Eight more (38) would develop breast cancer. These are single-digit differences in a sample population of 10,000, so some women may reasonably conclude that HRT poses an acceptable risk when taken to relieve the symptoms of menopause — at least for a few years. In order to truly evaluate a drug’s effectiveness (or harmfulness), you need to know the true rate of events in both the treatment group and the control group, stresses Dr. Peter Austin, a senior scientist at Toronto’s Institute for Clinical Evaluative Studies. “And if the relative risk reduction with a treatment is 50%, you want to know if you’re going down from 50% to 25% of patients having an adverse event, or from 2% to 1%, or from half of 1% to a quarter of 1% — all of which, of course, have the same relative risk reduction.” If you know the risks and benefits within their larger context, you are better equipped to weigh them up —and decide whether taking a drug is worth exposing yourself to potential side effects or shelling out big time at the pharmacy. One useful reporting tool — known as the number needed to treat (NNT) — attempts to roll important information into one handy number. Simply put, NNT tells you the number of people who would need to be treated before you would see one person benefit. In the example of our fictional drug, Panacea, 100 men had to receive treatment before there was one less heart attack than with no treatment. So the NNT for Panacea is 100. “Two considerations drive the NNT,” says Dr. Andreas Laupacis, director of the Li Ka Shing Knowledge Institute at St. Michael’s Hospital in Toronto. “On the one hand, how effective is the therapy; on the other, how likely is it that the person will get into trouble without the therapy?” Laupacis, who with his colleagues was instrumental in introducing this measure almost 20 years ago, adds that NNT should ideally be accompanied by an indication of the length of treatment required before an adverse medical event is prevented. In the case of urgent treatments, such as coronary-artery bypass surgery for patients with life-threatening arterial blockages, the NNT is very low, as you would expect. But some NNTs are surprisingly high. Many of us, for example, assume that taking an antibiotic will always cure a child’s ear infection, but the NNT for antibiotics even to shorten a fever is more than 20, says Dr. Darshak Sanghavi, a pediatric cardiologist and an assistant professor of medicine at the University of Massachusetts Medical School in Worcester. Other time-honoured practices can be measured in terms of NNT, he notes. Studies report that breastfeeding — the optimal form of infant nutrition — lowers the risk of childhood diseases ranging from type 1 diabetes to leukemia. But the number needed to be “treated” by breastfeeding to prevent these conditions is actually extremely high, says Sanghavi — at least 10,000 for diabetes and one million or more for leukemia. And while the cardiovascular benefits of acetylsalicylic acid are practically gospel, the NNT for the prevention of a single stroke with ASA therapy over one year is 102, and well into the hundreds to ward off one heart attack. Laupacis cites another example: taking statin drugs to lower cholesterol and reduce heart attack risk. For people at the low end of the cardiac risk scale, the NNT is around 200, so a low-risk person may reasonably conclude that it’s not worth the cost, bother and side effects of taking these medications. But for a high-risk patient, the NNT might be closer to 10. “So I wouldn't be having a big value-laden discussion with that patient about whether or not to go on a statin,” Laupacis says. “I would probably say the evidence that a statin will reduce his risk of heart attack is pretty convincing.” The clarity of NNT to communicate complicated numbers has gained a lot of fans. “Number needed to treat should be a routine measure,” says Lomas. “It's easy for everybody to understand, and it’s a way that very neatly takes account of this relative-versus-absolute risk phenomenon.” Another important criterion in judging a study’s results is its power: did the trial have enough people in it to reach a solid conclusion? Did its numbers really have the statistical power to establish that its results were due to the treatment tested and not pure chance? And did it even have people in it? There’s a parable that pops upin panel discussions on evaluating medical statistics that goes something like this. News report: study finds a 33% cure rate with new cancer drug. Study analysis: one mouse was cured, one mouse died, and the other mouse ran away (or, in the parlance of medical trials, “was lost to follow-up”). Conscientious researchers have always been keen to drive home the role of chance. Thirteen years ago, the venerable British Medical Journal published the findings of a Scottish study from the Department of Clinical Neurosciences at Western General Hospital in Edinburgh. The study documented the positive effects of a new cerebrovascular treatment in an article entitled “The Miracle of DICE Therapy for Acute Stroke.” One of the interventions tested, which was designated the “white therapy,” was said to reduce the odds of death in stroke patients by a miraculous 93%. But the actual findings were less than miraculous. Part joke and part cautionary tale, the “treatment” consisted of the rolling of white, green and red dice by 24 “patients.” Every time a patient rolled a six, the researchers recorded a death, documenting any other number rolled as a survival. In the most successful “arm” of the trial — the white group — the dice were rolled a total of 20 times without once coming up on six in the so-called treatment group supposedly getting therapy. But in the control group not receiving therapy, the dice came up six a total of six times. Thus, six “deaths” were recorded in the imaginary untreated group versus none in the treatment arm. The article included an account of the reaction to the 24-patient study — ecstatic media reports and immediate implementation of white therapy on the part of enthusiastic doctors. “This fantasy is perhaps more common than we care to admit,” the researchers wrote. As it turned out, DICE was actually an acronym for Don’t Ignore Chance Effects, and neither doctors nor members of the public are as aware as they should be of their random impact. “If you study a small number of people, the results that you think are due to the treatment could just be due to chance,” Laupacis says. So how do scientists decide whether results are statistically significant? One tool is the p-value (p = probability). As the p-value shrinks, so does the probability that a study’s results are due to chance alone. “So a low p-value suggests that an intervention might be working,” says Dr. Brian Haynes, a professor of clinical epidemiology and medicine at McMaster University in Hamilton, Ont. If a p-value is below 0.05 (stated as P<0.05), that means that the probability that chance produced the effects is less than one in 20, or less than 5%. The p-value is the near-universal threshold used to judge the conclusions of trials and something every savvy reader should look for when assessing them. Another measure of a study’s scientific validity is the confidence interval (CI), the probable range of results researchers would find if they repeatedly performed an investigation. “If you had the resources, you would include all the people in the world in a study of a problem,” says Haynes. But since that’s not possible, researchers look at a sample of the population and then calculate the range of results they would find if they performed the study over and over again with different members of an affected group. Say a study found a 30% to 40% reduction in stroke risk with drug X. With the CI usually set at 95%, that means that if the study was conducted 100 times, 95 times it would yield results within the 30% to 40% range of the initial trial. So tonight, as you flip through the channels, you’ll encounter news, commercials and testimonials touting the curative effects of drugs based on clinical trials. Take these with a grain of salt and learn to read the recipes. Medical research is filled with confusing numbers, but if you understand the basics, you’ll be a more enlightened patient and better able to make sensible decisions with your doctor about treatment. And while having the tools to pierce the numbers is important, you still have to keep your head on straight. Put any study to the eyeball test. “If you can’t see an interesting picture in the actual numbers of events for the groups being compared, ignore the stats,” Haynes says. Never suspend your common sense. “If you suspend your judgment,” he adds, using the phrase dear to Mark Twain, “you get lost in the territory of ‘lies, damned lies and statistics.’” |
