Elementary Bayesian Biostatistics: Fundamentals for Investigators

Elementary Bayesian Biostatistics: Fundamentals for Investigators introduces the clinical investigator to the background, advantages, disadvantages and implications of the use of Bayes statistics in healthcare research. Through clear explanations of the principles of Bayesian analysis, this book provides a basic introduction to the Bayes approach in contemporaneous healthcare research issues. The book’s numerous explanations, figures and examples clarify for the investigator the perspective offered by the Bayesian philosophy and how to apply it to modern research issues. Upon completing this easy to follow book, its readers will understand the language, ideas, and application of the Bayesian approach to biostatistics in healthcare research.

Excerpt from Elementary Bayesian Biostatistics

Celia Hanson, M.D. momentarily hesitated as she stood quietly at the back of the darkened auditorium. Unconsciously postponing the upcoming battle, she allowed herself a leisurely look first to the left and then to the right at the long, winding rows of empty gray seats that curved off into the dark corners of the huge room. Then, involuntarily sighing, she looked forward, letting her eyes follow the long, sloping aisle as it descended away from her down and through the quiet, dim chamber to the stage hundreds of feet away. On it sat a long table where two men congregated, caught in the bright light that flooded down from the high ceiling to the stage. The huge auditorium, soon to be jammed with hundreds of eager investigators and research nurses, held only the three scientists.

                Beginning her long walk to the stage, passing row upon dark row of empty seats, she acknowledged that her two colleagues awaiting her on the dais appeared amiable enough. Leon Thomas, tall and thin, wearing his trademark white vest sat comfortably on the edge of the table, one leg resting of the floor, the other dangling free, slowly swinging back and forth. Leaning his head with its full mane of gray hair back, he laughed gently at a comment uttered by his counterpart, Jeremy Stone. Standing several feet from Leon, Jeremy stood, feet spread apart, arms loosely crossed over his tie less shirt and blue blazer, shaking his head in amusement.
This shared moment of humor completely masked the well known fact that the two statisticians were fierce, philosophical opponents.
          Well, I couldn’t tell that now, Celia thought to herself as she emerged from the shadows of the cavernous auditorium, climbing the steps to join her collaborators on the well lit stage.
           “Sorry I’m late, she explained, “but it seems like you’re getting along just fine without me.
            “Of course we are!” Jeremy answered with a smile as he ambled over to shake Celia’s hand. “We always do when we’re not discussing anything important, and we can always use a neurologist!”
            “Nice to see you, as always, Celia,” Leon called over from the other side of the table, “although I’m surprised that you wanted to get together just two hours before the steering committee meeting. I’d have thought you’d have a million things to do before this first major planning session,” Leon finished, his smile having retreated some from his eyes.
“To the point, as always Leon,” Celia responded, nodding to him. “I wanted the three of us to meet because I received these two statistical analysis plans from each of you, and…”
                “Is there anything wrong with them?” Jeremy interjected immediately.
                “That’s the problem.” she replied, pulling them out of her valise, and tossing them onto the smooth table where they slide for a few feet before stopping. “I just don’t know.”

●

“Each of you was asked to submit a report about the design of this clinical trial,” Celia continued, moving to the table, while motioning to Leon and Jeremy to join her.
“You know the importance of this study. It’s the first major trial since the use of aspirin to study the effect of a new medication that we believe will actually prevent strokes.”
“Yes,” Jeremy agreed. “There are 600,000 strokes in the US each year alone. Dramatically reducing that burden will have tremendous public health implications. But the intervention has important adverse effects, and the final risk-benefit assessment will be critical. Whatever the results,” Jeremy quietly finished, “your study’s impact will be profound.”
“Well, I think my group did a descent job with the stat analysis component,” Leon offered, leaning forward over the table. “We expect that the annual stroke event rate of patients in the control group of the study will be about 10%, producing a two year cumulative event rate of 19 percent. Based on Celia’s estimation, “Leon continued, nodding an acknowledgement to the neurologist,” the active therapy will produce a 20% reduction in the event rate. This means that patients who are randomized to the active group will have 80% of the strokes that will occur in the control group.
“That’s right,” Celia added. “You computed the sample size for this study, assuming a two-side type I error rate of 0.05 and 90% power, would be 4118 people. So 2059 recruited to each of the study’s two treatment arms and then followed for two years is what you have argued we need to answer the scientific question.”
“That’s exactly right,” Leland replied with a smile. “The analysis of the trial will be uncomplicated, following well-accepted procedures. I don’t know what the problem is Celia. It’s pretty straightforward,”
“Yes, Leon,” Celia responded, “It’s a well considered plan. Yet, Jeremy,” she continued, nodding to the other statistician, “has another perspective.”
“Yes,” Jeremy began. “I understand what Leon has generated. As always, it’s well written, but in my view, it contains some cumbersome reasoning. Plus, it’s more expensive than necessary. We can do just as good a study with fewer patients.”
                “Uh oh,” Leon replied, wrinkling his nose at the new aroma he pretended to detect. Isn’t that a Bayes scent I smell? ”
                “Sure is. It’s sweet, isn’t it,” Jeremy asked. They all laughed, breaking the new tension that has just joined them on the stage.
                Leon physically twisted, turning from Celia to Jeremy, giving full attention to his intellectual adversary. “Jeremy, these Bayes arguments, however fashionable, would just be a distraction from the results of a major clinical experiment. We want the point of the study to be the results about stroke, not some new-fangled statistical analysis.”
                “Bayes procedures are innovative, Leon,” Jeremy responded, “but they certainly aren’t new. “In fact, they’ve been around and available for far longer than the frequentist approach.” Standing, Jeremy began to walk some as he talked. “It’s a clear, simple approach. Celia, as a scientist and practicing physician, you understand its basic appeal. We start with a scientific idea that you believe. In this case, it would be that this new method of stroke prevention reduces the stroke rate. You collect relevant data, and then see how the data influences your early belief. Your initial evaluation of the hypothesis’ truth either increases or decreases based on the data. It’s quite natural.”
                “Many good workers are skeptical of that approach, as you know Jeremy,” Leon interrupted
“That’s a real shame, Leon,” Jeremy replied, stopping behind his friend, and putting his hands gently on Leon’s shoulders “and news to me, because what I’ve done so far is simply describe the scientific method. Are you saying many scientific workers have taken a stand against this fundamental principle?”
Waving his hand and smiling in mock surrender, Leon responded, “Go on, go on, Jeremy.” I don’t think you’re finished.”
“Sure,” Dr. Stone replied, “Bayes procedures parallel the scientific method nicely. As scientists, in fact, as humans, we formulate ideas. We believe concepts. We Bayesians demand that we place prior probabilities on those beliefs. Those are called ‘the prior’. The data that we subsequently collect updates these prior probabilities.”
                “I follow that Jeremy,” Celia asked, “but the scientific method has been around for four or five hundred years. If Bayes procedures follow it so closely, then where have they been all of this time?”
“Patiently waiting for the right tools, Celia,” Jeremy said, taking a seat across from the neurologist. “The Bayesian concept started in the eighteenth century. In fact, it was commonly used for over 130 years. However, the computations required to solve realistic problems can be complicated, and it’s taken many more years to develop the computing devices we need. The underlying mathematics can be difficult. Sixty years ago, they were beyond any one’s ability to solve them. Modern day computing has changed that.”
“Well, how complicated are they?”
“Just complicated enough to need a computer, but not so complex that we can’t solve them. Celia, I freely admit that Bayesian computations can be complicated, commonly far more complex than frequentist evaluations,” Jeremy continued, holding his open hands out in front of him. “But complicated computations, in and of themselves, are no reason to avoid a good approach. You know what logistic regression is right?
                “Yes,” Celia responded.
                “And you know how to interpret the regression coefficients, odds ratios, confidence intervals, p-values, etc, that come from it, right?” Jeremy continued.
                “Sure. We commonly write papers that discuss these results.”
                “Great. How do you compute the information matrix?”
“What?” Celia asked, taken aback by the question.

“I’m asking if you can carry out the necessary linear algebra to derive and compute Fisher’s information matrix. After all, it’s an important background calculation that must be completed to get the odds ratios and other quantities you used. Can you do that?” Jeremy asked conversationally.
                Nonplused, Celia responded, “I can’t do that.” Then, more defensively, “Why should I?
“That’s exactly my point, Celia. It’s a background computation that’s both necessary and complicated, but it’s not one you yourself do. You merely interpret the end result. That’s exactly how the Bayes background calculations work. You can’t condemn Bayes work because it complicated. We do a lot of complicated things in science. This is just one more”
“I get it,” the neurologist responded.
“Jeremy,” Leon interrupted. “You are completely missing the point. I’m not rejecting Bayes procedures merely because they are complicated. I reject them here because they’re unfamiliar to the clinical community. They’re alien. Foreign. No one knows how to either react to them, or to interpret them. Frankly, I’m sorry to say, the medical community just don’t trust them.”
                “I don’t know about the medical community, but you sure don’t,” Jeremy responded. “You don’t trust them because you don’t know them. The way to gain trust is to start with simple problems, gain some insight from that initial work, and then use that insight to help solve more complicated problems.”
“You mean,” Leon asked, “we should get more experience. ‘Learn as we go’?”
“That sums it up nicely, Leon,”
“Well, Jeremy, you may be right, but this is neither the time nor the experiment for that! The sponsor is spending millions of dollars on this research product I stroke. We just can’t hand that huge investment of money, time, and resources over to an untested procedure hoping to learn a little something about it. There’s too much at stake for that.”
“It’s not untested Jeremy, and in any event, if we follow your proposal, we are handing the same resources to a poorly formulated concept.”
                “Poorly formulated?” Leon repeated, arching his eyebrows. “How so?”
“Bear with me.” Turning to Celia, Jeremy asked, “Before you could get funding for this trial we’re here to discuss today, there was a phase II clinical trial that was carried out first, right?”
“Sure. We executed a much smaller study in the same types of patients we plan to evaluate in this larger study,” she responded, thinking to herself, Finally, here’s a question that I know how to answer!
               “That’s called a Phase II study, right, Celia?” Jeremy asked.
“Right,” the neurologist responded instantly.
“What was the effect size?”
“It showed an 18% reduction in strokes?”
“Was it positive?”
“We got a good result ─ ‘p’ of point one six.”
               “Ah…reporting study results as p-values. I guess we can thank Leon and his frequentist friends for that. OK. Oh point one six. That’s greater than 0.05, right?”
                “Yesssss…,” Celia replied, letting her voice trail off.
                “So the study was negative, right?
                “Not really,” the neurologist instantly protested, “it was ─”
                “But the p-value was greater than 0.05, and it had to be less than 0.05 to be positive.
                “Yes, but it wasn’t powered…”
                “Well, how could you design a huge, expensive phase III study on the foundation of a negative phase II result?”
                “Because we didn’t expect the p-value to be less than 0.05, Jeremy,” Celia responded a little testily despite herself. Taking a deep breath, she relaxed some.
                “Well, why not say in the beginning of that study that you expect the effect size to be about 15-20% and the p-value between 0.10 and 0.30.”
                Celia was quiet.
                “I’m pushing you to make a point,” Jeremy gently smiled. “I’m just saying that it seems silly to have an unrealistic expectation from the study. Why design a study to be positive at the 0.05 level when you know that result is close to impossible, only to backtrack in the end, making a more appropriate argument when you see the data. Why not design the study more reasonably to begin with?
                “The answer is that you are stuck with the frequentist traditional interpretation of p-values,” the Bayesian continued. “The medical community has accepted the false message that ‘p-values had better be less than 0.05 before the study is considered positive.’ As you pointed out, the p-value paradigm didn’t make any sense for you in your phase II study. But you well compelled to use it anyway. One of my colleagues says, next to nuclear weapons, p-values are the worst invention of the 20^th century.”
“Fortunately,” Jeremy concluded, “you chose to interpret your phase II effort reasonably. Bayesians want more reasonable designs as well. We don’t have p-values.”
                “Well,” Celia asked, “what do you have in their place?’
                “We have posterior probabilities. These probabilities are the result of beginning with prior probabilities, and then updating them with relevant data.”
                “A process that is no better than the prior probabilities on which they are based, Celia,” Leon inserted. “Don’t be fooled. When Jeremy describes this prior probability, what he’s really saying is ‘guesswork’. Can I get a word or two in here, Jeremy?
                “Sure.”
“So,” Leon continued, “this posterior distribution is based on prior information, right? Tell me, Jeremy, what prior information would you use for this trial that will study strokes?”
“We have some useful information for the control group event rate. We also have it for the treatment effect.”
“That’s fair enough. And, what probability do you place on it?”
“That depends on how strong the neurologists’ belief in this information is?”
“Belief?” Leon responded, with mock incredulity. “Don’t you know?”
“No,” Jeremy replied. “We have to choose a probability based on what the literature says, how the community perceives the effect, how the investigators react to different effect sizes ─”
“I get it – you guess!” Leon responded. “And, if you guess wrong, since the posterior distribution is based on the prior distribution, the posterior distribution will also be wrong. That’s true too, isn’t it?
                “Yes, Leon, that’s a possibility, but you frequentists run the same risk.”
                “Not by guessing about a prior or posterior distribution!”
“I disagree,” Jeremy replied. “Say that you’re designing a clinical trial. How do you choose the effect size? From prior information. The same as we Bayesians do. And what happens if you get it wrong? Suppose that you design a clinical trial to detect a sixty percent reduction in the control group event rate. Now you don’t really know that the effect size will be sixty percent. You just ‘arrive’ at it, using it to compute the required sample size.
“Now, Jeremy continued, “suppose when you carry out the study, you don’t find a sixty percent reduction, but instead, discover a much more modest effect size ─ say, thirty percent. Because you ‘guessed wrong’ about the effect size, the observed thirty percent reduction is not large enough to fall into the critical region, but it’s nevertheless clinically significant. So, while the clinical investigators believe the discovered effect size is clinically relevant, the thirty percent reduction is not statistically significant. Your wrong ‘guess’ about the effect size has ruined the chance of the study to show that a clinically significant effect is actually statistically significant. Frequentists can make good or bad initial estimates, just like Bayesians.”
“First of all, Jeremy,” Leon responded, “the effect size chosen would have been based on the minimal clinical difference, not guesswork, second─.”
“It’s still an estimate that you make in the absence of firm knowledge.”
“Well,” Leon asked, “if we frequentists are not so different then you Bayesians, then why do you reject the entire frequentist argument out of hand?”
                “Because frequentists like they actually know the values of control group event rates and effect sizes when they don’t. They therefore confuse investigators, regulators, and certainly the medical community.”
                “Well, what prior probability would you put on these effect size ‘guesses’ of yours, Jeremy?” Leon inquired.
                “We would choose a reasonable value?”
“And that value is…?”
                “In this problem, since the neurologists are quite uncertain about the effect size, we would use a non-preferential prior.”
“Whoa!” Celia exclaimed. “What’s that?” Leon was grinning unabashedly.
“It’s a distribution that does not put more weight on one prior estimate that another. All prior estimates of the effect size are treated the same. So, there’s no preference.”
                “Ah yes,” Leon replied with a huge smile. “This is the part I love! Bayesians argue about the important contribution of choosing a prior distribution. They tell us how “flexible” it makes their approach to the problem. How “adaptable” the procedures are. How “honest” the process is! But, when their time comes, and they are asked to commit, Bayesians, throw all of that flexibility and adaptability right out the window, by choosing a non-preferential prior. Kind of like ‘prior non-information’. It’s really laughable.”
                “Unlike frequentists, a Bayesian doesn’t pretend to have information when no information is available,” Jeremy responded easily. “All that I’m saying is that some problems have more prior information than others. When we have it, we utilize it with an appropriately weighted prior. When we don’t, then we don’t and we use a non-preferential prior. Turning to Leon, Jeremy continued, “The point is, one doesn’t have to retreat to the frequentist approach just because of the absence of prior information. The Bayesian procedures are useful when prior information is either plentiful or absent.”
                “OK,” Leon replied after a moment. “Tell me how you compute the weights. That is, when you have this prior information.”
                “The weights are based on the beliefs of the neurologists.”
                “What you really mean is that, you assign probabilities right.”
                “Yes, we listen to the clinicians, study the literature, speak to the experts in the field, and then attach a probability.
                “So, you make it up.”
                “It’s based on the relative strengths of the beliefs of the scientists,” Jeremy explained.
                “Like I said ─ you make it up.,” Leon persisted. “That’s a critical difference between you and me. You’re content to make the probability up. I need to measure it. To me, probability is corporeal, while to the Bayesian, it’s ethereal.”
“I’m afraid that I’m not following you, Leon,” Celia asked.
“Probability to me is concrete, Celia. Here’s an example. You give a coin to a Bayesian and ask, ‘Is the coin fair?’” He replies, ‘What do you think?’ and then makes up a probability based on your belief. He then flips the coin 100 times and combines the result with your assessment. What do I do? I skip the preliminary question, take the coin from you and flip it. The only information I need is the result of the tosses. Jeremy needs your subjective assessment, an evaluation that I think should remain between you and your conscience. Give me the coin, and keep your beliefs to yourself!”
“And, Leon, if every toss cost you a $1000, you would be foolish not to use prior information to help reduce the number of tosses,” Jeremy interjected.
“Probability makes its greatest contribution when anchored to reality, Celia,” Leon responded, directing his attention to the physician again.
“Yes,” Jeremy replied, “but the perception of reality is subjective.”
               “Oh no you don’t, Jeremy,” Leon answered. This is not a philosophy class.” Turning again to Celia, Leon asserted, “Here’s the difficulty. You are, right this minute, in the midst of designing a large clinical trial that will study thousands of patients. You will direct the expenditure of hundreds of thousands of man hours of effort, and millions of dollars, to do….what? Evaluate the effectiveness of a new stroke therapy, or to put a new statistical procedure to the test?”
                “And so, with that as an argument,” Jeremy interrupted, “clinical trial methodology would never have developed. Sixty years ago it was both innovative and threatening, with little precedent to guide the investigators. However, because it was well considered and used cautiously, clinical trials have risen to the pinnacle in clinical research methodology. That was certainly a chance worth taking, wasn’t it Leon?”
                “Yes it was,” the frequentist conceded. “However, it developed slowly over time, allowing investigators, and regulators to become familiar with it.”
                “A process which precisely describes the evolution of Bayes procedures,” Jeremy interjected. “They have been studied for decades. With each passing year, workers become more comfortable with them, computing tools advance, and the correct use of prior information improves.”
                “But your reliance of prior information is a critical weakness in clinical trials,” Leon replied. “The frequentist approach, the perspective that accepts probability only as relative frequency, is comprehensible, well established, and works. Even Bradford Hill, who was a premier epidemiologist and very skeptical of the frequentist approach to statistics did not argue for the Bayes approach. This eloquent scientist who convince skeptical doctors to accept the concept of randomization and the use of blinding in clinical studies, knew better than to get them involved in Bayes procedures.
                “And, it’s no surprise why,” Leon continued, focusing again on Celia. “Bayesian procedures are too subjective. They deliberately mix subjective, sometimes even capricious, shifting impressions with objective data. Constructing a clinical trial using Bayes procedures is like building a mansion on sand. It won’t take long before the entire edifice, no matter how elegant, crumbles. Clinical trials must be evidence-based, Celia. That’s what the frequentist requires. We take one fact-based step at a time. We don’t even let you start with what you believe. You must start with what you don’t believe.
                “We insist, “Leon continued, “that you begin with what is established, not what you want, or hope, or dream will be true. You start with what is accepted, because that is the closest community approximation to the truth. That’s the null hypothesis. In order to take the next step forward, you must build a fact-based case. We require that the scientist affirmatively and clearly disprove the current community standard. When you have rejected this standard, you have set a new standard. The next scientist cannot just assume what he or she believe is true, they must prove that the new standard that you helped to create is wrong.”
                “Setting my other concerns aside for the moment,” Jeremy interjected, “what you have described is a wasteful process if there is good prior information that you ignore.”
“Well, Jeremy, if there’s that much prior information that informs the process so completely, then why bother with the experiment at all?” Leon paused for a moment, and then continued. “Of course we all know the sad answer to that question don’t we? We execute the experiment because there really isn’t good prior information. There is only prior belief. Bayesians embrace these soft prior beliefs while frequentists reject them. Frequentists in biostatistics understand the many blind alleys false prior beliefs lead to.
                “My goodness, Celia,” Leon said, leaning across the table to engage the neurologist, “Look at how many times prior beliefs has been wrong in the past. Did you know that, for fifteen hundred years, physicians believed that the presence of pus in a wound was a good sign? They carried out procedures that would increase the likelihood that pus would be generated. Why? Because their belief was based on an erroneous statement in an early Roman medical text.
“People weren’t doing any real evaluations of what they did then,” Celia commented.
“And what would Bayesians have done in a clinical trial designed to test whether producing pus was good?” Leon speculated, leaning back in his chair. “They would have searched the literature to gauge current opinion, eh Jeremy? They would have spoken to experts and learned that the overwhelming consensus was that pus was beneficial. These strong but false beliefs about the benefit of pus would have been the foundation of their prior ‘information’ that would have been built into the research effort, making it even tougher to demonstrate the worthlessness of pus-producing therapies.
                “And Leon, your null hypothesis would have been that pus was good as well, right?” Jeremy inquired. “After all, that idea was the state of the science at the time, and the null hypothesis assumes the state of the science is true.”
                “That’s exactly my point Jeremy,” Leon responded instantly. “It most definitely would have been our null hypothesis. But the final mathematical formula that estimates the effect size of the therapy would not have included that belief. The effect size would be based on the data. The data would speak for itself, and not be muzzled by the misleading prior information.”
                “Thanks for the history lesson,” Celia acknowledged, checking her watch, “but we need to get back to the issue at hand.”
“My point is simply this,” Leon asserted. “The time-tested frequentist approach makes you uncomfortable as a physician precisely because it does an admirable job of protecting you from your weakness, Celia”
“What weakness is that?”
“Your need to believe that the therapy you are studying is going to work!”
“I didn’t think that was a weakness,” Celia replied, taken aback.
               “Celia,” Leon explained, carefully softening his voice, “you are a devoted physician and specialist, but you are also human. You care about the patients that you see, with a compassion that is uniquely yours and amplified by the oath you took. As a neurologist, you know what strokes can do to people ─ you’ve seen the ravaging effects of this disease regularly. The early deaths. It’s crippling effects in survivors. Strong people reduced to helplessness. Dynamic personalities enfeebled. You see that, Celia, day after day, and you want to do something about it. You just don’t look for a cure, you ache for one! I’m not criticizing you for this. We non-physicians rely on your drive to make anything and everything to improve our lives. I don’t fault you for it. In fact, I honor you.
“So, Leon continued, “when you as an investigator learn of a potential new treatment that promises safe effective prevention of strokes, you can’t help but give it the benefit of the doubt because you’ve seen so much pain and suffering produced by the current, inadequate standard of care. It’s only human to believe the new therapy can make a difference. But, unfortunately, this human feeling can be misleading.”
“It’s the nature of practicing medicine,” Celia agreed.
“Now, look what the research process does to your belief, Celia. As a clinical scientist, you want to work with this potentially new therapy, study its potential, and hopefully develop it into useful tool in the war against strokes. But first you must build support for this new therapy, convincing others that your therapy is useful. So you speak with other investigators about this therapy, email them, meet with them, debate and argue with them, eventually persuading several that what is now known as “your approach” is a promising one. You defend your point of view with conviction. You must persuade other believers with different beliefs. How can you possibly get them to align with you without first being a believer yourself? You must believe in the therapy you advocate.
                “Now, you and your team search for funding. You go to private companies, philanthropic organizations, granting agencies. Each of these funding sources is overwhelmed with ideas and protocols such as yours. They want their resources to have the greatest effect, to do the most good. So who are they going to fund, investigators who fervently believe in an idea, or scientists who blandly state ‘we don’t really know what the result will be, so we want to simply make an objective assessment’?”
                “Look at your situation now, Celia? In about twenty minutes you’re going to present this protocol to a room full of investigators and nurses who want to prevent strokes. They are connected to each other and to you by the conviction that this medication has a good chance of preventing strokes. You are believers in this therapy. My point is that in the beginning you may have been objective, but the nature of the process converts you to a true believer by this time in the protocol.”
                “What’s the point, Leon?” Jeremy asked.
                “Not a point, but a problem, Leon concluded. “Fervent, well intentioned beliefs are commonly wrong in medicine. And if the believers belief vehemently, then when they’re wrong, they’re vehemently wrong. The role of the frequentist statistician is to protect you from your strong convictions about the effects of an untested therapy. You can’t know whether your belief is right or wrong, so why rely on it in a study designed to objectively test it? That’s why we reject the notion of subjective probability.”
“I can’t speak about other fields,” Leon added, “but in medicine, subjective probability is just allowing non-evidence based opinions, which can no longer get in the front door, to sneak in the back one. And you don’t have to look far to find clinical trial examples, that have overturned commonly held “beliefs” The CAST investigators didn’t see how the use of new antiarrhythmic therapy could make people with dangerous heart rhythms worse. But that’s exactly what the medicines did. All of the prior information suggested these medicines would save lives. The medicines killed people.
                “Many surgeons believed osteoarthritic knees would benefit from a procedure that cleaned out the joint space of its loose, dysfunctional cartilage remnants. The prior information said this would be the most beneficial. It wasn’t.”
                Well, we have to have beliefs in the practice of medicine, Leon”
                “Why”
                “Because without them, we would be indecisive. How could we advocate therapy to patients if we didn’t?”
                “And that’s the trap,” Leon replied. “Your beliefs and ’priors’ get you clinicians into trouble because you can’t separate out the good ones from the bad ones. You don’t have prior information, you have prior suspicions.
                “Bayesians say that you should accept these prior suspicions, allowing them to alter the result. Frequentists say ‘Prove it!’ To frequentists there is no middle ground. You either know something or you don’t!”
                “’No pale pastels, eh, Leon,” Jeremy replied, quoting the ex-president, getting a chuckle out of all of them. ”I understand your enthusiasm for frequentist-based clinical trials, but they produce misleading results. In fact, they do it so often that some divisions of the FDA require two, well-designed and well-conducted clinical trials to show that an intervention is effective. Not one, but two.”
                “That’s right, Jeremy,” Leon responded instantly. So why make the situation worse, by weakening the clinical trial corpus with a Bayes injection”
“Because, we don’t have the luxury of your presumptions, Leon. There is always middle ground. The time for your simplistic approach to prior information is over. It served a useful role for a short time, but now, its over. It’s time for it to leave the stage.”
“Speaking of leaving the stage,” Leon said to Celia, “how much more time do we have.”
“The meeting starts in a half hour,” Celia replied, checking her watch. “We have some time left.”
“Clinical investigation, “Jeremy began, “was a mess in the 1930’s and 40’s. Articles contained 10% data, 90% belief. The new introduction of frequentist-based statistical analyses changed that, by providing a clear, solid structure to the design and analysis of clinical research. It was the right thing at the right time. But now,” Jeremy continued, “its time is over. Leon, frequentists have done a good service for the medical community for several decades. A very short time in the history of science, but it was an important contribution nevertheless. Things were out of balance, and your presence helped to balance them. Frequentists reacted understandably to the bad use of prior information sixty years ago. That was a good lesson to learn.”
“Well,” Leon responded. “At least we can agree on something today. I thought you were going to say that we learned the wrong lesson from this experience.”
“No,” Jeremy responded. “You learned the right lesson too well. You say the medical community hasn’t grown up, Leon. You’d better take another look around. It’s a whole new set of a numbers! This community whose insight you disparaged has pioneered some remarkable research.”
“With clear catastrophes.”
“Catastrophes that we have learned from. Just like catastrophes in economics, in space travel, in meteorological prediction. Yet despite the setbacks ─setbacks that, by the way, that occurred on the frequentists’ watch ─ we have advanced. Many sophisticated clinical investigations that are underway now were beyond the wildest dreams of the established clinical scientists sixty years ago. We now carry out multi-armed clinical trials. Merged phase II / phase III clinical trials. Trials with complex statistical monitoring rules. Trials within trials, that include prospectively declared, adequately powered subgroup analyses.
“Yet despite these advances in methodology and philosophy, Leon, you would deny clinical investigators the right to take the next step they have earned. You’re like an overprotective parent, who won’t let their son move from the children’s clothing section up to the young men’s section, because you’re comfortable with where they are and don’t want to run the risk of change. Well Leon, like it or not change is here. You can play an important role in this evolution of thought, smoothing the transition, but, whether you’re active or not, the transformation is underway. The time is now.”
“Using untested prior information to fuel future studies is a destructive force, Jeremy”
“Just because bad automobile drivers cause accidents doesn’t stop you from driving Leon. Just where would science be if it followed your philosophy of ‘once bad ─ always bad’. Should innovative surgeons choose not to inform the community of new procedures just because they fear the presence of a few inferior surgeons will harm patients with this new tool? Should Celia cease her investigation because she knows some doctor at some point will misuse it? Leon, if it would have been left to you, the modern world would have given up on the idea of flight because they knew at some point that an airplane would eventually crash. You argue that we shouldn’t grow because growth is dangerous. But, we have to grow anyway. The use of prior information can be watched, monitored, and controlled. We need flexibility in its use. Bayesians provide that flexibility.”
“So do frequentists,” Leon responded.
“Really? You mean, like the ‘flexibility’ that says a p-value must be less than 0.05 in all research endeavors? Here we are, eighty years later, still trying to dig out from the results of Fisher’s manure experiment.”
“I don’t understand,” Celia interjected, “Manure experiment?”
“Yes, the father of frequentist statistics carried out an agricultural experiment evaluating the effects of manure. The 0.05 level came from that.”
Celia shook her head in amazement.
“Leon and his frequentist friends foisted Fisher’s 0.05 proclamation on us all,” Jeremy continued. “They watched it happen. Heavens! They helped it happen, by writing manuscripts that showed investigators how to carry out procedures that would lower the p-value, rather than how to carry out better investigation.”
                “Listen, Jeremy…” Leon responded.
                Dr. Stone held his hand up. “I don’t want to fight. I’m just acknowledging as I’m sure that you would, that the p-value selection of 0.05 was never meant to be rooted in stone. We are stuck with oh-five not because of frequentist flexibility but because of your hyper-rigidity. Many frequentists contend that this criterion should be thoughtfully reconsidered. And you are right to do so. But it requires good, clear, prospectively thinking, and that’s what Bayesians argue for, Celia,” Jeremy continued. “Sadly, disciplined thought will be rejected by bad researchers. They will always be with us. But nevertheless the field of clinical investigation must advance. With self-control and with clear thought, but it must make progress. The technology is here, and the research community is ready. So, what are you waiting for?”
                “Celia,” Leon responded, “the clinical community is not ready for these hypermodern arguments. We’re not just talking about mathematics here; we’re talking about the fate of patients at risk of having strokes. Keep in mind that Bayes procedures have been around for two hundred years. They never got off the ground. They are untested assessments ─ runaway evaluations. Even though your clinical therapy payload may be valuable, you will have used the wrong rocket, and it will never get off the ground.
“If you use them in your trial, Celia” Leon warned, “they will attract criticism like magnets attract iron bars. The issue won’t be the effects of your treatment ─ it will be the mathematics that you used to measure the therapy’s effects. The only way your treatment will be accepted as beneficial is if its benefit is confirmed by a time tested, accepted, standard, frequentist procedure, a confirmation procedure that will add weeks to the study and thousands of dollars to the overall bill. And, heaven help you if the Bayesian result is in one direction, and the frequentist evaluation in the other! Your study will die a painful suffocating death in the quicksand of controversy.
                “Well Celia?” Leon inquired, “Do you want to be an established neurologist, or a renegade?
“What he’s really asking,” Jeremy interposed, “is whether you want to use old fashion tools, that don’t give you the flexibility you need. Sure, we use prior information. When it’s solid, we incorporate it. We’d be silly not to use reliable information when it’s readily available. However, it’s the Bayesian who makes decisions based on the observed data. I bet you never heard of the likelihood principle.”
“That’s something else new to me this morning,” Celia answered.
“The likelihood principle states that only relevant data useful to answering your scientific question is the data that’s been collected. Other observations that have not been observed are irrelevant. Does that make sense to you?
                “Yes,” Celia responded. “Seems like it’s hardly worth bothering about. Why draw conclusions from data you don’t have?”
                “Actually, its kind of self evident, isn’t it,” Jeremy agreed. Why make a decision based on data you didn’t observe? You don’t buy a car, based on cars that you didn’t examine. You don’t choose a job based on jobs you don’t apply for, or never heard of, so why make decisions based on data that we didn’t observe?”
                “Seems silly to do that,” Celia replied.
                “Well, don’t just tell me,” Jeremy answered, “tell my friend, Leon. That’s what he does. Frequentists violate this principle all the time.”
                “How so?” the neurologist answered.
“Taking his glasses off to clean them, Jeremy began. “Significance testing, is based on the rejection or acceptance of the null hypo─”
“You mean rejection or non-rejection,” Leon interrupted.
               Jeremy smiled at Celia while pointing to Leon, “Do you really want to turn your clinical experiment over to this kind of double talk doctor? According to the frequentist, you don’t accept the null – you just can’t reject it. That kind of syntax may make sense to a frequentist, but it hasn’t, doesn’t, and won’t make sense to the practicing community. Research is difficult enough without having to jump through the semantic hoops the frequentists have created. Working with the frequentist perspective is like walking through a hall of mirrors. One positive result is a type I error, but another is not ─ that one‘s real. Another result appears to be negative, but they say ‘No, it’s not really negative ─ it’s simply a type II error.’ They have created a confusing maze based on reverse thinking. Its time the research community just left the amusement park.”
                “You mean, leave the frying pan, and leap into the fire, don’t you” Leon added.
                “No, but I need to get back to the likelihood principle. To the frequentist, the same data, using different underlying distributions, can produce different results. That’s a violation of the likelihood principle. Here’s an example. Let’s say you measure blood pressure with a digital meter. Make sense to use a normal distribution for these pressures?
“I guess so,” Celia responded.
“That’s not a trick question, Celia. Tricks live in Leon’s bag, not mine. Using an electronic meter, you compute the DBP’s of all of your patients. Each patient has a diastolic blood pressure less than 100, and you easily calculate the mean of these pressures. Later, you learn that the meter you used to measure DBP wouldn’t work if the diastolic blood pressure was greater than 100. Does that fact change how you compute the mean?
“No,” Celia answered.
“Why not,” Jeremy asked.
“Well, because no blood pressures occurred in the range where the meter was broken.”
“It would change Leon’s calculation. He’d say you couldn’t use a normal distribution; you’d have to truncate it.”
“Even though the DBP’s were never in the range where the meter was broken?” Celia asked, turning to Leon with a look of surprise.
                Nodding, Leon responded, “Jeremy’s right. That’s exactly what I’d do.”
“Let’s make sure that we all see what just happened here,” Jeremy responded. “The distribution that Leon would use to describe your data was selected not by the data, but by the process that produced the data. Data that you didn’t observe affected the distribution Leon was going to use. This, Celia, is a clear violation of the likelihood principle. Do you want to commit your multimillion dollar trial to that kind of philosophy?”
                “The methodology, not the data, should determine the analysis plan,” Leon responded, shaking his head. “We have rules that we go by. Clinical investigation would be lost without them. We rely on them, and so do you, Celia. What you just heard from Jeremy is exactly why you shouldn’t use the procedures he advocates. He has, in one fell swoop, just abandoned the two-tailed test.
“What?” Celia asked. “How’d he do that?”
            “Two-tailed testing ensures that there is adequate protection for the finding that a therapeutic intervention may produce harm,” Leon explained. “At the beginning of the study, the investigator creates their critical region so that the null hypothesis is rejected for hazard as well as for benefit. Commonly, this means dividing the type I error in two, placing half in the hazard side of the distribution and half on the benefit side. However, at the study’s conclusion, the test statistic can only fall on one side of the distribution – it can’t be in both. Doctors commonly look at this and ask ‘Why, when we now know what side of the distribution the test statistic is on, must be bother with the type I error allocated in the other tail?’ But statisticians have successfully argued that since they said prospectively that they would divide the type I error in two, you should follow through on your original plan. Bayesians argue that since the data do not identify harm, than regardless of what you determined up front, you can put all of your type I error on the benefit side.”
“Right!” Jeremy exclaimed. “Even when the investigation is over, and the test statistic falls clearly in the benefit side revealing that no harm has been done, with, you frequentist still argue that, since you were concerned about harm in the beginning, you must be concerned about harm in the end, even though there was not evidence of harm. You’re shackled by a well meaning, but false a priori belief, Leon. If that’s not being misled by a prior belief, than nothing is!
“But actually,” Jeremy added, smiling at his friend, “we don’t have type I error in the Bayesian paradigm, right? All we have is the likelihood that the scientific hypothesis is true. The conclusions are much easier to understand. The intervening computations can be more complex, I admit, but the conclusions make more sense.”
“We’d better sum this up now,” Celia responded, noting that investigators were beginning to file into the auditorium.
                “Sure,” Leon answered. “Here’s the difficulty. Doctors have reported on the treatments they have used for hundreds of years. Many of those assessments are worthless because they’re degraded by either poor observation or built-in biases. For centuries, even the idea of testing some of their own central tenets was anathema. That’s why Celia’s predecessors bled patients for so long. And, when they did try to collect data, they didn’t understand how their point of view affected the results of their work.
                “When clinical trials appeared in England in the 1940’s, clinicians were this methodology’s strongest opponents. Many doctors hated the idea of blinding, and rejected the concept of randomization. It’s not hard to understand why. Who among us would want their loved ones to have their treatment option selected by a doctor flipping a coin? I know that’s simplistic, but that’s the view they took.
                “One of the most important occurrences in the development of clinical trials,” Leon continued, “is the change wrought in clinical investigators. While the technology has advanced, the thinking of medical researchers has evolved. They have come to understand the importance of objective measures. When they use therapy based on belief, and not objective data, disaster strikes, e.g., the harmful effects of hormonal replacement therapy in post-menopausal women. Medicine, at long last, was finally making huge strides because it was actually, finally testing the beliefs on which therapy was founded. Your immediate predecessors, Celia, were no longer content to rely on history or tradition to educate you. They demanded data ─ data that was objectively obtained with clear, consistent rules, and objectively analyzed according to national, and more recently, international standards.
                “And now, after all of this effort and perseverance, along come the Bayesians. They say that you can incorporate prior information into your answer. You can rely on your untested intuition because it’s your intuition. Be honest, Celia, doesn’t that approach strike a sympathetic chord in you?”
                “Yes,” she agreed. “I have to admit that it does.”
                “It is that chord that you must disavow,” Leon responded, his voice suddenly flat and emotionless. “Obliterate it. Letting your belief influence your analysis is a giant step backward in medical science. Maybe we’ll be ready for it soon, but only when we have good, clear ways to measure and accurately assess prior information. That’s not now. Trying to use this novel tool today is like a 17^th century man trying to fly. He’s two hundred years too early, and he’s going to get hurt.”
                “But what about using prior information that’s available?” Celia inquired.           “That’s nothing but intellectual seduction,” Leon answered, dismissing the concept with a contemptuous wave of his hand. “Bayesians wrap this notion of prior distribution in elegant mathematics, but it’s the same dark assertion that has pulled at the shadowed corners of investigator’s minds for centuries namely, ‘Your belief is correct, self evident, and needn’t be completely tested.’ Now, in all fairness to Jeremy, that may not be the message that he and his fell Bayesians mean to transmit, but I assure you, that’s the answer that doctors receive.
“And, by the way, Celia, where were these Bayesians when frequentists were patiently instructing clinical investigators these past sixty years. Bayesians were nowhere to be found when frequentists were educating clinical investigators about research discipline. They were AWOL when frequentists argued repeatedly that clinical research methodology should be structured, that research questions should be stated objectively, that null findings be as easily interpretable as positive ones, and that doctors must be concerned about the possibility of harm even when they ‘believe’ that no harm will occur. Bayesians were curiously absent for the five decades that these arguments raged furiously.”
“And now, just when the investigator community finally understands the problem that stem from relying on their own untested assumptions, along come the Bayesians who have the temerity to suggest that ‘Well, after all, prior information is a good thing’. It’s just the same old argument, that is, providing mathematical justification for physicians to toss away their research discipline and once again embrace their old, first love – prior belief! That’s great in religion, Jeremy, but it poisonous to science. What you’re doing to clinical scientists, is akin to giving a ten-year old a handgun, and then saying, ‘Now, be careful’. They just can’t be careful enough because they’re not ready.”
                “But Leon, Celia responded, “it’s undeniable that these procedures are increasing in popularity.”
                “Have you ever been to the beach, Celia?” Leon asked.
                “Well sure, what does that ─”
                “Just because the tide is high, doesn’t mean it will always be so. It always recedes. It may shape the landscape a little, but the waters always withdraw.”
Turning to Jeremy, Celia asked, “Any final words?”
                “Only this,” Jeremy asserted. “Frequentist thinking is doomed to the ash heap of history for the simple reason that it denies the one persistent, common trait of humans. We learn by updating our prior experience. If it won’t die today, then it will die tomorrow, or the next day.” But it’s on the way out. We’ve grown beyond it.”
                Investigators were now streaming into the room. Celia thanked each of her friends, coming away from the discussion with much more knowledge, but the same level of confusion. As she walked to the podium to prepare for her first address as Principal Investigator, the same uneasiness that had prompted her to meet with the statisticians remained, and as the audience of scientists looked to her for leadership, she wondered, Is my problem a Bayes problem or not?