Parachutes, belief and intellectual curiosity

“There is no evidence that jumping out of a plane with a parachute improves outcome”

“If you go to PubMed, you will find no publications that breathing air is good for you”

“There’ll never be a trial, we are beyond that”

Have you ever heard these statements made when someone discusses the evidence about a particular new (or old) therapy? The statements might be true but are they useful? Do they advance an argument? What do they mean?

A paper from a Christmas British Medical Journal in 2018 found no evidence of benefit of parachute use when people jumping out of an aeroplane (which happened to be stationary at ground level) were randomised to wearing either a parachute or an empty North Face rucksack. This evidence built on a 2003 systematic review which found that there was no randomised evidence on the usefulness of parachutes for high altitude exits. Both these articles have a somewhat tongue-in-cheek style, but make the point that “…under exceptional circumstances, common sense must be applied when considering the potential risks and benefits of interventions”.

It is self evident that wearing a parachute when jumping out of a plane in flight, or of being in an atmosphere with enough air to breathe, is good for you. When people quote arguments about parachutes or air (or similar) in response to a query about a lack of evidence for a particular intervention, they are implying that the intervention they are discussing is similarly self evidently safe, effective or cost effective, that common sense must be applied. 

The issue is that the benefits of most medical interventions are clearly not in this category. To give some examples from my own field, it is not self evident that dosimetric methods will improve the outcomes of selective internal radiation therapy sufficiently to make a difference to trial outcomes, that endovascular intervention for acute deep vein thrombosis improves long term outcomes compared with anticoagulation, that for complex aneurysms endovascular aneurysm repair is better than open surgery or conservative management… I could go on.

And here we come to the crux of the matter, which is that such comments add nothing to a discussion about an intervention’s evidence base. Rather, their effect is to stifle debate into a confused silence. Whether this is done intentionally or out of embarrassment is irrelevant, the effect is the same: intellectual curiosity is suppressed and questioning is discouraged. This is the opposite of the Empiricism that underpins the whole of Western scientific thought. Before people asked questions, it was self evident that the Earth was flat, that it was the centre of the universe and that it was orbited by the sun. That was just common sense.

A strategy related to appeals to common sense is the weaponisation of the weight of collective opinion. Clinical trial design is dependent on equipoise, meaning clinicians do not know which of several options is better. Equipoise is dependent on opinion, and opinion is swayed by much more than evidence. Medical professionals are just as receptive to marketing, advertising, fashion and halo bias as anyone. Nihilistic statements denying a trial is possible (or even desirable) on the grounds that an intervention has become too popular or culturally embedded are only true if we allow them to be. The role of senior ‘key opinion leaders’ is critical here: they have a responsibility to openly question the status quo, to use their experience to identify and highlight the holes in the evidence, to point out the ‘elephant in the room’. But too often these leaders (supported in some cases by professional bodies and societies) become a mouthpiece for industry and vested interest, promoting dubious evidence, suppressing debate and inhibiting intellectual curiosity. There are notable examples of trials overcoming the hurdle of entrenched clinical practice and assessing deeply embedded cultural norms. This requires committed leaders who create a culture where doubt, equipoise and enquiry can flourish.

Given the rapid pace of technological development in modern healthcare, it is not unreasonable to have an opinion about an intervention that is not backed by the evidence of multiple congruent randomised controlled trials. But this opinion must be bounded by a realistic uncertainty. A better word for this state of mind is a reckoning. To reckon allows for doubt. Instead, when an opinion becomes a belief, doubt is squeezed out. ‘Can this be true?’ becomes ‘I want this to be true’, then ‘it is true’ and ultimately ‘it is self evidently true’. Belief becomes orthodoxy, and questioning becomes heresy and is actively (or passive aggressively) suppressed.

Karl Popper’s theory of empirical falsification states that a theory is only scientifically valid if it is falsifiable. In his book on assessing often incomplete espionage intelligence, David Omand (former head of the UK electronic intelligence, security and cyber agency, GCHQ) comments that the best theory is the one with least evidence against it. A powerful question therefore is not “what evidence do I need to demonstrate that this view of the world is right?” but its opposite: “what evidence would I need to demonstrate that this view of the world is wrong?”. Before the second Gulf War in 2002-3 an important question was whether Iraq had an ongoing chemical weapons programme. As we all know, no evidence was found (before or after the invasion). The theory with the least evidence against it is that Iraq had, indeed, destroyed its chemical weapons stockpile. More prosaically, that all swans are white is self evident until you observe a single black one. 

If someone is so sure that an intervention is self evidently effective, pointing out an experimental design to test this should be welcomed, not seen as a threat. But belief (as opposed to a reckoning) is tied up in identity, self worth and professional pride. What then does an impassioned advocate of a particular technique have to gain from an honest answer to the question “what evidence would it take for you to abandon this intervention as ineffective?” if that evidence is then produced.

Research is hard. Even before the tricky task of patient recruitment begins, a team with complimentary skills in trial design, statistics, decision making, patient involvement, data science and many more must be assembled. Funding and time must be identified. Colleagues must be persuaded that the research question is important enough to be prioritised amongst their other commitments. This process is time consuming, expensive and often results in failure, as my fruitless attempts at getting funding from the National Institute for Healthcare Research for studies on abdominal aortic aneurysm attest. But this is not say that we should not try. We are lucky in medicine that many of the research questions we face are solvable by the tools we have at our disposal if only we could deploy them rapidly and at scale. Unlike climate scientists we can design experiments to test our hypotheses. We do not have to rely on observational data alone.

The 2018 study on parachute use is often cited as a criticism of evidence based medicine. That a trial can produce such a bizarre result is extrapolated to infer that all trials are flawed (especially if they do not produce the desired result). My reading of the paper is that the authors have little sympathy for these arguments. After discussing the criticisms levelled at randomised trials they write with masterly understatement “It will be up to the reader to determine the relevance of these findings in the real world” and that the “…accurate interpretation [of a trial] requires more than a cursory reading of the abstract.”

As I wander around device manufacturers at medical conferences I wonder: what if more of the resource used to fund the glossy stands, baristas and masseuses were channelled into rigorous and independent research, then generating the evidence to support what we do would be so much easier. And I wonder why we tolerate a professional culture that so embraces orthodoxy, finds excuses to not undertake rigorous assessments of the new (and less new) interventions we undertake and is happy to allow glib statements about trial desirability, feasibility and generalisability, about parachutes and air, to go unchallenged.

Registry Data and the Emperor’s New Clothes

Registries. They’re a big thing in interventional radiology. Go to a conference and you’ll see multiple presentations describing a new device or technique as ‘safe and effective’ on the basis of ‘analysis of prospectively collected data’. National organisations (eg. the Healthcare Quality Improvement Partnership [HQIP] and the National Institute for Health and Care Excellence), professional societies (like the British Society for Interventional Radiology) and the medical device industry promote them, often enthusiastically.

The IDEAL collaboration is an organisation dedicated to quality improvement in research into surgery, interventional procedures and devices. It has recently updated its comprehensive framework for the evaluation of surgical and device based therapeutic interventions. The value of comprehensive data collection within registries is emphasised in this framework at all stages of development, from translational research to post-market surveillance.

Baroness Cumberledge’s report into failures in long term monitoring of new devices, techniques and drugs identified a lack of vigilant long-term monitoring as contributing to a system that is not safe enough for those being treated using these innovations. She recommended that a central database be created for implanted devices for research and audit into their long term outcomes. 

This is all eminently sensible. Registries, when properly designed and funded and with a clear purpose and goal are powerful tools in generating information about the interventions we perform. But I feel very uneasy about many registries because they often have unclear purpose, are poorly designed and are inadequately funded. At best they create data without information. At worst they cause harm by obscuring reality or suppressing more appropriate forms of assessment.

A clear understanding of the purpose of a registry is crucial to its design. Registries work best as tools to assess safety. In a crowded and expensive healthcare economy, this is an insufficient metric by which to judge a new procedure or device. Evidence of effectiveness relative to alternatives is crucial. If the purpose of a registry is to make some assessment of effectiveness, its design needs to reflect this.

The gold standard tool for assessing effectiveness is the randomised controlled trial [RCT]. These are expensive, time-consuming, and complex to set up and coordinate. As an alternative, a registry recruiting on the basis of a specific diagnosis (equivalent to RCT inclusion criteria) is ethically simpler and frequently cheaper to instigate. While still subject to selection bias, a registry recruiting on this basis can provide data on the relative effectiveness of the various interventions (or no intervention) offered to patients with that diagnosis. The registry data supports shared decision making by providing at least some data about all the options available. 

Unfortunately, most current UK and international interventional registries use the undertaking of the intervention (rather than the patient’s diagnosis) as the criterion for entry. The lack of data collection about patients who are in some way unsuitable for the intervention or opt for an alternative (such as conservative management) introduces insurmountable inclusion bias and prevents the reporting of effectiveness and cost-effectiveness compared with alternatives. The alternatives are simply ignored (or assumed to be inferior) and safety is blithely equated with effectiveness without justification or explanation. Such registries are philosophically anchored to the interests of the clinician (interested in the intervention) rather than to those of the patient (with an interest in their disease). They are useless for shared decision making. 

This philosophical anchoring is also evident in choices about registry outcome measures which are frequently those most easy to collect rather than those which matter most to patients: an perfect example of the McNamara (quantitative) fallacy. How often are patients involved in registry design at the outset? How often are outcome metrics relevant to them included, rather than surrogate endpoints of importance to clinicians and device manufacturers?

Even registries where the ambition is limited to post-intervention safety assessment or outcome prediction, and where appropriate endpoints are chosen, are frequently limited by methodological flaws. Lack of adequate statistical planning at the outset and collection of multiple baseline variables without consideration of the number of outcome events needed to allow modelling, risks overfitting and shrinkage – fundamental statistical errors.

Systematic inclusion of ‘all comers’ is rare, but failure to include all patients undergoing a procedure introduces ascertainment bias. Global registries often recruit apparently impressive numbers of patients, but scratch the surface and you find rates of recruitment that suggest a majority of patients were excluded. Why? Why include one intervention or patient but not another? Such recruitment problems also affect RCTs, resulting in criticisms about ‘generalisability’ or real world relevance, but its uncommon to see such criticism levelled at registry data, especially when it supports pre-existing beliefs, procedural enthusiasm, or endorses a product marketing agenda.

Finally there is the issue of funding. Whether the burden of funding and transacting post market surveillance should fall primarily onto professional bodies, the government or on the medical device companies that profit from the sale of their products is a subject for legitimate debate but in the meantime, registry funding rarely includes the provision for systematic longitudinal collation of long-term outcome data from all registrants. Pressured clinicians and nursing staff cannot prioritise data collection absent the time or funding to do this. Rather the assumption is (for example) that absence of notification of adverse outcome automatically represents a positive. Registry long term outcome data is therefore frequently inadequate. While potential solutions such as linkages to routinely collected datasets and other ‘big data’ initiatives are attractive, these data are often generic and rarely patient focussed. The information governance and privacy obstacles to linkages of this sensitive information are substantial.

Where does this depressing analysis leave us?

Innovative modern trial methodologies (such as cluster, preference, stepped wedge, trial-within-cohort or adaptive trials) provide affordable, robust, pragmatic and scalable alternative options for the evaluation of novel interventions and are deliverable within an NHS environment, though registries are still likely to have an important role to play. HQIP’s ‘Proposal for a Medical Device Registry’ defines key principles for registry development including patient and clinician inclusivity and ease of routine data collection using electronic systems. When these principles are adhered to, where registries are conceived and designed around a predefined specific hypothesis or purpose, where they are based on appropriate statistical methodology with relevant outcome measures, are coordinated by staff with the necessary skillsets to manage site, funding and regulatory aspects and are budgeted to ensure successful data collection and analysis, then they can be powerful sources of information about practice and novel technologies. This is a high bar but is achievable as the use of registry data during the COVID pandemic has highlighted. Much effort is being expended on key national registries (such as the National Vascular Registry) to try to improve the quality and comprehensiveness of the data collected and create links to other datasets.

But where these ambitions are not achieved we must remain highly skeptical about any evidence registry data purports to present. Fundamentally, unclear registry purpose, poor design and inadequate funding will guarantee both garbage in and garbage out.

Registry data is everywhere. Like the emperor’s new clothes, is it something you accept at face value, uncritically, because everyone else does? Do you dismiss the implications of registry design if the data interpretation matches your prejudice? Instead perhaps, next time you read a paper reporting registry data or are at a conference listening to a presentation about a ‘single arm trial’, be like the child in the story and puncture the fallacy. Ask whether there is any meaningful information left once the biases inherent in the design are stripped away.