RDoC: Outcomes to Causes and Back
Imagine a world where your psychiatrist runs a panel of tests—behavioral and brain function tests—in addition to her clinical assessment. She gives you a diagnosis and realistic prognosis, and helps you choose between treatments with the knowledge of your individualized chance of responding to each of them. This is the ambition of the Research Domain Criteria (RDoC) project.
In my last message—addressing a question I’m asked frequently, which is what I’m going to do with RDoC—I discussed the notion that it is too early to determine the full utility of the RDoC project, but that we at NIMH remain committed to doing so. The first step (in the short term) is to evaluate the RDoC hypothesis—will it be useful in characterizing brain-behavior relationships, including those that underlie psychiatric disorders? How do the results from RDoC studies compare and align with results from studies classifying disorders only by traditional DSM diagnoses? I also discussed an important second step in dealing with the tremendous challenge posed by any attempt to deconstruct human behavior into its component parts—to leverage bottom-up, data-driven approaches to deriving the natural domains of behavior. In this way, we can ask whether the bottom-up approach to characterizing behavioral domains agrees with the top-down, expert-driven approach we have initially pursued, and modify the domains accordingly.
Once we have these data-driven domains, we can do some powerful stuff. Because don’t forget what else we will have, besides those domains. We will have the big dataset derived from web-based behavioral testing of the All of Us Research Program, that million-plus cohort of volunteers who will have agreed to participate in perhaps the greatest longitudinal health study of all time. They will have submitted biological samples, and given researchers access to their de-identified electronic medical records. This resource gives us tremendous potential to use a computational approach to integrate and continuously refine RDoC and the DSM into a comprehensive tool that I believe can revolutionize psychiatric research and practice.
The Even Longer Answer: What a Big Data Approach to RDoC Really Buys You
To set the stage, first I need to turn back to the plan I discussed in my last message. Take a bunch of behavioral tests covering the full spectrum of behaviors represented in RDoC. These tests would comprise a number of behavioral measures, each corresponding to an RDoC construct and its related neurobiological correlates. Develop versions of these tests suitable for the web. Pick a subset of the All of Us cohort, and send out this package of tests via the web so that select volunteers, 100,000 or more of them, can complete them at home. Using clustering algorithms and other methods, extract from all these data something analogous to social networks but pertaining to behaviors rather than people: what elements of behavior are related to and influenced by each other and how they can be traced back to brain systems? This will help revise RDoC to better reflect the true natural structure of behavior.
The real power of this data-driven approach is the link with electronic medical records, however. The synergistic combination of dimensional behavioral assessment—the RDoC approach—with categorical diagnoses—defined in the DSM—and longitudinal clinical information has tremendous potential to revolutionize how we think about and manage patients—and eventually, to predict and prevent the onset of disorders. To illustrate this, I’m going to describe one possible way to use this dataset. It is an idea that grew out of the computational psychiatry meeting I mentioned in one of my early messages. It is described in full in two introductory chapters from the book that grew out of that meeting, as well as an article in the journal Computational Psychiatry.1 The key idea is to use an established statistical approach—Bayesian causal modeling—that allows us to find relationships between causes and outcomes, working both ways, from causes to outcomes and also from outcomes to causes.
This new approach relies first on the recognition that DSM diagnoses are not actual disease processes happening in the brain. The DSM has always comprised categorizations of symptoms. We realized in that meeting that this meant that the DSM described observations made by expert clinicians. These observations are outcomes that arise from some underlying disease process or set of processes happening in the brain. Since we know the outcomes (DSM-based observations) but not the causes (underlying disease processes), we need a way of working backwards from outcomes to causes if we are to understand the neurobiology of mental illnesses.
The key is that the relationship between the underlying causes and the symptoms is both multifaceted and probabilistic. The same disease processes can produce different outcomes (symptoms or diagnoses) in different patients and the same outcomes can come from different causes. But these relationships are not random. If a patient comes to a doctor with chest pain, that could come from either indigestion or a heart attack, but it probably does not come from a broken leg. Statistical methods provide a formal, systematized way of calculating the chances that one or another condition (indigestion or heart attack) caused the chest pain.
Conceptualized this way, DSM diagnoses have a very interesting potential relationship to RDoC domains. The RDoC domains and constructs are hypotheses about how specific behaviors relate to underlying brain processes. If those constructs are a useful categorization of these processes (note the point I made earlier that we have to test this!), then we should be able to relate those RDoC constructs and their potential dysfunctions to the DSM observations—in a multifaceted and probabilistic way.
The recognition that the DSM diagnoses are observations that probabilistically arise from underlying brain processes (RDoC domains) is particularly advantageous if one would like to learn something about those processes. Why? Because if a given set of disrupted processes is likely (probabilistically) to produce a given set of observations (DSM diagnoses) then the reverse is also true: for any given set of DSM diagnoses, there is a likely (probabilistic) underlying set of disrupted processes. And if you have enough data, you can construct a model of what those probabilities are, and, furthermore, you can ask how accurate that model of that relationship is. And even more importantly, you can test the accuracy of multiple models, that is, you can use observational data to tell you which of your models best explains the data. In other words, we can employ a systematic way of working from outcomes to causes. Quantitatively.
How would this work concretely? Let’s consider a specific example. Imagine we have a dataset based on a whole bunch of people with a diagnosis of major depressive disorder. Some of these individuals get better with fluoxetine, and some with cognitive behavioral therapy (CBT). Furthermore, suppose we can demonstrate aberrant function specific to one or more RDoC constructs that can be measured through brain or behavioral observations. Brain scanning might, for example, reveal hyper- or hypo-activity in different areas of the brain, or disrupted connectivity between different brain areas; or behavioral measures might detect impaired working memory. We can quantitatively measure the degree of influence that dysfunction in each of these constructs has on the two different subtypes of depression. Are the subtypes better described as each arising from dysfunction in one of these constructs, or more than one? Or are the constructs and diagnostic subtypes more likely independent observations? With enough data, you can test which of these models best explains what you see.
Even better, this approach allows you to improve your model iteratively. Remember that in addition to clinical assessments, we have behavioral tests on these patients with depression. Now you find that most of the fluoxetine-sensitive patients also score low on tests of motivation, but only a few of the CBT-sensitive patients do. One can then ask whether applying other tests relevant to the same behavioral domain (in this case, positive valence, which has to do with motivation and reward) would improve your ability to predict which treatment the patient will respond to—again, with a quantitative evaluation of how much the addition of these measures improves that prediction.
Finally, imagine you had this complete dataset—with hundreds of thousands of patients, each with diagnoses, clinical response, longitudinal course, and behavioral tests across the RDoC domains. One can make quantitative predictions not only of outcome—how a patient will do with a given medication, what that patient’s prognosis is, etc.—but also about the nature of the underlying structure of psychological and biological processes: Are the neural substrates (the brain processes on which treatments act) for fluoxetine-sensitive and CBT-sensitive patients separable? How many different kinds of processes are there, and which brain regions might relate to each? One can build models of the underlying processes, and quantify how much they explain about the symptoms and behavioral measures, which can help refine those quantitative predictions of prognosis and outcome.
Combine the features of this approach—integration of RDoC behavioral constructs and DSM observational diagnoses, quantitative comparisons of model accuracy, iterative improvements in the combined model, and integration with longitudinal data (including developmental history, illness progression, and treatment response)—and you have the makings of an evidence-based diagnostic and clinical tool that has built-in automatic updating capability.
Of course, there are a lot of things that have to go right for this to work. Getting the data will be challenging, especially data inclusive of hard-to-reach people like children and individuals with serious mental illnesses. Early editions of a comprehensive diagnostic tool will likely be rudimentary.
Nonetheless, with enough data, creativity, and computation, we can build the tool of the future, sooner than you might think. Let’s get busy.
1 See also: Friston KJ, Redish AD, Gordon JA. Computational nosology and precision psychiatry. Computational Psychology. 2017. https://doi.org/10.1162/cpsy_a_00001