Webinar: Analyzing and Using RDoC Data in Your Research
UMA VAIDYANATHAN: So I think I'll go and get started. It's 12:01. All right. So, hi, everyone. Thank you for logging into today's webinar. I'm Uma Vaidyanathan, the scientific program manager for the RDoC unit here at NIMH, and in today's webinar, we're going to address the topic of how to analyze RDoC data in your research. We are very happy to have Aristotle Voineskos, Lisa McTeague, and Meredith Wallace here with us today for that purpose. So say hi, folks, to everyone.
ARISTOTLE VOINESKOS: Hi. UMA VAIDYANATHAN: All right. So before we get started, I just want to cover some basic Zoom controls so that you, our viewers, are familiar with them. So on the top left-hand corner of your Zoom window, right about there, you see a bunch of buttons. One of them will be an audio settings button, which is, as the name implies, allows you to play around with that. Next to that, you should see a Q&A button. If you click on that, it will open up a window into which you can type in questions and send us your question at any point in time during the webinar, and we'll try to answer them live today as much as possible. And then right next to that is a Chat button. That's disabled because we prefer all your questions by the Q&A window. And that's pretty much it for controls. It should be pretty simple. So in terms of the order of things today, I'll first cover what RDoC is and give you a very brief overview of it. Then I'll have each of our panelists present briefly on their work, after which we'll have an integrative discussion about their studies and how they illustrate ways to analyze and integrate data obtained using multiple methodologies. So let me start by sharing my slides here. Here we go. All right. So here we go now. So just in case we have folks who are listening in who are not familiar with RDoC, here's some sort of RDoC 101, and I've underlined keywords on the slide in red here and bolded them that you should pay attention to. So RDoC, or Research Domain Criteria, is an initiative that was begun by NIMH in 2009. It is a strategy or set of principles for research in mental disorders. It is not intended for clinical use at the moment, just for research, and so that's why I've underlined those words there. And the core of RDoC is centered around the notion of constructs; that is to say, concepts or theoretical entities that are relevant to mental disorders. These constructs are based on both biology and behavior. RDoC does not prioritize any one methodology above the other, and such constructs are also measured dimensionally; that is to say, they cover both the normal and abnormal range of the trait you're trying to get at. And keep in mind the reason I've underlined the word "measured" is because I'm referring to the measurement of that construct. So, in reality, your construct may have discontinuities at some latent or underlying level, but you want to measure it dimensionally so you get the maximum information on it as possible, so you can use the information to determine where to draw those cut points or to determine if something is truly categorical. The next is that the presence of constructs inferred by using multiple units of analysis in RDoC parlance are "methodologies" to get at it. Again, we don't value any single methodology, biology or behavior, as less supportive of the other. And finally, such constructs may help parse heterogeneity within or across a disorder. And the idea is that these constructs are getting in mechanisms that lead to symptoms or symptom sets within or across disorders, and the meaning of this will become a little bit more clear when you listen to our speakers present today. So just keep these sort of basic principles of RDoC in mind as our panelists go through their presentations today. And some other things to note about RDoC are as follows. So many of you are likely familiar with our RDoC matrix, the grid-like structure on our website. The matrix is actually a tool for helping implement the principles of RDoC in your research. It's a list of the major constructs and domains that are thought to be relevant to mental disorders and was derived from a series of workshops that were held between 2010 and 2012, where over 200 expert researchers convened here at NIMH to define and delineate those constructs. The constructs and domains in the matrix are not considered to be entirely independent of one another. They are distinct in that they're distinct dimensions of human behavior and functioning but are thought to function interdependently and affect each other. So, for example, the RDoC domains of positive valence systems and negative valence systems do affect each other in cognition as well as emotion affects cognition, but they're still distinct domains of behavior. And finally, the matrix is not set in stone. It is expected to evolve as research accumulates, based on the constructs that we have defined here so far. And a very important point to keep in mind is that RDoC is not the matrix. A lot of people sort of conflate the two. They think RDoC equals the matrix, which is not the case. RDoC, again, is a set of principles that I showed you on the previous slide, and the matrix is merely a way of implementing or operationalizing those principles. And, lastly, this is a question we get often, which is, “What about developing environments?” And those do matter very much to RDoC. While they're not specified in the matrix at the same level of detail as constructs and domains, this was done deliberately because we wanted to leave it up to investigators to decide how best to specify them in their work. So this picture summarizes what I just talked about in terms of how RDoC is a set of principles that aims to get at constructs or mechanisms which lead to heterogeneity within or across mental disorders by using multiple methodologies. So one question we often get is, Well, how exactly do you do that? I mean, you say integrate data, but how do you integrate data from multiple methodologies? What statistical method or methods do you use?" So I'm going to go ahead and spoil the surprise ending of our webinar now for you, and the answer is that there is no single method we would advocate. It all depends on your study and the question you asked, as long as you follow the principles of RDoC that I talked about earlier. What we would like to do in the next hour is to show you why this is the case, why there is no single statistical method or analysis that can address all the problems, and how to goabout doing your study in that case. So during today's webinar, we will have three researchers present data from their studies that are along the lines of RDoC, and they'll discuss how they approach the issue of data analysis in their work. So we're very happy to have Lisa, Aristotle, and Meredith here with us today. Aristotle will start us off by presenting his work on social processing deficits in psychosis spectrum disorders. He'll talk about integrating various types of data using a statistical method called partial least squares. Lisa will present next and will show some psychophysiological data on work she's done in relation to fear and anxiety. And her work focuses on a more theoretical approach to integrating data. And then, finally, we'll have Meredith, who will discuss some new clustering methods that she has been working on and will talk about why it is very important to pay attention to features such as skew in your data before you run some analysis. And in between each presenter, we will also have a couple of minutes to focus on any questions that you might be interested in asking about specifically their presentations, and as I mentioned, you can type these at any point in time to us in the Q&A window. We'll also have a broad integrative discussion at the end, in the last 15 minutes or so, where you can ask questions about sort of common themes amongst the research or questions that apply to any of them. Again, what we'd like you to pay attention to during each of our panelists' talks is how their work fits the RDoC principles and sort of how they approach the research questions they had in their studies. So let's get started now. So I'm going to stop sharing my slides. Aristotle will be our first presenter. Aristotle, it would be great if you could talk about your research very briefly and then start discussing your study. Thanks.
ARISTOTLE VOINESKOS: Okay, thanks. So, yeah, I'm Aristotle Voineskos. I'm at the Center for Addiction and Mental Health in Toronto, at the University of Toronto. My research program largely uses brain imaging approaches to understand more about psychiatric disorders and also uses brain imaging in the context of intervention studies. But today I'll be talking about one specific study which is a multicenter brain imaging study that uses the RDoC framework. So I guess I'll just get started. Okay. Does that look okay to everyone? Yep? Okay. Got the thumbs-up from Uma. All right. So I can get started. Right. So the title of our grant application was "Social Processes Initiative in Neurobiology of the Schizophrenia(s)," and acknowledging the fact that there may be many schizophrenias and also really underscoring the point that there's a lot of heterogeneity in this disorder. And I think as might have been mentioned already, this is a study that is looking at schizophrenia spectrum disorders in healthy people. It's not a study that looks at a number of different diagnostic categories other than those in the schizophrenia spectrum disorders. And that was, in part, because it was a starting point but also because we felt there's a lot of heterogeneity already in the social cognitive constructs that we're using within the social process domain to at least get started. And certainly, we know there are other disorders of social cognitive impairment and others that may have less social cognitive impairment. But the point was really to have a range of performance from great—a significant amount of impairment–to people who are very good at performing social cognitive tasks. So this is a three-site collaborative RO1, so there's an RO1 to my center in Toronto, one to Zucker Hillside Hospital, and one to the Maryland Psychiatric Research Center. So we're trying to do the exact same thing in each site and then combine all the data at the end of the study for analyses. So we're collecting 60 healthy controls and 100 people with schizophrenia spectrum disorders at each site. And just to note that the study just started, actually, so there isn't a whole lot of data to present to you, but I will get into a little bit of pilot data. But we just started up in late 2014, and so we're not wrapping up until 2019. And as I mentioned, it's a neuroimaging study, so we have a number of structural and functional neuroimaging acquisitions we're doing when people get into the scanner, detailed clinical and neurocognitive assessment because we know that those things may actually be related to social cognitive performance, so we want to be able to disentangle the relationships between these variables to understand aspects of shared variance and unique variance. And we're also really interested in how all this relates to social function or functional outcome in the real world, actually outside of a scanner, outside of a lab. And as Uma mentioned, the reason the partial least squares multivariate approach, we think the multivariate approach is necessary to relate a lot of data that may be collinear or that might have a lot of shared variance. So I'll just move to the second slide. So we just built a model when we were writing the grant of a set of hypotheses that we were testing. We may or may not disprove them; we'll see. But basically just to have some kind of conceptualization of what we were testing, we are hypothesizing that there are some circuits, including the frontal parietal circuit, on the right side of the brain, that might be better known by some as the mirror neuron system or the simulation system, that might be related to what we call lower-level social cognitive processes, so these are basic emotion understanding, and then a cortical midline circuit and some lateral parietal temporal regions that might be related to higher social cognitive processes that might be more related to understanding high-level intent and attitudes of others. We have a number of tasks that we're doing in the scanner and a number of tasks outside of the scanner, including some dynamic tasks where people watch videos and have to really get at what's going on in a more real-world type setting, and then we're also relating that to functional outcome. So that's just the model that we're testing, and we'll see. So as I mentioned, there's going to be many imaging and many behavioral variables that we're dealing with, and that's, you know, a blessing and a curse at the same time. The partial least squares approach has a few advantages. It's not a perfect method, but I think it's one that's fairly well suited to the design of the study. Basically, it pulls out latent variables from—if you think of two sets of data, you kind of think of the left side and the right side, sort of the X's and the Y's, maybe to put it really simply, you have all the imaging data on one side and all the behavioral data on the other side. And we really want to try and pull out what the latent variables are that relate these blocks of data. So the pro of that is you really get to see all the imaging variables that might be related to a number of behavioral variables at once, and there's going to be an independent series of those types of relationships. The con is that you may—you're not going to get single brain region to single behavior relationships like you would in more univariate style analyses. But the nice thing is you can get measures of significance of these latent variables and also measures of reliability of the data, and I think that's really important in heterogeneous disorders because sometimes, especially in smaller end studies, your findings may be driven by a subset of individuals who are sort of way off to the side in one direction or something along those lines, and that's sometimes hard to detect when you're looking at your data yourself or writing up the paper. And as I mentioned already, PLS can take into account dependent measures that are highly correlated. We know the activity of hundreds of thousands of brain blocks, those are going to be correlated all at once, so it's important to be able to take that into account. So just to talk about this a little bit more, without getting into any math, because I'm not a mathematician, but as I understand it, partial least squares, basically it's the least squares to composition of part of a covariance matrix, and so, as I already mentioned, you're basically explaining the relationship between two or more blocks of data. And then you do statistical assessment through resampling algorithms, both through permutation testing and bootstrap estimation of standard error, to determine a reliability of the latent variables you've detected. Okay. So I'm just going to take you through sort of an example of a task and some very preliminary results. Unfortunately, it's only in a small number of people, because, as I mentioned, we're just early on into the study, and at the end, we're hoping to have a much larger number of people. And I'll try and get to how we might use that to our advantage. So one of the two functional tasks we're doing in the scanner is a very simple task known as the imitative Zurich task that was pioneered by Marco Lacoboni, and he's actually a consultant on our grant. He's at UCLA and has really pioneered this work in people with autism spectrum disorder, and we think that's an interesting relationship worth exploring down the road, between schizophrenia spectrum and autism spectrum disorders. So this task is believed to activate the mirror system. Basically, someone goes into the scanner and there are two 5-1/2-minute runs. They're either observing a number of faces with different, very prominent facial emotions, as you can see on the image down below, and then, in a second run—or it can be in the first run because we counterbalance things—they are imitating the faces. And so the idea is through—basically through mirror neuron theory, the idea is that when you imitate these faces you're activating that circuit a little more intensively than you might be when you're simply observing the faces. So these are some of the early data we based our hypotheses on, but also I think are useful as an illustration of where we were and where we want to get to. So these are—this is Marco's seminal Nature neuroscience paper in 2006, where he studied younger people with autism spectrum disorder and healthy controls. And in Image A, you can see basically what is a map of the activation in the brain during the imitate task in one of the groups. In that paper, he also showed the same activation in another group and basically tried to contrast the difference, and you can get kind of a regional assessment of what region might activate differently than another. And then he followed that up looking at some voxel behavior correlations. So he looked at some peak areas of activation, correlated them in a univariate fashion with some measures of social function in people with autism. So I think that's the way functional MRI studies have gone for quite a while, and I think that this is critical early work in which we base our hypotheses on, but what we'd like to do is take things a step further. And so I hope this isn't a slide that shocks the system too much, but I'm going to take a minute or two to explain it because this is my second-to-last slide, and then I'll just conclude briefly. So not much of a transition here, because of time, but basically this is an example of one type of PLS, and I'll take a minute or two to explain this, as I mentioned. So we were able to construct different contrasts of fixation, neutral, and emotional faces within the imitate condition and then within the observed condition. You can see those labels on the first panel. And then labeled on the bottom of the first panel are the different social cognitive tasks. So instead of simply correlating activation of a voxel with a single task, we're getting correlations of brain activation with a number of behavioral social cognitive tasks. So at year 40 is the emotion recognition, RMET is reading the mind in the eyes, RAD is relationships across domains, and TASIT 1, 2, and 3 are aspects of tests of the awareness of social inference. On the bottom panel, you can actually see parts of the brain that activate at different time points during the task, and the red basically means there's a positive correlation with the bars that are going up in Panel A, and the blue means there's a positive correlation with bars that are going down in Panel A. And so you have inverse correlations, though, with blue blobs and bars that are going up and similarly with red blobs and bars that are going down. So what you can really see is what parts of the brain are positively correlated with performance on these tasks in specific conditions, both during imitate and observe, and what parts are inversely correlated. So rather than spending a lot of time explaining what these results mean, this is just an example. It's a very rich amount of data. It's just in 20 healthy controls, just to kind of use as a proof of principle or proof of concept, but you can really see here that you kind of can get out from here that some of the tasks are highly correlated, and some brain regions are highly correlated with each other. But you can really get the sense that things are looking quite different, depending on whether someone is imitating an emotional face or a neutral face, and you can really see that the extent of activation correlations with the behavioral task is diminished in the observed condition compared to the imitate condition, because on the Y axis, you have the degree of correlation. And that's kind of what we'd expect as well, that you're going to have stronger correlations during imitation because you're activating that mirror neuron circuit more intensely. What we hope to do is to do this sort of things across patients and controls, because we know that—and this is part of the RDoC idea– we know that some patients will perform more poorly than controls; others will perform better. So it's really not about comparing what's going on in schizophrenia patients and healthy controls, but it's really about disentangling subgroups of people who performed better or worse and who might use different circuits than each other during performance of these tasks. So that just takes me to some of the things we hope to achieve by grant end; I just picked three. So one is to detect the neural circuitry underpinning the full range of social cognitive performance and function across our sample. So we do hope to be able to detect what had been called tipping point subsamples. So there may be a few people on one end of the spectrum who are really poor performers, who use altogether different neural circuitry, and by dividing up our sample into different levels of social cognitive performance and running PLS on them, we might be able to detect what unique circuits those people use or don't use during these tasks. And that's really relevant for interventions when you're using target engagement-based approaches, because you want to know what circuits different individuals are actually using during the task. Point two, I think it relates to the issue of shared variance or correlations. You know, neural cognition and social cognition, negative symptoms are all correlated in some way, and so we want to be able to disentangle what those unique and shared aspects of variance are in relation to brain circuit structure and function. And, sorry, I got to the tipping point subsamples already, because I think that's really important for the intervention component which we hope to do subsequently when this grant finishes, because we don't want to be designing interventions for all groups of schizophrenia patients. We want to be tailoring them to subsamples, depending on their circuit structure properties or circuit activation properties during these tasks. So thanks very much.
UMA VAIDYANATHAN: Thank you, Aristotle. If you want to hit Stop Share on your screen, we'll get back to the video for all the participants. Thank you for the great talk. It was very interesting. We had a couple of questions come in. So it sounds like your hypotheses are a mix of sort of more theory-driven and sort of exploratory. Would that be correct?
ARISTOTLE VOINESKOS: Yep.
UMA VAIDYANATHAN: Okay. So you're just kind of hoping to use analyses to get at these different circuits without necessarily being—
ARISTOTLE VOINESKOS: It would be interesting. The analyses themselves have a, you know, PSL—the approach is kind of agnostic in a way, so we're not picking out regions like you might want to do in a hypothesis-driven based approach. But we believe we're going to find certain things, based on the literature and the biology and how we understand things. But the approach we're using will tell us which circuits are related to which social cognitive performance and social cognitive tasks, and they may be the circuits we suspect and they may be other ones. But because we're using a whole brain voxel as approach, the method will detect what are the relevant circuits during that time.
UMA VAIDYANATHAN: And you mentioned your site was a multi-site study, right? Did you have any sort of particular issues in designing it, to allow for combined data across sites?
ARISTOTLE VOINESKOS: Right, so then that's a whole other talk, but a good point. We spent a lot of time in putting a lot of effort into preparing for this application, probably a year and a half to 2 years, including people being scanned at each site, including consulting with people who had done this before. Jessica Turner is a co-investigator on our grant. She was the manager of the FBIRN, which was, I think, maybe the first multi-site imaging study attempted in schizophrenia. So Jessica's been very helpful. We basically did as much up-front work as possible in terms of preparing things, running phantoms on the scanners, trying to detect site differences, and I think one thing that was really important for us was to accept the fact that there are going to be inter-site differences, not to pretend like we're going to get things identical across each site, particularly from the scanning point of view. And the important thing for us was to be able to understand what those differences were and how to quantify them, and then how to deal with them after the fact. I didn't get into that in this particular presentation but that was, I think, really important for us to be realistic about that and to come up with approaches to handle those issues.
UMA VAIDYANATHAN: Okay. One last quick question before I move on to our next presenter. How do plan to identify your tipping points statistically, the tipping points you mentioned?
ARISTOTLE VOINESKOS: I think the next two presenters are going to probably give some ideas. But one really easy approach that isn't statistically sophisticated at all can simply be to divide your group into quintiles, or with an n of 300 we could probably even divide it up a little further. I mean, you want to have adequate power within each subsample to run your analyses, and I think the nice thing about PLS is so long as we're getting good bootstrapping results, suggesting our data are reliable and the findings are not problematic within each subset, then we'd feel confident in those results.
UMA VAIDYANATHAN: Okay. Great. Thank you very much.
ARISTOTLE VOINESKOS: Thank you.
UMA VAIDYANATHAN: Lisa, do you want to get going on your presentation next? You're muted still. Oh, there we go.
LISA McTEAGUE: Okay.
UMA VAIDYANATHAN: You can go ahead.
LISA McTEAGUE: Can you see that okay? All right. Today, as Uma mentioned, I'll be talking about defensive reactivity across the anxiety disorder spectrum, taking more of a conceptual descriptive approach as opposed to a statistical one. I just want to mention that I'm currently at the Medical University of South Carolina, but this data was collected with Peter Lang, Margaret Bradley, and a host of colleagues at the University of Florida. Just to start, as I'm sure you all know, the defensive system is activated in the context of threat, and, essentially, neural structures with outputs to structures that mediate reactions in a host of autonomic as well as somatic physiological systems prompt a wide array of responses. And in animal models, the extent or the strength of defensive activation is, in large part, a function of predator imminence or proximity. So that is to say that stages of pre-encounter with threat, post-encounter, and circumscribed or overt action each have a characteristic series of coordinated responses. Peter Lang and colleagues, as well as other individuals, have suggested that our human laboratory-based paradigms are, in fact, akin to the post-encounter stage. So once threat has been detected, and that given that, we should be able to assess this coordinated defensive respond across channels. And, in fact, we have shown, as well as others, across a range of different paradigms, that there is, in fact, coordinated defense cascade across multiple measures in the case of healthy, adaptive, defensive mobilizations. One paradigm which we found particularly productive for prompting this multisystem defense cascade is narrative imagery, in which participants listen or read a narrative script and they've been instructed to actively image themselves involved in a subsequent period as a protagonist, as opposed to an observer. And, in fact, narrative imagery reliably modulates subjective or self-reported arousal and aversion. It increases fear potentiation consistent with perceived threat, increases heart rate as well as skin conductance, so autonomic measures, as well as corrugator EMG or facial frowning. So across multiple measures, what we see is a coordinated defensive response in the context of adaptive defensive mobilization or adaptive emotional processing. But what about disordered emotional processing? We've actually used narrative imagery extensively in anxiety patients, and I'll show some data of over 500 participants, and what you'll see is individuals that represent, basically, each of the principal anxiety spectrum disorders as well as a sample of demographically matched community controls. But before we jump into the physiological data, what I wanted to just show is that if you look up their symptom scores, what actually emerges is we simply order these individuals or these principal groups, based on the severity of their scores. What we see is a continuum of increasing negative affectivity with decreasing focal fearfulness, and by that I mean that on the left side here we have controls, followed by principal specific phobia, social phobia circumscribed to performance situations, so very focal fear disorders, followed by panic without agoraphobia, generalized social phobia, panic with agoraphobia, obsessive-compulsive disorder, generalized anxiety disorder, and PTSD, at the extreme. Now, it's very important to keep in mind that this continuum of increasing negative affectivity with decreasing focal fearfulness of the principal complaint is not limited to depression, to BDI or depression more broadly, but is evident also in nonspecific anxiety, functional interference, as well as comorbidity. So to see how physiological patterns line up with this we'll actually start with PTSD, so a disorder at the extreme of this negative affectivity continuum, and what you see here, this is startle reactivity during imagery for controls and principal PTSD patients. And this is not going to be surprising to anyone that what we see in PTSD, as a group, is exaggerated startle reactivity. Here, I've depicted just the startle data, but we also saw this pattern of exaggerated defensive engagement in heart rate, skin conductance, corrugator, as well as subjective responding. And so, essentially, we are, in fact, seeing coordinated exaggerated defensive responding across measures in PTSD. But what happens when we start looking at meaningful subtypes? Here we have a single-trauma PTSD group, and this is multiple trauma. This is startle reactivity to startle probes, and we saw a very divergent response with really pronounced reactivity in the single-trauma group, but actually incredibly obtunded reactivity in the multiple trauma, and this was evident in skin conductance and then, more modestly, also in heart rates. So a natural question is, well, did we just fail to activate them? Well, when we looked at their subjective arousal ratings we actually see similarly extreme arousal in both subtypes, so a discordance between their physiological reactivity and their self-reports. But to complicate matters more, when we actually look at facial frowning, we also see exaggerated reactivity in both subtypes. So, just to summarize, what we're finding here is that in a single-trauma group, which shows more limited comorbidity, a more focal fearful response, we actually see coordinated exaggerated defensive engagement across a whole host of different channels, whereas in the more comorbid, broadly negative affect, more broadly stressed, multiply traumatized individuals, we actually see prominent collapse of some defensive channels, but simultaneous with exaggerated reactivity, and so prominent discordance. Now I've talked about PTSD so far, and we just selected that as an example, but we also see this pattern within each of the anxiety disorders, as well as between them. So here I've just brought up the fear potentiation scores, to startle reactivity during personal threat imagery, simply based on magnitude. And what you see, reminiscent of what we see in their self-reported negative affectivity is that focally fearful disorders tend to be more reactive, and as the distress generalizes there's this overall weight of affective pathology increases, we actually see attenuation of their startle reactivity. But similar to what we saw when we looked at each of the individual measures of PTSD, what also changes along this continuum here is that we see greater discordance between different defensive measures as we move from the more focally fearful out to the more broadly distressed. So this pattern led us to wonder, were we going to see a different pattern if we shed the diagnostic labels and instead started to identify groups as a function of the system response, system concordance or discordance. What we simply did here was we created a composite variable of startle and heart rate reactivity, and then what we did was sort all the individuals or rank them on this, and created five equal bins or quintiles of responders. So these are hyperreacters out to the hyporeacters. And, not surprisingly, because that's the way that variable was defined, startle reflex reactivity shows a consistent detriment across this continuum as well as heart rate. But what was interesting for us, and what deviated from our analyses driven by principal disorder, was that we also started to see greater concordance in the propensity for physiological hyperreactivity or hyporeactivity in other physiological measures. So here, we have skin conductance, corrugator EMG, and orbicularis EMG. So how do these physiologically defined quintiles relate to some of our important clinical variables? And what we see here, this is the startle heart rate responder to composite again, and then on the right is subjective aversive arousal. So differences in subjective aversive arousal does not actually predict the differences that we're seeing in their physiological reactivity, but what does relate is an inverse pattern of increasing broad negative affectivity, as physiological reactivity decreases, as well as increases in functional impairment. Now one thing that's important to note is that this is not a simple one-to-one correspondence with principal disorder. So at every one of these quintiles, every single principal anxiety disorder was represented. There was, of course, the tendency for the hyperreacters to have a greater proportion of focal fear disorders, and at the hyporeactive end, the opposite pattern, that they tended to be more anxious misery. So where do we move from here? Well, we spent an extensive amount of time assessing multiple measures in multiple disorders, and found that it's very rich but it's also very complicated. This quintile analysis or these composite hyperreacters/hyporeacters was really just a first step, an exploratory step, on our docking approach, and, in fact, is really quite crude. And so as we've moved on, what we've done is try out different clustering techniques to try and capture the full dimensional variation among the multiple physiological and self-reported measures as well as symptom measures. And that has not been without hurdles, I have to say, and I'm not going to elaborate on that because Meredith can do a much better job. But it has, from our perspective, suggested that this could be really meaningful moving forward, if you take, for example, single-trauma PTSD, relative to multiple, for which the prognosis and treatment is much better across a number of interventions. This might, in part, be attributable to the fact that coordinated, albeit exaggerated defensive reactivity across multiple systems might be easier to remediate than trying to reengage and resynchronize disrupted defense cascades. I just want to say thank you for your attention and thank you very much to the team of individuals who helped with this data.
UMA VAIDYANATHAN: Thank you, Lisa. That was a fascinating talk, truly. It was very interesting to see that. We had a question coming in just now, and they asked, it seems like the decrease in multiple trauma PTSD might be associated with dissociation. Did you find that in your data, or did you look at that aspect?
LISA McTEAGUE: Actually, it's a really important question and we get it a lot. This is one of those reasons, in my opinion, that it's really essential to have multiple measures. So while we do see attenuation in startle, skin, and heart rate, we actually see an exaggeration, even in multiple trauma, in corrugator EMG or subovert facial frowning, and we also see it in their subjective ratings of their adversiveness, the experienced adversiveness and arousal. So from that perspective, we have been interpreting it not as a function of dissociation.
UMA VAIDYANATHAN: The same person also remarked that it probably varies by developmental phase of exposure to trauma.
LISA McTEAGUE: I'm sorry. I didn't hear that.
UMA VAIDYANATHAN: They remarked that it probably varies by developmental phase of exposure to trauma, so I don’t know if you have any data that speaks to that.
LISA McTEAGUE: That's absolutely true. The multiply traumatized individuals have a totally different developmental trajectory because their trauma exposure, on average, started, I believe, when they were 11 and it actually proceeded right through the time of the onset of their PTSD. And so they've experienced PTSD for three times as long but they had been experiencing cumulative trauma exposure over the lifespan. So that's a very important point.
UMA VAIDYANATHAN: Okay. Another person asked, can you give more information on the plans on how you will recruit in the future and analyze the data you have to examine these ideas in an RDoC manner?
LISA McTEAGUE: Actually, I'd have to say that I have to echo the sentiments I've heard from many people and we've talked about it also, just with the presenters here. We have also been struggling to find methods that actually work, and that's one of the reasons I've been really excited, actually, about Meredith's methods, and we've actually already talked about utilizing hers. And so we want to be doing a lot of clustering methods, and we've been looking— I have to say, it's really necessitated that we look outside of our typical colleagues into like engineering departments, people who have done large-scale network modeling as well. And so I think it's an open question and we are exploring a whole range of them. So I do apologize. There's no straightforward answer.
UMA VAIDYANATHAN: The other thing that was very interesting to note about your data was that the self-reported didn't always quite jive with physiological reactivity. What implications did that have in general, do you think, on things like diagnosis, or, you know, which is all based mostly on self-report, for better or worse.
LISA McTEAGUE: I certainly wouldn't necessarily say that this calls into question the validity of the diagnosis, but I do think it means that we need to keep in mind what intervention means. When we asked them, for example, subjective units of distress during exposure to therapy, if it's completely inconsistent with what we're seeing in their physiology, are we actually tapping into the mechanisms that we believe are essential for symptom remediation. So I think that, you know, of course, ideally, I am psychophysiologist, I'd like to see these methods being used more also in the context of intervention.
UMA VAIDYANATHAN: Okay. Good. And one last question before we move on. Did you explore any racial or ethnic differences in your data?
LISA McTEAGUE: The sample is predominantly Caucasian, consistent with the demographics in Gainesville, Florida. Interestingly, SES does, in fact, covary along with this defensive diminution and discordance, and so the weight of affective psychopathology, but also cumulative life stress and overall deprivation in the environment seems also to track with this.
UMA VAIDYANATHAN: Great. Thank you again for the great talk. It was really an awesome job. Meredith, I think you're up now.
MEREDITH WALLACE: Lisa, thank you so much for that really fabulous introduction into my talk. I am a biostatistician by training, and I currently am an assistant professor in the psychiatry department at the University of Pittsburgh, and my research focuses primarily on developing and applying statistical methods for clustering, and I've been focusing a lot on methods for clustering within the RDoC framework. Just so everyone is on the same page, I just wanted to take a brief minute to talk about what clustering is. Clustering is a method that can be used to reveal subgroups of individuals with similar characteristics, and you can think of these subgroups as being separated by natural boundaries. For example, if everyone watching this webinar could type into a database the number of minutes it took them to fall asleep last night, and how many minutes they were awake after they first fell asleep, I could take that database and then use clustering methods to try to determine how many subgroups or clusters there might be within this sample. So, for example, I might find two clusters and maybe one of these clusters or subgroups would generally fall asleep very quickly, but then they're awake a lot in the middle of the night, and maybe the other cluster takes a long time to fall asleep but then once they're asleep they don't wake up at all. So that would be an example of using clustering to find subgroups of individuals with similar sleep characteristics. Clustering methods are really relevant for RDoC because within a sample you can use them to determine whether there really is a continuum of signs and symptoms, or whether there are actually more discrete subgroups within that sample. And if you're only looking at something like self-report, hopefully you might find subgroups that are similar to our existing DSM diagnoses. But the nice thing about RDoC is that it does encourage researchers to look at multiple different levels of information, so using clustering you could try to find new subgroups that are based on all these different types of information. This could be really informative, especially if you could relate these subgroups to relevant outcomes. This might help you to generate hypotheses about underlying disease mechanisms and maybe treatments that you could develop and then target to individuals matching the characteristics of each subgroup. There are a lot of different clustering methods out there. I have been focusing primarily on mixture modeling. Mixture modeling is a clustering methods that's based on a likelihood, and because of that it does come with underlying distributional assumptions, and the most common assumption that people make is that their clusters are normally distributed. The nice thing about it being based on this likelihood is that it's easier to compare models, and, in my opinion, it's easier to select the number of clusters in your sample which you wouldn't know ahead of time. In order to demonstrate and develop some of these clustering methods, as you might guess, based on my first example, I've been working with a lot of sleep data, and, in particular, I've been using the AgeWise data set. And the sample I'll be talking about today from AgeWise is 216 older adults, with and without insomnia, and on these older adults we identified 70 characteristics that may be relevant, and these characteristics were captured through self-reported sleep diary; actigraphy, which is a behavioral measure of sleep; and also polysomnography, which is a physiological measure of sleep. With these data, we wanted to use clustering to reveal potentially interesting subgroups that might be based on all of these different data types, and then look and see how those subgroups might relate to our a priori self-reported insomnia diagnoses. As Lisa alluded to, clustering can be a really frustrating methodology to use. There are a lot of challenges that come along with it, and when you have RDoC data, these challenges are only enhanced. One of the things that I think make RDoC data especially challenging for clustering is that the data are often very highly skewed. So as an investigator, it is really important to think about whether you believe that these variables would still be skewed, even in an extremely homogenous subsample, or, alternatively, whether you think that skewness that you observe is actually a result of multiple normally distributed subsamples. So to give you a little illustration, here we have a scatter plot. It's minutes to fall asleep versus minutes awake after sleep onset, and you can see that these two variables are highly skewed. If we want to assume that this skewness is caused, or results from a series of normally distributed clusters, we might see something like this after we fit our clustering model. So you can see that we have three clusters here, the red, the blue, and the green, and the skewness in this full sample is explained by having these three normally distributed clusters with successively increasingly large amounts of variability. However, when I see a result like this, I think that that's just not a great representation of what's really going on and what these underlying subgroups might really be in this sample, because, at least in my experience, no matter how homogenous the subgroup, these two variables are always highly skewed, which might lead me to believe that, in fact, the underlying data generation process itself is skewed. In this case, we can use a mixture model that's based on a skewed distribution —here, this is based on the skew normal distribution—and if we allow for a skewed distribution in our clustering model, we may actually see that this sample is actually one continuous and skewed sample, and that there are no actual discrete subgroups within the sample. So in addition to having to deal with the skewed data that you get with RDoC, another issue is that, based on the nature of RDoC which asks us to capture data across multiple different units of analysis, there are just a lot of potential clustering variables that you could use, and it's really not clear ahead of time which subsets of those clustering variables are actually going to be useful for clustering. And certainly it's also possible that depending on the specific subset of clustering variables you use, you may reveal different, but equally statistically plausible subgroups. So because of this, I do think it's important to use something like a variable selection algorithm, or dimension reduction, but as I'll be talking about later, there are also a lot of challenges with that as well. And finally, you can use the fanciest statistical model out there, but one frustrating thing about clustering is that just because you get a solution doesn't at all mean that it will be clinically useful and meaningful, to the extent to which it's actually related to something that you care about, or that it teaches you something new. I've been working to develop some solutions for these challenges to clustering, and first and foremost I have been working to really promote the use of skewed mixture model distributions. These currently exist. You can use them. They're available especially in a statistical program, R. However, unfortunately, I think I may be, like, the only person who's using them right now in this type of research. They're used in other areas, but I do hope that people can begin to consider these skewed mixture model distributions when they really do think that underlying clusters might be followed skewed distributions. The second thing I've been working on is to develop new variable selection algorithms, but specifically I want variable selection algorithms that are, themselves, based on underlying skewed distributions. And within that I developed two algorithms. The first one is used to reveal a set of variables for clustering, but, in particular, I want variables that are useful for skewed clustering, and this particular algorithm completely ignores the data type. So for my AgeWise example, you could use this algorithm on all 70 variables to pick a subset that's useful for skewed clustering, or you could also apply this algorithm within the self-report variables, within the actigraphy variables, within the polysomnography variables, to see what different clustering solutions arise, and then you could compare across. That could provide you with some really interesting information about the heterogeneity in your sample, and how that heterogeneity changes, depending on the instrument you're using to collect data. The second algorithm that I've developed, I like to think is a little more clinically intuitive, and this algorithm is based on the idea that you could have multiple statistically plausible sets of clustering variables within your array of all variables you're considering, and furthermore, I want to have a set of variables that incorporate at least one of all the data types that I'm interested in. So, in my case, with the AgeWise data, it would be at least one self-report, actigraphy, and polysomnography variable. This slide shows that when I applied the first algorithm that completely ignores data type, it identified three polysomnography variables. And I think this is really important to highlight because, anecdotally, what I have found is that individuals tend to cluster better within a type of data, rather than across a type of data. So this algorithm did select just three polysomnography variables, even though it could have selected other data types as well. Using these three variables I then fit a skewed clustering model, and we see that we identified four clusters, essentially based on the amount of delta sleep that were getting, and these clusters were completely unrelated to the self-reported insomnia diagnosis. This slide shows the variables that were selected when I used the second algorithm that I discussed, the one that actually considers data type and forces one of each data type to be used in the clustering model. Here there were five variables that were selected, and what I want to highlight here is that three of those five variables were sleep latency or minutes to fall asleep, but that was based on both self-report, actigraphy, and polysomnography. So I feel like the fact that all three of these together, all three of these sleep latency variables are being used to reveal the heterogeneity in the sample, and to identify these subgroups, to me it suggests that there may be something there with this feature of sleep, and that this may be something to really focus on in future research, when we're trying to further investigate disease mechanisms and novel treatments that we could develop. In terms of future work, I am hoping to make the code that I wrote in order to do these algorithms, I hope that I can make it available, but we're not quite there yet. What I've done so far is really demonstrate and develop these methods, but the next step is to really apply them to a larger data set, maybe one that's even more directly relevant to RDoC, compare subgroups identified on various relevant outcomes, and, most importantly, to validate the findings that I get. Clustering is extremely exploratory so it is really important to be able to have one development sample and then hopefully a validation sample, so you can see that those cluster are relating.
UMA VAIDYANATHAN: Thank you, Meredith. That was a great presentation. You walked us through some really sort of complicated concepts in a really nice manner, honestly. So, yes, we have a number of questions for you that are lining up here. Some of them related to features of data, such as would using log normal distributions or transforming your data in some way discourage some sort of help, and also, relatedly, how do you know when to use a normal distribution versus a skewed distribution?
MEREDITH WALLACE: Those are excellent questions. First, the question of whether or not to transform your data. Quite frankly, I think there are probably a lot of different opinions and views on this. I can give you my personal opinion. When you're doing clustering, the idea is that you want to understand the heterogeneity in your sample. When you do something like a log transformation, you are changing the heterogeneity in your sample. A log transformation, by its nature, it does different things to low variables, observations less than 1, than it does to observations greater than 1. So those types of transformations, they do change the heterogeneity. If there's a reason that you really believe that the transformed data are more meaningful than the original data, and that you think that clusters reveal based on those transformed data would be meaningful, by all means do this, but you do have to be aware that those transformations do change the heterogeneity. And then, with the question of how do you know if you should use a normal distribution or a skewed distribution, I mean, I really simplified it and I just talked about skewed distributions, but there are tons of different asymmetric and skewed distributions out there. The nice thing about mixture modeling is that you can compare model fits. So with a given set of variables, you could fit a mixture model based on a normal distribution. You could fit it based on a T distribution. You could fit it based on a skewed normal, a skewed T, a Gaussian, I mean, all sorts of different things, and you could go and you could compare those BICs to see which one fits best. And you could also, honestly, look at how many clusters do they each indicate. Are the solutions actually meaningful? What do they tell us? Do any of them tell us new things? If there's actually one important take-home message I hope people can get from this, it is that clustering is really very exploratory, and I think for that reason it would be better, in my opinion, to just accept that exploratory nature of it, be honest and be up front about it, that the purpose of clustering is to generate hypotheses. You're not trying to solve any problems here. You're just trying to generate hypotheses and to use it in that exploratory nature.
UMA VAIDYANATHAN: One last question because we're at 12:56 and we're supposed to end at 1, so just comment on this briefly. People are asking questions about, what about using something like a latent class analysis, or a factor mixture model? What are your thoughts on those?
MEREDITH WALLACE: Yeah, those are good options too. Thinking about a type of latent class analysis, I think of that being more about grouping variables together versus grouping people together, so that may be something to think about there in terms of what do you want to do.
UMA VAIDYANATHAN: All right. I hate to cut you off here but we're at 12:57 so I want to make sure we end at the right time. So if you want to go ahead and stop sharing your slides at this point, we'll have all our presenters back on screen. Thank you guys for some wonderful presentations today. You know, what kind of struck me, sort of as common themes among your presentations was that sort of the variety and samples you managed to recruit— you know, one of the questions we get a lot, often about RDoC, is "Well, that's great, you want to look at heterogeneity using all these disorders, cut across disorders and all that stuff, but practically speaking, recruiting people for studies like that is difficult. I mean, do you just take anybody who comes?" You know, that kind of thing, like a take-all comers approach. I think, Aristotle and Lisa, both of you showed that that's not necessarily the case. It depends on what you're testing, basically. Your hypotheses and sort of the spectrum of disorders. Lisa, obviously you're concerned more with fear and anxiety so you didn't include a patient with schizophrenia, whereas Aristotle it was the converse. I'm kind of answering my own question here but it's good to see that it's not that hard, in a way. Do you guys have any thoughts to add to that? All of you are muted right now, just FYI.
ARISTOTLE VOINESKOS: I could add one thing. One of the things we were doing at each site is every 6 months reviewing our recruitment and reviewing the constructs that we're interested in measuring, to make sure that we have a good range, and then if we need to we can alter our recruitment strategies as necessary. The other thing which I didn't mention, actually, which maybe is unrelated, is even within the schizophrenia spectrum, which sounds a little more limited, I mean, there could be pretty significant differences between first-episode and chronic patients, so our study is focusing on people more at the onset of the illness, and I think that's what the NIMH requested us to do so that's what we're aiming to do as well.
MEREDITH WALLACE: From a clustering standpoint, I do think that the clusters you identify are really only as helpful as your sample is generalizable. So if you put together a sample but it's totally unrepresentative of anything that you might observe out in the population, you can certainly still find clusters there but you have to be aware of what is the sample of which you're explaining the heterogeneity, whereas if you did do that all-comers approach, you might be limited in some of the things that you're doing and in some of the methods that Lisa and Aristotle were talking about. But with the all-comers approach, if you did clustering, that might be sort of a more natural indicator of what the real underlying phenotypes were in that area.
UMA VAIDYANATHAN: That's a very good point. So generally for any kind of statistics use, not just for clustering, any kind of method, you just want to have as much sort of variation as possible, because all our statistics rely on variance anyway. Just a couple more comments before we end for the day. Folks who are logged in, please do check our RDoC website regularly, especially the section called Funding Option, that you probably will be interested in. There is an RO3 that's currently active for secondary data analysis where you can apply and do studies along the lines of all our presenters today. So do check our website, and also do check out our newly revamped RDoC matrix. It is really cool. It's way better than the old one. So just go to Google and type in "RDoC matrix." It's the first hit you'll see on Google. So click on that, feel free to click around. It's got a more Wikipedia-like structure now. Anyway, that's it for today so thank you so much for your time. Thank you to all our presenters, and we hope you found the webinar very useful. And feel free to contact us with any questions or comments you may have at rdocadmin@mail.NIH.gov. Again, that's rdocamin@mail.NIH.gov. Thanks again, everybody. Have a great day. Bye, all.