Workshop on Advanced Statistical Methods and Dynamic Data Visualizations for Mental Health Studies: Day Two
Workshop on Advanced Statistical Methods and Dynamic Data Visualizations for Mental Health Studies: Day Two
DR. FERRANTE: Good morning, everyone. So I work at the National Institute of Mental Health, and I am the program director for the Computational Neuroscience program and the Computational Psychiatry program. One is in the Division of Basic Science and Basic Neuroscience, and the other one is in the Division of Translational Neuroscience.
This workshop is a two-part, the first one happened two days ago, and was run by my colleague, Abera Wouhib, but the main goal of this workshop is to showcase dynamic interactive visualization for neurobehavioral data. We're going to have a packed agenda.
The main things that we're going to go through is showing all of you how we can use these figures in papers and in grants for, to actually submit, to have more dynamic interactions with both the figures interactive with the readers, and the figures interactive with each other. And showcase dense and multidimensional trajectories for neurobehavioral data, figures that can show exploration of model parameters, or deep phenotyping of mental health cohorts. The idea would be to identify potential use cases and gaps for these tools in mental health, both in basic and translational and in services and intervention research.
I'm not sure if our institute director is online right now, so I will introduce him probably at the end of the talks.
It's the 10-year anniversary for RDoC, and one of the things that I've been recently asked was to reimagine RDoC through the use of this new data science technology. The main goal of this presentation, of this workshop, is basically to move our figures to what's on the top, which is a figure that requires a lot of labels and requires a lot of explanations, to the figure on the bottom.
The figure on the bottom is more dynamic and captures all phenomenon on a much clearer way. In order to do this, a couple of things would be necessary. One is to paint the current framework for RDoC in a way that actually, at the moment, if you are familiar with RDoC, it's pretty much the format of an Excel spreadsheet, where there is on left side of the domains of function, and on the top, there are units of analysis.
It also encompasses such as genes, molecules, circuits. It also encompasses concepts as neurodevelopment and environment. In order for us to make this change, we require two concepts, basically the addition of time and the addition of function, function that would be able to connect things in the matrix.
Why this is necessary. So NIMH is spending a lot of money and resources and time on collecting more dense, multidimensional data that are continuous or longitudinal. It's kind of like an interesting concept that we collect this data, and still we are actually showing them in a static way.
Why this is so important. It's only about the way the data looks? Not really. It's also about the fact that we actually can have a true impact. This is for instance a success story from NHLBI. It would be great to have some example like this in a couple of years probably at NIMH, where we could actually build parametric models for risk predictions, and those models can actually lead to dynamic decisions that can, that clinicians can use in their office.
In order to make this change, we need to understand something that is obvious, but is almost never represented in our articles. The path that actually our measures paint over time. For instance, if we give a task, take task, today, we might get a score, the day after we might get another score. If these scores change over time, the matrix that I showed you before that was previously an Excel spreadsheet can change to something like this.
So why this is a better version, in my view, of the current matrix? Because the arches here, each one of these arches, actually is quantitative. So it can represent whatever quantity we want to represent of RDoC. It can be task scores, it can be bibliometric information, dollar spends, amount of grants funded.
It could amount to everyone, because that's all the information that we have about that knowledge. And we can expand and collapse the arches for us to actually reveal all the different concepts in the matrix and explore more in-depth. By design, there are empty spaces here that show all the concepts that were not previously represented. They are not currently included in the matrix, almost like saying RDoC-like constructs, they are not being explored or they are not still yet in the matrix, can actually be inserted in these empty spaces.
This is like, as I said, it's quantitative, but it's not relational. So in order to make it relational, we actually need to build the connections between these constructs. And you can imagine that each one of these lines represent a theory. There is reinforceable learning, for instance, or diffusion drift models, that can connect the different constructs in the matrix, and allow us to make predictions and classifications. So what this framework would change would basically allow us to add a quantitative relational view of the matrix.
I should say that all of these examples are all made up data, so none of this is – it just like examples of how we use these tools if we had them available for our publication.
Another interesting type of visualization could be like this is like a representation of a real-world evidence for RDoC. Imagine that we have different types of interventions, and over time, these interventions are actually our U.S. population is switching from one intervention to the next, is staying with the same intervention. So adding this -- and this can function of new interventions get discovered, the policy change, trends in our mental health practitioners. So it would be more interesting if something like this could be actually used as primary figure in an article, for instance, or when you submit grants.
As I showed at the beginning, these are population-level data, but the same type of representation can be used at the individual level where in this case, we're tracking how individual dots or individual people switch from one activity to the next during the day, and you can imagine that this would enable next-generation, for instance, clinical trials, in which you actually give an intervention that you can see at the large population level, how the activity of those subjects during the day change, with things like sleep, leisure, activity, personal care, and things that are like more or less related to mental health concepts.
Another concept that I found particularly useful for this dynamic data visualization is the ability to connect figures within the paper. Imagine that you have something that is at the individual level, here on top, and something that is at the population level. And you want to see, while you read the paper, where the individual-level data maps into the population level. This kind of tool will allow you to actually connect and show for instance for each one of the individual level, what is the strength and the significance of that correlation, and it could allow you also to go from the individual to the population level, and see where at the individual level that group is represented.
Obviously, this would be a big step for us in the way that actually would enable our readers to understand our data. That said, most of our data still in mental health is focused on the first top two levels of analysis, which is behavior and brain signals, most of the time. Obviously, there is a lot more that could be studied and be modeled in relation to mental health. The thing that is needs to be understood is the fact that actually although all these descriptions might be very mechanistic, they obviously end up affecting things that are at a much higher level -- so, population-level changes that I was describing before when I was describing transitions between different interventions.
So how do you do this? The way that you would do this is like you collect the data, and you make mathematical functions grow from those data collections. Once you have those mathematical functions for each one of the domain or function of interest -- so, imagine in this case, this is my recording for an RDoC measurement, and it tends, like I record myself for one month, and I happen to be bimodal, whether the way that my function is distributed for a specific mental health function, let's say, executive function.
The reasons why something like this would be interesting is because the moment that you start collecting large data, you can actually in the paper itself start exploring, like if I score in this range, what are the people that score like me scoring in other range or domain of function? So this is positive valence, this is negative valence, this is sensorimotor. Knowing the score in one range allows you to predict how people like you would score in other range.
So this is while I'm reading the paper. So that would be like from my perspective a way to actually for the readers, for us to discover a lot more while we're reading the paper.
Another good representation is the idea of most of our measures actually look like this; they go up and down in a random, pseudorandom way, over time. But the idea is it would be nice to actually identify dimensions that can explain that chaotic rhythm with something that actually can be modeled. The reasons why something like this would be interesting in a paper is basically imagine that in a paper you have a text box like this, this is the Lorenz attractor, it's a three-parameter model, you can imagine who that would be interesting, but like I would like to see how this model would change while I change one of these three parameters. You can actually in the paper itself input the other parameter and see how the Lorenz attractor would change, and how the patient population would ideally change their dynamics in this multidimensional space.
It would be interesting to give patients more complex type of stimuli, and we will have several examples of this, such as movies and real-life type of stimuli that might be labeled with different types of, in this case, colors represents different domain of RDoC function. You can imagine that there is -- you give Forrest Gump, there are scenes that are social, scenes that are positive valence, scenes that are negative valence.
Once you run different people against the matched population, you might see that there are significant difference in the way that, like, in the parameters of this model, and this difference can be represented with graph theory, and the graph would actually suggest where the next generation circuit intervention could be acted.
Of course, we don't need the brain. You can actually do this completely via behavior. And you could actually see for instance in this case, we are recording negative valence and digital activity in mental health patients in a city, like all of this is made up data, but just to give a sense of how this tool can be used.
You could see, for instance, the production and emergence and consumption of this type of activity, and you can actually see how this positive valence or negative valence propagates, both at the local level and the global level. It doesn't need to be, obviously, geography, but just to give a sense. Like it can be mapped in a space.
These type of graphs can be also particularly useful in case we want to portray multiscale phenomena, in which something is happening at a very early time phase, and very refined time and ends up affecting largescale developmental processes. So in this case, I'm showing a phenomenon that ends up happening around the developmental time in early lifetime and ends up affecting the developmental, the specific domain of function. And you can imagine like if you catch that switch early on, you would be able to intervene in the plastic window and affect the outcome.
Similar thing here, like I have a few RDoC domain of functions as dimensions, and I'm trying to connect here a biomarker that in this case represented by the sides of the dot that is improving after treatment, and you can see that over time here that's passing by, this dot's improving function, and the patients move from a state of poor function here on the lower left to a state of higher function here on the top right.
Another thing that is often represented where we're finally starting to collect longitudinal data and in large sample size, but one thing that I think that is often missing is the fact that we're not able to disentangle developmental changes from generational effects. So you could imagine that we're studying, for instance, ABCD right now, and we're starting with a population that started maybe at 9 years old, and we followed them until they become 20 years old.
In that case, you will not be able to disentangle what is actually developmental effect from what is a generational effect, and the reason for that is because we are not continuing to recruit patients as the time moves. If we are able to something like this, we can actually show that the phenomenon that we can see, like for instance this switch between biotype A and biotype B is moving over time, which suggests that this is actually a generational effect rather than a developmental effect.
I introduced a few concepts here regarding RDoC and the way that it could be represented, and it's really an exciting field for me, and I would be happy to take any questions if there are in the chat.
And if not, I would like to take a minute to introduce our institute director. Josh Gordon is the director of the National Institute of Mental Health. He oversees a large portion of grants in our space, mental health is the leading institute, and he's been a tremendous supporter of computational approaches. Josh?
DR. GORDON: Thanks, Michele, for what really was a thorough introduction to the day, and to everyone for joining us today. I was meant to give these remarks before Michele's talk, but I apologize; I was delayed by another meeting running over. But I really enjoyed watching, and literally watching, his explication of what we're doing here today.
Let me just briefly add that I think, number one, there's a lot of interest in this area. We have an international audience, as well as international speakers, so I'll say good morning, good afternoon, good evening, and maybe even good night for those of you who might be in the wee hours.
And second, I wanted to say that, as Michele suggested, the fact that we have opportunities now to really change how we present scientific data with the availability of web-based publishing methods, I think it's really exciting to imagine what we might be able to learn by visualizations such as those that Michele showed just now, and those that we'll be visualizing the rest of this meeting.
With that, I'll turn it back over to Michele and have him introduce the speakers and get us on our way. Thank you.
DR. FERRANTE: Thanks, Josh, for the kind introduction. It is my pleasure to introduce Janice Chen. Janice is an assistant professor at the department of psychology and brain science at Johns Hopkins University. She completed her bachelor at MIT, her PhD at Stanford, and her postdoc at Princeton. She investigates neural systems, processes and representation underlining real-world episodic memory as it manifests in naturalistic settings. She uses dynamic stimuli, such as audiovisual movies and interactive stories, and daily life activities, such as spoken conversation and web browsing. She is one of the four members of the organizing committee, that did this for dynamical data visualization.
Janice, please take it away.
DR. JANICE CHEN: Thank you, Michele. I am looking forward to a great day of talks and tutorials today. This is a really exciting workshop. As Michele said, I'm going to be talking about brain dynamics during movies and naturalistic stimuli.
I work on human memory and perception using behavioral and brain imaging methods, mainly functional magnetic resonance imaging, and today I'll be talking about how and why we use audiovisual movies and spontaneous speech to study the brain and how we depend on dynamic visualization for understanding and communicating our findings.
So why do we use audiovisual movies and spontaneous speech to study the brain? Well, for decades now, most of brain imaging research in humans has relied on very simplified stimuli in randomized trials. These are very minimal, stripped-down experimental paradigms that we try to make clean and elegant, and scientists have to do this in order to control confounds and isolate specific variables that they want to study.
But what we learn from simplified experiments is supposed to remain true in the real world, too, where things are complicated and messy. We always hope that our findings will scale up to apply to situations that people encounter in their normal lives, but we're often not sure that they will. And that's why we sometimes have to do naturalistic experiments, where we try to create a laboratory experience that's less controlled but much closer to normal life. And what we observe here could be surprising and it could take us to places that we wouldn't have gone otherwise.
I think that clean, elegant, scientific experiments are extremely important. There are some things that we can't learn without them. But I advocate for both of these approaches, and I hope that they can inform each other.
So how do we study realistic experience in people who are trying to hold very still? We want to scan people's brains while they're having naturalistic experiences and remembering them, and that means that they won't be able to move. So we emulate real-world input using a movie, and we elicit realistic behavior by asking people to speak freely about what they've seen. Cinematic movies have animated images and sounds, and they depict real environments and social situations and emotions. Because our stimuli and our behaviors are dynamic, and the brain response is dynamic, oftentimes our understanding of the data benefits from analyses and visualizations that capture those properties.
It turns out that when different people watch the same movie, their brain responses become synchronized, and this is a very useful tool for studying the brain. Here's what I mean by synchronized. Look at just one region of the brain. I'm circling early auditory cortex here. Auditory cortex produces a complex response time course while the movie is playing, driven by the volume and different sound textures in the movie audio. And that complex response time course is the same across different people, because everyone is hearing the same sound stream. If people are just lying in the scanner listening to nothing, you don't see synchrony across people in auditory cortex.
Each part of the brain has its own response time course, depending what features of the movie stimulus it cares about, and if you calculate this correlation across people for every part of the brain, you can make a map of synchrony across people. The correlation map is a static visualization of the synchronization across people's brains. It's a useful visualization.
To see the brain activity that gives rise to those correlations, we can look at this animation of the raw fMRI signal. Each of these two images shows the average brain activity of two independent groups of people as they all watched the same movie, and in some regions, like here in visual cortex, you can clearly see how activity is mirrored across the two groups. You can see waves of activity moving and changing at the same in the same direction. And if I slow this down and I break it into different movie scenes, you can see how each frame is different from the next, but similar across the two groups of people.
In this kind of intersubject analysis, one person's brain activity is the model for another person's brain activity. You don't know exactly what in the stimulus is driving the brain response, but you know how reliable the response is. So something in the stimulus evoked brain activity in this region. Not necessarily a pulse of activity, but a consistent response. And this is how you examine what kinds of stimuli drive responses in any given brain region.
In these kinds of experiments we find that a nonvisual stimulus does not drive visual cortex reliably, and nonauditory stimulus doesn't drive auditory cortex reliably, and so on. So we probe the function of less-understood brain regions by asking what conditions cause them to respond reliably, and the reverse, and trying to build models for their activity from the natural stimuli themselves or from the produced behavior.
The behavior that people produce in studies like this is often their speech. After people watch a movie, for example, we could ask them to then talk about what they remember. And by collecting brain data as people spontaneously speak about their memories, we can study how and where brain patterns appear during recollection. We can also examine the trajectories of the words themselves, and Jeremy is going to talk about this in more detail in a couple of minutes.
Using movies and stories and spontaneous speech has led scientists to a really exciting suite of findings about the human brain. I'm listing a bunch of topic areas here. This is not a comprehensive list, and I sincerely apologize to many people if excellent papers that are not on this slide. I focused more on recent papers. these are just some pointers for the audience that you can find out more about each topic if you're interested.
Lastly, the naturalistic experiment community is very enthusiastic about open data, and I want to highlight this beautiful recently released public fMRI dataset, spearheaded by Sam Nastase. These data come from hundreds of subjects listening to hours and hours of auditory stories, and it's all free to download. In the spirit of today's workshop, you can try this at home.
Thanks to my wonderful lab members and to all of you for listening, and I'm looking forward to a really fantastic day of talks and tutorials. I'm going to introduce the next speaker. I think we're a little bit ahead of schedule, so we could take a question or two, right?
DR. FERRANTE: I can answer mine. From Chris Pawnee(Ph.), these are great tools; to what extent do you see visualization tools as supporting, one, engagement and understanding with other scientists, and engagement and understanding with the broader public.
The idea is really actually to try to do both. We want to bring both other scientists that are adjacent to our field, not just for instance, like computer scientists, data scientists, but also from other disciplines like physics and so forth. And also, we want to actually not only really the broader public, but also our patient populations and our clinicians, to actually familiarize themselves with this tool, because they think they're are going to be more and more pervasive as we move forward, so it's a way to intuitively understand our data and our mechanisms, and it would be great to jump on this as soon as possible, because that would basically allow us to have more people thinking about our problems and issues, both in mental health and in the behavioral neuroscience.
DR. CHEN: I can answer the next question here. How much variability is there in the duration of free-speech epochs, given fMRI time constraints and expenses, how can we ensure that free speech is long enough but not too long?
This is a great question. There's a lot of variability, because these are people doing what comes naturally to them. I think the answer here is piloting. So if you pilot and develop your instructions and your paradigm in such a way that people know what level of detail that you are expecting and how long you want them to talk for, people are perfectly capable of generating a pretty large amount of recall. I typically find that for a movie, you'll get the length of time that people spend describing the movie will be anywhere from a half to 100 percent of the time that they spent watching it.
I'm going to go on to introduce our next speaker. Our next speaker is one of the co-organizers of this workshop, Manish Saggar. He's an assistant professor in the department of psychiatry and behavioral sciences and the Institute of Design at Stanford University, where he works on a variety of cool topics, including the resting brain, creativity, and meditation. Please join me in welcoming Dr. Manish Saggar.
DR. SAGGAR: Thanks so much, Janice. Thanks so much for having me. Thanks to the NIMH staff, especially Michele, for this opportunity.
Today, I will focus on the data visualization methods that we have developed in our lab in the last couple of years, to essentially create representations for brain activity dynamics, during both task and resting state paradigms.
Here's a broad mission statement for our lab: we want to create methods that can distill high-dimensional data into simple yet vibrant and clinically relevant representations that can be interactively explored to discover new aspects of the data without necessarily averaging data across space, time, or people at the outset.
It sounds very wordy but the main idea is to somehow create an as accurate as possible or as useful as possible, rather, representation of data that can be interacted with. The reason I say without averaging data at the outset is because a lot of times just to increase the signal-to-noise ratio in our data, especially in the neuroimaging world, we sometimes, especially using traditional approaches, end up collapsing data either across people, creating group stats, or across time, looking at the entire scan, or across brain regions. But essentially that kind of averaging at the outset, we could miss out by doing this kind of approach, miss critical or crucial insights that we could eventually study if it's a data-driven approach where we let the data tell us where to look for.
So that's our goal, how to create such representations, and to do that one of the tools we have borrowed from the subfield of algebraic topology, it's called topological data analysis. This TDA has been developed here at Stanford by me and by Gunnar Carlsson, one of my co-mentors for the K99, where the main idea is to somehow learn the shape of the data.
Just to give an example, if you have this 2D sample, two-dimensional sample of a bunch of points that to a human would instantly look like, you look like you're sampling from a circle or something like a loop. To a machine it's not as easy, and the TDA allows us to essentially create insights about what the shape might be. Here the insight is that it's a loop. The hope is once we get some aspect, some understanding of the actual shape of the data in its original dimensional space, then we can create much better insights about what might be happening under the hood.
We have applied this topological data analysis, TDA-based approach, to a lunch of neuroimaging applications. In the interest of time today, I'll probably only have enough time to talk about the evoked transitions during task-fMRI data. But we have also looked at intrinsic or resting state activity. We are also currently looking at during social communication when people interact with each other, like in a live hyper-scanning environment. How does the combined space evolve, the neutral space, or the dynamical landscape evolves? We're also looking at some decoding approaches, and Emily Finn, the speaker after me, might show you some results from there.
And obviously down the line the plan is to apply these methods to examine changes in landscapes, during interventions or during psychiatric disorders, so we can get a better idea hopefully -- the hope is to anchor psychiatric methodology eventually the hope is to anchor the psychiatric methodology in the spatial-temporal-dynamical-biological features.
Here's a rough outline, pipeline, for the entire method. Obviously I don't have time to take you through the entire pipeline, but the goal is you take the same case fMRI data, four-dimensional, in its entirety, so it's all thousands of voxels space and across time, and come with a representation that as faithfully as possible represents this high-dimensional data as a low-dimensional -- as a graph, where the nodes represent whole-brain activity patterns that are connected if the activity is similar for some definition of similarity.
One of the highlights of TDA-based approach is that it tries to, using some of these filtering and partial clustering steps, it tries to reduce the information loss that usually occurs with dimensionality reduction. And the second good thing about the TDA-based approach is that we get this graph at the end of the day, rather than getting a bunch of point clouds, as you might get in, let's say, TSNI(?) or PSNI(?) or some other approach. A graph allows you to essentially apply kinds of network science measures and it's robust, much more robust.
Just a quick example of how we applied this to fMRI data. This data was collected in Peter Bandettini's lab, where they essentially had this continuous multi-task paradigm, where one after another a participant went through task logs like resting state for three minutes, then do a three-back or two-back working memory task, and watch a video of a Nemo fish in the aquarium -- looks more like resting state -- and then math, where they just do some math operants.
The reason we picked this particular data is because it provides us ground truth about transitions and times, so we know if our method can pick up those transitions accurately with high fidelity or not. And secondly, we have behavior for each block, so we can predict performance based on the measures we extract.
The other thing that I may have forgotten to mention is all these methods tend to create representations at the single participant level first to avoid the collapsing of data across people, as well. So here's an example of four-dimensional data again, it's about 1,000 timepoints, and when you pass this data through this TDA-based method approach, we get this kind of graph, and it's a grayscale graph, but we have the meta information at every node, whatever brain volumes become part of that node due to similarity, we know in time what task people are doing.
So we can color these nodes based on whether they're instructions -- the TRs are going from instructions or resting state or the other tasks, and we get a graph like this where it looks like a spider shot, but it has a core and has some peripheries and it turns out the peripheries are usually where the resting state lies, so, the mind wandering. And the core is where the very cognitive effort requiring tasks lie. So, working memory, math, and the idea behind the core is that your spatial representations across time are so similar, hence they end up being very close to each other, especially if you're doing the task correctly, whereas if you have this kind of mind-wandering resting state, then you kind of stumble around all over the place.
So we find these core-periphery structures and then we use this pie-chart-based visualization, again, not averaging the information within each node, but just to represent proportionality of how many timeframes are coming from each task. And the cool thing is when you apply this to a single person, you can look at, obviously, the individual differences, and here I'm showing you two participants at the extreme ends of the two behavioral representations, and it turns out one of them is the best performer and the other one is not actually worst performer -- and I would love to have done a quiz on this, but in the interest of time, very quickly -- this guy turned out to be the best performer, or this person, and this was the worst performer, and if you look in this graph, they have very high modularity, meaning whichever task this person was doing, the task-specific representations that they were engaged in, as opposed to this person, who was probably, maybe he or she was sleeping inside the scanner, but no matter what task they did, they had very similar spatial representation, and just taking this property of the landscape of the manifold of the graph, we can then associate it with behavioral variables.
So here's a rough demonstration of this tool. It's a web-based thing, so you crunch the data and then you can interact with the shape graphs or the graphs that pop out of the approach. This is just one subject's data, and I will just play the movie in a second, in this video, and then you will see here the spatial representation, the anatomical representation, that we can get at the single TR level.
Down at the bottom there's a slider, which is showing increasing numbers, which includes the TR and the highlighting nodes are showing you as the person is going through different tasks, it's resting state, it stays in the green node area. On the right, you can see activation, deactivation in the brain, in real-time. And now the person is moving on to the memory task, they will stay more in the core, apart from a few excursions.
A shout-out to Caleb, my postdoc, who hopefully soon will become a postdoc as well, who has taken this version and created a nice Python version for this toolbox; we're calling it DyNeuSR. So please check that out as well.
And looking forward, we want to go from not just representing these dynamic representations, but taking them a step forward to try to see if we can help design interventions to go from, let's say, this is so-called worst performer to best performer, but especially in clinical settings, this could be -- just making it up -- but this could be somebody with a lot of rumination and this could be a person who is jumping around thoughts much more fast.
So could be potentially attention deficiency, and how can we go from one to the next? And that's where we develop using biophysical network models to do that, and we're calling that tool DyNeuMo. We just love these names.
But thanks to lab members, postdocs Samir, Yinming, Hua Xie, Mengsen, and then grad students Caleb and Kaitlin, who have helped with this whole journey, and thanks to funding, of course. One of the tools has came out, as you can see, DyNeuSR, but the other two are coming soon. And I have a shameless plug for the TDA workshop that happening July 6, so if you're interested in this space, please join us.
With that, hopefully I'm still in time, I would like to introduce our next speaker. I don't know if there are any questions. It's my pleasure to introduce our next speaker, Dr. Emily Finn. Emily is an assistant professor in the Department of Psychological and Brain Sciences at Dartmouth. She completed her PhD from Yale, received her postdoctoral training from NIMH. Her current work focused on understanding individual variability in brain activity and behavior, especially as it relates to the appraisal of ambiguous information under naturalistic conditions.
With that, please welcome Dr. Finn.
DR. EMILY FINN: Thanks, Manish, and thanks again for the opportunity to be here. Also very excited for today. And actually both Janice and Manish did a really nice job of setting up and giving some background for some of the stuff which I'll talk about which is quite related to what they're working on. But my title today is individual differences in the appraisal of social information.
Some of the big questions that my lab is interested in are what makes us unique? How do our intrinsic traits bias our perceptions and judgments? And why might different people arrive at different interpretations of the same experience?
In particular, we've become really interested in combining neuroimaging with naturalistic stimuli, which Janice so nicely introduced, where you have people, for example, watch a movie or listen to a story in the scanner, and then we relate differences in people's ultimate interpretations of those stimuli to patterns of brain activity as they're experiencing the stimulus. Because these are continuous time series data and they're very rich and high-dimensional, dynamic data visualizations can really help us get a handle on what's going on.
I should say that my lab is still in the exploratory phase of seeing how these tools can help us, but we're seeing some very interesting findings, so I'll just give you a couple of snapshots of where we are applying these tools.
In a first example, this was a study that we did a couple of years ago now, where we actually created an original fictional narrative that was designed to stratify people along an axis of trait paranoia. The story described an ambiguous social scenario, and the idea was if you are someone who is more naturally paranoid, if you are higher in trait paranoia, you might get a suspicious or a nefarious read on the events going on in the story, whereas if you were less paranoid you might have a neutral or even a positive impression of the story.
We also had two additional conditions on the scanner, and this is all fMRI data, by the way. A resting state run, and a run with just an abstract visual stimulus. So these graphs here, which were created in collaboration with Manish Saggar, who just nicely introduced his tool here, these are basically low-dimensional topological embeddings of whole-brain activity from two individual subjects, as they're undergoing these different conditions. The resting state run in dark purple here, and then this sort of abstract visual stimulus in blue, and then the story is split into three parts, so dark green, pink, and beige, and then another resting state run after the story.
Again, everyone is getting the exact same perceptual input in the form of this story, but the idea is that they might be forming different impressions of this as they're listening. So this subject on the left has high trait paranoia, and you can see this kind of loopy structure, so to speak, in their graph where certain events in this story might trigger some excursions out into these parts of the graph, and I can actually play this movie here, hopefully. I can scrub through until we hit the story.
This is where the story starts, and hopefully people can see this, where they are in the graph is highlighted in white on the nodes, and so they're entering the story now, in this first part, and you can see they have this excursion out into this loop, and then come back to the middle, and then in the second part of the story they do this even larger loop out to the outside of the graph, and then finally, in the third and final part of the story, in beige, they're kind of going off into this other part of this space. Even in this subject, you can see that the two resting state runs in dark purple and orange are quite different. So their rest, where they are in their graph is different before and after they've listened to the story.
We can qualitatively at least for the moment contrast that with this low-paranoia participant over here on the right, and you can just see by looking at this that this person's graph is much more clustered here. Actually their biggest excursion comes during the -- not the story itself, but this abstract visual stimulation that they also underwent. And if I play this visualization here, and just scrub along for reasons of time, you can see this is now when the story is starting, they're kind of in this middle part. You can kind of see the dark green and the pink and the beige corresponding to the three parts of the story, but again it's all sort of more clustered together. They don't seem to be having these excursions that, again, could be triggered by particular suspicious events that were inserted into the story.
We're still working out the best way to quantify this and what this might mean, but I think being able to start with some exploratory data visualization like this is becoming an increasingly important part of our workflow.
This is another example from a different dataset. This is a large dataset of people watching simple geometric shape animations. So these are sort of a classic stimulus in psychology, often referred to as the Heider and Simmel animations, and basically, the task here is just to decide if these -- I'll play this movie in a second -- this is the actual stimulus that people saw. But the task they were given was a simple one and it was just to decide if the shapes were having a social interaction or were simply moving randomly. This particular stimulus was intended to be perceived as random motion by the experimenters, but it turned out that it did actually evoke the perception of a social interaction in a substantial number of people.
So what I'm about to play, again, is the actual stimulus that people saw, and then another version of a low-dimensional embedding of whole-brain activity, and how the trajectory differs between people that ultimately perceived this as social versus random versus not sure. So these are averaged across multiple individuals, and this was a visualization created with Dr. Jeremy Manning's lab's HyperTools package, which you'll hear more about from him in a second.
You can kind of see that all three groups start in a similar place, and then the people that ultimately end reporting a social interaction kind of quickly diverge off into a different part of the space, whereas the people that perceive this as random versus say that they're not sure, kind of stick together for longer and then ultimately maybe diverge a little bit, and then maybe start to come back together towards the end.
This is, I think, again, just a really interesting way to get a handle on how in real time different people are arriving at different interpretations of this, and we can do lots of things where we try to map this back onto brain activity and things like that, which I won't show you for reasons of time, but Jeremy will talk more about this tool as well.
In the interest of time, I think I may actually skip this, I'll just move through this really quickly. This was another example where we had people watch longer live-action films, this is not real-time now. The movie's too long to play this, but these are different individuals' trajectories as they watched this film with social content, and each line is a participant, and they're colored by basically this trait social function score. You can also think of it as sort of how lonely they tend to be. There's a lot going on here, and we're still kind of piecing through this, but it does seem like in certain parts in the movie, like maybe the people that are more lonely, there may be more variability in those individuals. And again I'm still sort of piecing through and quantifying that.
Another example here comes from a nonsocial movie. This was a movie of a Rube Goldberg machine here. There's just one human there at the start, but then sort of this mechanical trajectory rather than a social trajectory, and here the participants are colored by how engaging they ultimately found this movie. This was initially intended as our control stimulus. It turned out people really liked watching this, most people, so they rated it very engaging, in red, and then there was one person that didn't really like it, and they rated it as less engaging in blue. And here's a 2D visualization of that. The subject in blue you can sort of maybe loosely start to interpret this as sort of mind-wandering; they're less engaged in the stimulus, per their ultimate report.
With that, I would like to acknowledge my lab here at Dartmouth and my funding and the collaborators who have built these awesome tools that you can see here, and I will stop sharing.
Let's move on to Dr. Jeremy Manning. It's my pleasure to introduce my colleague here at Dartmouth, Dr. Jeremy Manning, who is an assistant professor in our department of psychological and brain sciences. He has dual BSs in computer science and neuroscience from Brandeis University, a PhD in neuroscience from the University of Pennsylvania, and he did his postdoc training at Princeton. Here at Dartmouth he directs the Contextual Dynamics Lab, which uses computational models, behavioral experiments, and brain recordings to track and also manipulate the ever-changing thoughts that we carry into each new moment. In the course of doing this really innovative research, his lab is also developing some awesome tools that I've already previewed for you that they share with the world.
With that, take it away, Jeremy.
DR. MANNING: My name is Jeremy Manning, and I'm an assistant professor of psychological and brain sciences at Dartmouth College. I also direct the Contextual Dynamics Lab there. I'm going to using some of the key questions my lab studies as examples to illustrate some ways that we have used dynamic data visualizations in our own projects and to hopefully spark some ideas about how you might use these sorts of approaches in your own work.
My lab studies the underpinnings of human thought. A lot of how we study and think about thoughts ends up being about the brain network dynamics that support learning and memory and communication and other high-level mental operations. One tool we use to help quantify thoughts is geometric spaces called thought spaces, like the one I'm showing here. Each coordinate in this space is like one thought or set of thoughts you might potentially have, and the geometry of the space is set up so that conceptually-related thoughts are nearby. So duck and goose are assigned to nearby coordinates in this space, whereas duck and truck live further apart. We define these spaces using text embedding models that are typically fit to huge text corpora.
Because these spaces are often visually complicated, it's often helpful to visualize them dynamically. In this slide, I'm showing low-dimensional embeddings of the text of roughly 7,500 research articles. Each article appears as a dot, and when we color the articles by subject area, we can see from how dots of the same color tend to clump together that this approach is capturing the notion that articles about similar things are embedded into this space at nearby locations.
When we rotate this static image, we can really start to get a sense of the full three-dimensional structure. We can start to see that there's quite a bit of structure to the set of embedded coordinates. For example, if you think about a bounding box tightly enclosing the set of coordinates for these articles, that bounding box would represent the space of possible locations that articles could be embedded into. But in practice, articles almost always live on this complicated-looking surface that emerges visually when we rotate this cloud of points. You can even imagine creating a document classifier that uses the distance from this surface, within this space, as a heuristic for estimating whether a particular held-out document is a member of this collection, or not.
If we're careful about how we define these spaces, we can also always follow the mapping from an arbitrary embedded location back to the original conceptual meanings of each coordinate. Sometimes, those meanings are best described by individual words, like the dots shown in this particular example. But when a particular location falls between individual words' coordinates, it's often more accurate to think of locations in these spaces as word clouds that reflect weighted blends of thoughts about many atomic concepts.
When our thoughts change over time, we can characterize those dynamics using thought trajectories. The shapes of those trajectories can help us test detailed theories about cognition using the tools of the field of geometry. Here are some real thought trajectories for people who were listening to a 10-minute story. Each color represents a different person, and just by looking at this animation, you can start to get a sense of what's happening in people's thoughts. For example, you can see that there are certain points in the story where everyone's thought trajectories kind of deflect at the same moment. Maybe those are like high-level plot changes in the story.
You can also see that even though these people all have similar thoughts, they're not identical, so we can start to ask about how spread out different people's thoughts are at different pints in the story, or about why one particular person's thoughts diverge from the group at a particular moment. Even though the full trajectory would look overwhelming if I showed the full thing in a static plot, visualizing the trajectories dynamically can be a really compelling way to focus attention on just one small part of the dataset in each frame. It helps us illustrate the structure of the dataset in a digestible and intuitive way.
Of course, it’s not strictly necessary to visualize these trajectories dynamically. Here are two examples of trajectories that we can get a good sense of even in static form, since there aren't many places where the trajectories intersect themselves. The thought trajectory on the left shows how the conceptual content of a television episode unfolds over time, and then the thought trajectory on the right shows the thought trajectories derived from people as they were trying to remember what had happened in the episode.
Even without a formal statistical test, you can see that the shapes really look visually similar, which means that there's some correspondence between the episode that people watched and how people are thinking about the episode when they remember it later. In other words, our memory systems are picking up on something about the global structure of the episode's trajectory in a way that enables us to reconstruct it later.
We can also look for subtle distortions or disagreements between these shapes, to help us understand when our memories are inaccurate or incomplete. The trajectories I'm visualizing here come from a fantastic public dataset collected by one of my fellow workshop organizers, Janice Chen. Her Sherlock dataset is a wonderful example of how data sharing, particularly of really clever and rich experimental datasets, can give a project an extended life beyond its original intended scope. Janice's Sherlock dataset has remained one of my favorites, but we're lucky as scientists today to have access to many thousands of public datasets that are just begging to be analyzed and visualized in new ways.
Another point that visualizing thought trajectories highlights is the need to consider the full scope and structure of our experiences beyond individual moments in isolation. If we think that our memory systems are picking up on the global structure of our experiences, then in order to understand the dynamics and underpinnings of our thoughts, we need to consider how each moment and how we think about that moment relates to the rest of our experiences.
There's also a growing body of evidence that different brain regions are sensitive to the statistical structure of our experiences at different timescales and levels of conceptual detail. For example, sensory regions seem to respond to low-level perceptual aspects of our experiences that unfold over short timescales, and as you move to higher-order cortex, you see brain regions that seems to respond to a higher level of conceptual aspects of our experiences that often unfold over much longer timescales.
If we want to understand how our brains support our thoughts, then we also need to think about how the representations maintained by different brain areas interact and influence each other. That means that a key aspect of studying the neural basis of thought is about modeling and characterizing patterns of brain network networks under different cognitive circumstances. Network dynamics are another key area that is ripe for dynamic data visualization approaches. So to make this movie, I've applied a model to estimate a set of nodes throughout the brain, which are shown here as gray spheres, and then in each frame, I'm showing a red line when the associated regions' responses are positively correlated and a blue line when the regions' responses are negatively correlated. From the animation, even without knowing what’s happening in this experiment, and without doing any formal statistical test, you can see this lightning storm of correlated activity that appears periodically, and the animation lets us intuit what's going on, which can then lead us to define and formally test specific hypotheses.
These examples using thought spaces and brain network dynamics are part of a broader question space my lab is tackling related to how different people transmit information to each other and how we verbally describe our past experiences to ourselves and to other people. Essentially, we're building models of individual people's thought spaces, and we're trying to use the geometric alignment between those spaces across people to describe how efficiently those people might be able to communicate with each other. We're looking at scenarios like teachers communicating with students, or doctor-patient interactions, and other things like that.
If you're interested in any of these ideas or if you'd like to learn more, I hope you'll check out my lab website, where you'll find links to our papers, code, and data. Thanks so much for participating in this workshop, and I'm looking forward to more dynamic data visualization discussions with you throughout the day.
For our next speaker, it's my pleasure to introduce Tim Behrens, who will be giving our first keynote lecture today. Dr. Behrens has been at Oxford since he carried out his doctoral training there, and then stayed on as a postdoctoral researcher and then a lecturer, and now as a professor of computational neuroscience, and he also holds an honorary position at University College, London.
He's led groundbreaking research in functional neuroimaging, brain connectivity, neuroanatomy, learning, decision-making and reward, and more. And he's also won many awards for his work, including the UK Life Sciences Blavatnik Award for Young Scientists, and he's also a fellow of the Royal Society. He also holds leadership positions at PLOS Computational Biology, PLOS Biology, and eLife, and today he's going to be speaking to us from he perspective of his role as deputy editor at eLife, where he's working on new approaches for evaluating and communicating scientific research.
So please join me in welcoming Professor Tim Behrens.
Agenda Item: Keynote 1 – Dr. Tim Behrens
DR. BEHRENS: Thanks very much for ridiculous-sounding introduction, but I'm very grateful. You can forget all the neuroscience today. I'm not a neuroscientist today, I'm an editor, and talking to you about stuff that's happening at eLife.
I've been massively impressed with all these amazing new ways of visualizing data that people are coming up with. It's a great thing to be thinking about. But the big problem is that when you come to share all this work, you'll be sharing it in PDF, which is going to be tough to make those amazing visualizations, so I'm here today to tell you what eLife are trying to do to solve that problem.
But first, since everybody sort of starts off explaining what their lab does at the start of these kinds of talks, I'm going to spend just two or three minutes explaining what eLife is. ELife is a journal that publishes really high-quality science. That's most of what we do, and all the decisions are made by scientists, and we try to publish rigorous high-quality science openly. But it's also a research hub for how we should do publishing. Lots of people at eLife think that publishing doesn't work very well, so we try to innovate. Part of our mission is to innovate both how we run the business of publishing, and how we communicate our results.
We're funded to do that. In fact, we're not really funded to run a journal, anymore. We were originally also funded to run a journal, but mostly the journal is now self-running in terms of finances, and almost all of our funding goes towards innovating and understanding what the community needs.
So we innovate in several different spheres. I'll just tell you about a couple of them, quickly. We innovate historically in how science publishing works. We put a lot of effort into increasing transparency in peer review, into making news ways of peer review working consultative review, which leads to quick and clear decisions for authors and reduces the time in revision. Many of you may have experienced that, working with eLife.
We are a big promoter of open science, so everything we do is open access. And we were a big early adopter of broad open science publishing. That's a lot of historical things that eLife are doing. And eLife are now going a lot further in this direction, and so we're implementing a new model, which effectively says instead of reviewing a paper before publishing it, we're going to take advantage of what's happening in the preprint world, where papers are basically all published before they're even submitted to journals, and we are going to try to work out how it's possible to run a completely different way of publishing where you publish before reviewing. And then you curate that published literature.
Here's a kind of example of the kinds of things we're doing now. We're putting a lot of energy, we're taking preprints seriously, trying to get rid of gatekeeping, and putting a lot of energy into curating at the preprint space. So we're trying to publish concise takes on preprints, along with detailed reviews, to try to help people understand the contributions of preprints.
Right now, obviously, we're still a journal that publishes, makes gatekeeping publishing decisions, but we're trying to push the sphere towards making a world in which everybody doesn't need gatekeeping journals anymore, and so we're trying to do those innovations, and that's where a lot of our editorial practice is happening right now.
I'm just going to change to my first demo. This is the only demo that isn't about a visualization, but this is a demo showing why it can be useful. ELife also makes infrastructure for other people and it shares that infrastructure broadly. Here's this thing, sciety, that we're making, which is going to end up being a broad tool for curating the literature. This is a paper that ended up being published in June about SARS-CoV, about COVID, so it was published on June 21 in Nature, and this is our website sciety curating opinions from lots of authoritative people, groups, who are reviewing the preprint literature, and you can see opinions and clear opinions were present in April, a long time before the paper was published, even though it was under Nature's rapid publishing. So that's a major thing that's happening at eLife right now, that is not to do with data visualization, but I just thought I was allowed three minutes to tell you that.
Let's get back to the presentation, which is here. We don't just innovate, or try to innovate, in how the mechanisms of publishing works. We also are funded to innovate in how science is communicated. So much in the spirit of today's talks, we can lose the legacy of print publishing; we can try to make new ways of sharing our work, which can communicate the science in the clearest, most transparent way that current technology allows; and critically, not only do we do that for ourselves, but we also try to write open-source software that makes it easy for other journals to borrow this approach. So we're trying to change how science publishing works broadly, not just within eLife.
I'm going to show you two examples of that in the space of data visualization, which I think would be useful to many of the people who've presented already today. The first is quite a simple but nevertheless impactful thing, which is making articles that can embed richer media than just figures. I'm going to show you example of a video, but you can imagine audio, et cetera, being inside figures, and then the most exciting thing, I think, is these executable research articles, which I think will allow you to publish most of the figures that were in Michele's talk at the start.
Let's start with embedding videos in papers, and again, I'm just going to flip. I'm sorry for all this flipping, but I thought it was better, given that we're a publisher, to show you actual papers that we've published. So I'm going to flip out, I'm going to find the eLife paper.
This is an eLife paper about some monkeys that are grasping, reaching and grasping some objects, and the critical thing is how do they grasp them, and how does the brain control how they grasp them. And if you were writing a traditional paper, you would spend a long time explaining this task, and probably your reader wouldn't really understand what you're talking about. But if you have the opportunity to put a video, embed a video, inside your paper, then you can do something like this. This is literally just showing the monkey doing the task.
In some sense you can lose a whole section of the paper which is just describing in longwinded terms how the task is set up. Just you see that, and you know what the whole paper is about.
And then here's this one I really like as well, underneath it. Now the question is okay, how did the musculature enact that? So there's EMGs everywhere, and you can see as the different grasps are happening, you can see the EMGs -- in fact, this one actually isn't, this is the angles of all the joints, but below is the EMGs. This is the angles of all the joints, and you can imagine it's just so much easier to align with this video, it's so much easier to see what's going on. Here it's about to grasp a cylinder.
That pretty simple step of allowing, forgetting the PDF and allowing authors to embed more complicated media inside of a web article, already I think gets a lot of power for people to play an interesting games with their articles, and it’s always fun to see what people are doing.
I'm going stop this show and go back to the presentation again. Here's the second example. This second example is a much bigger beast than that. This is called an executable research article, and effectively we're allowing you to publish your data and your code, and I'll show you how that works.
This is the basic premise that we've all been discussing already today. Traditional research manuscripts still are useless in terms of what modern technology could do in terms of sharing, communicating science. One thing that hasn't been said, I suppose, many of these points have been discussed already, so I won't belabor them, but one thing that has not been discussed is that it's possible to hide an awful lot in a figure, right? You can play with the thresholds and present the best ones, et cetera. And we all know people do it.
If you can make a richer explication of your data, a more interactive explication of your data, then it's going to be a more auditable, more reproducible article. It's going to favor more reproducible science, because people can really dig around and see what's in the data. So that's a real incentive for us to be doing this as well.
We want to make articles which are themselves executable, which have code embedded in them, that you can execute that code, change that code, so these articles are going to encapsulate usable code and data within the flow of a manuscript, deliver progressive enhancement from a static research article to full data and code interaction, and I'll show you in a minute how far we are along that progression.
Critically, we'll make them future-proof, so just because they're -- the coding languages might move to a new version, all that kind of stuff, that isn't going to -- it's important that that shouldn't make the articles that we publish defunct. And we want to be able to do this stuff in a way that anybody who knows basic coding can make use of it. If we can do something like that, articles that really embed data and code, it'll make for much more transparent and trustworthy science. And I'll show you where we are with that now.
Here is a paper that we published in eLife. What this paper is is a meta-analysis. You don't need to know what it's a meta-analysis of; it's a meta-analysis that we published in eLife. And you can see this is what it looks like in its traditional form. Here's a nice figure. It's very beautiful, but they're flat figures. Here's another one down here. This one really is a complicated beast, and you'd be terrified of this figure if you wanted -- if you saw this in a normal picture. This is describing all the different papers that have gone into this meta-analysis, in some detail. And the kind of detail that you might tell your graduate student you could never possible share with the world. So just make some summary graphs for me.
Excellent. So you can go to this article and then you can just press this thing up here where it says see this research in executable code view. So I'm going to press that and pray that it works. It's working.
There's this really fun thing at the top, saying run document. Because you're running it, it's a program now. Here's that figure again I just showed you. Can you see what's happening now? It's now interactive. It tells me, if I look at excluded articles, it tells me what the criteria were for exclusion when I hover over there. Labels are appearing online. This is an article that's published online at eLife.
Here you can go and have a look at every single study, how many patients they had. This is containing an enormous amount more information that the flat article did, because you can, because as everyone says, you can focus your attention on one piece at the same time and build up sequentially your understanding of the graph, in a way that you can't in a single flat page.
And here's that horrible figure that everybody hates, that the supervisor wouldn't even let the student present. But now you can go in there and zoom just before, and now it turns into a really useful thing. It's something that you can flip around and say, okay, I can interact with it hierarchically. I want to see all the MTR ones, okay, here's one down here, it tells me this one's got a low R-squared, and it tells me the number of samples, et cetera. You can imagine how this is a much more useful thing than you could do in flat.
Okay, again and again, and then this is kind of fun, right? So here I'm just going to show you this fun feature. So all of these figures, and it's not just figures, by the way, but I'm just going to show you figures, all these figures you can click on this little eye up here, and you can see the code that runs them.
And you can go down and you can change it and you can see what happens if I change the threshold or change this, that, or the other, or in this case, if I change this subplot title to NIMH DataVis Workshop. Now let's see if this works. Now I could run the whole document again, or I could just maybe even run this figure. I just run this figure, requesting a session, session starting. Is it going to work? Right, let's have a look at what the figure looks like now.
Oh, no. Let's run the document instead. This worked a minute ago. Maybe it doesn't work when I'm on Zoom.
Look at that. This figure is now titled NIMH DataVis Workshop. So you can go in there and you can play with the data. You can edit it, see what it does to the graphs, play with the code. This is an executable research article.
Cool, and then if you want to just go back and share the article by emailing it or something like that, you can return to the original and you have a normal article which you can get a PDF of or something like that.
Okay, I don't know if that demonstration, it's always difficult to know what people have understood or not in Zoom. But let me go back to my presentation.
So to make one of these ERAs, it's easy, right? The hardest bit is getting published in eLife. That's the first thing you have to do. So you don't worry about the ERA. I mean, obviously it's great if you can be thinking about that from the time when you were designing your paper, but actually before you start the ERA process, you need to have an eLife paper.
And then you go to Stencila, who are our collaborators that are building this with eLife, and you can automatically convert your eLife article, which will be in eLife format, and convert it with the tool into R markdown or Jupyter notebook, and then you can add code. So if you're coding in R or Jupyter, you can just add codes -- or R or Python -- you can just add code from your R markdown or your Jupyter notebook locally, and then you can upload that enriched code and then you can -- that's going to make your ERA, your executable research article, and actually right now the eLife team will then work with you to optimize it and make it look cooler, and all that kind of stuff.
So we think this is scalable, and we think that other people will be able to use it. So the team have tried to make sure that it operates well with lots of authoring and conversion tools, it's reliable, and it's minimally disruptive from the publisher side. So what that means is other journals will be able to just take it, because it starts the process after they've got your paper. They do whatever they do, and then the ERA process starts then. So we hope we made it easy for other journals to borrow this.
It's open source and modular and so we hope it will be easy for people to contribute to it to make it -- give it new functionality, et cetera. As I said, this has all been developed together with Stencila.
There's one more thing I wanted to maybe show you, which is -- let me just move out, there are a bunch of things that we have more information, I'll show you them quickly on the web, and then these websites are here.
So we published, on our labs website, we published a paper by Emmy Tsang and Giuliano Maciocci, explaining how you go about getting an eLife thing and with a button saying I want to turn my eLife paper into an ERA, and a whole bunch of frequently asked questions, so that you can figure out how to go about it. There is a webinar explaining how you go about it, as well. Stencila have detailed instructions on how to do it. So hopefully it will be easy for people who want to do it.
These are some of the ones we've done so far. So we've already published 15, 20 or so executable research articles, and these are the ones that are in progress there are people working on right now. So you can see there are a few more coming, including some neuro ones.
So the last thing I wanted to do was flip back to the start and just say that I am really just the face of eLife today. I haven't done any of this work. The work has been done by the product and innovation team and Giulia Guizzardi from that team is here as well, and so will be able to answer or help me answer any questions that you may have.
Thanks very much indeed. I think I'm probably getting towards the end of my time.
DR. FERRANTE: There are a few questions in the Q&A.
DR. BEHRENS: Okay. I love this idea, how would this type of publication be catalogued? How would you find the lit review? So exactly the same way that we're cataloguing right now. It's going to have a DOI, it will be indexed in PubMed, it's an eLife paper. In fact, there's a version of this eLife paper that's just a PDF, but then it also comes along with this enriched executable version which is presumably much, much clearer. So I don't think there's any problem in terms of PubMed, indexing or et cetera.
Next, well done, this is the right direction. Thanks very much. We think so.
For ERAs, there must be access to the full dataset? Correct. Where are those data stored? Well, how much of your data do you need to upload? In the cases that you saw, maybe Giulia, do you want to answer that question?
DR. GUIZZARDI: Sure. Hi, everyone, by the way. I'm Giulia Guizzardi, innovation officer at eLife, and I'm a little bit here as a technology expert behind the executable research articles. For now, the dataset are usually stored in the repository in GitHub. So usually we can actually upload those repository and retrieve them from GitHub to the Stencila hub, that it's like another middle ground where the executable research article gets actually translated into its new form. So that's where, like, the magic happens basically.
So (indiscernible) from that, but now eLife has also a new integration with Dryad on the website. So data could be stored also in Dryad in that case.
DR. BEHRENS: So, you guys should stop me when we have taken our time, because there seems to be a few questions.
DR. FERRANTE: You can take another one, and maybe like the rest you can answer by typing.
DR. BEHRENS: So the answer to Forrest is yes. So ERAs are like Jupyter -- in fact, it's Python, it's like Jupyter or R markdown. That's the simple answer to Forrest Schuster's question. That's the way it's working, yeah. I think that's right.
DR. GUIZZARDI: I think it would be interesting to answer maybe to Theodora's question live. Oh, no, okay. I read it wrong. Sorry. It was asking about the editor position.
DR. BEHRENS: Oh, Richard's question is interesting. Are the innovation team working on anything like that, Giulia? I mean, obviously we have ways of just sharing the data in code, but so Richard's saying are the interesting new ideas about how to share all of the data in an interactive and interesting way that can be peer reviewed? So I don't think we are -- well, Giulia, maybe you can comment on whether the team are thinking beyond ERAs about how to share the bulk of the data.
DR. GUIZZARDI: Yeah, sure. As far as I know, like the direction we're going we'd like to integrate as much libraries as much data type as possible. So the direction we are now obviously we're actually trying to translate into ERAs those articles that have already been published by eLife. So for now, like those were the only -- where only those type of data that we were challenged into translating into ERAs, but as far as I know, the team is now working in introducing into an ERA a model in 3D of the brain. So we're going a lot farther with the type of data and visualization. We're going to introduce some --
DR. BEHRENS: That was kind of what I was -- I was hoping you'd say that. At the moment, it's noticeable that the ERA things that you can mostly do are plots, interactive plots, that don't have the data, where the analysis has already been done, and this is how we present it. But it seems to me that it's even more interesting if you can do some of the data analysis online inside the article, yeah.
DR. GUIZZARDI: For sure, now everything is accessible, so if you go on the Stencila hub, everything is published there, you can download it, you can link like the code and play with it or like add charts and use those code that were used in the eLife papers, so everything is completely accessible.
DR. BEHRENS: I think maybe we've had our time now. So thanks for inviting us, and thanks, Giulia, for saving me.
DR. FINN: Excellent. Thank you to Tim and Giulia. It's my pleasure to introduce our next keynote speaker, who is Dr. Aaron Alexander-Bloch. Dr. Alexander-Bloch received a bachelor's degree in philosophy from Harvard, a PhD in computational biology from the University of Cambridge, and an MD from UCLA. He did psychiatry residency at Yale before moving to Children's Hospital of Pennsylvania at UPenn where he is an attending psychiatrist and also directs the Brain Gene Development Lab. He has won several awards, including the NIMH Outstanding Resident Award, and broadly his research investigates both normal brain development and the altered developmental trajectories that lead to mental illness.
Most recently, he and his team have been working to integrate big data from publicly available imaging and genomics resources, like for example UK Biobank, with deep phenotyping of individuals that they have in the UPenn system, with the ultimate goal of translating polygenic risk scores for psychosis and other neurodevelopmental psychiatric disorders into pathophysiologic mechanisms that can inform therapeutic targets and improve risk assessment.
So please join me in welcoming Dr. Alexander-Bloch.
Agenda Item: Keynote 2 – Dr. Aaron Alexander-Bloch
DR. ALEXANDER-BLOCH: Thanks very much, Emily, and thanks very much for this opportunity to speak to all of you at this workshop. In particular, I want to thank Michele and Josh for the vision that these topics are really important for NIMH. I am a practicing psychiatrist and also an assistant professor at Penn where I lead a multidisciplinary research group focused on psychiatric neuroimaging, and today I'm presenting projects that we've been working on in our lab along with many collaborators to characterize brain growth charts from MRI methodology across the lifespan.
Part and parcel of the scientific goals of this project is the development of a usable online platform where the data and models can be explored interactively, and also applied to new datasets by the researchers. So my two overarching goals are to both describe the science of this project, but also to showcase this platform itself.
With that in mind, there are four parts to this talk, each of which includes a demonstration of the online resource. So first, I'm going to introduce why we think brain charts are important and the data we used to create our preliminary models. Second, I'm going to show how we use brain charts to describe new neuroimaging-based developmental milestones. Third, I'll show how we use brain charts to characterize clinical alterations in patient groups, and finally, I'm going to discuss how brain charts can be used in tandem with novel data and also where we want to take this work in the future.
So first, the why and how of brain charts. Growth charts in some form have been around since the late 18th century. The first known growth chart was developed for height, and since then, this simple way to quantify age-related changes against a reference standard has been a cornerstone of pediatric care and also research in many disciplines. Growth charts remain a powerful example of personalized or precision medicine, but widely used growth charts exist mainly for a small set of anthropometric variables such as height, weight, and head circumference.
The lack of a brain reference standard is particularly relevant for psychiatric disorders, which are generally accepted to be disorders of brain development. So this classic graphic shows hypothesized alterations at the brain cellular level in people with psychosis risk and how these cellular changes may map onto structural neuroimaging findings during typical development.
Despite many advances in our understanding of psychiatric illness and its brain correlates, it's also true that we in psychiatric neuroimaging have yet to provide breakthroughs with clinical impact on par with other areas of medical science, and one contributing factor for this may be the continued difficulty in establishing reference standards to anchor findings of age-related changes.
Many important discoveries have yielded a general understanding of how the brain grows in typical development, some of which is summarized in this figure from an excellent recent review. There's too great a body of work to do it justice in such a short talk. Our challenge remains, though, to continue to work towards practically useful neuroimaging growth charts.
This is partly due to difficulties in data harmonization across studies, studies that often target disjointed developmental periods and separate clinical conditions. A big part of this challenge is also that in contrast with standard pediatric growth charts, such as those for height and weight, MRI is more sensitive to technological variation in things like scanner platforms, acquisition and analytic strategy, which is one of the reasons why the period from fetal growth through early postnatal development, through the preschool years, is rarely incorporated into multisite studies, even those that have a lifespan focus, despite evidence that early processes shape growth trajectories and vulnerability to psychiatric conditions.
Another important point is that brain growth in maturation continues through adulthood, well beyond the developmental period covered by anthropometric charts. So rather than charts of brain growth, per se, what we really want is charts of age-related changes across the whole lifespan.
Not withstanding all these hurdles, the building blocks are in place to create brain charts for the human lifespan, and that's thanks to the investment from NIH and other funding bodies in largescale imaging datasets and the support for collaborative multisite initiatives, and also recent advances in image processing and statistical frameworks for data harmonization.
So I'm going to present some of our work in this area, which is extremely recent. It hasn't yet been peer reviewed, but it is available on bioRxiv, and as you can see, this is a true team science approach using data from multiple consortia as well as data shared directly for this project. In particular, I want to stress the work by the equal contribution first authors on this paper, Jakob Seidlitz, who is a postdoc in my lab, and our international collaborators at the University of Cambridge, Richard Bethlehem and Simon White, who worked closely with Professor Ed Bullmore in the Cambridge psychiatry department.
As I mentioned before, and I think in keeping with the goals of this workshop, we really see the development of a usable resource as a major part of this effort, and I'll walk through the current version of this research which is online at brainchart.io, built using Shiny in the R environment. I do encourage people to explore this yourselves, although maybe not while I'm talking, as the server may not yet be ready to withstand quite that level of curiosity.
This is a quick illustration of the data that went into these models on the back end. We incorporated over 95 studies, including over 100,000 individual brain MRI scans from the prenatal period through to the very end of life. To our knowledge, this is the largest MRI dataset to date and the most comprehensive in terms of age range across the lifespan.
Every participant here contributed structural MRI data. They also had what we call biological covariates, like age and biological sex, as well as what we call technical variates, which encompass information about the MRI platform and imaging processing pipeline. It's important to note that so far we focused on global volume phenotypes, total cerebral gray matter, illustrated here in purple, subcortical gray matter here in yellow, white matter volume in blue, and ventricular cerebrospinal fluid volume in orange.
The focus on these global phenotypes is both a weakness, but also a strength of this study. It's a weakness because of course we're interested, like everybody else, in more complex phenotypes like folding and cortical expansion at the millimeter scale, not even to mention things like fMRI and diffusion MRI, but this focus is also a strength, because technical artifacts are less profound for the global volume features, which makes them an ideal test case for this framework.
The datasets that went into our models can be explored using the interactive resource. So if we accept all the disclaimers, getting us into the main site, we go to data selection. We can look at just subsets of studies or even a single study or you can stick with the whole data set. We can also just look at specific classes of pipelines, or include all of them, which is the default.
You can also look at the data geographically, so if you go into the map of studies tab to look at sample size geographically, and the color scale here is in terms of thousands of subjects, it won't be surprising to people in the field that the single largest study is the UK Biobank, and if we click on a study, we get some basic information about the study, including data access requirements when the study site has made that information available.
So that's the sort of basic motivation and the data behind these models. Now I'll talk about our work using brain charts to characterize neuroimaging developmental milestones.
Briefly, the statistical approach that we use is called GAMLSS, which is a robust and flexible approach to model nonlinear growth trajectories, which is recommended by the World Health Organization and implemented in R. To show the general approach, here we're looking at some public data on head circumference, and using this data as a reference, we can derive these colored centile lines in terms of age and sex specific distributions. So now we can take a new individual's raw head circumference data from this population and reinterpret it as a percentile, or sometimes we just say centile against the reference data.
And our goal is to do something analogous but for our more complex data, the global volume phenotypes from anatomical MRI, and we want to do this while accounting for technical covariates that are potentially so problematic.
It's a good sign that even prior to any modeling the imaging features across studies show clear age-related trends, but there was also a lot of heterogeneity between studies, which are shown in different colors in these total tissue volume plots. This shows the importance of using the full multisite data to achieve a reference that isn't biased by individual studies.
These population trajectories show the 50th centile line as well as the 5th and 95th centile lines for males and females after removing study and processing pipeline effects. These models had high stability under cross-validation and high validity against non-MRI metrics of brain size, such as postmortem brain weight across the lifespan.
I think this is also a good place to stress that we stratified by sex for similar reasons as pediatric growth charts, males have larger brain volumes in absolute terms, but this isn't associated with any clinical or cognitive difference, which is why sex-specific growth charts are likely to be more informative.
One way our models extend previous work is modeling age-related changes in variance across individuals. For example, on the bottom left, we see an early developmental increase in gray matter variance that peaks at age 5. In contrast, white matter variance peaks during the fourth decade of life and CSF variances peaks at the end of the lifespan. And in line with prior literature on the subject, variance in males is generally higher than variance in females.
If we look directly at the rate of growth across the lifespan, for example, we see that the increase in gray matter from mid-gestation peaks at age 6 where the first derivative on the bottom left crosses zero. This observed peak occurs two to three years later than the peak reported in prior studies that relied on smaller age-restricted samples.
Here's another way of visualizing the information about peak rate of change and peak absolute size. This plot shows the 50th centile for each phenotype, for males with the solid line, for females with the solid line, sorry, and males with the dashed line. Cerebral gray matter is red. White matter is light blue. The subcortex is green and CSF is purple. For each imaging feature, the circles show the age of peak absolute size while the triangles show the age of peak rate of change.
Only gray matter volume peaks in absolute size prior to adolescence, but rates of growth in general peak much earlier in infancy and early childhood, and these early peaks in rate of growth haven't previously been well demarcated, because datasets haven't really spanned the perinatal period, which was necessary for us to accurately model early growth.
It's been hypothesized that cellular changes are reflected in these neuroimaging milestones, even at the level of relative growth of global features. The initial postnatal increase in gray matter relative to white matter has been argued to be due to increasing complexity of neuropil and synaptic proliferation. Subsequently, gray matter declined relative to white matter, which is likely due to synaptic pruning and also continued myelination.
But the exact timing of this gray/white differentiation hadn't yet been clearly shown, again partly due to the lack of datasets spanning the perinatal period. Our models clearly demarcate this early period of gray/white differentiation shown by the horizontal gray lines in this plot. This period begins with the switch from white to gray as the majority tissue compartment in the first month after birth, and ends when the absolute difference between gray and white matter reaches its peak in the fourth year of life.
All of this information can be explored interactively with the brain chart resource. If we go to charts on the top panel, we see the population trajectories with different centile lines. So this is showing gray matter, and we can switch features. For example, the ventricles or CSF volume. And we can look specifically at age-related variance.
Now, I'll switch the phenotype back to gray matter volume, and we can use it specifically to look at the rate of change and to look at a specific developmental window, I'm going to use this scroll bar to ignore all the adult data from the end of the age range to mid-adolescence, and zoom in on the brain chart prior to adulthood, showing the main charts above and the rate of change for males and females below. So this platform can be used to interactively visualize and further explore the imaging milestones I just described.
In addition to looking at typical development, a major goal for brain charts is to look at clinical alterations in individuals with neuropsychiatric illness. Brain charts allow us to take a study with both cases and controls and use the controls to model study-specific variation, which then allows us to extract centiles for the clinical groups that leverage the full reference dataset while also controlling for site-specific technical confounds. One thing this framework does is allow for cross-disorder comparisons between disorders that occur in different developmental periods. It is important, given evidence of shared risk factors across psychiatric illnesses, even those that don't necessarily occur during the same developmental period.
Relative to individuals without diagnoses, we found highly significant differences in centiles across diagnostic groups. Here I'm showing results for gray matter for seven conditions where there are more than 500 scans for multisite data for each condition included in our dataset.
We see the results for males above and females below. From left to right, we're showing Alzheimer's disease, ADHD, anxiety disorders, autism spectrum disorder, mild cognitive impairment that may precede dementias, and major depressive disorders, and finally schizophrenia. The circles indicate differences from the control median and the asterisks just indicate statistical significance after correction using false discovery rate.
Notably, schizophrenia ranked second overall behind Alzheimer's in terms of the effect size of gray matter deficits when measured in terms of centiles. Of course, different mechanisms underlie the gray matter deficits observed across disorders, and in the case of schizophrenia, the cellular basis isn't yet fully known. But while brain MRI is part of the diagnostic workup for dementia with the potential to help discriminate between pathological processes, our results underscore the potential diagnostic yield for a wider scope of human diseases perhaps, with the use of appropriate reference models.
In addition to considering phenotypes separately, we generated accumulative deviation metric, the centile Mahalanobis distance, across all brain phenotypes, summarizing the cumulative deviation from the 50th percentile. A nice thing about this measure is that it can be readily scaled to incorporate additional phenotypes when, as we hope, they become available in the future. As we'd expect, this measure of centile deviation was consistently greater in patients compared to controls.
One benefit of this measure is that it incorporates deviation both directions. So for example, if you look at ASD in the gray matter plot on the left, you can see some evidence of bimodality, suggesting some people with ASD have less gray matter and others have more gray matter compared to the control median, which is also consistent with prior literature. And both increases and decreases are incorporated into the measure of centile deviation, which is a potential advantage.
For this cumulative measure of centile deviation, schizophrenia was third behind Alzheimer's and mild cognitive impairment in terms of overall effect size.
These, as well as other clinical alterations, can be explored using the interactive brain chart resource. If we go back to the data selection tool, where at this point we're only including the CN group, which stands for control or cognitively normal, we can add other diagnoses from a wide range included in the available datasets. So here I've added ASD, schizophrenia, and Alzheimer's.
Then we go to diagnostics, and you can see the individual participant data with site and pipeline effects removed, plot it against the reference charts on top, and you can see the group level distributions in box plots below, including on the bottom right, test of significance for the deviation from the control group. This is for gray matter, but we could also explore other phenotypes by changing phenotype of interest in this panel.
Now let's move from clinical alterations to the incorporation of novel data and future directions. A key extension of the present growth charts is the estimation of centiles for data not included in the original models. What we want is for a novel study to be able to use our resource to derive centile scores for their subjects, anchoring the novel study against the full reference dataset while controlling for technical covariates.
Option a to do this is just to refit the model using the new data. But this has some serious drawbacks. First, just in terms of computational feasibility, it would require too many computational resources to be practical, especially as an open resource. Privacy restrictions are also likely to prevent the sharing of individual participant data in many cases where brain charts would otherwise be useful.
So we sought another option, which I call option b here, and we implemented a maximum likelihood approach to estimate the study parameters of a new study, a study not already included in the reference model, using only the pre-derived model parameters but not the individual participant data that went into the reference model.
We tested the accuracy of this approach on simulated data and on four independent real-world datasets. In the real data, the results from options a and b corresponded almost perfectly in terms of the centiles output from the brain chart models, suggesting that option b is a good approach, as long as there is enough data to robustly estimate the study specific parameters in the new dataset. And simulations suggested a benchmark of about 100 scans in a new study to robustly estimate study specific parameters.
So this may seem like a minor and very technical point, but the ability to use option b is critical, because it's computationally feasible and also obviates many future privacy concerns.
In fact, it was possible to implement a fully automated version of this approach into the online resource. Once we're in the app, we can go to upload data, and the question mark here just reminds us to make sure the novel data is correctly formatted, and then we can go ahead and actually upload some novel data, in this case from one of our computers, and after the upload is complete, you can visualize the new data the same way you can visualize data included in the original model.
Here we're just switching to look at female participants and you can also visualize diagnostic groups and download the centile scores from the novel data directly, allowing whatever use someone has that they're interested in for their own data.
So with that, where do we see this work going in the future? Perhaps the most important goal is to continue to gather more data and more phenotypes. Early development in particular is an area where there's expected to be new large datasets available to be incorporated into models in the not-so-distant future. In addition, we want to move from the four global volume phenotypes to regional and millimeter-scale measures of brain size and shape, including cortical thickness, surface area, and folding. This is achievable although maybe not quite as straightforward as it sounds, because technical covariates are expected to have a more complex effect on some of these phenotypes, and we need to address this thoroughly to make sure that the brain charts are really usable.
Another area of interest is in combined imaging genetic studies. There's been so much recent success in terms of psychiatric genetics showing risk loci for disorders, and a central challenge for imaging is to help translate that highly polygenic risk into neurobiological pathways, and a lot of the challenge there is on the phenotypic side of combined imaging genetic studies, optimizing the signal conveyed by imaging features, and we do have some preliminary data shown here suggesting increased measure of genetic heritability, genetic heritability in centile scores as opposed to non-centile imaging features, which if this holds has the potential to increase statistical power across many contexts in imaging genetic studies.
Another area of particular interest is risk trajectories, i.e. tracking individuals along their centile trajectories, where the theory is that brain charts to increase the sensitivity of assessments that track individual deviation from reference norms. Although our current reference models are based on cross-sectional data, we've shown that they can be used to track longitudinal variation for an individual over time in terms of their centile scores.
And there's a high degree of stability in longitudinal centile scores for individuals without diagnoses, which is promising in terms of the potential to track deviation in at-risk individuals, and that's the kind of work that needs to be done if brain charts are going to fulfil their promise to yield clinical insights in the future.
So with that, I'm going to wind down. I think I'm on track. I made up a little bit of time, and I hope I've introduced brain charts and their application to study typical development as well as neuroanatomical deviations in clinical populations, and some of our hopes for the continued development of the kind of interactive resource that I demonstrated for you today.
I want to thank all of our collaborators, especially the members of the Brain Gene Development Lab, our colleagues at CHOP and Penn, our national and international collaborators, and most notably again, Jakob, Richard, and Simon, who led this work.
I do invite you to read our paper, which again was a massive team science effort, including far too many people to thank individually, and also please go to brainchart.io and explore these data and models yourself and be in touch if you have any questions or if you want to be involved in the future iterations of this work. Thanks a lot.
DR. FINN: Great. Thank you so much, Dr. Alexander-Bloch. That was really interesting. I think we probably have a couple of minutes for questions. We started your slot a little bit late. So if people do have questions, feel free to pop them in the Q&A.
Let's see, I see one question here asking have you looked at data-driven phenotypes that cut across disorders? For example, the B-SNIP study.
DR. ALEXANDER-BLOCH: That's a really good question. We have not yet done that, although it's definitely something that we're interested in doing, in keeping with work that we've done in other contexts. At this point, it's kind of -- to some extent, it's not really true, but to some extent you have the choice between going shallow and broad and going deep and narrow and at this point, this resource is kind of definitely in the broad shallow category, right? So you could take these quantiles or centiles and kind of limit your investigation to a specific study where you have very deep phenotyping and use them in that context. But if you're trying to look at the broad cross-study dataset, you're sort of limited by the most common denominator in terms of phenotyping across all the studies, in some sense.
I should say, Richard mentioned this in the chat, but I may have sort of made it seem like you can download the centiles right now from the app, and we decided to wait for peer review to allow that feature in the app. So you couldn't actually download centiles right now. I don't want to give a false impression.
He also mentioned -- I'm a live puppet here -- one of the important things is that there's no -- none of the uploaded data would ever be stored on the back end. So we would be able to kind of ensure the privacy of that data, if you uploaded your data in order to download your own centiles, we wouldn't be taking any of that data or looking at it, and it would be completely protected.
DR. FINN: Thanks, that's an important point. Let's see, we have another sort of question or comment, suggestion, saying it would be interesting to map new potential RDoC domains onto the brain chart using consistent measures across studies.
DR. ALEXANDER-BLOCH: Absolutely. That sounds like more of a comment. I completely agree with you, and that's so great. I think this is the kind of thing where I've presented it a number of times now, and people have thankfully a lot of their own ideas about how they could take this kind of thing and apply it to their own research, and that's really the goal.
DR. FINN: Yeah, it's a really impressive platform and seems like there's a lot more that could be included, and hopefully as you guys roll it out, people can start contributing both data, but also sort of back end contributions of how to incorporate all these other data types.
I have another question, just out of curiosity. It's again such a huge undertaking. I'm wondering how you guys think about like anatomical normalization and how you actually pull out measures from these anatomical scans and just what your pipeline for going from a T1 MRI to the measures that you have and whether that changes at all with the different ages, like are you using sort of age-specific templates, are you using templates at all? How do you guys do that?
DR. ALEXANDER-BLOCH: That's another good question. So overall, we're trying to model the effect of processing pipeline. That's one of the things we're trying to do, but with that being said, we definitely focused on FreeSurfer as a processing pipeline. And we've allowed data into the models under a few different kind of tiered systems. One is getting data ourselves, in which case we process it all in FreeSurfer version 6.0, but we also -- or if it's younger data, like from 0 to 2, we process it using infant FreeSurfer and we also allow data that other people ran with FreeSurfer and they just gave us the output in terms of phenotypes. We also -- and this was particularly important for the fetal data -- we allowed custom processing pipelines that led to phenotypes that were harmonized at the level of what the phenotypes are meant to capture.
I think what that also sort of speaks to is that it worked for total cerebral gray matter volume and constructs like that, but it would definitely be harder if you -- and hopefully when we move to finer-grained features. There's a tension for sure, like between wanting to have consistency in the methodological pipeline across all studies, and then wanting to use the best processing pipeline for the specific age range, right? Because especially in the younger ages, that's just not possible, and we've definitely, at this point at least, we think about it in terms of you want the best pipeline to be applied for the data. So it's kind of on you, on us as a sort of an attempt to harmonize and merge all this data, to use different processing, to allow different processing pipelines as opposed to saying we're going to use the same processing pipeline for all datasets even when it's not the best processing pipeline to apply to that dataset.
DR. FINN: Yeah, absolutely. That makes a lot of sense.
I don't see any more open questions at the moment. So I think in the interest of trying to say somewhat on time, we'll move onto the next talk. But maybe Dr. Alexander-Bloch can stick around for a few more minutes. If people have more questions, feel free to pop them in the Q&A. So thanks again.
DR. SAGGAR: Great, thanks again. Not it's my great pleasure to introduce our next keynote speaker, Katy Borner. She is a Victor H. Yngve Distinguished Professor of Intelligent Systems Engineering and Information Science at Indiana University. She is also the founding director of the Cyberinfrastructure for Network Science Center at Indiana. Dr. Borner is a curator of the famous international Places & Spaces: Mapping Science exhibit. Dr. Borner's research focuses on the development of data analysis and visualization techniques for information access, understanding, and management. So without further ado, please welcome Dr. Borner.
Agenda Item: Keynote 3 – Dr. Katy Borner
DR. BORNER: Good afternoon, ladies and gentlemen. I was hoping to present in person, but if you see this, that means my originally canceled and then rescheduled flight didn't make it on time. So I'm sorry for this, but I'm very pleased to be here.
I will present on registering, visualizing, and exploring biomedical data, and this is for the NIMH Workshop on Advanced Statistical Methods and Dynamic Data Visualizations for Mental Health Studies. I have been truly enjoying the presentations by other speakers, and it's also wonderful to have everything recorded and all the slides available.
In my presentation, I will give a brief overview of the Mapping Science Exhibit. I will then go over two biomedical data, Mapping SPOKE: 3 Million Nodes and 30 Million Edges, a humongous effort to interlink publicly open ontologies. I will then present on HuBMAP, the ambition to map the human body, healthy adult body, at single cell resolution, 33 trillion cells, to create a reference atlas of the healthy human body.
And I will also tell you a little bit more about the Data Visualization Literacy Framework, which my team has been developing since many years now, and is actively teaching also in several courses, one of them I will also entice you to participate in, which is the Visual Analytic Certificate, and I would like you to empower yourself by making your own data visualizations.
A little bit more about the Mapping Science Exhibit. Some of you might have encountered that exhibit in public libraries and science museums. It's an ambition to bring high quality data visualizations to many different environments. Here, you see it on the screen, it's the annual meeting of the Association of American Geographers, back in 2005. So the very first year that we had the exhibit on display. And since then, it has been going to many places, including also the CDC Museum in Atlanta, Georgia or Duke University in North Carolina.
The first decade of the exhibit was all about maps, static maps, and each year, we had many maps submitted and reviewed by an international team of experts, and then the best ten maps were picked, and after ten years we had 100 maps. That's a lot of maps, and there's a lot of information displayed in them. So there are a number of atlases now which explain these maps in detail, and we can actually sit down with a book in hand to explore them.
In the second decade, going on right now, we have interactive data visualizations, and some of you might like to explore those, and in total we now have 100 maps, more than four times six, 24 macroscopes, and we also have many, many display venues that we have the exhibit on display.
Among those maps, you have maps of scientific collaborations, here using Elsevier data. You have maps of NIH funding, and all of these maps are available online at scimaps.org. You also have the Structure of Science, the very first map back in 2005 that showed all of the sciences, and you can now use that as a base map to overlay for instance where nanotechnologies papers are, where proteomics or pharmacogenomics are. So you can use this like a base map of the world to then overlay additional datasets. And all the information on how all these maps come into existence are provided with the maps themselves, and I won't have time to go into details here.
You can map patents, U.S. patents, and you can get to see how they draw on prior art and how they impact future work. You can map Wikipedia and try to understand how far Wikipedia also captures mathematics, science, and technology, and there are lots of relevant Wikipedia entries on those three topics.
You can of course also map the human disease network, or the diseasome, as it's called, and this is an interactive data visualization from 2009, which still works and you are welcome to explore it in much detail.
Or, we have mapped here the history of science of fiction. This is a map by Ward Shelley. It's a hand-drawn map, and it's very much like Amazon. If you find a book you like, then those which are close by you might also enjoy. So again, if you go to scimaps.org, you can zoom in and find your next science fiction book to read.
Going over to the macroscopes, you now have interactive data visualizations that are fed by live data. You have visualizations such as this one of London where I should be now, but maybe I'm not yet there, which show you how different areas in London smell. It's called Smelly Maps. It's using Twitter feed data to help you all understand what kinds of smell might exist in a certain area based on lookup tables. It also does sentiment analysis over tweet so that you get to see which areas in London are more happy, more joyful, more sad, more trustworthy, filled with anger or anticipation or fear, and I think all of these linguistic techniques now exist and you can do this from customer data or for mental health datasets, but you can of course also do it for social media data streams.
So here you see the river. There's all kinds of different bridges and some of them are pedestrian bridges, and they are very different colors than those which are used for traffic as well.
Another team mapped the megaregions of the United States. They used commuting patterns from before the pandemic, and as you know, some people used to commute two hours each way, and they used these patterns to redraw the boundaries of the U.S. states, and as you see for instance for Chicago, Gary, Indiana, which is in the northern part of Indiana, just becomes part of Chicago, because so many people are moving back and forth.
Again, all of these are available online, and you can interact with and explore them. This is an effort that involves many, many mapmakers and macroscope makers, but also advisory board members and we have been proud to serve as curators for that exhibit from the last 16, 17 years now, and again, the exhibit goes to many, many places. If you have a place which would benefit from maps, let us know. We are very happy to bring it to public places.
Going over to more precision medicine datasets, some of you might have heard of SPOKE. If you go to spoke.ucsf.edu, you will encounter a very, very large network that captures the essential structure of biomedicine and human health for discovery, and this effort is aiming to get anyone access to this data in a way that is easy to understand, not just to patients and caregivers and doctors, but also to biomedical researchers. So the investigative team is listed here. Major institutions are involved, and Sergio Baranzini is a very, very great leader for all of us together with his other co-PIs.
As part of this team effort, we developed an explorer that helps you to envision or to visualize SPOKE, 3 million nodes and 30 million edges, which federates about 19 open datasets into a public common dataset for health relevant knowledge, and if you click on that explore SPOKE button, you go over to a visualization of the many different types of knowledge that are in that knowledge graph. So you get to see that from disease you can go to symptoms, you can also go over to compounds, you can go over to proteins, from the compounds, you can go over to food and to nutrients inside of the food items.
And you can then start exploring this network, and the more connections exist between two types of knowledge or different types of nodes in this exploration graph, the thicker the link is. So as you see, from gene to disease, there are many, many linkages.
You can then query this graph by, for instance, entering a food item and disease, let's say coronary artery disease, you want to understand what kind of food items are beneficial or not beneficial for that kind of disease, and thanks to synonym lookup, you can type in heart and you will still get to go to coronary artery disease, even though it doesn't necessarily have heart as a term in this query. You can then search for this pattern and you get to see it in lookup of what entities are involved. So to get from food to a disease, we can either go through compounds or through genes, and going on, you can then zoom into the landscape from the potato -- I'm very German and I like potatoes, so that is a very good choice -- to coronary artery disease, and again, you can get there either via compounds or genes.
We can then start probing this knowledge graph in many different ways. It seems to be beneficial to have two different types of nodes, a little bit more and most humans are having a harder time understanding it all, but then you can of course go through multiple different nodes in order to connect those two types. But then you might also have one of one type and multiple of another type, and this way you can query the entire knowledge graph and you can of course zoom in, very much like a Google map, and you can zoom in again to get more information ultimately. You can also then share that information with others.
This work, which was just presented, requires a that you layer very large multi-level graphs, and so we recently had a lecture seminar in Germany and there is a special issue in IEEE Computer Graphics and Applications coming out on multi-level graph representations for big data in science. So if you work in that area, please consider submitting. If you are interesting in using these algorithms, consider reading the papers as they become available in 2022.
Next project I wanted to introduce to you is HuBMAP, mapping 30-plus trillion healthy cells in the human body, male or female. This is an NIH-funded effort. There's also a markup paper out that is listed here. It provides more information on the overall effort.
The goal is to generate foundational 3D tissue maps of the healthy human adult body, again male and female, to establish this as an open data platform that is there, to coordinate and collaborate with other funding agencies, programs, and of course the biomedical research community at large, and to also ultimately support use cases that demonstrate the utility of this data for advancing health and biomedical research itself.
As you see here, there are different tissue mapping centers, but also TTDs and RTIs, which generate data. The data is then collected and there are many different assay types that need to get compiled and harmonized. Then data is compiled into a human reference atlas, and ultimately is served to the world so that anyone can get access except for this sensitive data, where you would need to log in.
My team is part of the HIVE, the integration, visualization, and engagement team, and I'm leading one of the two mapping centers. So if we go over, you get to see that we have many different organs. The latest count is 28 organs in the human body. So many of your favorite organs would be in that set. There are many different types of single cell and omics assays run, many multiplex spatial assays also that are then used for the atlas generation based on landmarks and also based on anatomical structures and cell types as it can also then serve as a pattern that can be used as a landmark, and ultimately so-called common coordinate framework is developed.
Zooming into this so-called CCF, you get to see that we believe that the CCF must capture major anatomical structures, cell types, and biomarkers, but also their interrelations across multiple levels of resolutions. So from the human body, down to the single cell level, that's a lot of scales. We are using functional tissue units, such as for instance here shown the glomeruli in the kidney as a way to bridge between the whole body and down to the single cell level.
We believe that the CCF must be semantically explicit but also spatially explicit. So we are working on 2D and 3D representations of major anatomical structures and cell types. Basically, it's an anatomical atlas, but you can run an API query again. It's a computable atlas. We can also identify certain cell types and ask what is in their vicinity and their immediate neighborhood. You will be able to query this atlas for what cell types are commonly found in certain anatomical structures, given certain sex and age group and disease versus non-disease, whereas HuBMAP is all focusing right now on non-disease.
Here you see one of those so-called ASCT+B tables that aim to capture the partonomy of anatomical structures, but also information on how cell types are typically located in those anatomical structures, and then ultimately these ASCT+B tables are used to create AS partonomies and cell type typologies, and those are then crosswalked over onto existing ontologies. They are also used to then create a reference optic library in 2D and in 3D that represents anatomical structures and cell types.
Here you see a different representation of the ASCT+B tables, where you see anatomical structures, partonomy tree on the left and the cell types typology tree is in the middle, and then the biomarkers from genomic and proteomic biomarkers, but also lipids and metabolites and now also proteoform biomarkers on the righthand side.
We also just published 11 organs and their 3D anatomical structures. This DOI is online for free use for anyone to use, and you will get to see that there of course is a correspondence between the anatomical structures in the ASCT+B tables and the three anatomical structures in the 3D reference bodies.
In order to facilitate the authoring and review and validation of ASCT+B tables, my team developed the so-called ASCT+B reporter, and this reporter is also freely available online, and then you can hover over one of those nodes and you get to see how certain anatomical structures are connected to cell types and what biomarkers are commonly used to identify certain types of cells.
We now also have a new release, which supports search and table comparison, and there are a few new features that just became available. So check out the ASCT+B reporter.
Also, typically, cell tissue samples are photographed, like what you see here. In order to make this more systematic and to support the registration of tissue across different organs, we developed a registration user interface which you can use to uniquely identify not only the position, but also the AS and CTs that are commonly found in tissue block, and so if you use the registration user interface you can identify the size, the position, and the rotation of the tissue block and where it came from, but also via collision detection, you automatically get semantic annotations and IDs first over two ontologies associated with your tissue blocks.
We now have that for many different organs. Four of them are shown here. We also get to see in the lower righthand part here how different tissue blocks were extracted from here, kidney and spleen.
You can then use that data in the exploration user interface, and you can zoom into the human body, and you get to see the tissue blocks that were registered and you can select one, and you can get to see that there are tissue blocks from HuBMAP but also from the kidney precision medicine project, and you get some more information and you can click on one of those and go over to the test tissue browser from Nils Gehlenborg's team at Harvard University, Harvard Medical School, to explore these tissue sections in more detail.
If you wanted to learn more about the HuBMAP project and the new types of data that now become available, we can go over to the Visible Human MOOC, which is a free massively-open online course, which has many different learning modules that introduce HuBMAP, its data, technologies, but also some of these interfaces I just showed. Some of the modules are seen here.
This is a massive team effort involving not only many tissue mapping centers, but also a large team here at IU, and of course we would like to thank all the patients that agreed to volunteer healthy tissue and open use of their data.
The last ten minutes, I wanted to introduce the Data Visualization Literacy Framework to you. This is a framework we have been using and optimizing and refining over the last 15 years. I have been teaching data visualization courses at IU for 17 years now, and when we started, there was really no general guidance for this, and it was hard to then teach it. So over the many, many years here, we had an opportunity to first of all agree on different types that are needed, but also on names and terminology.
Specifically, we believe that data visualization literacy, the literacy to make and explain data visualizations, actually requires that you are able to have real literacy. You are able to read and write text. You have visual literacy, the ability to find, interpret, evaluate, use, and create images and visual media, but of course also mathematical literacy. We believe that you need to not only be able read data visualizations, but you actually benefit making visualizations, because it's only then that you truly understand how they come into existence and how they could be used or abused in some cases.
So the framework also then focuses on reading and construction. We take human perception and cognition into account. We try to build on prior work in cartography, psychology, cognitive science, statistics, scientific visualization, data visualization, learning sciences, there's really a lot of good work that has been done, pioneering work which can be used for such a framework, and we wanted to have a framework that's theoretically grounded, practically useful, and easy to learn and use. Plus, it needs to be modular and extendable, because new algorithms, new datasets, new challenges, new use cases become available on almost a daily basis.
Here is the development process, and in the interest of time, maybe I just introduce the framework itself to you.
The framework has two parts, DVL typology, so just types and instances, and proper names and explanations and examples for each. And then there is the workflow process of how you actually go about making a visualization. So here you see seven types, and you have some workflow on the right, and you can then overlay these seven types here. So you have stakeholders on the left-hand side. You have insight needs by those stakeholders. You then acquire the best data that you can afford or that you have time for or that you can budget for and that you can get your hands on. It's really important to have good data.
Then you have data scale types here. Then you analyze that data; there are different analysis types. You visualize the data. You have different visualization types that come with different graphic symbol types and different graphic variable types, and then you deploy your visualization and, again, there are different interactivity types. Then ultimately, you interpret your visualizations. You might realize that one year is missing or there's some erroneous data in there, or you get really interested to zoom into an area, and so you need to get more data for that high resolution inset, so to say.
Oftentimes, if you really did a good job as a data visualization expert, your stakeholders will have new questions. They will look at the data and they will see it in a way that they have never seen their data before, and they will come back with new questions.
So by going through this process again and again and again, you have an iterative refinement of your data visualizations, and it's very important to understand that there is this operationalization project step here from stakeholder needs to what can be operationalized, and at the end, your data visualization has to be translated back to stakeholders so that they know what the next action should be on their end.
You also will see that the visualization types could be mapped, could be a scatter plot, and then we overlay different graphic symbol types, nodes and edges here, and the graphic variables are used to size and shape and color-code your variables, graphic variables, based on the data variables.
So we have developed a Data Visualization Literacy Framework which makes it easy for you to upload data to then make a visualization and then pick a visualization type, then select a graphic symbol type, and then also to select a graphic variable type. So it implements the framework itself, helping students to understand how you get from your data variables to the graphic variables.
So now a deep dive into these seven types. So you have different insight need types. If you already know what insight needs you have, that makes it much easier to then pick all the other elements. For instance, if you have a geospatial question, a where question, then oftentimes map-based visualization and geospatial analysis are most relevant. If you have a temporal question, a trend question for instance, then temporal analysis more likely you'll find. Oftentimes, then graph visualizations are relevant, but also in some other cases you want to animate network roles over time, because you want to see the evolution of the scholarly work, for instance.
Then you also have these graphic symbols and graphic variables, and there are some examples of those, and then ultimately you have the interactivity types.
For the insight need types and for all the other types, we have built on prior work. So here you start with Jacque Bertin in 1967, who identified four different insight need types, and then there were many after him, pioneering works, that tried to bring order to all of this, and then on the righthand side, you have the types of insight needs that are captured in the DVL Framework.
Number four, you have different visualization types and I'm just zooming into some of those. So here you have charts, you have graphs, you have maps, you have trees and networks, and to be honest, it's not the case yet that experts would agree on this kind of classification. Some might actually think that a pie chart is a graph. So there are still discrepancies among experts on this, but I think you have to bring order to the language and terminology before you can teach these things and before you can empower others.
We are also very adamant about the fact that it's important to agree on a reference system. So here you see different reference systems, for instance a table, a graph, a map, and a network. If you have these reference systems, you can then see that all of them actually have an x-y aspect to them. So you have a column and a row. You have an x and a y axis. You have latitude and longitude, and even in the network, which is laid out so that are few edge crossings and the distances correspond to similarity between nodes or to distances between nodes, still, you can actually freeze your layout and you still then can refer to a node using x and y positions.
So as soon as we have that reference system, you can then overlay other data, and that's also shown here as we hear again in the lower part we see a reference system, and then there are different data overlays, first the graphic symbols, and then the graphic variables.
Let's go look at them in more detail. You can actually take those two, the graphic symbols and variables, and try to understand what they are, first of all. So here graphic variable types, you have position, you have form, you have color, optics, texture, and motion, and many of them are supported by today's data visualization tools, but not all of them. As you also know, we know from cognitive science and psychology literature, which ones are easier to distinguish and which ones are harder, and which ones are more accurately distinguishable by human beings. So position we are very accurate, angle not so much. That's why the pie charts are harder on us.
You can then take the graphic variable types and the graphic symbol types and create these tables, and you can even identify which ones of those graphic variable types are more qualitative or quantitative, and these tables are rather large. So in the Atlas of Knowledge, you would get the entire set of details and all the examples if you are interested to learn more.
So last but not least, I wanted to make sure that you saw that there is an entire course made for busy professionals to empower yourself and others to become more literate in terms of data visualization literacy. The next course will start September 20th this year, and you will have different experts present to you different data visualizations. You will have your own my project, where you bring your own data and you visualize it in many different ways, and then you also get to work on my project and get feedback of course on the way from many of us. So we have many U.S. employers presenting this to their students, and we had the true pleasure to have students from The Boeing Company, from Lilly, DOE, CDC, and many others in this course, really helping us also in creating wonderful expert networks, and they're still in existence, long after the course has ended.
So please go to visanaltyics.cns.iu.edu and check it out, if it is for you. It's a six-week course, and we would welcome you.
If you like books, there are now three atlases, the Atlas of Science, of Knowledge, and of Forecasts. The trilogy is completed, and of course there are quite a number of other books as well, including textbooks. So feel free to check those out as well.
I think I'm out of time now. Thanks so much, and I would be happy to answer questions; again if I haven't joined by now, please send it to Katy@indiana.edu, and I will follow up later on. Thank you all.
DR. CHEN: Great. Thanks to Katy. She was unable to get here due to her delayed flight, unfortunately, but as she said, if you would like to contact her with your questions, it sounds like she would welcome them.
So I'm going introduce our next speaker, Dr. Lindsay Zimmerman. I'm delighted to introduce Dr. Zimmerman, who received her PhD in clinical and community psychology at Georgia State University and then completed her postgraduate training at the University of Washington School of Medicine. She is now a clinical and community psychologist and implementation scientist in the Office of Mental Health and Suicide Prevention at the National Center for Posttraumatic Stress Disorder. She also holds affiliations at the University of Washington and Stanford University schools of medicine, as well as her position at the Veterans Affairs Palo Alto Healthcare System.
In Dr. Zimmerman's work, she leads research efforts which use participatory system dynamics to increase timely patient access to evidence-based pharmacotherapy and psychotherapy for depression, PTSD, alcohol, and opioid abuse disorder. Please join me in welcoming Dr. Lindsay Zimmerman.
Agenda Item: Keynote 4 – Dr. Lindsay Zimmerman
DR. ZIMMERMAN: Thank you, Dr. Chen. I'm really excited to pick up where the last talk left off with a use case that I hope will be interesting to folks. The talk here, Modeling to Learn: Test. Don't Guess., comes out of my work as an implementation scientist. I'm joining you all from Silicon Valley, California, to talk about this national quality improvement initiative in the VA and how we're encouraging staff at the point of care to test out their heuristics using dynamical and interactive team resources. So specifically multidisciplinary teams of nurses, social workers, psychiatrists, psychologists, and so on.
I work with an amazing group of scientists and partners across the country that make this possible. If you're interested I'll use my mouse to highlight down at the bottom under the purple map, if you're interested in a little bit more about who we are and what we do, you can check it out at mtl.how/team.
So as an implementation scientist, we're always wanting to reach more patients at the point of care with the research evidence that so many researchers at the National Institute of Mental Health and NIH in general work so hard to produce, and this open source comment from XKCD, you may recognize the pattern of vaccine results published by Pfizer earlier this year to combat our pandemic. The joke is the tip in the caption. Always get data good enough that you don't need statistics.
Well, I'm not going to bury our lede. I want to kind of start with what we've been doing for the last six years and show some things in a static way in the deck and then show things in an interactive way at the end. I know we're running about 10 minutes late, so I'll try to keep a good clip here to cover this work.
So first of all, these resources, Modeling to Learn, are effective by quality improvement standards. What do I mean? So if I just show this figure here, we've been looking at when we do these interactive and dynamic data visualization exercises with these multidisciplinary mental health provider teams at the point of care in their outpatient clinics, then if you look here where the y-axis is showing what proportion of the patients are getting evidence-based psychotherapy around these clinics and you're seeing several years of time across the x-axis, this is statistical process control. So used to determine when you've brought a system process to sort of a new case. Of course, you would normally be adjusting these upper control limits that reflect a 3-sigma improvement. So we're seeing that among other clinics that may share some staff and the same leadership in a regional healthcare system, clinics that used modeling depicted in the middle and righthand panel were able to in some cases double or triple the number of veterans, really in this case, it's like a 15-fold increase, and maintain that improvement over months.
So just starting there, modeling to learn with interactive and dynamic tools can lead to those types of real-world effects at the point of care. My goal is to answer some questions during this talk about why we think this works. So what works to increase the reach of evidence-based psychotherapy and pharmacotherapy? Assuming that most of us who are here today sort of are already involved in trying to make these data resources more interactive, more dynamic, I'd like to talk about pushing ourselves to think about who needs to be involved in developing these, and in particular, why participatory learning works to upgrade decisions that healthcare providers, mental health professionals, make in really dynamically complex environments.
Because I know that I can't represent this entire process in this amount of time, I just want to point you to a website that's available to you, mtl.how/demo. If you use this course code here, nimh_2021_data_vis, all underscores, all lowercase, then you can play around with this now during this talk. You can play around with it later. The code just keeps it open for longer for you. So you can do that.
There's information there; there's videos from veterans with lived experience in recovery talking about what they hope their providers will get from doing modeling to learn, and if you have any questions that we can't get to with the chat, I know I said we're running behind, email us firstname.lastname@example.org, and myself or someone from the team will respond right away.
So as we go through, I'm going to answer the questions I outlined for the audience, but I'm also going to ask you some questions assuming that we all know how important it is to make these tools more interactive and dynamic for people to actually get the insights they need.
So I'd just like to point out some orienting questions back that we've used to help people think about this, if, say, you haven't yet tried to measure whether people's interaction with your resources is working the way you intend or you haven't scaled it at national production in healthcare, or maybe you haven't thought about the implementation science side of this problem before. So I'm just going to use these questions throughout to kind of encourage you and prompt your thinking.
So for an implementation scientist, we have to think right away what's our working definition of our mental healthcare problem, and Dr. Borner talked about how problematic it is in terms of graphic variables and our data literacy skills, to use pie charts. This is kind of like the typical pie chart of the implementation improvement scientist. The question is sort of how do we get more patients to our highest quality care, and in our case on our team, we're talking about evidence-based psychotherapies for PTSD, depression, alcohol use disorder, and opioid use disorder. If we cover those four presenting concerns, we're covering about 80 percent of the reasons that people come into outpatient mental health care, and we have very strong evidence-based pharmacotherapies as well as psychotherapies for those presenting concerns.
The question is why, depending on how you slice it, you might get only one out of three of your patients to that.
By the way, I'll tell you, this is common in healthcare. This isn't unique to the VA by any means. And when I say depending on how you slice it, this is where dynamic and interactive data becomes important, because we mean over time. We mean do they get one touch, like they've even had a possibility of getting exposed to this treatment? Do they actually engage in care over time in a way that would meet their need?
These all involve the dynamic aspect or the time component, and most frontline staff don't have insights about how these things work over time in a way that they can access at the point of care in the clinic.
So we also think Modeling to Learn is effective by implementation science standards, where the difference is quality improvement is improving things globally overall in these local hospitals, and implementation science is what makes it work everywhere? Could we create generalizable knowledge about how people can interact with these modeling to learn resources we developed and could it work everywhere?
I just kind of want to show a few more static views that are better than the pie chart but still not good enough. So you'll see if you're studying dynamical systems, you're going to see some really common behaviors here, where I've taken on the left and we've put initiating an evidence-based practice like psychotherapy or pharmacotherapy on the y-axis and completing it, and if you know that you have an inability to wave a magic wand and grow staff where they don't exist, in rural Utah, or create hours in the day in a busy high-volume clinic in St. Louis, then you know you're going to see these characteristic behaviors of systems that include oscillations whenever there's a balancing feedback and a quantity that needs to be conserved.
So you'll see some sites where they're just kind of peaking and troughing, and they really have no insight into what is causing this, because they've been flying blind. They don't have the resources that they need to understand these patterns and know when they should change course or even how to take those patterns and move it in a trend that they'd like to see.
And this is really disempowering. It's not only bad for patients, but I'm using another joke here from XKCD where the first panel says I used to think correlation implied causation. I took a statistics class, now I don't. Sounds like it helped. Well, maybe.
And yet, as some of the other talks have pointed out today, these learners, these busy professionals, need to understand whether something is likely to work to the benefit of their patients in their local clinic right away, and they have limited time to get those insights.
So I would encourage all of us if we're hoping that more and more people will catch on to what we're building, think about who needs to be engaged early to refine and co-define those tools that you're building. How would you engage them and what makes it scalable and feasible for them?
We concluded that we really needed to establish infrastructure for participation in co-defining problems and modeling terms and that that would be actually be something that, given that models are always an approximation, we would be doing for years, that we would just continue to be moving toward better and better tools to upgrade these team decisions.
This really comes out of a participatory epistemology where if we're thinking about complex problems and we only have ourselves and our favorite colleague at the computer coding, we're probably missing some very important things that each stakeholder brings to understanding what the common dynamics of a system problem are. So we worked in our project over the last several years, we still do this, with all kinds of offices from policymaker levels to certainly provider levels, and we have a veteran advisory patient group and so on, across all the disciplines, to try to continue to further calibrate and refine and validate how these models can be used to improve care.
We didn't do that in a vacuum. If you are not familiar with the system dynamics tradition out of MIT, I do recommend and commend to you Scriptapedia. When we started this work, I was not agnostic to the idea that complex systems probably needed lots of stakeholders in the room to select features and to improve the dynamics that were addressed in the models, but fortunately, especially if you're early career, as I was when I started this work, there are off-the-shelf scripts for how to develop models of system dynamics problems and so forth at Scriptapedia, and I commend them to you that you could apply to some of the problems you might be focused on in your work.
If you're curious about, like, what it looks like when we very first started that, I just will point out this paper from six years ago where we talked about the quantities that staff really wanted included in their models. So when we were trying to figure out how to reach more veterans with these evidence-based practices, part of their concern was the places where patients were accumulating in undesired states of care, having extra stops, and as you can see here, they've even really focused on the difference between how they allocate their time, like their clinical schedule, versus how they allocate their time in terms of what they actually do. For example, consulting with each other, calling a caregiver, if you're worried about lethal means, access to lethal means for suicide, and a number of other ways in which the quantity of time is critical to improving things locally and absolutely hard to access without computational power and interactive tools that show how things change over time.
I think it's really important for us to be thinking about that end user of the model from the very beginning. So we're talking often about who will use these resources and what decisions do they make in what decision-making context. So are we talking about people that are responsible for patients? We are in this project, and although we work with people from Washington, D.C. on down, we're not focusing on the decisions they make about policies and dollars and so forth, but we're really careful to think through how learning from these interactive tools map to the decisions our decisionmakers can make.
And most of the time what people have in healthcare quality improvement is a bunch of retrospective data reviews that tell them how well they've been doing against some sort of benchmark or standard, and when people see something that they like, that's good. When they see something they don't like, obviously that's bad. When they have no idea what causes either, it's really bad, because what happens is even when you see something you like in the data, if you don't know how it emerges over time, if you can't understand those causal feedbacks and dynamics that contribute to that system behavior in your team, even when you see something you like, you might be afraid to try another improvement and that can actually prevent any solutions.
So we've developed both data user interfaces and simulation user interfaces, mainly related to the regulated data stores we use in terms of why they're not fully integrated yet into one platform, and we encourage them to use simulation to try to explore all those questions they would like to explore about what if we tried this locally, what if we made this decision in our clinical practice.
So if we're thinking about who needs that ongoing decision support, we really started getting local fast. So I'm not going to read all the details of this table. I just kind of want to highlight some ways in which the bold rows might differ clinic from clinic, the different demand for your services. You have a different local mix of providers, not all providers provide every evidence-based mental health service, and so you're always trying to take this dynamic, constantly changing local context, and it would be reasonable for clinic 2 to just doubt whether something that worked in clinic 1 works for them, because they would be fully aware of what all these differences are, but they would not be made manageable or tractable for their decision-making.
So I really want us all to kind of keep going and just be part of the voice with all of you, because I know that's why you're here, about thinking about ways to make this accessible and transparent and why scaling it is important at the point of care, not just in our research grants but even research grants that occur at the point of care in our health systems.
So we want this to be empowering. We're trying to upgrade the decisions that healthcare teams can make. Why we think this will work, this is an adaptation of John Sterman at MIT's, what he calls double loop learning, and if you're not familiar with this, I really commend the American Journal of Public Health paper from 2006 that will be in the references at the end called Learning from Evidence in a Complex World.
I'm just going to give a quick overview. You may not know that they have been writing about graphical user interfaces, plain English definitions of your partial differential equations, in the system dynamics tradition, genuinely for decades, and the distinction that John Sterman makes is that it's really, really hard to make decisions when in the real world you have all of this dynamic complexity and time delays between your actions and your ability to observe them.
As a result, as a learner, I'm using my mouse work to trace here, if you can follow it, I'm in the information box -- in the real world, the cloud on the righthand side means you have missing data. You have observations about some things and other things with the delays, it's hard to know that the quality improvement initiative you did in the clinic in March explains what you're seeing in October, and something that should strike fear in all of our hearts as patients but we know is totally true as providers, our providers are making decisions all day every day by heuristics that may be flawed.
So system dynamicists have cited Herman Simon and his Nobel prizewinning work on bounded rationality, and said, listen, without being able to upgrade these decisions safely, via simulation learning, then it's really scary in the VA to make a mistake. When we have high risk of suicide, we have relapse and other chronic impairments that are top of mind for providers, it's very scary to just learn by trial and error.
Fortunately in a virtual world you can control experiments. You can get complete real-time immediate feedback about the impacts of decisions, which is critical to learning and generalizing your learning. Much as we as children benefit from our pain receptor system's efficiency in telling us don't touch a hot stove, oh, and maybe that generalizes to the campfire and to that hot radiator over there, this is what providers who are busy in a clinic need, is complete, accurate, immediate feedback where no veterans or patients are harmed, in order to make more correct inferences about what's likely to happen over time in their common problem.
And that's why with Modeling to Learn, learning can be the goal and it can be safer, which for any of you who do clinical work in one of your hats, you may realize how disempowering the quality improvement infrastructure can be, where it's evaluative and learning is not the goal.
So we really started focusing on frontline teams making evidence-based practice care decisions, and we realized that learning from modeling conferred several advantages where people are not able to adapt to those dynamic decisions, and they're not able to coordinate their mental models, and they're also not really able to evaluate their EBP-specific constraints, like how much time in the day they have or how much staff they have to deliver a given EBP to a patient.
So we really encourage the teams to consider the physics of their own local problems, where the main constraint is their time. So most teams, and I've adapted this slide because my mentees have told me it's helpful, but most of our frontline staff, social work, nursing, psychiatry, psychology, peer support specialists, it's not clear to them that you could generalize the same modeling infrastructure and same data sources and come up with a local solution. I will often grab whatever's on my desk, a stapler, a piece of paper, and say, well, you know, we rely on this all the time. We rely on system engineering to help us understand how if I were to chuck both of these across the room right now, then they would land absolutely in different places. They would follow absolutely different trajectories.
But the variables that I should account for in accounting for that flight path over time are the same, and we should be able to account for the differences in their math, their aerodynamics, and come up with insights that can help upgrade our decisions, in your own team, in your own clinic. In fact, we might be able to do it faster than just the really disempowering trial and error.
So it's empowering to think that something that currently just feels bigger than you at the point of care in the healthcare system might actually be the accumulated result of your own decisions. It's just adding up to something that maybe nobody wants. It's better to actually be able to get far more nuanced understandings of the tradeoffs and trends and take that timescale of two years and rather than wait until two years have gone by and realize, oh, my gosh, we've just got more turnover because we've just got more burnout because we've just got more inability to deal with the dynamic complexity of care here, and instead, start to get insights about time-based patterns in a clinic meeting, in a staff meeting.
So for example, here you can see I'm using the left upper panel and my mouse work to talk about the time between visits, frequency of return visits, and in this case the red line here was talking about a team that currently wasn't about to get people due to low resources in for psychotherapy any faster than every 16 weeks, every four months, which as we know is not our evidence base for cognitive behavioral therapy, for prolonged exposure, any of our evidence-based psychotherapies typically would want near weekly or in some cases even a more intensive frequency of visits.
So the question is we know it's going to get worse, but how long would it get worse before it got better, or how would it impact the other services? Those kinds of questions, most of our providers know the complexity of the problem, but they have really not been trained in understanding the dynamics.
For those of us working in this space, I think it's really important for us to always look for the those common data sources that can feasibly scale so people are not reliant on us in the future, and think about very carefully what are the units that really matter to our decisionmakers. So in our case in healthcare, we benefit from looking at data sources that are common across all of U.S. healthcare, coming from the Centers for Medicare and Medicaid Services. So basically if you're interested in the models, you can test them. You can try them in your healthcare system. If you use common procedural terminology for encounters, you can use it. If you have a scheduling system that tracks time, you can use them.
But what's really powerful for the team, again sort of sticking with a static view for a moment, is being able to initialize this with local values and tell a system story about the states of care where patients accumulate and about the rates of change that govern the transitions between those states of care. Thinking about complexity feedback and change over time.
So what does that look like? Well, we have a lab notebook that everyone interacts with when they go into the user interface, and we also coach them through this process of clarifying what learning question they have, their dynamic hypothesis. So if we made these new decisions in the clinic, then we expect this over time. Of course, what we're trying to is move people from very simple linear cause and effect understandings to more complexity, to move from like stories that have a clean line like a beginning, middle, and end to stories where there's a loop. So the beginning can come back and influence, the end can influence the beginning, as effects are reinforced or attenuated over time, and to really think in movies, as others have talked about today, rather than snapshots. So we just guide them through this interface, clarifying what they find and thinking about how they could upgrade some of the decisions they make all day every day.
The theory of change that we're testing in two large multisite national phase III cluster trials. Let me say that again. Multisite implementation trials and they're phase III cluster randomized trials. So we're randomizing in our NIH R01 some sites to just get interactive dynamic data tools without the what-if scenarios and the simulating the two years ahead. So that key idea of being able to grasp what's likely ahead of us and what's the impact of our decisions, we're trying to isolate those what-if scenarios over time and instead just look at the value of the graphics and the visual interfacing with their local data. Two-arm trial, and we're going to be measuring changes in system thinking among the staff.
That's consistent with the system dynamics literature where there's lots of evaluation of the ability to increase your skills, managing complexity, feedback, and system behavior over time. But there's also just the possibility that when you can go into an interface and interact with it naturally, then maybe you're just engaged more, and if you don't have prebaked solutions set up by your auditors or your accreditation bodies or your quality improvement business office, if it's really something that you can use towards your own ends related to the stressors you have in your clinic, maybe that's just what's driving it, the ability to engage in a mutual learning process. So we're actually isolating some other aspects of the user interfaces in our VA trial to test that out.
I'm not going to go through all of this, but I'll just kind of give you some snapshots before I kind of wrap up and show a little bit or see what questions we have. So we developed a secure website where teams could start to get a sense of trends over time. That's kind of where we begin in our partner phase, and then we start to ask them what types of questions they would really be bummed if there was all of this, like, Cadillac souped-up technology, and they didn't get an answer to.
Usually what people know is they know the complexity, but they don't know the dynamics. So they'll have a question like we know we should be getting more patients with opioid use disorder medications, but we only have some providers with these drug enforcement agency x-waivers who can provide that care. It's a smaller population that we think should come back more frequently to assess for therapeutic response to methadone or buprenorphine, and of course the modal presenting concern we have around here is we have a bunch of veterans at high risk for suicide with depression and high risk for relapse with alcohol use disorder, and we have the same set of providers in this town to deliver these evidence-based pharmacotherapies.
So we know there's these tradeoffs here. We know populations vary. We know that how often they should come back to be evaluated vary, but we have really no way to develop a good insight about exactly what we should do with our local staff mix to serve this local community.
And it's important that, against this base case, the staff are able to experiment with a lot of different changes they might make. So we initialized the model with the local data, and they experiment up or down from the base case to see what they think is feasible.
The exciting thing is that you can start to move people from sort of hypotheses that they have that are sort of scary to them, like you might have a prescriber say I really know that we have an opioid crisis in the United States, but I am really, really burned out and I'm really, really, worried that if I start taking on more patients with opioid use disorder medication needs that I'm just going to be overwhelmed and swamped and it makes me not even want to think about this problem. I do like having some well-managed depression sometimes on my caseload, or other fears.
So the nice thing about being able to explain things in terms of feedback dynamics and being able to explore them safely in a simulation is that you can kind of address those fears and help people think about those tradeoffs and they might find a smaller or lighter lift makes a big difference.
For example, for this team, this is one of our real teams, they found that if they just reallocated about 20 percent of their x-waivered slots from their depression patients to focus on medication for opioid use disorder, that actually some of their patterns due to all the balancing feedbacks in the system would actually even right back out over time, and they could get through their backlog with a relatively smaller change than they feared. And they concluded basically exactly in this relatively small team, how many referrals (indiscernible) over time.
So you can see here, you're zooming in and out on an entire flow of patients and trying to help them connect the causal dynamics to the system behaviors they produce.
So wrapping up, I do want us to think about our long game, about how decisionmakers will interact with these things. One of the things I was finding was that the biggest sell was that people could find these insights faster with simulation learning. So we developed a 12-session plan that's now accredited by all those disciplinary bodies in psychiatry, psychology, social work, and so on, and we basically join a team huddle twice a month for six months to walk them through interacting with their data, learning how to tell causal systems stories and running through a number of decision scenarios they'd like to try to improve what's going on in their clinic.
I think for many of us what will be sort of the state, for those of us who already know how important interactive and dynamic data is, is to clarify not only what needs to be in those graphic variables or what time dimensions and units need to be accounted for, but exactly how we expect people to interact with them and what we think will be happening for them in those cognitions and to stake a claim about them in our NIH grants and in our research so that we can develop generalizable knowledge about how they work.
So I mentioned the trials I had earlier, and I know Michele mentioned this in his opening today. We did for this R01 submit one of those two-minute interactive videos showing how this worked for reviewers to consider, and it was funded. I do think it helps for people to see that quick dynamic interaction, and I'm just calling out for you these two aims here where we're really trying to test whether that participatory learning from simulation is shifting the heuristics that providers are using to accomplish and account for more dynamic complexity over time and greater engagement with the visualizations that we produce.
So to that end, if you do have questions about this, I would encourage you to check out that mtl.how, knowing -- I'm happy to answer what ones we have here, but go to the demonstration website, log in there, check out all the guides. We've been developing this on GitHub open science for years. So all of our deliberations, even ones from yesterday, are available for transparent review. All the code is downloadable, the models are downloadable. Our guides for facilitating folks, the accredited guides, are all available for use by anyone.
And if you just want like a really quick view of that partner, build, and apply view of how to get folks through this process and really try to focus on shifting those heuristics, you can also just watch videos at mtl.how and see what you see there.
On the one hand, some of this may feel like you're -- we may feel like we're really trying to capitalize on new computational power and new capacities for visualization and interactive dynamic tools, and that may seem very innovative, but in fact, at least in many of our applications, when we think about the stakes of just hard work, trial and error, in the real world, we're probably glad that our surgeons have simulation training before we get under their knife, that our pilots have simulation training before we get in the plane, and it makes sense that similarly some of the insights about these dynamic complex care problems in mental health also benefit from looking before we leap and helping the team of providers to share a mental model that accounts for dynamic complexity over time.
With that, I want to thank all of our funders, all of our team members. I'm going to leave up a couple of key slides of references while I get to the questions. And I do have the tool cued up in the background to show anything that people may have, if we have time for that. But I do know we are running a little bit late. So I'll take my cues from the other panelists about that.
DR. SAGGAR: Thanks so much, Dr. Zimmerman. Really great talk. I think we are running really out of time, but you can answer the questions live on the chat Q&A thing. But we'll probably move on to the next speaker. But thanks again. Wonderful.
Okay, it's with great pleasure I introduce our next keynote speaker and my gold mentor, Dr. Lucina Uddin. Dr. Uddin is an associate professor at the University of Miami, where she directs the Brain Connectivity and Cognition Lab. She received her PhD from UCLA and completed her postdoctoral training at the Child Study Center at NYU. Before joining Miami, she was a faculty member here in our psychiatry department at Stanford. Her current work is focused on understanding dynamic network interactions underlying cognitive inflexibility in neurodevelopmental disorders such as autism.
Without further ado, please welcome Dr. Uddin.
Agenda Item: Keynote 5 – Dr. Lucina Uddin
DR. UDDIN: Thanks, Manish, for the introduction, and thanks, Michele, for organizing this. I've already learned a lot, and thanks to the other organizers. Today I wanted to just share some thoughts about data visualization for network neuroscience.
So when I was thinking about preparing for this workshop, just this month there was an article in the New Yorker that was a little bit sensationalized, but was really interesting. It talks about when graphs are a matter of life and death. So it's really talking about the importance of data visualization, and here we see a figure that was created back in 1824 that is supposed to be the first time series graph that we know at least, plotting the prices of different things over different quarters, and it was just showing that at some point, someone had to invent this type of graph. Like we didn't think about time series data in this way, visualizing it, and in fact, when it was first introduced, people didn't know what it was and we had to really walk everyone through what this visualization meant.
So these things take time for people to adopt and to understand, but in this particular article in the New Yorker, they talk about how things like the train schedules and the ways that trains go across tracks has been plotted out. This is a plot from 1878, and basically showing how it really avoids them crashing into each other, if we can visualize them in a way that lets us know what's happening when.
I thought another nice example came from racecar driving where they can often plot instances of when engines fail versus the temperature outside, and the red dots are showing incidence of damage or destruction in the race scenarios, but if you didn't plot the blue dots there, which are -- those are when no accidents happened, you can see which temperatures outdoors resulted in no accidents, without plotting both the sort of positive data and the negative data, you might miss an important point about these relationships. So I thought it was a nice introduction to the topic of data visualization, how it sort of -- we may not necessarily see how important it can be in different aspects of really life and death, as this article nicely points out.
But in the case of network neuroscience and cognitive neuroscience more broadly, which is where I work, I wanted to think about why data visualization is particularly important for us, and I think to convey things like anatomical specificity where exactly in the brain are we talking about, dynamics and time varying properties of the brain, which I'll definitely get into, but really knowing that the brain isn't sort of a static organ, it's doing things at millisecond and second and minutes in terms of the timescale and trying to really capture those is a challenge for data visualization.
Of course, it's a three-dimensional structure, and any time you try to map a three-dimensional structure into a two-dimensional space, something is lost. So how can we get it back, or what's the best way of retaining the three-dimensionality in our depictions?
And finally, complexity. The brain, with something like over 100 billion neurons is not going to be something easy for us to plot. No matter what we do, we're going to lose something in the data reduction, but there's still ways as we've seen in the talks all throughout the day, there's still ways of keeping information. We're getting better and better at this. So I think that's a positive development.
There was a paper from Vince Calhoun's group a few years ago talking about data visualization in the neurosciences overcoming the curse of dimensionality, and there were some nice figures in that paper talking about going -- how you can convey more information from means and standards deviations, you can convey more information using things like violin plots than bar plots, and since this paper was published I think about ten years ago, I've seen more and more, as an editor and as a reviewer of papers, I've seen more and more people really opting for the more rather than less information like box plots or the violin plots which give you more information about all the individual data points that went into a chart. So I think the field has largely adopted these practices already. These are just some ways of displaying activation maps on brains; instead of just showing something above a particular threshold, there's ways of showing the gradations of the variation in plot you can see on B, and of course error bars and confidence intervals and all of these things are more and more I think almost expected in our figures, and I think that's obviously a positive development, if the goal is to convey more and more information.
There's a nice chart here in that paper where they're talking about some suggestions for improving clarity and completeness, sort of not trying to hide anything in the data as we heard about in earlier talks today, and there's just some very good recommendations in this paper, how to convey uncertainty and how to use colors in a way that is easiest for interpretation.
The funny thing about it is that as neuroscientists, we are not usually experts in data visualization. We're not even very good at science communication. Maybe I'm just speaking for myself. But we're not trained to do this, but we have to do it. We have to put figures in our papers and our grants and in our talks, and we have to communicate our science.
So it does behoove us to take a little bit of time to think about these issues, because if we can't convey the information, we really are at a loss. So there's another paper that was out about a few years back that looked at data visualization for inference in tomographic brain mapping that talked again about the advantages of different ways of viewing brains, and of course you would have to sacrifice precision sometimes in order to increase interpretation or -- it's sort of a tradeoff at many times when you're trying to decide how to depict an activation, for example, whether you give it a slice view or a whole brain view or a glass brain view.
And there's again this discussion about thresholding and what that reveals and what that hides, and another table that gives an overview of the different ways of showing sort of brain findings, and how you might want to of course tailor your approach to visualization based on what you're trying to convey. There's not a one-size-fits-all exactly approach for any of these things, as you can imagine.
I always like to use this figure that Daniel Margulies created, like to use this figure when I give talks, because it just highlights something that we know from single unit recordings. We know from EEG and we know from fMRI that there's a whole lot of spontaneous activity that the brain sort of produces, and there's this in particular showing the low frequency fluctuations that contribute to what we call nowadays resting-state functional connectivity, and the idea that you can see the recapitulation of brain networks that we know to be involved in cognitive tasks like memory and attention and vision and motor processes, all of these networks that subserve these things are actually coherent in the resting-state, and so you can see that really nicely here.
These kinds of visualizations, whether they're time lapsed, whether they're sped up, convey something that you cannot possibly convey I think from a static image, even if you have a series of images showing the change, the gradualness and the anatomical specificity and everything, I think, is really best conveyed by movies.
So I excited to see, for example, Tim Behrens talking about how this kind of movies can be embedded in papers at eLife and these kinds of movies, as Michele mentioned, you can -- I didn't know that you could submit movies as part of grants. Now that I know that, everything I submit from now on is going to have some kind of movie in it. Not just because it's cool, but because I think it conveys more information than the static image ever could when it comes to thinking about brain dynamics in particular.
Our lab studies how brain networks develop from childhood and adolescence into adulthood and how that supports cognitive development, particularly processes like executive function and cognitive flexibility. So any time you're trying to convey change over time, all the issues we've talked about during this workshop about visualization are I think even more pertinent.
The field of human connectomics, if you want to call it, presents particular challenge for visualization, because we have all of these dimensionality (inaudible) as we talked about. We have just many, many parts of the brain. A lot of times we're taking our images and we're already making a bunch of simplifying assumptions. So instead of the hundreds of thousands of voxels which are each a data point in their own right, as Manish mentioned in his talk, there may be ways of getting back to recovering all of that data, but by and large, we're actually summing over large brain regions, perhaps using an atlas, some type of parcellation that someone has agreed upon to divide the brain into these kind of anatomical or functional areas, and then doing summaries to look at how those particular areas change in their activation levels over time.
Perhaps we then go on to depict this as a connectivity matrix where we're looking at each brain region by each brain region and how strong, for example, the correlation is between any two given brain regions. This connectivity matrix that you see here is something that you'll see in now hundreds and hundreds of published papers, and sometimes we don't even really describe what's on the matrix, like the labels are too small on the y and x-axes for us to see what brain regions are actually being referred to.
So I think we have a long ways to go as far as making sure we actually do our visualizations in a way that conveys the information clearly without too much room for ambiguity. But I'm saying all this that, like, we have a lot of challenges in visualization for network neuroscience, but I think you can see there's a lot of people working to make kind really beautiful and clear pictures that show what's actually going on.
Then in connectomics, we often do these analyses network kinds of analyses, where we treat each brain region as a node, and we can treat either a functional or structural connection between two brain regions as an edge, and of course all kinds of metrics can then be computed on those graphs at that point, and you can on the right just showing some things like path length, how many links must be traversed to go from point A to point B is one metric, clustering coefficient, how tightly clustered is a particular node in a network, and things like rich clubs and hubs and modules, all of these concepts are now sort of widely applied in network neuroscience research, but the visualization of them is just really critical for conveying what the findings are.
So I think the more we spend time getting that right, the better our insights I think will be into these network structures of the brain. So I think it's worth spending the extra time as it were to get the visualizations right.
If you were old enough, you might have seen images like this, or if you had looked back on papers, neuroimaging papers, published PET and fMRI papers from the late 1990s, early 2000s, the output of the SPM software looks like this. It puts on a glass brain the activations, the contrasts of interest. Here's basically some task fMRI result, and this was what people did. These are the figures you would see in a paper, and your guess would be as good as mine exactly what brain regions were being activated by the task here, and of course you'd have the tables showing the peak clusters and their coordinates. But it wasn't long ago that this was our sort of state-of-the-art for making a figure for brain imaging.
So obviously we've come a long way, and it's nice to see how many tools there are, how many different ways there are now of conveying the full scope of the information from a neuroimaging study. This is not very recent. This is something I did I guess ten years ago now, but when I was a postdoc and I submitted a paper here just trying to show there's functional connectivity differences between two brain regions that are adjacent to each other, but the reviewer came back and said -- I had shown some surface maps, and the reviewer said if you're talking about brain regions that are hidden in the sulcus, you really need to show a flat map, and I thought, great, now I have to spend like another two weeks learning a new software, figuring out how to make flat maps and that was a lot of -- at the time, I thought that was a lot of work.
But the truth is like nowadays, there are so many tools that make these things easier. At the time, I used a tool called Caret, but there are many, many different software packages that have been developed to let you look at the brain in these different ways, and some people, rightly so, will insist that if you don't use a flat map depiction to visualize areas in the sulcus, you're missing out on some important information. So I've come to since appreciate that reviewer's insistence that (inaudible).
But this is also a figure that I was using a lot from a paper that was published by Sridharan and colleagues at Stanford a while back now. I used to use this figure a lot in many of my talks, because much of the work I started doing during that time was -- this work was a starting point for a lot of the work that I was doing, but what it's doing is showing Granger causal analysis, basically interactions between brain regions, causal interactions between brain regions. The networks are sort of labeled by colors, blue for what's called the salience network, yellow here is the default mode network, and green is what's called the central executive network, and these are just showing causal interactions between the brain areas of these networks in a task-free resting state, during auditory event segmentation task, and during a visual oddball attention task.
I used this figure a lot in talks, but then at one point someone gave me some advice, I think -- I don't remember who it was but I should really thank them -- about how you don't have to just use the figures that are in a paper in your talk. If you want to simplify them, you can actually make them again the way you want to make a point in your talk or to be consistent with some other overarching theme, and at the time, I'd started doing studies on these same brain regions as well, and I started using BrainNet Viewer, a MATLAB-based tool for showing all these interactions between brain regions on an actual brain surface.
So I remade his figure to use in talks, and it looks better, I would say. Like in a sense, like you can see where the brain regions are that we're talking about. You can see more clearly the anatomy of the networks and the interactions between them, how consistent some of these findings are across task and rest, but just remaking someone else's figures, that was like a flashbulb moment for me. I didn't realize that you could do that as a scientist, because your goal here in the talk is data visualization and science communication. I think it's okay to realize that we often simplify results when we're trying to present them to the public.
So I did this going forward. If I saw a figure I wanted to include and I wanted to make it look a particular way without losing too much of the detail of the findings, I would just make them again in some software that has since been developed. So this is all sort of part of what we do and maybe to make things clearer to our audiences.
There's a lot of work in developmental cognitive neuroscience and developmental network neuroscience that really benefits from some of these dynamic visualizations we've talked about. Damien Fair published a paper a number of years ago showing differences in whole brain network structure from childhood to adolescence into adulthood, and the static figure is of course nice for conveying the point that graph structures change over development, but he also has in supplementary materials a lovely video that shows you how this actually happens from the young age, the eight-year-olds up through the adults, and you can kind of see where the nodes of the different networks change affiliation and how they result in a more segregated system over the lifespan or over that early part of the lifespan.
I really like this, because it's more fun to watch in some ways, and also really gives you a better sense for how these brain regions kind of change their connections over a period of time. So when we can do these things, even before we had some of the tools we'd use today, I think they added a lot, but in the past, of course, they were all relegated to supplementary materials, and we're hearing now how there's maybe ways to embed them in the primary publications.
We also, of course, have seen these kinds of figures over and over again, just brain maps looking at functional connectivity of particular brain region and negatively correlated brain regions. This is from Mike Fox's seminal work. I like this figure that comes from one of Thomas Yeo's papers in 2011, I believe, that here he's just basically taking a seed region of interest and moving it around and showing you how functional connectivity changes gradually in interesting ways as you go across the cortex.
Not only is it fun, but it gives you a feel for really like how dramatic there are changes across some particular borders, for example, and how there's more smooth transitions and gradients of functional connectivity at other points. So visualization is fun and games, but it really does like give a better way of understanding what's going on in terms of the neuroscience.
While I'm at it, I'll just show another one of Thomas's great visualizations. He has this tool for data exploration. In one of his 2014 papers, where on the left is kind of a static image and on the right, if you go to the website and you click on one of those components on the edge of the circle, it changes dynamically to show you what's unique to that particular component. So any user can get on the website and play around with that, and it's really a better way in some ways to understand what this really complex data set is showing.
These are some images from a paper that one of my former graduate students, Taylor Bolt, is currently working on where he's talking about traveling waves of activation across the cortex, and what better way to understand what he calls -- he and Shella Keilholz -- call quasiperiodic patterns. What better way to visualize them than to really have a movie that tells you what's going on at a particular point in time. You can see transitions between particular brain states. You can see how long a particular configuration persists and many other things. I'm just not doing justice to this paper, but it's on bioRxiv if anyone is interested in checking it out.
But it's a tool for data exploration and oftentimes lets us see commonalities across different tools that we might not have seen otherwise. So I'll leave the details of this out for the interests of time.
When I tweeted about I had to give a talk on data visualization, a lot of people offered up some nice images that I could show for this. I appreciate that. This comes from Faruk Gulban, who's talking about cortical flattening and has some software here to do that, and you can see how really nicely that conveys some of that information that you wouldn't know from a static image. This is another example of data visualization from Eduarda Zampieri, which she also kindly provided this in response to my desperate tweet.
So to sort of learn how to do this, I was impressed to find how many free tools there are online. On Coursera there's something like 700 different courses. I'm sure one of them would be relevant for what we would want to do. There's all these different universities. I think both Stanford and Harvard have data visualization courses online, and I'm assuming that others do as well.
And these are some of the tools that I used, for example, in some of the images I showed you earlier, brain viewer, many of these things can be found on GitHub. There's just too many to list. MRIcron has been around for a long time, but there are so many that I've just left out of here, but we all know that if we do a little bit of searching, we can find ways to make our figures better.
And the final thing I'd like to mention is that sometimes the figures we make for our paper, they're art. They end up being visually very stunning and beautiful, and in recognition of this, the Organization for Human Brain Mapping many years ago started the Brain Art Special Interest Group and every year has a competition for people to submit their beautiful images in very creative forms, but there's even a category called Beautiful Mistake, where if you use software and you tried to make a figure for a paper but you ended up making a mistake but it looked really cool, you can submit that as part of the competition.
I think part of what we do, it is art when we're -- it's graphic design, it's all of these different components and the kinds of things we end up with are really quite beautiful. So I think we should sort of be proud of that art that we're creating for our figures when we're doing data visualization for network neuroscience.
So since we're out of time or close to out of time, I'll just say thanks and if there's any questions, I'm happy to take them.
DR. FERRANTE: Thank you, Lucina. This was a very impressive view of the field of neuroimaging and all the data visualizations that, like you've said, you made a very good impression.
So that said, I want to thank all the organizers and all the speakers. I learned a lot today, and I was very impressed with all the presentations so far. I want to also thank all the attendees. We had over 1,500 registrants from all over the world.
And I wanted to just let you know we have a break, and we will come back here at 2:15 Eastern time, and during that time, we will have the tutorials. So if you want to learn how to do this beautiful presentation, some of them, you are more than welcome to join us.
I see that Janice and Manish are online, and Jeremy, all the DataVis organizers team, and Emily. Thank you, everybody.