Director’s Innovation Speaker Series: All of Us Research Program: Improving Health Through Innovative Technology, Large and Diverse Cohorts, and Precision Medicine
DR. JOSHUA GORDON: Welcome, everyone, to the National Institute of Mental Health Director’s Innovation Speaker Series. I am really pleased to have with us today Dr. Joshua Denny, the Chief Executive Officer of the National Institutes of Health’s All of Us Research Program, to talk about that program and what it offers for precision medicine.
Before we hear from Dr. Denny let’s just go over some usual housekeeping notes. Please, if you require technical assistance, use the Q&A box to communicate with the event production staff and they will do their best to help you.
At any time during the webinar you can enter any questions or comments you might have into that Q&A box. Josh and I are going to engage in a fireside chat of sorts -- no fire, but the internet I suppose serves as that -- but we will sprinkle your questions in throughout our conversations, so feel free to enter them at any point in time and I will do my best to introduce those questions into our conversation as we go along.
The third point is that this webinar is being recorded and the recording will be made available in the coming weeks on our website, nimh.nih.gov, and I encourage you, if you enjoy today’s session and think you know somebody who would like to learn more about it, to point them to the recording so that they can enjoy it at their next opportunity.
With that, I am going to give Josh a chance to introduce himself. Josh Denny is the Chief Executive Officer, as I noted, of the National Institutes of Health’s All of Us Research Program. Josh, feel free to introduce yourself in any way you see fit, though it is always nice for us to hear about how our colleagues have gotten to the point where they are and what they like about what they are doing. Go ahead and introduce yourself.
DR. JOSHUA DENNY: Josh, thank you so much for inviting me to be part of this fireside chat with you, and I certainly am excited to engage with your audience and take some questions when there is time. And I welcome the opportunity to introduce myself in a less formal capacity.
I am an internist and informatician by training. I did actually all of my training at Vanderbilt from medical school, residency, fellowship in informatics. I got started doing stuff on medical education actually and processing a large corpus of medical documents and doing natural language processing on those education documents as a source of information.
That work found its way into medical records, and after using this with medical records, I started looking at genomics, and that just found its way into large-scale biobanks and the opportunity to engage in what became known as the All of Us Research Program, starting really as just a concept in late 2014 and then the President’s announcement in 2015 at the time of the Precision Medicine Initiative. I have been involved with it ever since and it has been just an incredible, exciting ride through that time period.
I am a dad of four kids as well, so that means soccer games and football games and a ballerina wanna-be, a six-year old at this point.
DR. GORDON: I think even your Google calendar is big data.
DR. DENNY: So even my Google calendar gets big data, and we all get all hours of the day, right. Anyway, it’s a real pleasure to talk to you about the All of Us Research Program and how to learn from your audience and also engage in how I can help advance mental health amongst other topics.
DR. GORDON: Great, Josh. We will get into that right away, although I might come back to some of the details of the origin story. It always intrigues me why and how doctors end up being informaticians and actually what informatician means. It sounds a little bit like magician so we’ll figure that out.
I am just going to ask you a question I know you’re prepared for. Tell us about the All of Us Program. Many people on the call probably know what it’s about but many don’t, so if you could just give us a general introduction, and I know you have some slides to share.
DR. DENNY: Yes. I will share my screen and just go through some introductory slides of the All of Us Research Program. Our real goal is, writ large and across really all disease and health statuses, to make precision medicine a reality for everyone. This is our mission statement: to accelerate health research and medical breakthroughs, enabling individualized prevention, treatment and care for all of us.
Those words were intentionally chosen, really as with all mission statements, but to really hone in on that first branch: nurture partnerships for decades with at least a million participants who reflect the diversity of the United States. A real strength of our country is its diversity. It’s something that has not been well represented in most of our research studies and we really wanted to partner with our participants, which means things like returning value to them, which is in some cases information but it also means doing research that matters and delivering real change, and how we can be a positive catalyst for change in the research community.
We want this program to last for a long time, decades, and really deliver to as many researchers as possible a large, rich biomedical dataset. We are a platform that others come in and do research on, and we need those of you in the audience to come in and do work and make discoveries.
We launched nationally in 2018. We have since enrolled over 543,000 participants who have consented. People can participate in different ways, but regardless of how they come into the system, they share the same kinds of information and sign the same consents. Right now, we have over 400,000 who have contributed biospecimens.
You can see that, like all systems that have a significant in-person component, which we did at the time, Covid-19 hit us hard and we had to pause all those in-person activities, and we used that to make ourselves stronger. We have launched with now the ability to reach all 50 states and US territories through things like saliva kits, more national partners, safer ways to engage, and really can get biospecimens everywhere now and, of course, have restarted our in-person activities as well.
This gives you a sense of our distribution. We have all 50 states and US territories represented. We have more representation where we have large medical centers that are recruiting through brick-and-mortar, and like I said, we have really focused on diversity. Fifty percent of those contributing biospecimens are diverse by race and ethnicity. Over 80 percent are diverse by the larger measure of under-representation in biomedical research, and you can see those categories on the bottom right. They have to do with age, race and ethnicity, sexual gender, minorities, educational attainment, income, rural location.
We recently added disability to that metric and our participants are engaging in an ongoing way, so we did things like send an updated survey to them to assess disability because those were not questions that we asked at the very beginning of the program. And that is a theme; we continue to ask questions of our participants as things go on.
These are the five major data streams and it starts with consent. A key part of that is an authorization for sharing electronic health records. It becomes a really powerful measure of someone’s health trajectory and provides a lot of data without them having to provide it.
We had sets of surveys at launch and then we typically release a new survey about each year. We did some during Covid, for instance, that were repeated measures. We have different kinds of surveys across measures.
We have a brief set of physical measurements, and during in-person visits we collet DNA, plasma serum, cell-free DNA, RNA and urine, and we have ways that people can donate through Fitbit or other mobile wearable technologies through things like Apple HealthKit into the resource. And we are also doing a pilot where we’re giving out Fitbit devices to individuals as well.
This gives you a sense of the data we have. I mentioned the capture of electronic health records, and one real power there is we have a lot of longitudinal information. In some cases, we have up to 40 years of information on people that they have shared with us. That means that even though we enroll only adults over 18 at the current time, we are going to be expanding soon to pediatrics. We have 20,000 virtual kids that have shared data in some cases even to birth from their extant electronic health records.
You can see we have about 13,000 Fitbit records already available in that dataset, lots of other kinds of information, about one-half a billion information points, if you exclude Fitbit -- and you can see Fitbit is already in the billions information points -- we have on our individuals that they have shared with us.
We are really excited that we have recently launched our genetic information. Our first release of genomic information came out in March and includes about 100,000 whole genome sequences. Just like our whole cohort, that’s about 50 percent diverse by race and ethnicity and 165,000 arrays.
If you really think about the diversity of this dataset and the size of this dataset, you can see in the number of variants we have observed, about 600 million unique variants, that 400 million of those are not in gnomAD, which is a common aggregation service of a lot of these variants, and 100 million of those variants occur in three or more of our participants. So they are not super-rare. They are very rare but they haven’t been observed yet.
And so this is a really new contribution that I think we can provide, and since these people also have other kinds of information like electronic health records and surveys, Fitbit, et cetera, we can start to interrogate what these variants actually mean in order to provide knowledge on which ones might be benign, which will be most of them of course, and which ones might have a health effect.
This just gives you a sense of where we are compared to all global GWASs. Probably a lot of you know this data, but 96 percent of all genome-wide association studies have been on people of European ancestry. You can see our data are dramatically different than that, and even amongst those who are of European ancestry, about 60 percent of them are under-represented with another characteristic that I mentioned before.
These data are available tor researchers now. You can explore them in real time through our public data browser. You don’t need a login. You can get senses of what the data look like for a given diagnosis across all EHR information, what those distributions look like, some very simple cross-type information giving a sense of whether or not we have counts that are likely to support a research study that you might want to do. In those kinds of data you can also explore the genetic variants in there as well.
If you want to do research and do a study, you come in through the Researcher Workbench and this gives you access to row-level data on individual participants to do studies. It’s a centralized cloud resource. We operate under what we call a passport access model that means you come in and just describe your workspace. It’s a central, non-human subjects research IRB approval, so you describe what you do, and that description, parts of it, become public. It’s available for us to review as well, and there’s a resource access board that passively and then subset-actively reviews some of these. But once you create that description and say what you’re going to do, you can get started doing work immediately.
It is currently open to US nonprofits, academic institutions, and healthcare organizations. We have about 3500 researchers using it now from over 400 institutions across the US. More than 20 percent of those represent the kind of patient advocacy nonprofits or minority-serving institutions including HBCUs, which is a particular focus and in this resource come in.
And of note, if you are one of those 400 institutions that have signed a master agreement, a new researcher can come on from that institution and, from the first time they sign up to the website, enter in and create an account to start doing work, it can be in as little as two hours. You can complete all those registration steps and actually define your first workspace and start building a cohort and start doing an analysis in under two hours, which we think is a really exciting advance of where we are with other cohorts.
I want to mention a little bit about the return of information as a cornerstone of what we want to, and participants told us what they were most interested in was getting health-related genetic results. We have already been releasing other kinds of information to them, some survey results and how that compares to others, genetic ancestry and some non-health related traits.
But we are really close to launching in a national way health-related genetic results, and the two things we’re returning are hereditary disease risk, which follows so-called ACMG59 inherited cancer syndromes, cardiomyopathy, arrhythmias, and then pharmacogenomics which are called Medicine and Your DNA. Those results follow a rigorous pathway. We have an FDA investigational device exemption, and these results are all supported by genetic counselors. We will also provide clinical results off of the hereditary disease risk report if someone has an actionable variant.
As we think about diverse audiences and reaching them, everything across the interface is in English and Spanish, and we also have language learning support for more than 200 languages in this genetic counseling support.
This gives you a sense of what some of those reports look like across pharmacogenetics and health-related results. These are available to take to your provider in provider-friendly formats as well.
I just want to give just one example of the many research studies. There are 2600 or so research projects that have been started on the platform already. This is one example of looking at what can be different in terms of treatment for people diagnosed with depression. In this analysis, it started with people diagnosed with depression and looked at common medications used to treat depression and then simply divided it by those who self-identified as white versus non-white in an earlier release of the dataset.
This is what we call a demonstration project so it wasn’t actually designed to necessarily find something new, but what it did find, as you can see, is there’s a different sequencing of medications in terms of frequency in what the first medication was versus the second medication someone would start based on these two populations. Some of this could be driven by all sorts of things. This could be driven by disparities like insurance status, it could be driven by regions of the country. It could be driven by things like pharmacogenomics and what actually works for participants, and potential side effects people have had. We don’t know the answer to that.
This, like many other resources, because this was a demonstration project, if you are an approved researcher you can actually go and grab the workspace and get the code that generates these things and go investigate it more deeply. You can generate all sorts of analyses like this, and people have been doing analyses looking at mental health and suicidality, for instance, during Covid and how that was affected by discrimination -- which are some of the surveys we did and launched during Covid. So there are lots of things out there.
I mentioned some of the future things that I’m excited about related to mental health in partnership with NIMH. One is our next survey module that we will be releasing on mental health and wellbeing. The goal here is to engage and derive more data related to mental health. It has 17 different domains and you can see what some of those are and some of the sampling of what some of the instruments are. We are trying where possible to use standardized instruments. We already have, for instance, PHQ-9, GAD-7 and a number of these other kinds of instruments available, and historically as well. So this will be added information on these that you can look at over time, for instance.
Then we are launching an ancillary study with NIMH called Exploring the Mind, and this will provide more quantitative information around behavioral cognitive traits. There are five different modules that participants will be doing initially in a pilot form and we will evaluate the data and see if there are any tweaks we need to make as we think about launching it in a broader way. So we are excited to have more of a computational quantitative measure that we can provide, as well as the survey approach as well from this information.
I just want to end with two callouts. You see the URLs there on the right. Anyone is welcome across the US to join our program, and I welcome you and others to come in as researchers as well. And, obviously, biggest thanks to our participants who are the foundation for All of Us.
With that, I will stop sharing and am excited to engage in a conversation.
DR. GORDON: Thanks, Josh, for that overview. I’ve got so many questions and some of them arcane scientific details, but I will try to rein myself in and appeal to the broader audience today.
One of the things, of course, that puzzled me and I’m sure is puzzling some of the listeners, is how do you have data from 40 years ago if you were launched just a couple years back? Tell us about that, tell us how you have data on things like diagnoses and treatments, et cetera, that happened decades ago.
DR. DENNY: Great question. It’s exciting. That is one of the exciting powers of electronic health record information, especially if people have been longitudinally engaged with a healthcare system for a long time. Some of the systems that are partnered with our network have had EHRs for a long time. That even includes electronic prescribing information, laboratory results. Of course, billing codes is the most common element there. Some of them represent decades, and that is a really powerful aspect.
The other thing that’s exciting is, even the Fitbit data you can see goes back about a decade for some of our participants that have been wearing Fitbits for a long time. There was actually a recent paper that came out in Nature of Medicine that looked at number of steps per day on a phenome-wide investigation of different outcomes and really was able to quantify prospectively how walking a certain number of steps per day helps protect against risk of diabetes and obesity and even gastro-esophageal reflux disease from happening in the future.
So you are able to do a lot of these by linking in and participants sharing those data with us.
DR. GORDON: Right. When you’re signing up, you then have the ability to give permission for All of Us to collect your data going back some time, and that presents a tremendous opportunity for scientists.
Naively, I’m think that as you are accumulating those million people, this is going to be great because we’re going to be able to learn things 20 years from now. And you’re saying no; we can learn things already even if they happened 20 years ago based upon the data that you’re aggregating for the scientific community, and that is fantastic. The program has come so far from 2014 when it was really just a germ of an idea.
But what has it done for us lately? What are you most proud about in terms of accomplishments of All of Us over the past year or so?
DR. DENNY: I think we are really on the front edge of that kind of work. We released the first insights into the Researcher Workbench in 2020 where people could start doing work, and there have been a lot of initial investigations. Most have looked at health disparities. One of note was looking at the so-called Hispanic/Latino paradox of cardiovascular disease risk and showed that that didn’t actually look to be true in our data.
I mentioned the one on the Fitbit data which I think is pretty cool. That 10,000 steps a day is a metric we carry in our heads and isn’t actually based on a lot of science. It was based on the number of digits on the screen, the original pedometer that was used. You see that there’s a flexion point that happens more around 8,000 steps per day in the data. As a guy who practiced as an internist for a number of years and how hard it was to encourage people to exercise, one of the things I was reassured by is there’s a big effect if you can make someone go from 1,000 to 2,000, or 2,000 or 3,000 or 4,000 steps per day. These small differences actually have in a linear fashion, or sometimes even more than that, risk of really important health outcomes. So that's an example.
DR. GORDON: So we don’t all have to go out and run a marathon you’re saying. We can walk around the block and that has tremendous benefits.
DR. DENNY: That’s exactly right.
And then the genomics. I think the genomics will be really powerful on diverse populations because it’s just not there.
DR. GORDON: Talk to me about that. So you have an instant population of how many thousands of whole genome sequences?
DR. DENNY: One hundred thousand now.
DR. GORDON: One hundred thousand whole genome sequences. And is the plan to whole genome sequence the entire All of Us cohort, if people want to be sequenced?
DR. DENNY: That’s right. We are going to sequence 100 million.
DR. GORDON: Wow, 100 million genomes. Let’s suppose I want to look at depression. Depression is a fairly common illness; something like 10 to 11 percent of individuals have depression at any one point in their lifetime, and the lifetime prevalence is probably higher than that. If I wanted to know does this collection have enough people with depression to do some whole genome sequence study, are we going to have that level of coverage or is it going to be that these are all healthy people?
DR. DENNY: We have looked at the population incidences across a phenome-wide set of diseases -- this isn’t published yet but it is something that we have been looking at internally -- compared to estimates across the US population, and what we find is for common diseases we are pretty close to national averages but usually a little bit higher prevalence. As you get to rare diseases, we tend to over-represent those more than you would expect nationally. So whatever the national prevalence is for most conditions, we are usually a little bit higher for those.
I mentioned the databrowser.researchallofus.org. Anyone can go in there right now and it works on your phone, a website, and just type in a condition and you will see instantly how many cases we have and the sex breakdown of those cases and the age breakdown, and you can get some sense of what the exposures are.
Another thing we’re presenting this week at AHG is a young scientist who is presenting work on looking at syndrome-inappropriate ADH in people who were exposed to SSRIs. He did a genetic study on people that have SSRIs and have low sodium values, and so there is already power to do that. I feel like he had about 1,000 patients in that study, just to give you a sense of all those things together --
DR. GORDON: So 1,000 people who happen to have SSRIs and sodium data -- sorry. 1,000 people had the syndrome.
DR. DENNY: I’m going to start to get fuzzy on that --
DR. GORDON: There’s got to be more than 1,000 people out of your 500,000 who are on SSRIs.
DR. DENNY: Way more, way more than 1,000.
DR. GORDON: There must have been 1,000 who had the syndrome, I guess.
DR. DENNY: That’s right. It must be cases.
DR. GORDON: That is pretty impressive.
We talked about genomics, but of course, genomics is fairly easy. We have done a pretty good job getting numbers of cases of individuals with schizophrenia or depression, whatever, into the hundreds of thousands, right. But what’s really hard for us is looking at environmental factors in mental illness, and particularly social determinants of health and working that in because the numbers of different things to think about are so high and also unknown, and it’s hard to get that deep level data. What are the prospects of using All of Us to address environmental factors in health in general and mental illness in particular?
DR. DENNY: That’s a great question, so important, and really one of the things we thought of as a foundation for our program. I came from places where we had the diagnoses but we didn’t have anything else. I’m sorry, we had the electronic health record but that was it, and so you are limited to know the whole picture of someone’s health.
During Covid we launched two modules, one was this repeated measure that gets at some social determinants of health and then a more comprehensive social determinants of health module that gathers a lot of those elements. And then at baseline we get information like insurance status and income, and, of course, we know some basics of how many people are in the household and things like that, as well as these deeper measures of resilience and optimism and religious preferences and engagement, and discrimination metrics, and then environment.
With the current dataset we have Zip-3 level geography and that --
DR. GORDON: What is Zip-3 level?
DR. DENNY: The first three digits of the zip code. This is an evolving process so nothing is a stop. We were really committed to protecting the privacy of our individuals and also figuring out models where we can bring in more and more environmental data. We have actually linked in the initial American Community Survey data, and we intend to link in a lot more environmental data over time. We have been working with NIEHS with that and we had a workshop to get ideas. We may look at linkages, we may also look at things that we measure directly from the biospecimens we have as well.
DR. GORDON: That is a lot of data. Do you have a number? How many petabytes is it right now?
DR. DENNY: I will be wrong. At some point in the recent past I think it was 7 petabytes, but it is definitely bigger than that.
DR. GORDON: So this is not going to fit on your local hard drive on your laptop.
DR. DENNY: That is right.
DR. GORDON: Where does it live?
DR. DENNY: It lives in the cloud. We use Amazon and Google cloud services. The data analysis right now is on the Google cloud, and the Research Workbench lives there. It is really a great example of how we can use these centralized resources to both empower researchers and accelerate research and make it more democratized as to who can do research, as well as provide better protection. We can centralize the protection instead of everyone downloading the data and provide more on-ramps into it.
I just want to tell you one story that I find really compelling. One of the first GWASs I did across the network was called the Emerge network and we looked at Type 2 diabetes. Five sites found people in the electronic health record, pulled together a range of phenotyping data of about one million loci -- actually it was less than that -- and it took us about three years, honestly, to do this. And we had our own built expensive computing systems to do that which we had to pull together, and not everyone can do that.
I had a data scientist replicate this experiment on our platform and he did the whole thing, soup to nuts, while actually doing some other things during that time, in three days.
DR. GORDON: Wow. Three years to three days.
DR. DENNY: Just pointed and clicked. The GWAS I think cost $37.00 to run. People, by the way, start off with $300 in compute credits, so you know it could be done for free.
DR. GORDON: That was my next question, about resourcing the research. I know that you put out some calls recently for applications to do studies in the All of Us ecosystem, and we are working with you and other institutes are working with you on figuring out how to get our investigators.
What is the plan? You have already said there are some central resources. You get a certain amount of compute time for free. How are we going to resource investigators to be able to come in and do what they want to do with this data?
DR. DENNY: Every new researcher that comes on gets those initial $300 of compute credits. You mentioned some of the things we already have in place. We are looking at ways and we continue to do this. I certainly encourage people to come on and use those initial credits and put it in their grants, in the future to include that as a component.
What we hope to do over time is think especially about those who would be under-resourced and things like that, how can we better support them. Those are things we are actually thinking about as well.
DR. GORDON: There are a few questions coming in on the Q&A. Please keep them coming in. I am going to turn to those in just a moment, but first I want to ask one more question myself, Josh. I asked you what you are most excited about and you are most proud of what you have done in the last year. What is coming in the next year? What are you most excited about that is on the horizon?
DR. DENNY: There will be another data release, and that data release will take our genome count -- we don’t know the exact number but it will be certainly above 200,000 whole genomes. It will easily more than double the number of genomes that we have in that dataset with, of course, the same diversity mix, so that will just really up the power to do some of these kinds of analyses.
DR. GORDON: Cool. Looking forward to the next data release, more genomics, more, I would imagine, data of all sorts in releases.
DR. DENNY: More of everything, right.
DR. GORDON: And we are already starting to see some of the amazing results that you can get in the All of Us space.
I am going to turn to some questions from the viewing audience. Keep them coming into the Q&A if you’d like. After those questions maybe we will turn to mental health for a bit and talk about what we are trying to do together.
How are you handling the confidentiality issues, for example, about genetic risk data, about insurance and insurance companies? And what are your views on what we need to do in that space, confidentiality, moving forward?
DR. DENNY: Let’s start with genomics on this question. One of the things that’s important and we thought about from the beginning was someone doesn’t have to get genomic results back if they don’t want to, and not everyone does. One of the things we do is we have an educational process on that and tell them of the risks. Right now, GNEP protects us for health insurance but not life and disability, so we educate on that component and tell them what the potential risks are.
I am certainly not a lawyer, but we don’t actually know what all the risks are. Certainly, anecdotally we know of people that have had found things and stories on their life and found their life insurance didn’t care that they had such genetic results, but that doesn’t necessarily mean that that is universal. So we educate on that process and try to let them know more about what that looks like.
In terms of things like privacy around electronic health records and stuff like that, it is always important to say that our protection of that privacy and security of the data is really job one because we have to begin with trust and really engender trust in all ways, and sometimes that’s telling people and letting them know we are going to protect their data. It is certainly a rigorous security process. Certificates of confidentiality to protect against law enforcement access, for instance, is an important thing we talk about, as well as always the data security stuff.
And then we are up front that we can’t promise there will never be a data breach but we are going to work as hard as we can, and we certainly will let you know if we notice anything like that. And we look at the research projects. There are certain rules around what kind of research projects can be done. It’s pretty expansive, but there are some things that we watch out for, especially what would be potentially stigmatizing research.
DR. GORDON: Great. So you care about it both from the perspective of safeguarding the data itself from outside intruders but also from the perspective of what researchers do with that data.
DR. DENNY: Exactly.
DR. GORDON: This question sort of follows up on a question I asked about moving beyond genomic data. Besides genome sequencing analysis what else can you do? Can you do combination analyses such as comparing mental status and physical conditions? What else can you do in the context of these datasets?
DR. DENNY: There are so many emergent studies that people can think of. I will never ever be able to think of them all. Yes, you can do all sorts of things.
Think about any of these data streams, 500 million health-related datapoints at this point over decades of information, plus the Fitbit data plus genomics plus how you can link in new environmental data or stuff that will come. These include surveys across mental health, some of them repeated measures. You can look at changes in substance use before and after the pandemic. You could look at interactions of physical with mental health, depression with race and ethnicity and how prior existent exposure to a cardiovascular medication like a betablocker with and without depression extant in their medical record. You could come up with all sorts of combinations.
DR. GORDON: And that’s all just what you thought of off the top of your head.
DR. DENNY: Rich data streams; you can combine them however you want.
DR. GORDON: And you said there’s how many thousands of investigators already working with the data thinking of creative things to do?
DR. DENNY: There’s about 3500 now.
DR. GORDON: Fantastic. There are two different questions that really get at combining the data resources that All of Us aggregates with exterior surfaces. One of them, for example: are there plans to perform genotype imputation, which apparently -- and this is beyond me -- the best imputation panel is TOPMed -- obviously that is not in your research portfolios. That is one that’s a very specific question.
Another question is can your information capture or be used with data that, say, collects peer support services -- or I was thinking about if people are using these apps that provide talk-based psychotherapy. But really the question is, how much is All of Us really -- it’s an isolated ecosystem. How much are we going to be able to combine what’s going on in All of Us with other data streams or processing streams?
DR. DENNY: The first question is pretty direct and then the second question is really exciting as we think about where we could go.
The first question -- I will just remind the audience that we are generating sequence and array data, and the imputation process is most relevant for the array data. Right now we have more arrays than sequences, but they will rapidly catch up. With this next release it will be tighter, they will be closer. Most people are using sequence data, not the array data and so we imagine that to be happening, so we are not actually essentially imputing the array data. We have directly sequenced 600 million variants of variation at this point already, which is way more than you could ever impute off of an array backbone.
We’re looking at creating an imputation server because we have such a diverse panel, and we are going to grow so large that we think it could be beneficial to do that. And whether we can add that or collaborate with TOPMed and maybe even learn from both of our sequences, I will certainly think about.
Thinking about the second question, we think of ourselves as a platform right now. It’s stuff that we are doing and generating. But as we think about going forward we really want to develop a rich portfolio of what we’re calling ancillary studies. These are things that we are going to build on top of us as a platform, and that’s what Exploring the Mind is. It’s one of these ancillary studies in partnership with you, and it’s a novel technology for us. This idea of an interactive kind of module that gives something out of it. So I don’t know what’s in the future of those ancillary studies.
We have another one called Nutrition for Precision Health. One of the cool parts of that is they’re going to have people take pictures of their food and it’s going to develop AI algorithms to figure out what people are eating and what the caloric count is and stuff like that based on other metrics, as well as looking at like the microbiome.
So the future possibilities are really close to endless with what we will be able to do. Let’s have those conversations over time.
DR. GORDON: I am going to ask more questions of my own, but please, folks, keep the Q&As coming in. I want to turn our attention to the goal of what All of Us at least initially was, and I think still is, and it is precision medicine.
We have talked about this data that you have and the wonderful questions that you can ask, and the tremendous genomics resource is going to be the ability to understand social determinants and other environmental factors leading to disease. And our own studies that you mentioned try to dissect behavior on a large scale and link that to health.
But if we think moving forward about precision medicine, about affecting our ability to really design treatments and make treatment decisions for individual patients, how does all this data get us there? Can you see some near-term stuff that might be illustrative of that?
DR. DENNY: I’m glad you asked this question and focused us towards some of those directions. I see two streams of information coming out of this that will actually help. One that may be less thought of I will mention first, which is we are directly offering participants a chance to get personal genomic information that has an impact on their health that they can take to their providers around these inherited cancer and cardiac and things like that conditions, as well as pharmacogenetics. And, by the way, 96 percent of people across all ancestries -- they are different variants, different medications that would be affected, but it’s basically 96 percent across ancestries will have a pharmacogenetic variant that will alter what drug would be recommended if they were prescribed a drug in that class.
So these kind of things across a national population with people walking in across all backgrounds and geographics to their health providers with these reports, I hope it has kind of a secular benefit of getting a broad range of doctors thinking about precision medicine with their patients. So it could have this effect of education and awareness in a large way. That is something that isn’t maybe quite as obvious and is an indirect effect of our program.
The more direct effect would be, you know, the studies are done in these populations and what we will end up learning about variants that aren’t in European backgrounds is an early win that helps us understand pathogenicity and risk of things like mendelian or non-mendelian disease across different populations and studies of those things.
I mentioned a bunch of epidemiological things that you can imagine as well.
DR. GORDON: Yes. I want to focus in for a moment on that diversity in genomics thing. This has been a really important area that we recognized for mental health as well as, of course, other areas of genetics and genomics.
In addition to identifying new variants, apparently -- this is what I have learned from my colleagues here at NMIH who are the genomics experts -- it also lets you fine-map existing variants better. Folks of one genetic ancestry are going to have one set of what we call haplotypes or blocks of variants that all go together, and folks from different ancestral variances will have either smaller ones or different ones that enable you to then dissociate the disease risk from a whole long block of variants that are all linked in one ancestry to really get it down finer.
And for mental health anyway that is incredibly important because what we have now is not really hundreds of genes, for example, that are linked for schizophrenia; we have hundreds of loci, hundreds of places, and diversifying the genetic ancestry of our data will really help us get from places to genes. That is why I’m tantalized by 200,000 genomes. That’s pretty exciting.
Tempered by the fact that probably -- even if it’s population risk of less than 1 percent of individuals in those 200,000 will have schizophrenia, but it is important to recognize that for more common illnesses we are talking about a tremendous advantage by diversifying the genetic data.
Let’s delve into the precision medicine piece a little bit longer. You touched on this somewhat, but I would love to hear from you what is your definition of precision medicine. How would you define it?
DR. DENNY: I will just briefly comment. At Vanderbilt one of my titles was Vice President of Personalized Medicine, and I led a Center for Precision Medicine, so I’ll just talk about the intersection of those things.
For a while we used the term personalized medicine, and I like to argue that doctors probably always have tried to do personalized medicine. I wouldn’t have wanted to say 20 years ago that I wasn’t trying to be personalized with my patients in their care, right.
So I think precision medicine is about layering a dimensionality of data that you may not be able to observe but sort of cognitively keep in your head directly in your care of the patient in a personalized way. I think that is the arc that you get in personalization with precision that you may not have had before. It has lots of other implications, more exact diagnoses, getting beyond maybe Type 1 or Type 2 diabetes to lots of subtypes, or thinking about the many factors that may influence depression, which includes even factors like getting that med right the first time even just from a side effect perspective as well as hopefully treatment perspective as well.
DR. GORDON: I like that characterization a lot. I think nowhere in medicine are we more personalized than in psychiatry where we really delve deeply into people’s pasts, people’s presents, the particular symptoms they happen to have. But what we lack in that, even though we might try to personalize our treatment approaches, are evidence-based approaches to use that information to guide treatment selection so we can understand our patients deeply by personalizing our information about them.
But then turning that understanding into -- in a practice. Well, I think the psychodynamic psychotherapists might say that that is what they do all the time. But in terms of evidence-based approaches to say if you find X, then you do Y -- that is where we really need some work.
And you mentioned our efforts around characterizing behavior and cognition in All of Us folks with this new ancillary study that you’re helping us with, and it has really been a pleasure to work with your people to get it up and going -- that is what it’s about. It’s about trying to get deeper information about our patients from a behavioral and cognitive perspective and really ask the question whether we can then provide an evidence base for making treatment decisions using that additional depth of phenotyping.
So I appreciate you bringing us into that space both with your definition of personalized versus precision medicine and also by facilitating our work together.
I am going to turn back to some questions from the audience now. We have about 10 minutes so if folks want to get in a last-minute question please put them in.
Here is a very practical one but important one. For those of us who already have a workbench account and workspace -- fantastic, glad that you have it. Wonderful to see some of our MH folks who are working in this space.
“If we move to another institution are we able to transfer the account and workspace over or do we have to start from scratch?” Have you figured that one out yet?
DR. DENNY: I don’t know that I know the answer to that question. I can guess what I think it is. What I would say is that’s a question to put into the little Info box to ask a question to the system to make sure you would get the right answer.
DR. GORDON: So there are places people can get help with those kinds of question.
DR. DENNY: Right. When you are in there, even before you are logged in, on the bottom right you can provide feedback. When you are logged in, under the menu there’s an option to contact us and I would encourage you to do that directly. That way you would get the right answer.
DR. GORDON: And you must be learning things all the time about what your system can and cannot do. How many people work at All of Us right now?
DR. DENNY: It’s always hard to know exactly what the count is. We have probably within the NIH team I think upwards of 200 people probably, and then across the consortium there’s probably, if you look at our list that combines them and us, close to 3,000. Most of that is going to be enrollment.
But our EHR team that was curating and collating and putting these things in common data malls and stuff, the email list that went out to that team at one point was around 200 people across all the sites that were pulling their individual sites.
We had some sites when we started this that had Excel spreadsheets for EHRs that they had to pull together. So there was a lot of benefit locally as well and then people said this was painful in the beginning but we’re really glad -- this helps us operationally as well.
DR. GORDON: That raises a question in my mind. You must have faced all kinds of challenges in bringing this all to fruition, and I wonder if you might share one that sticks in your mind as being particularly interesting or poignant.
DR. DENNY: I think you always have hats that you wear to get a goal done that you didn’t necessarily think you would be wearing. One of those was needing all the legal agreements that went back and forth and security agreements. When I was at Vanderbilt we had about 100 of these, and really having to carefully think about that.
A big “A-ha” for us was how we managed the consents in a way that would be understandable to everyone coming in all these different ways. We use a HIPAA right-of-authorization which means the participant is saying I want you to give your health information to All of Us to do work. That flip of how we did it I think is really empowering, too. It’s a right-of-access approach from the participant to direct it somewhere else, and that is a much more powerful thing than trying to do it with consent.
Those are just a couple observations. I already talked about the harmonization thing. I will just say, when we wrote the blueprint for this back in 2015 what it could look like, I thought we were pretty ambitious. It’s exciting to me that we have gone further than what we wrote out in terms of what we could harmonize, pull together, centralize, make available. And the participant population we could recruit -- I am really proud of the work that the engagement partners and healthcare provider organizations and all these groups that have engaged diverse communities and helped build and win the trust of diverse partners who have good reason to not always be completely trusting of us, without us doing the work to prove ourselves.
DR. GORDON: You mentioned Nashville. Are you still in Nashville?
DR. DENNY: Right now I’m in Los Angeles, but I live in Bethesda.
DR. GORDON: So you made the move to DC. How has the pandemic been treating you?
DR. DENNY: It was hardest on our kids. I mentioned I have kids. Moving and then immediately going virtual -- they wanted to move back to Nashville during most of the pandemic for sure. But it is better now. School is in session, sports, all that stuff. That’s helpful.
DR. GORDON: Good. Listen, Josh, I want to thank you tremendously not only for joining us here today but for being such an excellent partner with NIMH. We really appreciate the work we have been able to get done together and are looking forward to being able to open up the workspace to our researchers.
There is one final question and I think you told us this but since someone asked I will go ahead and ask it. How do I get access to the Researcher Workbench? Where do I go to sign up?
DR. DENNY: Awesome. Researchallofus.org. You will see on the top right a Register button and it will take you through the steps.
DR. GORDON: Researchallofus.org, Register, and then all of a sudden you have access to hundreds of thousands of datapoints -- actually billions of data points, hundreds of thousands of genomic resources. And I have done a little bit of playing around with not the Researcher Workbench but the public-facing part and I know that, just like Josh suggested, there is a good representation of individuals with various mental illnesses there, including schizophrenia and depression and anxiety disorders at rates that are around or better than the overall rate -- I shouldn’t say better than -- higher than the overall rate in the general population.
It really is going to be a tool, a resource, for precision psychiatry in this and precision medicine, and so we are really excited to be able to support scientists to get into this data and teach us new things.
Thanks for joining us today, Josh. I really appreciate the overview you gave, the opportunity for us to chat and for you to field questions from our audience.
And let me thank everyone who came. At one point there were close to 200 folks on the webinar, which is really great. I look forward to seeing many of you at our next NIH Director’s Innovation Series event. Thanks, Josh.
DR. DENNY: Thank you, Josh. It’s a pleasure being on here, and I really appreciate the excellent partnership with NIMH. It’s been great.
DR. GORDON: And I do think this is the first time I have had a Josh on the program.