Guidance for Applicants Following the Report of the National Advisory Mental Health Council Workgroup on Genomics

Developed by the NIMH Office of Genomics Research Coordination

Table of Contents

Summary
Common genetic variation
Rare genetic variation
Genetic syndromes
- Points to consider when following up human genetics findings in experimental systems
Frequently asked questions
Example case studies

Summary

In pursuit of its mission to advance mental health research, the NIMH supports a broad research program of fundamental neuroscience and clinical studies. One specific area of focus is human genetics and its potential to uncover disease biology and mechanisms. A question frequently posed to NIMH staff is: What gene or genes should I study experimentally that are relevant to psychiatric disorders? Unlike Mendelian diseases, for psychiatric disorders there is no clear one-to-one relationship between a gene and disease. For any given individual, disease risk is a unique combination of many rare and common genetic variants, as well as other factors, that are often shared across psychiatric diagnoses. Even in the case of rare disorders and highly penetrant mutations, the genetic background of the individual has a strong influence on the type and severity of the clinical presentation. Thus, to understand the genetic etiology of psychiatric disorders, we need to move away from the one-gene-one-disease paradigm and seek to understand the complex interplay of genomic risk factors and how they shape phenotypic outcome (Geschwind & Flint 2015 ).

The Genomics Workgroup of the National Advisory Mental Health Council (NAMHC) recently issued a set of recommendations for advancing the NIMH psychiatric genetics research program and prioritizing subsequent follow-up studies. They emphasized the primacy of rigorous statistical support from properly designed, well-powered studies for pursuing genetic variants reliably associated with disease. In light of these recommendations, here we provide broad guiding principles for investigators to consider prior to submission of applications motivated in whole, or in part, by an association between human DNA sequence variation (common or rare, single nucleotide or structural) and a disease or trait relevant to the mission of the NIMH. Program staff weigh these points in the context of reviewer comments, the existing literature, and NIMH portfolio balance. Following the NAMHC report, statistical strength and robustness of the underlying genetic discovery weighs heavily in our funding considerations as does the suitability of the proposed experimental approach.

Importantly, discovery in human genetics is proceeding rapidly and genetic risk factors identified through various rigorous study designs and analytic methods across multiple cohorts and studies are more likely to stand the test of time. We strongly advise all applicants to consult with relevant program staff in NIMH research areas well in advance of submission to determine how their application aligns with institute priorities and the latest findings in human genetics.

Common Genetic Variation

Genome-wide association (GWA) studies identify statistical relationships, in an unbiased manner, between common single nucleotide variants across the genome and a phenotype of interest. Critically, they implicate regions of the genome (loci) and do not necessarily pinpoint the causal variant(s), gene(s) or mechanism(s) underlying the association. Thus, caution is warranted since some published GWA studies may list genes within risk regions and give the false impression these are the relevant disease genes. For complex traits, hundreds of regions and potentially thousands of genes may be involved that require tens to hundreds of thousands of subjects to identify (Visscher et al 2017 ). Approaches that take into account the cumulative effects of these many, subtle changes in gene activity may thus be more informative with regard to disease risk than those focusing on individual variants or genes associated with disease. When assessing applications based on GWA findings we consider:

Is the locus/loci genome-wide significant and is the finding replicated across cohorts and studies?
If the answer to #1 is yes, then has the underlying causal variant/gene(s) been identified (fine-mapped)?
If the answer to #2 is yes, then Is the experimental approach being proposed appropriate for the research question of interest (see examples)?

Rare Genetic Variation

Rare variant association studies identify genes across the genome that harbor more mutations than expected by chance in one group of subjects (e.g., Autism Spectrum Disorder - ASD - cases) versus another (e.g., unaffected siblings). Due to limited power, sequencing studies of protein coding (“exome”) variants often implicate genes, but not necessarily single variants. Given that each of us carry hundreds of protein altering mutations, some of which appear to be damaging, strict statistical methods are required to guard against biologically interesting yet false positive associations (MacArthur et al 2014 ). Rare variant discovery methods differ in their assumptions, power, error rates and how they leverage genetic and non-genetic information to identify genome-wide statistical associations (Sanders et al 2015 , Krishnan et al 2016 , Deciphering Developmental Disorders Study 2017 , Werling et al 2018 ). With limited resources, a conservative approach does not rely on any one method or ranked gene list to select genes of high confidence for further follow up. When assessing applications based on sequencing findings we consider:

Is the gene genome-wide significant and the finding replicated across cohorts and studies?
What diseases and/or phenotypes are most commonly associated with mutations in the gene?
Is the experimental approach appropriate for the question of interest (see examples)?

Genetic Syndromes

Symptoms of rare diseases and syndromes may overlap with those of more common and etiologically more complex disorders. For example, individuals with Fragile X or Velocardiofacial syndromes often have features of autism or schizophrenia, respectively. Although these syndromes have well-defined genetic causes, the manifestation of different symptoms is influenced by other genetic factors. Thus, mutations in FRM1 and copy number variants (CNVs) of 22q11 that cause these syndromes should not be considered synonymous with autism or schizophrenia. In fact, the more common neurological feature for both these syndromes is intellectual disability. Even in the case of fairly penetrant variants such as CNVs, unbiased genetic associations are required to establish a definitive link between these variants and psychiatric diagnosis (Marshall et al 2017 ). This holds true for individual genes within large, multi-genic CNVs. Currently, for psychiatric disorders, there is no consensus experimental approach for determining which gene or genes within or near an associated CNV is driving disease risk. When assessing applications based on genetic syndromes we consider:

Is the gene or CNV genome-wide significant for a psychiatric diagnosis and the finding replicated across cohorts and studies?
What diseases and/or phenotypes are most commonly associated with the CNV or mutations in the gene?
Is the experimental approach appropriate for the question of interest (see examples)?

Points to Consider when Experimentally Following-up Human Genetic Findings

Assessing the Strength of Genetic Association with Disease

Is the genetic association signal statistically robust genome-wide? Has it been replicated?
For large, multi-genic CNVs, have any individual gene(s) been statistically associated with disease genome-wide?
Has the genome-wide significant signal been fine-mapped to a causal variant and/or gene(s)?
What other diseases or traits has the genetic signal or gene been associated with? How does that influence the design and interpretation of the proposed experimental study?

Experimental Study Design Based on the Variant(s) or Gene(s) of Interest

Has the functional variant been identified? Does it cause a gain or loss of function? Does it increase or decrease gene expression? Splicing? Transcription factor binding?
In what human tissues and cell types is that variant expressed? During which developmental period(s)?
How conserved is the implicated gene across species? How about its regulatory regions? Or its co-expression and/or protein interaction network across cells, tissues and developmental periods?
Is the function of the gene or effect of the gene variant robust across multiple cell lines and animal strains?
Do genome-wide enrichment analyses for the trait or disease identify those same cells, tissues and developmental periods?
Is the proposed experimental system appropriate given the conservation of the gene and its expression across cell, tissues, developmental periods and/or species?
If studying a set of risk variants or genes collectively, is there an appropriate set of matched control variants or genes?

Maximizing Utility and Impact of the Approach

Is the proposed experimental approach best suited to address the mechanism of disease risk conferred by a particular gene variant or investigate the basic biology and function of that gene?
Given the question of interest, what is the trade-off between a deep dive into the biology and function of a particular gene or variant versus a comprehensive assessment of all genome-wide significant, disease or trait-associated genes/variants?
Have the experimental effects of genes or variants been considered in the context of all clinical and non-clinical traits associated with those same variants as well as whether non-associated genes or variants have similar effects?

Frequently Asked Questions

Which disease gene(s) does the NIMH support for functional follow-up?

The NIMH does not maintain a curated list of established psychiatric risk genes. Human genetics is a rapidly evolving field and the set of potential risk genes is dynamic as studies increase in size and new statistical methods are applied. Each application is evaluated based on the strength of genetic evidence and rationale as described above. We strongly urge applicants not to base their proposal on any one genetic finding from any one single study and that they consult with program staff prior to submission.

How do I know if my gene is genome-wide significant?

A variant or gene is genome-wide significant (GWS) when it surpasses a statistical threshold corrected for the effective number of independent tests performed. In common variant association studies this is set at P < 5 x 10-8, which is a Bonferroni correction of P < 0.05 by the approximately one million independent genomic regions tagged by common single nucleotide variants (i.e., SNPs). In association studies of rare coding variants (“exome” studies), gene burden tests are corrected for the number of total genes and mutation classes tested (e.g., 20K genes, loss-of-function and missense variants). This threshold is typically on the order of P < 2 x 10-6. Robust methods are currently being developed for whole-genome sequence association studies of rare, non-coding variants (Werling et al 2018 ). Close attention should be paid to how studies determine GWS and if they account for potential sources of bias and confounds in the study design.

Depending on the disorder, rare variant association studies may require samples sizes as large or larger than common variant studies (N >> 10K). Given current sample sizes, there are different approaches for increasing power to detect disease-associated genes including Bayesian (Sanders et al 2015 ) and machine learning (Krishnan et al 2016 ) methods. These approaches use false discovery rates or assign probabilities at various thresholds and do not employ strict, Bonferroni-corrected thresholds with fixed error rates. Thus, while ranked genes derived with these methods provide important biological information when studied collectively in bioinformatic analyses or high throughput assays, they must not be overly interpreted on a gene by gene basis. Those individual genes with increased mutation burden that surpass both GWS and are highly ranked are prioritized for investment with deep, targeted follow-up.

How do I know if a genetic finding has been successfully replicated? Do meta-analyses and mega-analyses count?

The replicability and reproducibility of studies is an active discussion among the larger research community ( Leek & Jager 2017 ). Within the area of human genetics, replication has a circumscribed definition. Well-designed genetic association studies include internal replications. Genome-wide significant (GWS) associations in a discovery cohort are evaluated in an independent replication cohort. If a GWS association in the discovery cohort is nominally significant in the replication cohort (corrected for the number of discovery GWS associations) and GWS by meta-analysis of the discovery and replication cohort together, then the finding is considered to be replicated. (Note, meta-analyses combine summary data across cohorts/studies while mega-analyses combine individual data across cohorts/studies.) A genetic association may also be replicated in an independent study using a separate discovery cohort and meta-analyzed with previous studies. Genetic associations that are internally and externally replicated (GWS) are considered the most robust.

In some cases, meta-analyses are applied to smaller candidate gene studies that focus on particular genes or sets of genes. Meta-analyses, however, are only as reliable as the underlying studies and do not overcome the inherent limitations of candidate gene study designs. Thus, a ‘replicated’ candidate gene either across studies or via a nominally significant (not GWS) meta-analysis is not considered a reliable, robust or unbiased association.

What is fine-mapping? And how do I know whether a genetic risk region has been mapped finely enough?

Fine-mapping procedures aim to determine which variants in a genomic region of interest are most likely causally related to a trait when considering how all variants in the region are correlated (for a recent review see Schaid et al 2018 ). In most cases, the likely causal variant or gene is not the same as or nearest to the variant used to discover the association. Different approaches may leverage additional information from external reference data such as gene expression, functional annotations or patterns of correlations across diverse populations (Pasanuic & Price, 2017 ). There are concerted efforts to establish comprehensive genetic and genomic reference data sets and bioinformatic integration methods are expected to become increasingly important.

Fine-mapping approaches may consider all genome-wide significant regions simultaneously (Huang et al 2017 ) or focus on specific regions (Sekar et al 2016 ). Confidence that the causal variant(s) or gene(s) within a risk region has been identified increases when the finding is replicated by different groups in independent samples and is robust with the use of multiple statistical methods, software packages, and functional datasets.

My gene of interest is associated with a developmental disorder and/or intellectual disability, but there are reported cases with ASD or psychosis. Would NIMH support research into my gene?

While case studies of psychiatric diagnosis in individuals with defined developmental disorders may be informative, further evidence is required to establish that psychiatric features are a common clinical manifestation associated with mutations in the causal gene. Genes that show a robust statistical association with psychiatric diagnosis across the population or families are prioritized for follow-up. See our referral guidelines for applications related to developmental disorders and intellectual disability.

I found a gene variant in an individual that affects gene function. Although this is a single case study, can I use that variant to further understand the biology of the gene?

In principle yes. Each of us, however, carry variants that alter gene function but are nonetheless clinically irrelevant. As more individuals are sequenced, these observations will become more common and caution is warranted to avoid overinterpreting such findings. A strong rationale is needed for studying the gene independent of the chance observation of the variant. The integral question is, given available tools, would this variant be the most appropriate to interrogate gene function in the space of all possible variants? It’s possible more precise gene targeting manipulations will be better for unraveling gene biology.

My gene falls short of genome-wide significance, but if I knock it out it produces disease-like molecular, cellular and/or behavioral changes in an animal. Isn’t that corroborating evidence that warrants further investment and follow-up?

Variants or genes that do not reach GWS may become significant with larger samples. Conversely, currently significant genes near the threshold may become non-significant. It is not possible to determine, a priori, which marginal genes or variants are likely to withstand increases in study power and subsequent meta-analyses. Complementary biological information applied in an ad hoc basis does not provide an unbiased and rigorous assessment of potential disease relevance. The sensitivity, specificity and predictive power of common assays in basic psychiatric research have not been established (e.g., across the genome, how many associated and non-associated genes do and do not produce similar changes). Thus, if the rationale for the study is based on a genetic association, it must rely on a statistically robust one.

Gene X and its signaling pathway are involved in various processes, cell types, brain regions and/or behaviors relevant to NIMH’s stated interests and priorities. This gene has been associated with psychiatric disease in small, hypothesis-driven candidate gene studies, but not larger, unbiased GWA studies. Would NIMH support my hypothesis-driven study of gene X?

The strength of such an application will depend on the evidence and rationale for studying gene X, rather than any claim of disease association from candidate gene studies. Candidate gene association findings are not reliable and often do not replicate in larger, well-powered and rigorously designed studies. Strong applications will leverage what is known and unknown about the gene and provide convincing support for why the various processes, cell types, brain regions and/or behaviors are the most appropriate for understanding the function and biology of that particular gene, irrespective of any putative role in disease. Care should be taken to avoid previous weak and unsubstantiated claims of disease relevance from biasing the study design.

Aren’t findings from human genetics just one type of relevant information? Shouldn’t convergent lines of evidence be used from different sources and methods (i.e., triangulation) to prioritize functional follow-up?

This guidance is specific to proposals that are based on human genetics findings. There are other lines of evidence that may suggest a gene, process or other biological phenomenon warrants investment and deeper investigation and would be evaluated based upon the strength of that supporting evidence. In pursuing functional follow-up, such evidence, however, is not a substitute for a strong and replicated genetic association with disease. With the availability of large, high-dimensional datasets and a vast literature, it’s surprisingly easy to find seemingly confirmatory and supporting evidence for uncertain genetic associations. Weaving several, individually weak strands of evidence together, however, does not necessarily make a study stronger.

Psychiatric diagnoses are not biologically based. Do these gene discovery and follow-up guidelines apply to more objective brain and behavioral measures that are closer to the function of genes?

The number of underlying genetic variants, their frequency in the population and their effect sizes vary across traits and diseases. For complex traits, however, these are not vastly different. Thus, other outcome measures, even those that may be considered more intermediate phenotypes than diagnosis, still require substantial sample sizes to identify robust genetic associations. Gene discovery for these important traits may be beyond the capability of any single laboratory and may require team science collaborations or access to large datasets of electronic health records. There are no alternatives to rigorous, well-powered study designs.

Example Case Studies

Below we provide five examples of proposed studies, and our evaluations, that highlight some of the issues discussed above. These are very general examples for illustrative purposes only and do not reflect the full spectrum of applications we would consider higher or lower in priority – these examples are neither exclusive nor exhaustive. For specific feedback on an application, please consult with relevant program staff prior to submission.

[1] Analyses of well-powered GWA and exome studies show an enrichment of disease genes expressed in region X of the human brain compared to all other regions. Region X consists of a mixed population of cells that project to different downstream brain regions. Our preliminary data indicate that well-powered GWA and sequencing studies from related diseases also show an enrichment, albeit weaker one, in region X. Our data further indicate an enrichment of orthologous genes in brain region X of species Y. Here we use our method of isolating cells based on their projection targets in species Y to better resolve the genetic enrichment observed across these diseases into projection-specific cell types. We will further develop transcriptional signatures of these cells and validate our findings in human brain. This study will increase our understanding of how genetic risk across related diseases map onto specific cells and circuits in the brain.

This application does not focus on any one gene or gene variant, but rather follows up on well-powered analyses that take into account many potential genes associated with disease. Furthermore, given the known overlap in the causes and symptoms of psychiatric disorders, it does not focus on any one disease. It takes a finding based on human genetics and human tissue, identifies a similar biological signal in another species, then leverages that experimental system to further refine that signal and validate it back in humans. Using genetic information in this manner may reveal common biological mechanisms that would otherwise be missed by single gene and single disease studies.

[2] Gene X has been fine-mapped as the likely causal gene within a genome-wide significant disease risk locus and replicated across multiple cohorts. (Alternatively, gene X has a genome-wide significant enrichment of mutations in cases versus controls across multiple cohorts using various methods). Here we propose to further understand the function of gene X in the brain. Our preliminary data show that in both an experimental organism and humans gene X is most highly expressed in a specific cell type known to be important for a particular process. We will transcriptionally profile this cell type across development in the organism to identify the time course of gene X expression and determine if it coincides with the maturation of the relevant cellular process. We will also transcriptionally profile wildtype and conditional, cell-type selective gene X knockout cells, assess the impact on the cellular process and identify potential molecular mechanisms via rescue of differentially expressed targets. This study will advance our understanding of gene X biology.

Although this application is based on a statistically robust genetic finding, it is intended as a basic science study into the biology of gene X. The strength of the application will depend on how much and how well it advances our understanding of gene X and/or fundamental principles of biology. These findings may or may not be relevant for understanding how human genetic variation in gene X contributes to variation in disease risk, but it will contribute basic knowledge whose ultimate impact will be determined in the future.

[3] A recent genome-wide association study identified 12p34 as a genome-wide significant disease risk locus. Interestingly, one of the three genes within the risk locus is gene X that is part of a pathway long hypothesized to be involved in disease pathogenesis. How gene X is involved in this pathway is still not fully understood. Our preliminary data show that manipulating expression of gene X produces changes in the pathway similar to those observed in patients. Here we will use a series of loss and gain of function experiments to characterize how the function of gene X influences the function of the pathway. This study will advance our understanding of gene X and how it contributes to disease.

This application is based on the proximity of gene X within a genome-wide significant risk region. There is, however, no indication that the association of disease risk is operating via that gene as opposed to the other two genes within the region or possibly other more distal genes. This would require a formal fine-mapping procedure. The prior information that gene X is part of a disease-relevant pathway may appear to provide a strong rationale for focusing on that particular gene. Yet, in the absence of a formal unbiased assessment of how many pathways have been associated with disease, how many genes are within each of those pathways, how these genes are distributed across the genome as well as how many risk regions and genes within those regions there are, it is not possible to determine whether this is a chance occurrence or not. There may be other lines of evidence that indicate gene X and its pathway are important to study as fundamental biology, but in this case the human genetic evidence is circumstantial at best and may in fact detract from the potential basic science value of the application.

[4] Gene X has a genome-wide significant enrichment of loss-of-function mutations in cases versus controls across multiple cohorts using various analytic methods. In order to understand disease pathogenesis and pathophysiology, here we propose to characterize the effects of gene X knock-out at the molecular, cellular and behavioral levels in an experimentally tractable organism. Our preliminary data show disease-like deficits in behavior that are rescued by expressing gene X in adult animals. This study will identify the molecular and cellular mechanisms by which gene X causes disease and identify potential therapeutic targets.

Although this application is based on a statistically robust genetic finding, there are limitations to the experimental approach. The application proposes a phenotypic characterization of an animal knock-out (null allele) that purpotively mimics heterozygous loss-of-function mutations observed in patients. Given that the goal is to explore disease pathogenesis and pathophysiology and not the basic biology of the gene, it assumes that in the context of the behavioral deficits, changes at the molecular or cellular level observed in mutants will be relevant for understanding disease mechanisms. These changes, however, must be interpreted in a wider biological and clinical context. How many null mutations in other genes not associated with the disease cause similar changes in the organism? This may be mitigated by collectively studying multiple disease-associated genes to look for common patterns of effects. Yet even under this design, there needs to be an appropriate set of well-matched control genes assessed for similar changes.

It is important to keep in mind that loss-of-function mutations in particular genes are often associated with multiple neurological and psychiatric phenotypes in people, are not fully penetrant and are modified by genetic background. This lack of specificity at both the genetic and phenotypic level should guide and constrain the interpretation of phenotypes observed in mutant organisms.

[5] Gene X has been fine-mapped as the likely causal gene within a genome-wide significant disease risk locus and replicated across multiple cohorts. In order to understand disease pathogenesis and pathophysiology, here we propose to characterize the effects of gene X knock-out at the molecular, cellular and behavioral levels in an experimentally tractable organism. Our preliminary data show disease-like deficits in behavior that are rescued by expressing gene X in adult animals. This study will identify the molecular and cellular mechanisms by which the gene causes disease and identify potential therapeutic targets.

This application is similar to the previous example except that it is based on a common variant association mapped to a gene. In addition to the limitations noted above, the approach has the added issue of attempting to recapitulate, with a null allele, what are often subtle, cell-type specific regulatory effects of common non-coding variants. These risk variant in humans are typically of low risk and low effect and operate in the context of hundreds of other such variants. The biological effect of a null allele in an experimental system is not necessarily related to a single risk variant in the context of this polygenic background in humans. It is for this reason and those mentioned previously that studies attempting to create a genetic ‘model of disease Q’ are potentially more problematic than those focused on either basic biology or a more integrative approach that accounts for the multitude of risk variants.

Quick Links

Share Page