Skip to main content

Transforming the understanding
and treatment of mental illnesses.

Celebrating 75 Years! Learn More >>

Workshop: Advanced Statistical Methods and Dynamic Data Visualizations for Mental Health Studies


June 28–30, 2021



Introduction and Vision

Abera Wouhib, Ph.D., NIMH
Dr. Abera Wouhib began by discussing the importance of statistics in mental health studies. In biomedical research, data are commonly analyzed to highlight trends, advance the creation of valid diagnostic methods, and develop new therapies or prognoses for a disease. Accordingly, investigators in the biomedical field should seek direct training or professional advice to apply statistical methodologies in their research.

Dr. Wouhib emphasized that the nature of mental health and behavioral studies poses challenges to standard statistical methods. For instance, inadequately designed studies are irreproducible and may suffer from inflated Type I and Type II errors, small treatment efficacy, high individual response variability or heterogeneity, and instability and noise. Grant applicants may incorrectly plan for statistical analyses and power calculations, yielding marginalized effect sizes and insufficient statistical powers. Therefore, these studies can be overly optimistic in their claim for statistical superiority and may fail to adequately establish an effect from control.

Since 2018, NIMH has supported the Statistical Methods in Psychiatry Program, which aims to foster novel statistical methods and analytical plans to identify and validate biomarkers and novel treatment targets corresponding to mental health disorders. The program encourages and facilitates the development of new methods and applications that advance statistical techniques most beneficial in psychiatric research. The primary goal of the program is to create innovative statistical methods to assess and nurture reproducibility in translational mental health studies.

Dr. Wouhib explained that the purpose of today’s Workshop is to discuss a wide range of statistical applications in mental health research and to highlight recent statistical innovations in the field of mental health disorders. Such advances include methods to generate reliable and reproducible findings from neuroimaging data, statistical testing and power analysis for high-dimensional neuroimaging mental health data, and developments in imaging genetics. The vision for the Workshop is to stimulate meaningful conversation about key questions in mental health studies towards a set of actionable solutions to advance this important area of study.



Recent Advances in Statistical Methods for Mental Health Services Research

Chair: Elizabeth Stuart, Ph.D., Johns Hopkins University
Dr. Elizabeth Stuart introduced the topic of statistical methods and its importance to NIMH.

Data Science as the Engine for a Learning Health Care Service System for First Episode Psychosis in Coordinated Specialty Care

Melanie Wall, Ph.D., Columbia University
Dr. Melanie Wall introduced an NIMH-funded project called OnTrackNY, which leverages learning health care systems and data science to improve care provision for first-episode psychosis. In people ultimately diagnosed with schizophrenia, a first psychotic episode typically occurs between the ages of 15 to 30 years. Evidence indicates that people with psychotic disorders experience better outcomes when they are identified and connected to care as an adolescent or young adult during or soon after first-episode psychosis.  Prior to Dr. Wall’s involvement with the current project EPINET, she used OnTrackNY administrative data to show group-level improvement over the treatment course. The OnTrackNY cohort showed improvements in social and occupational functioning during their year-long involvement with the network. Additionally, the rate of hospitalizations decreased dramatically from about 70 percent to 10 percent, while education and employment nearly doubled. 

Machine Learning Approaches for Optimizing Treatment Strategies for Mental Disorders

Yuanjia Wang, Ph.D., Columbia University
Dr. Yuanjia Wang described the value of precision medicine, which offers targeted, tailored, dynamic treatment plans in lieu of one-size-fits-many care models. The RDoC framework is a useful conceptual model of precision psychiatry to tailor treatment using biological, behavioral, and psychosocial measures. Machine learning is important for developing and optimizing precision medicine approach. The current practice is to test for interactions between treatment and covariates, but each covariate may contribute a small effect with no guarantee of a linear relationship. In reality, there are many covariates, and an effective model should aim to predict outcomes rather than predict response versus non-response, the latter of which fails to capture negative effects of new treatment.

Employing Social Media to Improve Mental Health: Harnessing the Potentials and Avoiding the Pitfalls

Munmun De Choudhury, Ph.D., Georgia Tech
Dr. Munmun De Choudhury discussed the role of social media in monitoring and improving mental health using computational approaches. She highlighted social media in the context of the COVID-19 pandemic, noting that the internet is a valuable tool for information sharing during crises; by the same token, misinformation circulates quickly and widely. Dr. De Choudhury and team worked to understand how exposure to misinformation, operationalized by social media behavior, impacts mental health.

Discussant: Benjamin Lȇ Cook, Ph.D., Harvard Medical School
Dr. Benjamin Lê Cook presented highlights from each presentation session and remarked on how these tools can improve patient care and enhance equity in mental health treatment.


Statistical Methods for Generating Reliable and Reproducible Findings from Neuroimaging Data

Chair: Ying Guo, Ph.D., Emory University
Dr. Ying Guo introduced the next session, which focused on improving rigor of imaging data in mental health studies and the tools needed to support reproducible outcomes.

Veridical Data Science for Biomedical Research with and Application to Deep Tune for Characterizing V4 Neurons

Bin Yu, Ph.D., University of California, Berkeley
Dr. Bin Yu talked about a framework her research team developed for basic neuroscience application in data management. Data should be viewed as a lifecycle that spans all steps from problem formulation and data cleaning to analysis and interpretation.  Dr. Yu talked about the importance of veridical data science using the PCS framework as a workflow process towards solving these problems. They are working on an interactive data science book that will be available free online by this summer and there is a new division at Berkeley called the Computing, Data Science, and Society (CDSS).

The Impact of Methodological Variation in fMRI Studies

Thomas Nichols, Ph.D., University of Oxford
Dr. Thomas Nichols talked about methodological variation in fMRI imaging. The goal of his research was to address the question of which software to use in fMRI analysis. One reason fMRI research has thrived is that there are available end-to-end, turnkey software packages that are user-friendly and validated. But this also means that each software has a number of analytical choices to make. Each tool has different advantages and disadvantages. In a study comparing different methodologies, a researcher found that different software produced different results, and in some cases, different versions of the same software produced different results. Each program has fundamental differences in their default settings. There are many choices built into these methodologies that can affect outcomes.

By taking a systematic approach, they found evidence of variation between software packages. Though these differences are found in small sample size analyses, many fMRI studies use small sample sizes. Dr. Nichols talked about the way forward, which includes a comprehensive validation of the many software packages using older data that was never validated, large comparison efforts to find a “best practice” in methodology, and a consensus pipeline of fMRI prep results. The last of these may be challenges, but that was the purpose of fMRIprep. A more practical way forward may be a multiverse analyses that acknowledges that there is methodological variation that is sampled and averaged.

The Role of Statistics in Large Complex Neuroimaging Studies

Martin Lindquist, Ph.D., Johns Hopkins University
Dr. Martin Lindquist talked about the movement from small N to large N impact statistical analyses and reproducibility. in the last decade, there has been a lot of discussion about how small sample size neuroimaging studies tend to undermine the reliability of the research. The argument as that these small sample sized studies have low statistical power to detect true effects but also are less likely to produce statistically significant results that are true effects. There has been a recent increase in the availability of very large-scale, diverse lifespan data, such as the Human Connectome Project. With this increase in large-scale data, there is now a need to re-evaluate methods that have been validated for small sample sizes.

Discussant: Todd Ogden, Ph.D., Columbia University
Dr. Todd Ogden reviewed some items that he thought would be good discussion points including effect sizes and sources of variants not only in imaging, but also in psychiatric rating scales. There are number of challenges in ensuring that results are meaningful and reliable, and this creates difficulty when making population-based claims.


Statistical Testing and Power Analysis for High-Dimensional Mental Health Data

Chair: Dulal K. Bhaumik, Ph.D., University of Illinois at Chicago
Dr. Dulal Bhaumik introduced the session on high-dimensional data and how heterogeneity is depicted in power analyses.

Power Analyses for High-Dimensional Neuroimaging Studies

Dulal K. Bhaumik, Ph.D., University of Illinois at Chicago
Dr. Bhaumik addressed the question of the key factors needed in large-scale multiple testing. He began by describing a linear, mixed model fMRI study to determine which functional brain networks were associated with late life (>55 years of age) depression and the effect on functional connectivity measured in 87 brain regions across the left and right hemispheres. In considering this study, they needed to determine the appropriate sample size and how to control for false discovery. They also considered spatiotemporal correlation in the development of their model and assumed that heterogeneity could be extended by incorporating covariates using a large set of parameters. They compared the late life depression group and the control group by looking at differences of intercept parameters.  Dr Bhaumik then reviewed their chosen power and sample size, looking at how different factors that affected power. The power decreases as the false discovery rate decreases and vice versa. The marginal false discovery rate takes care of the proportion of null. Looking at the role of the null proportion, they need more samples to reach the power needed. For mental health studies, such as with late life depression, the null and the p-value are both high. Using negative discovery data, they can see that as sample size increases, higher false discovery rates will provide more power.

A Semi-Parametric Approach to Solve the Multiple Comparison Problem in Analyzing Functional MRI Data

Rajesh Ranjan Nandy, Ph.D., University of North Texas
Dr. Rajesh Nandy talked about the challenges involved in hypothesis-based approaches in fMRI studies. There are challenges in this approach including strong temporal autocorrelation in time course and the residuals, multiple comparison problems, and inherent low-frequency processes in the human brain—the last of these are lesser known than the first two. When doing an inference in active data, researchers need to account for these low frequency processes, which become problematic when there are periodic tasks in the study.  Dr. Nandy and his team sought to determine if the normalized spacings are independently and identically distributed (i.i.d.). By comparing the threshold estimates using these proposed methods and Random Field Theory, they used Fourier basis functions to overcome phased mismatch and looked at resting state and activated state data.

Adjusting for Confounders in Cross-Correlation Analyses of Resting-State Networks

Deepak N. Ayyala, Ph.D., Augusta University
Dr. Deepak Ayyala talked about how fMRI identifies active regions in the brain using blood-oxygen level difference (BOLD) which infers functional connectivity between regions. Resting state networks are useful to study because they show a moderate to high degree of reliability and reproducibility. There are two reliable methods for analyzing resting state networks: seed-based correlation analysis (SCA), which uses the correlation of time series of a given voxel with all other voxels, and independent component analysis (ICA), which is a technique for separating a multivariate signal into additive subcomponents. Using a voxel map is computationally challenging because the order of magnitude is squared, and one are likely to have many, many connections. By reducing the data from voxel-level to region-level, the analyses can be then used to describe the region using a single value.  Dr. Ayyala said that they are not trying to identify activated regions, but rather the reliability of the connectivity of any two activated regions although common study design confounders including visit (different days) or scan (different number of scans per visits) may be introduced into resting state networks. There is also temporal dependence in region-level time series that may inflate the standard of error in the statistics. His team built a comprehensive method to test the reliability and reproducibility of resting state networks accounting for confounders. The challenge is that there was no existing method for cross-correlations between different regions, so they calculated the covariance structure of the cross-correlations to build a multivariate analysis of variance (MANOVA) to test for reliability of effects of the different confounders.  Dr. Ayyala said that the method only allows for a very small number of connections analyzed and they are working on the properties that would accommodate larger networks.

Discussants: Olu Ajilore, M.D., University of Illinois at Chicago
Nicole Lazar, Ph.D., Penn State University
Dr. Olu Ajilore suggested that the panel address how the described methods apply to multimodal neuroimaging data, the applicability and generalizability for data combined across different time scales and from different sources, and the variability in outcomes from using different parameters.
Dr. Lazar talked about multiplicity as a longstanding issue and how these presentations addressed the issue in innovative ways. She asked panelists to consider if is it more critical in a connectivity study to address having a Type I or Type II error, since increasing sample size is not always an option.


Recent Statistical Developments in Imaging Genetics

Chair: Wesley Thompson, Ph.D. University of California, San Diego
Dr. Wes Thompson introduced the session about imaging genetics, which could be interpreted as using genetic data to predicted imaging data. This session will take this interpretation more broadly into the lessons learned from genetic studies, in which large sample sizes can lead to highly replicable results but very small effect size. He believes the field should be looking towards large sample sizes as a partial solution and shifting expectations about effect size.

Estimating the Fraction of Variance of Cognitive Traits Explained by High-Dimensional Genetic and Neuroimaging Measures

Armin Schwartzman, Ph.D., University of California, San Diego
Dr. Armin Schwartzman talked about using genome-wide association studies (GWAS) to describe cognitive traits or phenotypes from large numbers of single-nucleotide polymorphisms (SNPs). GWAS collect information from a very large number of subjects to determine variability across several quantifiable traits. A simple way to describe these traits is by using a polygenic linear model to express the trait using a coefficient determination (R2) in a situation with many more SNPs than
subjects, a high-dimensional data problem.  To overcome the problem, Dr. Schwartzman and his team developed an estimator for GWAS heritability (GWASH). He reviewed how the GWASH estimator works to simulate heritability with data on quantitative data such as IQ. Then he talked about translating the method to neuroimaging data as a brain-wide association study for heritability (Brain-WASH). The challenge is that genetic correlation in GWAS is local, but brain imaging data exhibits long-range correlation. One solution was to assess if long-range correlations can be captured by removing the first proponents.  Dr. Schwartzman summarized by explaining that the Brain-WASH approach allows for consistent estimation of the fraction of variants explained (FVE), allowing for different sets of predictors. This could potentially be extended to other high-dimensional predictors, such as microbiome studies. There are still challenges and his team aim to ensure that the theoretical conditions and diagnostic methods work in real-life situations. Long-range correlations need to be calibrated carefully and then translated into a formal process for removing first proponents. They also hope to work on stratifying different populations across traits, genetics, and brain characteristics so that these variables can be considered.

Efficient Vertexwise/Voxelwise Imaging GWAS for Large-Scale Heterogenous Population Imaging Data and Enabling Downstream Multivariate Inference

Chun Fan, Ph.D., University of California, San Diego
Dr. Chun Fan discussed the challenges in imaging genetics, including effects that are small and widely distributed both in genetics and imaging data, the need to improve reproducibility and interpretability, the power needed to detect genetic facts, integrating results to describe a certain function, and making this accessible to other researchers. The practical solution for these challenges is to use large-scale imaging genetic analyses. This approach is a mixed effects model to capture the diversity and complexity of large populations. Using a linear mixed effects model accounts for complex nesting and dense-complex relatedness, but this voxel-wise method is expensive. Dr. Fan therefore combined genetic and imaging methods into a two-pronged approach illustrating how this two-pronged method of combining data results with voxel-wise results improves power. To address the challenge of replicability, his team created a prediction model using scores for imaging and genetic data and the linear sum of effects. Using this prediction model increased replicability by 50 to 70 percent. Then, to identify which brain regions are more responsible for the effect size to improve inference, they combined voxel-wise summary statistics to approximate regions.

A Platform for Imaging Genetic Study of Brain Aging

Kevin Anderson, Ph.D., Harvard University
Dr. Kevin Anderson said that there is an opportunity to study the biology of the aging brain using large, public data repositories such as the Biobank. Brain aging occurs even in absence of disorders such as Alzheimer’s disease and dementia. This neurodegeneration can create general atrophy and volume loss in both gray and white matter, increased cavity size, and white matter lesions. His team used Biobank data of healthcare records and MRI images from 500,000 individuals, as well as genetics data integrated from large biobanks of postmortem brain gene expression data—both datasets representing individuals between the ages of 50 and 80. They used these data to build a brain aging data platform that aggregated phenotypic, genetic, and gene expression data; provided a dynamic analytic capability; quantified data by age effects; and identified the genetic, lifestyle, and environmental moderators related to brain aging. This platform did not bypass data regulations but was rather a complementary tool for research. Dr. Anderson summarized that age is a major source of variance in biological and behavioral data, and it is important to understand how this might influence data and analyses. They hope to help facilitate and integrate insights about the aging brain into other neuroimaging and genetic studies.

Discussant: Thomas Nichols, Ph.D., University of Oxford
Dr. Nichols provided an overview of the presentations asking the panel members on the details of their models. It includes asking Dr. Schwartzman if there could be interesting variance in those components that were removed and if population stratification accounts for known familial dependence. He asked Dr. Fun to comment on the common spike of relatedness, which might be higher for siblings, and finally, asking Dr. Anderson to address provenance and exact ways to capture exact analysis as a challenge to building data platforms, and Dr. Nichols have gotten answers from the panel for his questions.


Panel Discussion on the Roles of Statistical Methods to Improving Mental Health Studies

Holly Lisanby, M.D., NIMH
Dr. Holly Lisanby began by reiterating the importance of rigor, reproducibility, and adequate power in high-quality research. Increasingly complex data streams and high-dimensional data demand novel approaches to transform novel findings into insights that advance science and, ultimately, inform care. Dr. Lisanby emphasized the need to engage statistical experts and data scientists in the earliest stages of study design, which ensures quality and rigor of an analytic approach in the broader context of experimental design.  The purpose of this last session was to summarize and highlight themes from the four previous sessions. Each session chair gave a brief synopsis, and these were followed by a large panel discussion.

Dr. Stuart highlighted the need to harness complex data such as electronic health records (EHR), extensive longitudinal data, and social media data. These large-scale studies are necessary to achieve sufficient power, and they may allow researchers to answer questions about prediction, causal inference, and descriptive analysis. She pointed out the theme that emerged from Session 1 was integration of complex data across multiple sources. Dr. Stuart talked about the importance of bridging gaps between clinicians, public health practitioners, and statisticians by pointing out the need for statisticians to serve on review panels.

Dr. Gao reviewed Session 2 that she focused on generating reliable, reproducible findings. She suggested that stability and reproducibility should become a standard step in future studies. Researchers should consider how data will be reproduced in different studies as well as how findings will be generalized to different data sources. Data perturbations include differences across studies, data sources (e.g., imaging modalities) and pre-processing procedures. Dr. Gao pointed out that large datasets can be used to validate results from smaller studies and to build priors, improving power and validity of smaller sample studies in local clinical trials.

Dr. Bhaumik reviewed Session Three. He began by discussing the false discovery rate and the need to fix a definition for power itself before drawing statistical inferences from neuroimaging data. The panelists compared various methods for controlling the false discovery rate, demonstrating that deep exploration of the data to discover “ingredients” can help control the false discovery rate.

Dr. Thompson reviewed Session 4 pointing out substantial similarities between genetics and imaging data (e.g., effects and small but widely distributed, population stratification can substantially bias associations, etc.). However, there also are meaningful differences between genetic and imaging data that limit porting of methods from one to the other. For instance, imaging effects are spatially correlated, imaging phenotypes can be noisy, and imaging data come in several modalities. Dr. Thompson pointed out that no single imaging modality can capture every aspect of interest. Consequently, small effect sizes in imaging do not indicate unimportance. Dr. Thompson emphasized the need for large studies, statistical methods to leverage these large data sets (such as GWAS), and methods that address the paradigm of small but widely distributed effects. He recommended developing a simple measure of genetic merit or propensity for a small studies, as well as efforts to ensure that small imaging studies are harmonizable with larger studies.

Dr. Josh Gordon, NIMH Director, asked the discussants to speak about methods to identify and disseminate approaches that should be considered part of good scientific practice. Dr. Stuart said that it remains crucial to include diverse participants on panels such as this meeting or an interdisciplinary review panels. PCORI is one model of an effort to develop consensus on clear standards and guidelines. Dr. Yu added that she worked with CCSF doctors who agreed on minimum requirements in a tiered system (i.e., elements of some tiers are debatable; others are not).

Dr. Thompson expressed concern that the power section of grant applications often is an exercise in justifying a sample size chosen a priori. Applicants often use pilot data with highly exaggerated effect sizes to justify yet another small effect sizes. He wondered if applicants should be required to show confidence intervals in the effect size.

Dr. Yang highlighted the compact method, a mixed model that borrows neuroimagers from genetic studies to conduct data harmonization across studies. The field grows organically when a great measure is recognized. Another way to broadcast best practices is through NIH-funded software and platforms such as Nitrate.

Dr. Dulal pointed out that most researchers can agree on fundamental components of a study, such limiting variability and heterogeneity, good modeling, and large sample size. He agreed with Dr. Thompson that the power analysis section of most grant applications is useless. Although there is not yet a single method to answer all of these questions, Dr. Dulal expressed optimism for the future.

Dr. Yu pointed out the potential value of simulation studies. In her own studies, data-driven simulations have been helpful, but these are underutilized in the statistical world. Dr. Thompson agreed that this would be helpful and would also require large sample sizes.

Dr. Michael Freed (NIMH) said that the Division of Services and Research does support a program of methods research that looks at refining measures and analytical approaches. They encourage applicants to consider the user in some way to help make sense of and disseminate findings. Dr. Robert Heinssen added that NIMH aims to shorten the interval between scientific study and implementation into practice. He recommended moving the traditional approach towards the service system.  Dr. Pim Brouwers spoke on behalf of the Center for Global Mental Health and the Division of Age Research. He appreciated the ongoing work to expand prediction models to include providers and suggested using these prediction tools to change policy. He expressed hope that such efforts will facilitate communications with health departments in various countries to promote integration of mental health services into their general health systems.

Sponsored by

The Divisions of Translational Research and Neuroscience and Basic Behavioral Science