An Emerging Era of Big Data
By Thomas Insel on February 15, 2012
Every era has its technological icons. In the early 20th century, it was the airplane and the automobile. In the mid-20th century, it was the television and the telephone. Now in the early 21st century, the smart phone and social networking appear to be the defining technologies. And increasingly, the current era is beginning to look like the era of “big data” — a term that refers to the explosion of available information, a byproduct of the digital revolution.
“Explosion” is not too strong a term, especially in the mental health field. Watching Google or Wikipedia, it’s easy to accept the estimates that the amount of information doubles every two years. But in biomedical research, the rate of growth is much faster. One of the drivers is the advent of inexpensive, fast sequencing of DNA and RNA. In 2002, if you wanted to sequence a megabase (1 million bases) of DNA, you needed $5,292 and several weeks of manual labor to do it. Today, you only need $0.19 and a few hours of machine time.1 This difference is even more striking at the level of the human genome. While the first human genome was a $3B project requiring over a decade to complete in 2002, we are now close to being able to sequence an entire genome in a few days for only $1,000.2
We’ve seen similar changes in brain imaging, where higher resolution instruments are generating massive data sets that can provide more precise pictures of brain structures. In fact, the big data revolution is found at every level of NIMH research, including clinical studies that are now able to capture inputs from digital sensors. In addition, studies of social networks can, for the first time, combine information from millions of people, surveying what some have called “humanity’s dashboard”—a tool that may help us combat many diseases and other social ills.3 A now famous example of social network studies tied a spike in emergency room visits for the flu with an increase in the number of people searching Google for “flu symptoms” and “flu treatments” two weeks prior to the ER spike.4
These revolutionary changes in data acquisition create profound challenges for storage. Indeed, it may now be less expensive to generate the data than to store it. The National Center for Biotechnology Information (NCBI) has been our control tower for directing big data efforts in biomedical science, but neither the NCBI nor anyone in the private sector has a comprehensive, inexpensive, and secure solution to the problem of data storage.
Even more challenging than storage is the task of translating big data into better knowledge. Sequencing of the genome or mapping of the brain give us the opportunity to discover new, important frontiers, including genes and brain areas we did not even know existed. But vast data sets also may elicit faulty science, potentially tempting an investigator to search for the data that supports his or her own theory. There are safeguards to preclude such “false discoveries,” but even these may fail to prevent a biased use of selective data sets.
These caveats notwithstanding, the big data revolution can be transformative for mental health research, but only if much of this data become public. After all, if knowledge is power, then making scientific and health data public can become empowering. We are already seeing this happen with “public access” scientific and medical journals, as well as PubMed Central, which was created to make the results of all publications from NIH-funded studies available for free.
Of course, having places in which to share information only helps if scientists are willing to share. Biomedical science has a proprietary tradition that has been slow to change in the face of NIH’s increasing focus on data sharing.5,6 But as more scientists see the successes of sharing, such as the Psychiatric Genomic Consortium and the 1000 Connectomes Project, the proprietary culture will become more transparent and collaborative.
Some of the most innovative vanguard efforts to harness the power of big data are found outside of government and outside of mental health. The Personal Genome ProjectExternal Link: Please review our disclaimer., Patients Like MeExternal Link: Please review our disclaimer., NextBIOExternal Link: Please review our disclaimer., and some of the projects within Sage BionetworksExternal Link: Please review our disclaimer. are among the current efforts connecting individuals to big data related to their health. As these crowd-sourced efforts give individuals information about their own health, they are also creating knowledge for all of us. In a classic example, data registered on Patients Like Me indicated that using lithium to treat amyotrophic lateral sclerosis (ALS) was futile—years before the completion of prospective trials.7
The mental health community has been slower to join this revolution, but this could change. It just requires a passion to share information, a capacity to develop data repositories, and a vision for turning individual data into collective knowledge. We have some unique challenges in the mental health community: lack of a central organization, inconsistent quality of information, and in some cases, a denial of illness. But — as we are seeing in areas as diverse as robotics and baseball — big data has a way of overcoming big challenges. In fact, big data may be the solution for a field that has been lacking in metrics of performance or success.
References
- DNA Sequencing costs table. http://www.dnasequencing.org/achievements/60-dna-sequencing-costs-tableExternal Link: Please review our disclaimer.. Accessed Feb 13, 2012.
- International Human Genome Sequencing Consortium et al. Initial sequencing and analysis of the human genome. Nature. 15 Feb 2001.409:860-921.
- Quote attributed to Rick Smolan, in Lohr, S. 11 Feb 2012. The Age of Big Data. New York Times. Accessed http://www.nytimes.com/2012/02/12/sunday-review/big-datas-impact-in-the-world.htmlExternal Link: Please review our disclaimer..
- Carniero HA and Mylonakis E. Good Trends: A web-based tool for real-time surveillance of disease outbreaks. Clinical Infectious Diseases. 2009; 49:1557–64.
- Tenopir C et al. Data sharing by scientists: practices and perceptions. PLoS ONE. 29 Jun 2011, 6(6):c21101.
- Savage C and Vickers A. Empirical study of data sharing by authors publishing in PLoS journals. PLoS ONE.18 Sept 2009. 4(9):e7078.
- Wicks P, et al. Accelerated clinical discovery using self-reported patient data collected online and a patient-matching algorithm. Nature Biotechnology. Apr 2011. 29: 411-414.
RSS Feed for Director’s Blog
Publications by the Director
Posts by Topic
Disorders
- Attention Deficit Hyperactivity Disorder (ADHD) (3 Items)
- Autism (15 Items)
- Bipolar Disorder (4 Items)
- Borderline Personality Disorder (1 Item)
- Depression (6 Items)
- Eating Disorders (1 Item)
- Obsessive-Compulsive Disorder (OCD) (2 Items)
- Post-Traumatic Stress Disorder (6 Items)
- Schizophrenia (14 Items)
Populations
Research
- Basic Research (23 Items)
- Clinical Research and Trials (16 Items)
- Research Funding (26 Items)
- Mental Health Services Research (3 Items)
Other
Posts by Month
- April 2013 (2 Items)
- March 2013 (3 Items)
- February 2013 (2 Items)
- January 2013 (2 Items)
- December 2012 (2 Items)
- November 2012 (3 Items)
- October 2012 (1 Item)
- September 2012 (2 Items)
- August 2012 (3 Items)
- July 2012 (1 Item)
- June 2012 (2 Items)
- May 2012 (2 Items)
- April 2012 (3 Items)
- March 2012 (5 Items)
- February 2012 (3 Items)
- January 2012 (3 Items)
- December 2011 (4 Items)
- November 2011 (3 Items)
- October 2011 (4 Items)
- September 2011 (2 Items)
- August 2011 (3 Items)
- July 2011 (1 Item)
- June 2011 (4 Items)
- May 2011 (2 Items)
- April 2011 (2 Items)
- March 2011 (4 Items)
- February 2011 (3 Items)
- January 2011 (3 Items)
- December 2010 (3 Items)
- November 2010 (2 Items)
- October 2010 (3 Items)
- September 2010 (2 Items)
- August 2010 (3 Items)
- July 2010 (1 Item)
- June 2010 (4 Items)
- May 2010 (2 Items)
- April 2010 (3 Items)
- March 2010 (4 Items)
- February 2010 (1 Item)
- January 2010 (3 Items)
- December 2009 (2 Items)
- November 2009 (2 Items)
- October 2009 (1 Item)
- September 2009 (1 Item)




