There are so many reasons not to share scientific data – in industry, among academics, and even for some patients. For pharmaceutical companies, data are usually considered proprietary, with sharing limited by intellectual property rules. Research and development is a competitive process with success defined by being the first to bring a lucrative product to market. In universities, research is no less competitive. Scientific success in universities is built around attaining tenure through individual promotion and publications. Publications require research and research requires funding. With only 1 in 7 grant applications receiving funding from NIH, the process has never been more competitive. Sharing data (or products of research, like a genetically engineered mouse) before publication or, in some cases, even after publication, may feel like giving away the family jewels. And patients may not want their private medical information shared. Particularly when that information includes a diagnosis of a mental illness or personal history, protecting privacy becomes even more important.
But all of these very good reasons not to share must be balanced by the urgency of need. Patients and families need answers. For a parent with a non-verbal child with autism or adults with a parent developing Alzheimer’s disease, time matters.
The consequences of not sharing data – data hoarding – are simply unacceptable. When no one shares roadmaps, others continue to waste time and money following dead ends and all of us lose out on possible successes. In the case of biomedical research in universities, it is usually taxpayers who paid (through NIH) for those maps to be made. The public deserves to have their interests outweigh those of individual scientists striving to get ahead. How can we reign in the stifling competitive cultures of industry and academia, and build a culture of sharing and collaboration?
Fortunately, data sharing has now become a hot topic in the scientific community. Beginning with successes in genomics, where large numbers were required to achieve statistical power, scientists have come to realize that sharing data is sometimes the only road to success. There are still logistic challenges to overcome, such as how to fairly give credit when more than 50 scientists all contribute significantly to a research effort. Nevertheless, the culture is changing, and now genomic data are being made available rapidly, sometimes even before publication, as was done with the set of three papers on rare genetic mutations in autism published recently in Nature 1, 2, 3 or as supported via the Psychiatric Genomic Consortium , a federation of over 200 scientists at 60 institutions in 19 countries committed to sharing data.
Similar trends are emerging in the field of neuroimaging. The 1000 Functional Connectomes Project involves fMRI data, in a standard format, from over 33 sites around the world, and is now prospectively adding phenotypic data. 4 In six months, these data were accessed by scientists from over 75 countries; 4 an indication of how excited the scientific community is to have such a valuable resource. The importance of these data sharing efforts is not only that science goes faster, but also that the data are available to scientists in the broader community (e.g., computer scientist, statisticians). That means that more brilliant minds will have the chance to ask and find answers to questions, both now and in the future, that might have never been asked by the original investigator.
Surprisingly, despite a reputation of high security and corporate culture, pharmaceutical companies are not shying away from these data sharing and mining efforts; they’re taking a lead. 5 Forced by the high failure rate of drug research and development (recent figures show greater than 95% failures in clinical trials) 6, major companies are joining forces to create “precompetitive partnerships.” 7 Two weeks ago, three companies shared patented medications with NIH for a new initiative to permit academic scientists to have both data and compounds that were previously proprietary. Several companies have formed partnerships with NIH to develop biomarkers , and some companies are joining forces to create collaborative teams to work on projects in early stages of development. 7 A group of academic, NIMH, and industry scientists plan to work with the FDA to mine individual-level results from the more than 100 previous antidepressant trials to guide future trial design. And an ambitious new public-private effort, CommonMind , will make genomic data available broadly.
NIMH, like all of NIH, expects a data sharing plan for all large projects. In some areas, such as autism, NIMH, together with several other ICs, has created the National Database for Autism Research , which provides standardization and access to both federal and non-federal human autism research data. In addition to autism data sharing, NIMH is creating a cross-walk across current studies, from biomarkers to predictors of psychosis, to ensure that data can be shared. Published results from NIH research must be submitted to PubMed Central, providing free, public access to journal articles. And clinical trial data are now required to be posted on ClinicalTrials.gov. But we clearly need to do much more to enforce these expectations if we are to ensure the maximal impact of the research we fund.
Along with initiatives from industry and NIH, we are seeing research advocates as well as private research institutes jump into the data sharing pool. In its inaugural year, One Mind for Research is working to bring funding agencies, labs, research organizations, and advocacy groups together to encourage sharing of scientific, financial, and technological resources in the treatment of TBI and PTSD. Sage Bionetworks has become a hub for developing public access tools for data sharing and data mining. For neuroscience, the Allen Institute of Brain Science has led the way with atlases of the mouse, monkey, and human brain, available to anyone with a browser.
Data sharing is not without risks. Bad data remain bad even if shared, large numbers do not guarantee better insights, and big research teams may dilute accountability. Some kinds of scientific data (cellular neurophysiology, for instance) may be difficult to interpret or may be frankly misunderstood without access to the original methods. Without standardized methods, it may be impossible to integrate data across studies. For many areas of biomedical and behavioral science, we lack standardized methods for data collection, meaning that shared data could create more misunderstanding than insight.
But the risks to not sharing are arguably much greater. We need to find ways to spur progress and to open access without removing the incentives for promotion. That this is happening in industry more than in academia speaks to the challenge of changing a culture built on individual promotion. How much should we continue to allow the public good to suffer because of individuals’ desires to get ahead? We need to find creative solutions to ensure that the public derives maximum benefit from their investment in biomedical and behavioral science, because rapid, broad, accurate access to information matters.
1Sanders SJ, Murtha MT, Gupta AR, et al. De novo mutations revealed by whole-exome sequencing are strongly associated with autism. Nature. Apr 2012. 485(7397):237-41.
2Neale BM, Kou Y, Liu L, Ma'ayan A, et al. Patterns and rates of exonic de novo mutations in autism spectrum disorders. Nature. Apr 2012. 485(7397):242-5.
3O'Roa BJ, Vives L, Girirajan S, et al. Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations. Nature. Apr 2012. 485(7397):246-50.
4Functional Connectomes Mission statement: http://fcon_1000.projects.nitrc.org/indi/docs/INDI_MISSION_STATEMENT.pdf
5Munos BH, Chin WW. A call for sharing: adapting pharmaceutical research to new realities. Sci Transl Med. Dec 2009. (9):9cm8.
6Arrowsmith J. Trial Watch: Phase II failures: 2008-2010. Nat Rev Drug Discov May 2011. 10(5):328-9.
7Cain C. A mind for precompetitive collaboration. SciBX. May 2012. 5(19) : 1-5.