Policy for Distribution of Limited Access Datasets From NIMH Clinical Trials

The NIMH supports data collection from participants in numerous clinical trials. These data from well-characterized clinical samples constitute an important scientific resource. It is the view of the NIMH that their full value can only be realized if they are made available, under appropriate terms and conditions, in a timely manner to the wider scientific community. To this end, limited access datasets from clinical trials supported under NIMH research contracts, grants and cooperative agreements will be made available for distribution.

Responsibilities of Investigators Seeking Access to Study Data

To assure that the confidentiality and privacy of study participants are protected, all investigators seeking access to data from NIMH-supported studies that are in the possession of the Institute must execute and submit as their request the appropriate standard Data Use Certification (DUC). Requesters must certify that their institution or organization is covered by a Federal Wide Assurance (FWA) issued by the Department of Health and Human Services (HHS) Office of Human Research Protections. To assure that privacy and distribution guidelines will be followed, the DUC requires that investigators seeking access to study data must also obtain the signature of the Institutional Official. As a condition of receiving the limited access dataset, the requesting investigator must agree to the terms specified in the DUC.

The NIH expects investigators and institutions to meet the conditions for access specified and agreed to in the DUC. Failure to comply with it may result in adverse consequences, denial of further access to NIMH datasets, and legal action by study participants, their families, or the U.S. Government.

Description of What Is Provided

Included as part of the distribution of a limited access NIMH dataset are the individual data files that comprise the "dataset" for a particular clinical trial, as well as dataset documentation, i.e., description of all variables and their definitions. All personal identifiers have been excluded and other data elements modified so as to reduce the likelihood that any individual participant can be identified.


Documentation for limited access datasets is comprehensive and sufficiently clear to enable investigators who are not familiar with a specific dataset to use it. The documentation includes a brief description of the study, including a general orientation to the study, its components and its examination and assessment time points; data collection forms; study protocol/procedures; descriptions of variable recoding performed; and a list of major study publications.

In addition, an instructional "readme" file may be included, which instructs investigators how to go about using the overviews, documentation and data files. Also included will be a listing of all limited access files being provided, a description of system requirements, and a description of all data formats provided (e.g., SAS, ASCII). It is intended that the overview and documentation are sufficiently complete and clear so that investigators who are not familiar with the dataset will be able to find everything they need to use it on their own.

Data Storage and Format

The limited access dataset is stored on an encrypted CD ROM. All documentation as noted above are prepared in a consistent format (e.g., MS Word, ASCII, or PDF file) and included on the same CD ROM. In general, datasets are formatted in transportable SAS files. In some instances, other formats are provided, such as EXCEL or ASCII files.

Content of Limited Access Data

In addition to summary information, limited access datasets also include for each participant those individual data elements that have not otherwise been processed into summary information. Included, at least, are baseline, interim assessment visit(s), and outcome data, along with laboratory measurements (if applicable) not otherwise summarized.

Data files without personal identifiers are submitted to NIMH by the clinical trial's research team, which is responsible for the content and accuracy of the data elements and documentation. NIMH further reviews the datasets to ensure that they are properly de-identified and ready for distribution.

Safeguards for Limited Access Data Sets

The following guidelines indicate what steps are taken by data submitters to maintain participant privacy in the limited access public use datasets.

  • Direct participant personal identifiers (e.g., name, addresses, social security numbers, place of birth, city of birth, contact data) are not included.
  • New identification numbers replace original identification numbers. Codes linking the new and original data are not included on the CD ROM. Clinical center identifiers are removed. Clinical centers are distinguished by a code, but identification of the particular centers is not included.
  • Direct clinician, interviewer, or technician personal identifiers are removed and are replaced by recoded numbers, where appropriate.
  • Sensitive data, including illicit drug use, risky behaviors (e.g., carrying a gun or exhibiting violent behavior), sexual behaviors, and selected medical conditions (e.g., alcoholism, HIV/AIDS) are deleted if the size and focus of the trial are such that knowledge of these variables could lead to loss of participant privacy.
  • If present, regional variables with little or no variation within a center if they could be used to identify that center are deleted.
  • Verbatim responses stored as text data (e.g., specified in "other" category) are deleted, or may be edited to delete identifying text and thus included.
  • Dates: All dates may be coded relative to a specific reference point (e.g., date of randomization or study entry) if knowledge of actual dates could lead to a loss of participant privacy.
  • Variables with low frequencies for some values, that might be used to identify participants, may be recoded, if appropriate. Any variables recoded will be identified in the documentation. These might include:
    • Socioeconomic and demographic data (e.g., marital status, occupation, income, education, language, number of years married);
    • Household and family composition (e.g., number in household, number of siblings or children, ages of children or step-children, number of brothers and sisters, relationships, spouse in study);
    • Numbers of pregnancies, births, or multiple children within a birth;
    • Anthropometrical measures (e.g., height, weight, waist girth, hip girth, body mass index);
    • Physical characteristics (e.g., missing limbs);
    • Detailed medical conditions such as HIV/AIDS or medical conditions with low frequency (e.g., group specific cancers into broader categories) and related questions such as age at diagnosis and current status;
    • Parent and sibling medical history (e.g., parents' ages at death); and
    • Race/ethnicity and sex information when very few participants are in certain groups or cells.
  • There may be other variables identified by the submitters that may make it easy to identify individuals. All such variables are recoded or removed.

For information about how to request limited access datasets from NIMH, see (Limited Access Datasets from NIMH Clinical Trials)