Army STARRS Historical Data Study: Building the Data Enclave
The Army STARRS Historical Data Study focuses on data gathered by the Army and the Department of Defense (DoD). These data will help researchers look for factors that protect soldiers’ well-being and factors that put their mental health at risk.
The Army and DoD routinely gather and store information on soldiers and their experiences as soldiers go through their Army service. The data used in the Historical Data Study were collected for all Regular Army, Army Reserve, and Army National Guard soldiers who served on active duty from 2004 to 2009. The data have been kept in databases run by the many offices that normally collect information for administrative or operational purposes. These include medical offices, personnel offices, and others. These databases have not been linked outside the Army – until now. The purpose of the Army STARRS Historical Data Study is to work with data on soldiers’ characteristics, experiences, and exposures that the Army and DoD are already collecting.
Every research study faces challenges and opportunities to develop new methods. This is also true with Army STARRS. Listed below are some of the more interesting aspects of launching the Historical Data Study.
More than a Billion
The Army and the study team have identified 38 databases with information that may relate to suicide risk among soldiers. Together, these data sources include more than 1 billion records on active duty soldiers and more than 3,000 different types of information. Army STARRS receives a database and then stores the data in a special computer system – the Army STARRS data enclave. The enclave is housed securely and managed by Army STARRS investigators at the University of Michigan.
The Army STARRS Historical Data Study will not use data that could identify a soldier. For example, the data do not include names, Social Security numbers, addresses, etc. When the Army transfers a dataset to the Army STARRS data enclave, one of the final and most important steps is removing all information that could identify an individual soldier. The first step taken by the Army STARRS team is confirming that the database they received does not contain identifying information.
Putting it all Together
Assembling and linking these databases is complicated. It takes a great deal of attention to detail, coordination among different organizations, and time. There are many complex procedures and steps involved in finding, adding, and documenting each data source. The process includes
- getting a table of contents for the data,
- getting data samples, and
- performing quality checks.
This enormous effort requires the combined resources of the Army STARRS research team, the U.S. Army Public Health Command (Provisional), the Army’s Chief Information Officer G-6, the Army Data Center, Fairfield, and many others.
Researchers have already begun to work with data in the enclave. Findings from the Historical Data Study will be combined with data from the All Army Study, the New Soldier Study, and the Soldier Health Outcomes Study to help researchers find those factors that help protect soldiers’ emotional well-being and those factors that put their well-being at risk.