NIMH Data Archive (NDA) Populating your Data Expected List Webinar
LORRAINE SIOCHI: Hello everyone and thank you for joining our Data Expected office hours!
Today, we are here to learn how to update your Data Expected list, specifically the Data Structures per Research Aim section of your Data Expected list so that by the end of the session you understand how the Data Expected list is used in NDA, how to add a Data Structure to your Data Expected list, how to request changes to an existing Data Structure, and how to request an entirely new Data Structure by completing the Data Structure Template.
In this presentation, we'll go over just a general Data Expected overview. We'll go over how to search for existing Structures in NDA, how to add existing Structures with and without changes, and how to request entirely new Data Structures.
The Data Expected list is a list of all the Data Structures that you'll submit data to in your NDA Collection. The Data Expected list can be found under your Data Expected tab of your NDA Collection. To edit this Data Expected list and to populate it with Data Structures, you must have admin permissions to the NDA Collection. So, if you're the PI, you'll gain these permissions after you submit the NDA Data Submission Agreement. But if you're not the PI, your PI can provide you these permissions via the Permissions tab of the NDA Collection.
NDA expects that you populate this list within six months of your grant start date with all the Data Structures you know and expect to submit data for. But do note, you can still modify your Data Expected list after these six months. The six-month deadline allows time for any requests to modify existing Structures and requests for new Data Structures to be mapped to your NDA Collection in time for your first submission deadline.
Your Data Expected list is divided into two different sections. You have your Mandatory Data Structures up at the top. Just below that you have your Data Structures per Research Aim section.
Your Mandatory Data Structures are prepopulated by NDA when your NDA Collection is created. You are required to submit for all the Data Structures listed under your Mandatory Data Structures section. All NDA Collections will have this Research Subject and Pedigree Data Structure, but depending on the types of data you're collecting, you may also see additional Mandatory Data Structures like this NIMH CDE or Common Data Element Data Structure. More information on Mandatory Data Structures can be found in our Data Expected tutorial linked at the bottom of this slide.
The second section of the Data Expected list is where you come in. This section is populated by you with all the Data Structures in which you collected data for and will submit data for. So again, your task is to update this section with the Data Structures you use for data collection within six months of your grant start date.
So, before we get into adding these Structures to your Data Expected list, let's first go over what a Data Structure is. A Data Structure is NDA's version of your data collection entities. As in your measures, your surveys, or your assessments. NDA strives towards rigor and reproducibility, so these Data Structures are our approach to harmonizing data collected by hundreds of different labs.
Each Data Structure is composed of Data Elements and some metadata information. Data Elements are the single items usually in your original assessment, like your questions, your variables, or parameters.
Each Data Structure will be categorized into a few different things. The Data Structure will have its own unique Title and Short Name. A Data Structure will have an identified Data Type identifying what types of data the Data Structure is expecting. You may also see one or two Categories identified which are just brief descriptions on the subjects being assessed in that Data Structure. You'll also see a Description just outlining the overall purpose for the data being collected in that Data Structure. Lastly, you'll see Source identified, which designates the specific project that Data Structure is for. So, you'll only be able to use Data Structures with the NDA or NIMH Data Archive or NIAAA sources. Data Structures with the sources of ABCD, OAI, HEP, CCF are specific to those projects and are only usable for those projects.
Getting into Data Elements, again, these are like the single items. Typically, the questions are variables in your original assessment and provides more information on the subjects you're collecting. Each Data Element consists of an Element Name, Data Type, Size, Requirement, Description, Value, Range, Notes, and aliases.
So, let's go through each one of these and some examples that you'll see in a Data Structure. First, we have the Element name, just the identifier of the Element. Then we have the Data Type which notes the types of values accepted for that Data Element. Some examples you'll see are string, date, integer, floats, and even file type. String Data Elements will have an indicated Size which shows the maximum number of characters allowed for that Data Element. Again, Size is only for string characters.
Then we have this Required column. Under this Required column, a Data Element will either be listed with Required, Recommended and sometimes Optional or Conditional. If you see Required for a Data Element, this means that there must be a value provided for that Data Element. If you see Recommended, Conditional or Optional Data Elements, these can be ignored completely if they're not applicable to your study. So, this is important when you're looking for Data Structures to use for your NDA Collection.
Then we have our Description outlining what that Data Element is identifying. Next to that we have our Value Range and Notes. Value Range and Notes shows the acceptable values accepted for that Data Element. So, in this example we see a Value Range of 0 and 1260. This means for this Data Element; you can only enter integers between and including 0 and 1260. The Notes column will note what those values mean.
And lastly, we have our Aliases. These are just alternate Element names or identifiers for a Data Element that are specific to the NDA Collections that request these aliases. So, if you see an alias like in this example, you see erq_date for the Data Element interview date. This means that if you requested to use this alias for your NDA Collection, then you're able to use it. However, if you did not request to use this alias that's listed here, you should use the original Element name interview_date.
And in all Data Structures, including the ones that you may need to create and request for, they will include these five Required Data Elements. You'll see subjectkey, src_subject_id, interview_date, interview_age and sex.
So now that we understand the Data Expected lists and Data Structures a bit more, let's go over the tools NDA has to search for existing Structures to add to your Data Expected list and Data Elements that you may need to request to add to a Data Structure.
When searching for a Data Structure, you may come across three different paths. We have our happiest path, the path of least resistance, which is finding a Data Structure that fits your assessment perfectly. And by perfect, I mean the Data Structure has all the questions or variables that your original assessment has, but it may also have some additional Data Elements that are only listed as Recommended, so you can just leave those blank.
The next path you may find yourself in is that you do find an existing Structure, but it's missing some of those questions or variables in your original assessment, so you need to request some additions to that Data Structure.
And the least happy path you may find yourself in is that you can't find any Structure at all that matches your assessment, and you need to request an entirely new Structure with Elements that may exist in NDA, but some might not yet exist in NDA. But before we get into the step by steps of adding Structures to your list, requesting these changes, or requesting an entirely new Data Structure, let's go over how to best utilize NDA search tools so you have all the information you need for any of these three paths.
NDA offers two different tools. We have our Home Page Search Bar, and we also have the NDA Data Dictionary. The Search Bar is best used if you're just starting out and you're not exactly sure of naming conventions used in NDA because it has a broad scope.
The Search Bar uses Boolean logic with "or" being the default. So, by this I mean that when you enter a phrase, the Search Bar it will assume "or" exists between each word. So, let's take this phrase for example. We're searching for a Data Element that identifies if a participant is forgetful in daily activities. If we just enter this phrase "is forgetful in daily activities" in the Search Bar, it will assume that "or" is in between each of those words. So, I'll look for results that have the word "is" OR has the word "forgetful", OR it has the word "in" OR has the word "daily" OR it has the word "activities". So, when we put this in the Search Bar, we come up with 8000 results. 6,000 of those results are Data Elements, and that's just a lot to sift through. So, to narrow down your search, we suggest using "and" in between keywords instead of searching for the entire phrase.
So, let's take that same example "Is forgetful in daily activities". Let's take out those keywords and input "and" in between those. So, we have "forgetful AND daily AND activities". This provides only 36 search results, and all those 36 search results are Data Elements, so this narrows down our search.
Another tip to narrow your search is to use our advanced filtering options. We have "filter by types", where you can just select Data Structures and Data Elements. We also have "filter by sites". This is where source comes in. As I said before, you'll only be able to use Data Structures with the NIMH or NDA or NIAAA Data Structure sources, so you can just select those two to narrow down your search.
Our second tool for searching for Data Structures and Data Elements is the NDA Data Dictionary. This is a database of all the Data Structures and Data Elements that exist in NDA. This search tool uses an exact search logic, so that's why I said if you're unsure of NDA’s naming conventions, we advise using the Search Bar first, so you have that broader scope to work with.
When searching for Data Structures, again it will only provide perfect matches. So, to increase your search flexibility, we suggest using "OR", "AND", or "NOT" in between your keywords. So, for example, if you're looking for a Data Structure that has the words "autism" and "disorder" in it, you can enter in the text search field "title: autism AND disorder" to only provide results that have the word autism and disorder in the title. And similarly for the description, you can use "OR", "AND" or "NOT" and search that way.
It's very similar for Data Elements as well. Again, perfect exact matches only, use "OR", "AND", and "NOT" to increase your search flexibility.
With that, we're now going to go over the step by step how to add Structures to your Data Expected list, how to request changes to an existing Structure, and how to request an entirely new Data Structure to be added into the NDA Data Dictionary along with your NDA Collection.
MASON INGALLS: Thanks, Lorraine! So far, we've discussed the components of a Data Structures and how to find Data Structures in the NDA Data Dictionary. So next, we'll go over how to add Data Structures to your NDA Collection’s Data Expected list.
So, there are three paths we mentioned that you can have when you're looking to add Data Structures to your Data Expected list. The first path is considered the easiest of the three. It's what we call the happy path, in which you found a Data Structure in the NDA Data Dictionary that works for you as is with no changes or modifications necessary.
So, say you found an existing Data Structure that works for you, and you want to submit data to this Data Structure. First you go to your NDA Collection page, then you navigate to your Data Expected tab and click the green ‘New Data Expected’ button. Then you'll enter your Targeted Enrollment, your Initial Submission and Initial Share Dates, all based on the info from your grant or your data sharing agreement.
Next, in the Data Structure search field, you'll need to insert either the exact title or the exact short name of the Data Structure you're looking to add. So, in this example the search is for the short name emrq01. Then you should see your Data Structure appear in the list, and in this case it's the Emotional Regulation Questionnaire. So, you select this Data Structure from the list, and then you select that the Data Structure meets research needs as is, and then click ‘Add Data Expected’. And once you've done these steps, you'll see your Data Structure added to your Data Expected list. Since you don't need any modifications or anything for this Data Structure, it works for you as is, you can go ahead and start submitting data to the Structure whenever you're ready.
So next, we'll go over how to request changes to an existing Data Structure in the NDA Data Dictionary. So first we started with that happy path where you found a Data Structure that works as is and you can use it right away without any modifications. The second path we'll go over is when you find a Data Structure that kind of works for you, but maybe there are some changes that will be needed for the Structure to fill your data. For example, maybe the Structure only contains some of your questions, but not all of them.
So, we'll start off with listing the types of requested changes we see most often. First is adding existing Data Elements from the NDA Data Dictionary. For example, if you find an Element and one Data Structure but you want that Element and a different Data Structure, then we can go ahead and add that for you. The second is requesting brand new Data Elements that don't exist yet in the Data Dictionary. We can also add new values to the Value Range of existing Data Elements. For example, if you have an Element that allows values of 1 to 5, but you need it to have values of 1 to 10 instead. Also, you can request aliases for Data Elements. This is mostly done whenever you want to use a custom Element name that differs from the official Element name. And we have some other types of changes we see occasionally, such as requests to increase the maximum size of a string Element and request to change the data type of an Element.
So next we'll discuss some rules and limitations with the requested changes. For Value Ranges of Data Elements, we cannot modify the existing value in any way that impacts other data that's been submitted. So, what this means is we should only request to add new values to the Value Range, and since the Data Element notes often provide info on the different values, these rules apply to the Notes field as well. So, if you have a Data Element you want, but the Value Range conflicts with the data you've collected, maybe the NDA version is 1 to 5 and yours is 1 to 4, you should request a new Data Element rather than request to modify the existing one.
We also cannot change the existing Element name in any case. If you wish to change the Element name to something different, you should request a custom alias instead. Similar to the Value Range, we cannot change the Element Description in a way that impacts other data that's been submitted. So, this often applies to things like the time frame, the context, or maybe a specific wording of an Element Description. The one example we have listed here is you cannot request the change "how have you felt in the last week?" to "how have you felt in the last month?" since that changes the time frame of the original Element. And again, in this case you should request a new Data Element instead.
So now we'll go over how to request these changes. The general outline of the steps are as follows. First, search for the Data Structure in the NDA Data Dictionary. Then search for the Structures Definition file and download it. Then edit that Definition file with your requested changes and save it as an Excel file. And lastly, add the Data Structure to your Data Expected list.
Now we'll go over these steps in detail. First, you want to visit the NDA Data Dictionary that Lorraine talked about earlier. And then search for the Data Structure you'd like to change. As Lorraine mentioned earlier, the NDA Data Dictionary search uses exact search terms, so please make sure you use the exact title or the short name of the Structure you'd like to change. And once again, in this example we're searching for emrq01. So, after you search for the Structure, you should find it in the search results and from here click the Data Structure title to go to the Structures Data Dictionary. And here on the Data Dictionary page you can see all of the metadata and Data Elements in this Structure, but we're interested in the download section at the top right of the page. And here you will see a link that you can Click to download the Data Structure definition as a .csv file. So, click this link to download the .csv Definition file, and once it's downloaded, you can open this file and begin editing it to request your changes.
So now that we've downloaded our file, we can go ahead and start making edits to request the different types of changes. First, we'll start with requesting changes to a Data Element that is already in the Data Structure, such as maybe adding a new value to the Value Range. So, to start, find the Data Element and the Definition file that needs modifications. Then modify the Data Element row with your requested changes and highlight the cells that were changed in yellow. In this example, you'll see changes were made to the Value Range and the Notes fields, so these two cells are highlighted in yellow. And optionally, you can add a new column. You'll see it as column ‘I’ in this example here, to leave notes for the Data Curators, and this can be used to just provide some additional info on your requested changes.
Next, we'll go over how to add an Element that exists in the Data Dictionary and another Data Structure, but not the one you're using. So, start by adding a new row to the bottom of the Definition file, and in this new row provide the Element name exactly as it appears in the Data Dictionary and highlight the Element name and value. And that's about it. But again, feel free to add a new column with notes for the Data Curators.
So now we’ll go over how to add an existing Element that you would also like to request changes for. This combines the last two scenarios we went over, and we have an example for this one. So, say you want to add this existing Data Element fhs_02 to your Structure, but you want to add another value to the Value Range. So then same as last time, you add a new row to the bottom of the Definition file. And then you provide the Element name exactly as it appears and highlighted in blue. Then include the changes you'd like to make to the appropriate fields and highlight them in yellow. So, in our example, you'll see fhs_02 highlighted in blue and then the value, range and notes are included and highlighted in yellow. And as always, you can include notes for the Data Curators as needed.
Before going over how to add new Data Elements, we'll discuss some info and rules for Data Elements. So, there are four data types that can be requested and each one allows for different types of data input. Integer accepts positive and negative whole numbers, floats accept positive and negative decimals, strings accept numbers, letters, and some special characters, and then date accepts dates in that specific format you see here.
Integer Elements are always preferred though, so use these whenever possible. For example, maybe you were requesting a string Element with YES and NO values only. Instead, we ask you request an integer Element with 0 equals no and 1 equal yes. Also note that string Element Value Ranges are case sensitive, so pay extra attention when filling out the Value Ranges for these. And on the other hand, Data Elements should not have anything in the Value Range field.
Element names must be unique across the entire Data Dictionary, and Element names can only contain up to 30 characters. They should only contain letters, numbers, and underscores, and the first character in the name must be a letter.
The Size of the Data Element indicates the maximum number of characters allowed in the data submitted to the Element. The maximum value you can put for a Size is 4000, but we ask that you enter an appropriate but not overly limiting size when possible. For example, if you have a Data Element asking for something like a serial number or maybe the name of a manufacturer, then the Size should not be 4000 characters, it should be something more appropriate.
So, with those rules out of the way, we'll now go over how to request a brand-new Data Element and a Data Structure. First, same as usual, had a new row to the bottom of your file. And then provide the following fields for your new Element: the Element Name, the Data Type, the Size (only if requesting a string Element, otherwise you can leave it blank), an Element Description, and then the Value Range in Notes if applicable. And once you've done this, highlight that whole row and change the row text to red.
So, once you've formatted all of your requests in the Definition file, save that file in an Excel format. And it's important to save it in this format rather than a .csv because it ensures all of that highlighting and color coding is saved to the file. And then the remaining steps should look pretty familiar. Go to your NDA Collection’s Data Expected list and click the green ‘New Data Expected’ button. Then enter your Targeted Enrollment, Submission and Share Dates appropriately. Then in the Data Structure search field, enter the exact title or short name of the Structure. And then once you select your Data Structure from the resulting list, click ‘No, it requires changes to research needs’, and what this does is it allows you to upload the file with changes that you just finished working on.
So, from here you can click ‘select file’ and then select your requested changes Excel file that you just worked on. And once you've verified that the correct files attached and everything else is good, click ‘Add Data Expected’. And now you'll be able to see your existing Structure in your Data Expected list along with that requested changes file. And from here, your request will be reviewed by the NDA Data Curators, and you'll need to wait to submit data to that Structure until the Data Curator confirms your requests have been implemented.
Lastly, we'll go over requesting a new Data Structure. So, this is the third and last of our paths in which you aren't able to find any Structure that matches your assessment well, and in this case you can request a brand new one instead.
So, to request a new Data Structure, we ask that you provide a zip file containing the following three things. First is the new Data Structure Template in Excel format. And we asked for this to define the original assessment questionnaire, whatever it is that you're using in a way that's consistent with the format required for NDA Data Structures.
Next is a copy of the assessment or questionnaire as a PDF or a Word document. And this is so the Data Curators can ensure that the new Data Structure Template reflects all the source documentation, but it also helps to provide context when the Data Curators are working on your Structure. And then third is either information on the DOI or publication or a file that we call the Category and Description Template. And this is used to provide additional metadata that's required for the data creators to add your requested Structure to the NDA Data Dictionary.
So now we'll go over how to obtain a copy of both the New Data Structure Template and the Category and Description Template. So, if you go to our homepage at nda.nih.gov, navigate to ‘About us’ in the header, and you'll see a drop down from there and a page called ‘Forms and Templates’ at the bottom of the list. Clicking this will take you to a page with a lot of helpful links, forms, and templates, but the two templates we want can be found in the ‘Quick Reference for Investigators: Data Submission’ section. And in this section, first you'll see a link to the Data Structure Template. This is what you need to click to download to begin requesting a new Data Structure. And in the same section you can click the link to the ‘Category and Description Template’, and this is what you'll need to provide that additional metadata for your new Structure.
So next we'll go over how to fill in the New Data Structure Template. Once you've downloaded the Data Structure Template and you opened it up, you'll see that there's some valuable info that will be really helpful when formatting your requested Data Structure. There's a ReadMe file that contains a lot of instructions, rules, and formatting standards, as well as some details on the components and various fields in the Data Structure Template. At the bottom of this file, you'll see that there's a second sheet that contains the actual Structure Template. And you can click this sheet to begin filling in the Data Structure Template.
First, you'll notice that the template is prepopulated with five required indicate Data Elements. These are required to be in every single Data Structure exactly as is, so please do not remove or modify these in any way. And then below those you'll see some examples and instructions on how to add an existing Data Element, request changes to existing Elements and adding new Elements.
So now we'll go over how to format your requested Structure Template, but you'll probably notice that all of the formatting is pretty much the same as the stuff we've gone over before. So first we'll go over how to add an existing Data Element to your template. You want to make a new row. Make sure to provide the Element name exactly as is, and then highlight it in blue. And if you're requesting any changes to the Element, fill in the appropriate cells and highlight the cells in yellow.
And here we have those same rules that I went over previously when making new Data Elements. I'll just touch on the important things to remember. First, make sure to use the appropriate data type for your new Element, and try to use integer Elements whenever possible. Date Elements should not have any Value Range and string Elements have case sensitive Value Ranges. Element names need to be formatted appropriately and they must be unique across the entire NDA Data Dictionary, not just your one Data Structure. And when you're requesting a string Element, please be sure to include an appropriate size that does not exceed 4000 characters.
To request your new Data Element, add that new row per usual and fill in all of the fields you see here. Element Name, Data Type, and Description are always required, but Size, Value, Range, and Notes should only be included as needed. And then once you filled in all of the fields for your new Element, you can take the entire row and change the rows text to red.
So now here's a recap of the info we have on the new Data Structure Template. Remember that every new template will contain those five required Elements. These should not be altered or removed in any way. You've heard me go over the color format plenty of times now. Blue is for existing Elements, Yellow is for existing Elements with changes, and then red is for brand new Elements. And as always, if you want to add that new column with notes for your Data Curator, please feel free to do so.
So now I'll go over how to complete the category description template that we mentioned before. Again, this provides NDA the necessary metadata that's required to create a new Data Structure. You do not need to fill this in if there's a DOI or a publication that contains your assessment or questionnaire in full. But otherwise, you will need to fill in this template.
So first, provide the Data Structure title that you would like to use. Then provide a brief description that outlines the Data Structure, and if there are any relevant DOIs or publications that relate to your new Structure, include them here as well. And optionally you can include any other references or notes for the Data Curators.
Then at the bottom of this template, you'll see that there's a second sheet called ‘Categories’ and this contains a pretty extensive list of categories that you can select for your Data Structure. We ask that you select one or two of these categories that relate to your Data Structure most from the list.
So, after you've picked your categories, go back to the template, and fill in the category column with your selected categories. And after this is done and you've confirmed the other fields have been filled in, save your file in Excel format.
So now that we've filled in both the Data Structure Template and the category and description template, where in the home stretch. So you've made all of your additions to the templates. You need to zip these two templates together and your original assessment documentation into one file. And then the next step should look familiar again. Go to your Data Expected list on your NDA Collection page, click the green ‘New Data Expected’ button. But this time in the pop-up window, make sure you select ‘Request New Data Structure’ rather than searching for an existing Data Structure.
Fill in your Targeted Enrollment and other dates appropriately, then include your requested Data Structure title. But please note that your Data Structure title cannot closely match an existing Data Structure title. It also needs to be formatted properly. So, we ask that you use a descriptive, full title with proper capitalization and without abbreviations.
So after filling in your title, click ‘Select File’ to open up the File Explorer, and here you should select the zip file with your Data Structure Template, Category, description template and your original assessment documentation. Then, after confirming that everything's filled in properly, click ‘Add Data Expected’. And you will be able to see your requested Data Structure listed and then the zip file will be listed in the Data Structure file section. And from here, the ending Data Curators will review your Structure at a later time. We ask for your patience as the Data Curators review your Structure. And please wait to submit data to the Structure until you hear from a Data Curator regarding your request.
And that wraps up our three different paths for updating your Data Expected list. On the slide, we'll have a brief overview of everything we've gone over. First is that happy path we mentioned earlier where an existing Data Structure contains everything you need. We have the second path, where an existing Data Structure contains some of your questions, but maybe not all of them. And then the last path where you can't find any Structures that work for you, and you need to request a brand new one instead.
And with that, I will pass things back to Lorraine so she can wrap up our presentation.
LORRAINE SIOCHI: Thanks Mason. That was a lot of information to take in, so here are some key things to take away from this session. First, start updating your Data Expected list as soon as possible. There are a few steps to take before you can start submitting data to Data Structures, so don't wait until your first submission deadline to start. Second, you can always modify your Data Expected list after you've added to it. It's not an end all be all situation once you have added to it. Next, search for existing Structures using our NDA Data Dictionary or home page Search Bar and only create new Structures if there isn't anything that matches your assessment.
Next, requesting changes is always preferred and can be implemented faster than creating a new Data Structure. When looking at Data Structures and their Data Elements, remember that required Data Elements must have a value provided, whereas recommended Data Elements can be completely ignored and left blank if not applicable to your study.
And lastly, wait for a Data Curator to confirm they've completed your request for changes or new Data Structures before submitting data to those Data Structures.
And that wraps up our Data Expected webinar. Thank you for listening in! If you have any further questions or would like a copy of the slide deck, please e-mail the NDA helpdesk at NDAHelp@mail.nih.gov. Thank you!