NIMH Data Archive - QA Resubmission Webinar
LORRAINE SIOCHI: Hello everyone.
Today we'll be discussing the NDA resubmission process for submissions that were identified to have QA errors.
Our agenda for today is who should be resubmitting data, how to resubmit your corrected
Data, and how to identify the errors in your submissions using the QA results report provided by NDA.
MASON INGALLS: First, we'll be talking about who should be resubmitting data.
NDA expects you to resubmit your data if your NDA collection received a QA results e-mail from the NDA help desk.
And this will look something like this, with a subject saying “NIMH Data Archive QA Results”
with your PI's name and your NDA Collection ID.
Once you get your QA errors e-mail, you can find the QA report in this S3 link under “How
to Resolve QA Errors. It is in step one right here.
So, you click this link, open it up and it will take you to this page. We advise you to right click on this page and save it as a CSV file so that it is more readable.
Once you save that, you can open the file and view all of your QA errors in a much more
Once you've reviewed the QA error report and corrected your data, you can follow our “How to
Provide Corrected Files” instruction guide, which is also linked in your QA results e-mail.
You will see it under step 6 right here of “How to Resolve your QA Errors”.
So, you can click this, and it will give you all the information you need to resubmit.
The general steps are as follows. First, you will launch the validation and upload tool
and then click “Fix QA Errors” in the right of the bottom banner.
Next, you will select the submission in which you have corrected data and click submit.
A note for this is that you should only be resubmitting for one submission at a time.
So, you will click the one checkbox here, then click submit.
Then it will take you to this page where you can select “Choose Files” and choose your corrected
data files for upload. Or, alternatively, you can drag and drop the files.
Something to note for this is you should only be resubmitting data for data structures that
have QA errors.
For example, there are two data structures here. The first one is et_subject_experiment_O1. This one does have QA errors. The second one, eeg_sub_files_O1 does not
So, in this case you should only be resubmitting corrected data for et_subject_experiment_O1.
Another thing to note is that the same number of rows from the original submission must be
uploaded in your resubmission.
So, you will see up here on the same page, et_subject_experiment_O1 originally had 126 rows in your submission.
And then down here, the file that was uploaded to correct them only had eight of 126 errors.
In most cases, you will want to have these uploaded rows matching. The only cases in which
the uploaded rows can be a little bit different is if you receive the “duplicate records” error or the “missing subjects” error. But we will talk about those soon.
Once you have checked the number of rows, validated your files, and make sure
that they have no other types of errors, you can click next “Build Package”.
Then, you can verify that your listed information here is correct, and then you can
submit your data.
DOMINICK DIMERCURIO: Now that you know how to resubmit your data, let us look at the different types of error types that you may encounter so that you will know how to interpret these errors and correct them for your resubmission.
One error type you may encounter is the “Duplicate Records Within a Submission”.
This error will indicate that there has been identical data submitted for a subject within a data structure.
In order to correct this error, you will need to identify the subject using the submitted source subject ID value in column L on your error report.
You will then remove the duplicate record from the data structure so that you can resubmit your data.
Let's see an example of what this looks like on your error report.
In this example, you can see in column J that for the data structure fMRI_results_01
we have in column L 111, 222, and 333 all have duplicate records.
So then if we go to our submission template, we will then see this error clearly. So GUID 1 has
111 and 111 duplicated.
Then GUID 2 is subject 222 and 222, and 333 and 333.
What we will need to do is remove these duplicates so that we only have 111, 222, 333 for a total of three records in this data.
Then, this corrected submission template can be resubmitted.
Another type of error you may encounter is the “Miscalculated Age” error. In particular, the subtype interview age inconsistent with date.
This will indicate that there are different interview ages that are listed for the subject
for the same interview date.
The interview age is the age at which the interview was done in months.
The age must be consistent across all data structures.
For instance, if in one data structure on January 1st, it says that the subject is 200 months old, and then in the second data structure for the same date January 1st, it says that they're 202 months old, then this would be inconsistent.
The suggested action here is to review the ages that were recorded for the subjects across the data structures noted on the report.
Ensure that the correct age has been recorded, paying careful attention to the rounding rule.
Once the records have been corrected, resubmit the data.
An example of what this might look like in your error report.
Here, we have both data structures fMRI_results_01 and NDAR_subject_01.
In one data structure, there is a subject that was listed as 139 months old, and then in another they were listed as 130. These were both for January 14th, 2019.
We will then need to decide whether the 139 or the 130 is incorrect. When we find the incorrect submission template, we can then go through, find the interview ages that are incorrect, correct them, and resubmit.
You may see a similar error where now the interview date is inconsistent with age, and this will indicate that there are different interview dates that are listed for a subject with the same interview age element.
Again, the interview age is the age at which the interview was done, and the age must be consistent across all data structures.
In this case, you may have someone who came to your study on January 1st, and they were 200 months old and then they came back again in July 1st, and they were still 200 months.
In this case, you need to review the interview date recorded for the data subjects across the data structures and ensure that the correct date has been recorded.
Again, pay careful attention to the rounding rule.
Once you have made the corrections, you can then resubmit the data.
Here's an example of what that might look like in your error report.
So now they are 139 months in both data structures, but you will see that this is happening both on January 14th, 2019, and November 14th, 2009.
You will need to identify which submission template has the error.
Then, go to the interview date and correct the dates.
Then, you can resubmit.
Another error that you may encounter is the “Inconsistent Source Subject ID for the Same
This error indicates that a given GUID is associated with multiple source subject IDs.
The source subject ID is how the subject is defined in your lab, whereas the GUID is the global unique identifier used by NDA.
A single participant should have only one source subject ID and one GUID.
In your error report, “submitted subject key” is going to refer to the GUID and “source subject ID” is going to refer to your locally used ID.
The suggested action for this type of error is to review the source documents for the correct
source subject ID, then correct the data locally and resubmit.
You will also need to document this change and upload it to the supporting documentation.
For an example of how this may look on your error report, you can see here that in columns E through G we have GUIDs 1, 2 and 3, but GUID 1 in this case has been submitted with both source subject IDs 222 and 111.
You will need to determine which of those is the correct source subject ID, and then go to the submission template with the QA error.
In this case, GUID 1 should not be 222, it should be 111.
So, we correct this and now we have a corrected submission template that we can resubmit.
You may also encounter a similar error, “Inconsistent GUID for the Same Source Subject ID”.
In this case, it indicates that a source subject ID is assigned to multiple GUIDs in the data structure.
Again, there should only be one source subject ID and one GUID for every single participant.
And again, submitted subject key refers to GUID and source subject ID refers to the locally used ID.
The suggested action here is to review the source documents for the correct GUID.
If necessary, you may use the GUID tool to validate the GUID.
Then, correct the data locally and resubmit.
How this may appear on your error report, you can see in columns F and G, you have GUID
2 and GUID 1, both being linked to the source subject ID of 111.
You will need to determine which of those is the correct GUID.
Then, go to the submission template with the QA error and correct.
So, in this case, GUID 2, then GUID 3, then GUID 1.
In this case, it should have been listed as GUID 1 goes with source subject ID 111, GUID 2 with 222, and GUID 3 with 333.
Once you have corrected your submission template, you can resubmit.
Another possible area you may encounter is the “Missing Subjects for Data Structure Provided”.
This indicates that there are fewer subjects submitted than were in previous submissions.
Note that neurosignaling and omics subject data do not need to be submitted cumulatively and
thus are excluded from this report.
NDA expects all clinical data, that is all data structures with the clinical assessments data type, to be submitted cumulatively.
The suggested action here is to review the submission and confirm that all subjects collected so far were included in your cumulative submission, and then resubmit your data.
How may this appear on your error report?
You will see here that for columns E through G, you see blanks in the GUID because there was no GUID, this subject is missing, but then in column G in the previously submitted GUID, you see GUIDs 1, 2 and 3 are the missing subjects.
You will also then see the previously submitted data structure was ndar_subject01, but the missing subjects are not included.
You can also see their source subject IDs were 111, 222 and 333.
The submission ID being provided here is showing you the submission where it was provided, but the subjects are missing so they are not in future submissions.
You may also come across another similar error, “Missing Data Structure”, and this is going to
indicate that a data structure you submitted cannot be found in later submissions.
The data structure’s short name in question will be provided in the report.
Again, we expect you to submit all clinical data accumulatively.
So, what you will need to do is confirm that all cumulative data were submitted.
You will either need to upload the missing structures or confirm with NDA why this structure is missing to resolve this error.
Another potential error type you may encounter is “Inconsistent Sex Provided”.
This error will indicate that there is a discrepancy with the sex at birth value across data structures.
For instance, a subject might be listed as male with a letter M in one data structure, but female with the letter F in another.
You will then want to review the source documents for the correct sex.
You will then record the appropriate code M for male, F for female, O for other, or NR for not reported on all subject records locally and resubmit.
If this change in sex is a valid change, please document this change and upload it to the supporting documentation.
How might this appear on your error report?
So, you will look to columns J through O. You will see here that we are looking at data structure, fMRI results 01, and we are looking at subject 111, 222 and 333.
For subject 111, we see that in one submission, they were listed as M for male, and in another submission, they were listed as F for female, and likewise for 222 and 333.
You will need to determine which of these submissions has incorrect data, and then go to the sex column and correct with the appropriate code.
You may also receive a “Sex Value Threshold” warning. Note that this is a warning and not an error.
This warning will occur if the percentage of other or not reported values exceed the acceptable thresholds.
The acceptable threshold for other is 25%, and the acceptable threshold for not reported is 50%.
In this case, you should review the data to confirm that the values reported in the sex element are accurate and update the data accordingly.
Note that when this error occurs, the submission can continue without further action, if indeed the sexes were recorded appropriately and there is in fact more than 25% others and 50% not reported that should be listed in your data.
An example of what this may look like in your error report.
For instance, if all three subjects 111, 222 and 333 were listed as other, then it will show with this error type sex value threshold warning for each of these subjects.
LORRAINE SIOCHI: If you have any questions about QA, the resubmission process, or you would like a copy of the presentation deck, please contact the NDA helpdesk at firstname.lastname@example.org.