FASTQ files are the files containing the raw sequencing data that are produced by Next Generation Sequencing platforms such as those made by Illumina. These files contain all the information produced by a sequencing run and are used in all downstream analyses.
As there is a drive to make research accessible, making these FASTQ files publicly available is becoming increasingly common. Here I provide a short outline of how to go about uploading FASTQ files to the Sequence Read Archive.
A more detailed description of this uploading process can be found here.
-
Create an NCBI account here
- Initialise a Bioproject
- Go to the Bioproject page
- Login to your NCBI account
- Create a New Submission and fill in the required fields, leaving the Biosample field blank
- Initialise a Biosample
- Go to the Biosample page
- Download the batch submission template as a TSV (tab separated values) file and fill in the required fields
- Create a New Submission and fill in the required fields, including uploading the TSV file
- Wait for the submission to be accepted
- Click on “Download attributes file with BioSample accessions” to get the assigned accession numbers
- Prepare metadata file for Sequence Read Archive submission
- Download a SRA_metadata_acc.xlsx file (I had to begin a SRA submission).
- Fill in the required fields - using Bioproject number (PRJNA…) and Biosample accessions
- You used to need the FASTQ file md5sum values (BUT I DIDN’T FOR MY RECENT SUBMISSION)
- You can get md5sum values using the following UNIX command:
md5sum [fileName]
- Create a Sequence Read Archive submission, if you haven’t already, and fill in the required fields
- At this point you can upload the metadata file you created in step 4
- Log onto SRA server
- If your files are <10GB and you have less than 300 of them, upload your FASTQ files via the web browser. This is so much easier! Or:
- On the SRA page click on the FTP upload link
- Note the server address, username, password and folder to navigate to
- On a UNIX machine navigate to the folder containing (only) your FASTQ files
- Log onto the server
ftp -i
open [server address]
- Upload the FASTQ files
- Note that once logged into server directories are hidden - use full path to folder
- Once in correct folder create new directory
- Upload all FASTQ files
# Navigate to server directory
cd [path to folder]
# Make a new directory to store FASTQ files in
mkdir [new directory name]
cd [new directory name]
# Copy all FASTQ files into new directory
mput *
- Completing the Sequence Read Archive submission
- Go back to the Sequence Read Archive submission page
- On the data uploading tab the folder you created to upload your FASTQ files into should exist.
- Select this folder and click continue.
- If no errors appear, it means all the FASTQ files were successfully linked to the metadata you provided and you can submit!
Took me more than a few attempts to get this to work. If you run into any issues there is a dedicated NCBI team who are happy to help. Here are their emails: