Convert fastq to sra (fastq-load)

alusa

Junior Member

Join Date: Aug 2015
Posts: 2

Convert fastq to sra (fastq-load)

08-05-2015, 09:33 PM

Hello,

I have a problem with the conversion of fastq file to sra. Only available tool I found is called fastq-load from sra toolkit but I dont get the usage from very brief help which represents whole documentation.

I manage to create run.xml and experiment.xml files but I am also a little bit confused which version of xml fastq-load uses, because there are several of them by EBI (http://www.ebi.ac.uk/ena/submit/preparing-xmls). The files that I currently have do not make fastq-load stop, but there are some other errors so they have to be incorrect:

Code:

alena@boxer:/media/Data/Alena/Data/Julie/rawdata$ fastq-load -r run.xsd -e experiment.xsd -o /media/Data/Alena/Data/Julie/rawdata/ -i /media/Data/Alena/Data/Julie/rawdata/ -v
2015-08-05T17:12:03 fastq-load.2.5.2 warn: file="JMJD3_1.fastq" offset="0"
2015-08-05T17:12:03 fastq-load.2.5.2 warn: quality_scoring_system attribute not set for this file, using Phred as default
2015-08-05T17:12:03 fastq-load.2.5.2 err: path incorrect while creating manager within database module - failed to create table with schema NCBI:SRA:Illumina:tbl:phred:v2
2015-08-05T17:12:03 fastq-load.2.5.2 err: path incorrect while creating manager within database module - accession="" status="failure"
2015-08-05T17:12:03 fastq-load.2.5.2 err: path incorrect while creating manager within database module - load failed: path incorrect

I would really appriciate any sort of help. Thanks in advance. Please do not advice to submit data to SRA libraries - I want hem only locally so far.

Now I am trying the conversion on fastq file originally downloaded as sra (but I want other files in sra - http://www.ncbi.nlm.nih.gov/geo/quer...acc=GSM1366930). It contains single end 51 b long reads.

Code:

head JMJD3_1.fastq 
@SRR1232291.1 HISEQ:60:D1VB2ACXX:7:1101:1684:1983 length=51
CAGGCCCAGAACCACCTCAAGTCGGCCTCCCCANNNNCAGCTGCAGCCTCC
+SRR1232291.1 HISEQ:60:D1VB2ACXX:7:1101:1684:1983 length=51
B@BDDFFFFHHHGJJJJJJJHHIIHGIIJJJIJ####000BCGEHCDGHIJ
@SRR1232291.2 HISEQ:60:D1VB2ACXX:7:1101:2385:1976 length=51
ACCCNGGAGGTGGAGCTTGCAGTGAACCAAGATNNNNGTGCCACTTCACTC
+SRR1232291.2 HISEQ:60:D1VB2ACXX:7:1101:2385:1976 length=51
???D#2ABDD:ADEEEIIIIEIEEIEEIEIDID####00?DDEIIIEEIIE
@SRR1232291.3 HISEQ:60:D1VB2ACXX:7:1101:2562:1981 length=51
GCGCAATCCCTGGGAGCCAGGATGAGCAGCACCNNNNAGCCGTAGGAGCCC

Thanks in advance.

Attached Files

Tags: None

alusa

Junior Member

Join Date: Aug 2015

Posts: 2
- Share
- Tweet
#2

08-06-2015, 09:35 PM

I've just got a message from ncbi official support:

There are currently 2 ways to load fastq files, you can use either fastq-load or latf-load.

The advantage of latf-load is that it does not require XML files for the load process to occur and it can handle variable length reads. However, it does throw away the read names.

Fastq-load is an older loader and cannot handle variable length reads and requires XML files.

Fastq-load can have trouble handling fastq files that do not have the standard Illumina read header formats. For your example of SRR1232289 from GSM1366930, you will need to dump the data using the "-F" flag to get the original read header format. The latf-load command is a bit more resistant to non-standard Illumina read names if the data is not paired-end.

The problem that caused the error message is with the output path. You need to specify a directory that does not exist because that is where all the files for the archive will be created and that area will be treated as archive folder. In your example I would use the following path "/media/Data/Alena/Data/Julie/rawdata/JMJD3".

Once the load is successful you will need to run the "kar" utility. For your example it would look like this:
kar -c JMJD3.sra -d /media/Data/Alena/Data/Julie/rawdata/JMJD3

This would create the SRA archive on your local system.
Comment

Previous template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Convert fastq to sra (fastq-load)

Comment

Latest Articles

ad_right_rmr

News