Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Convert fastq to sra (fastq-load)

    Hello,

    I have a problem with the conversion of fastq file to sra. Only available tool I found is called fastq-load from sra toolkit but I dont get the usage from very brief help which represents whole documentation.

    I manage to create run.xml and experiment.xml files but I am also a little bit confused which version of xml fastq-load uses, because there are several of them by EBI (http://www.ebi.ac.uk/ena/submit/preparing-xmls). The files that I currently have do not make fastq-load stop, but there are some other errors so they have to be incorrect:

    Code:
    alena@boxer:/media/Data/Alena/Data/Julie/rawdata$ fastq-load -r run.xsd -e experiment.xsd -o /media/Data/Alena/Data/Julie/rawdata/ -i /media/Data/Alena/Data/Julie/rawdata/ -v
    2015-08-05T17:12:03 fastq-load.2.5.2 warn: file="JMJD3_1.fastq" offset="0"
    2015-08-05T17:12:03 fastq-load.2.5.2 warn: quality_scoring_system attribute not set for this file, using Phred as default
    2015-08-05T17:12:03 fastq-load.2.5.2 err: path incorrect while creating manager within database module - failed to create table with schema NCBI:SRA:Illumina:tbl:phred:v2
    2015-08-05T17:12:03 fastq-load.2.5.2 err: path incorrect while creating manager within database module - accession="" status="failure"
    2015-08-05T17:12:03 fastq-load.2.5.2 err: path incorrect while creating manager within database module - load failed: path incorrect
    I would really appriciate any sort of help. Thanks in advance. Please do not advice to submit data to SRA libraries - I want hem only locally so far.

    Now I am trying the conversion on fastq file originally downloaded as sra (but I want other files in sra - http://www.ncbi.nlm.nih.gov/geo/quer...acc=GSM1366930). It contains single end 51 b long reads.
    Code:
    head JMJD3_1.fastq 
    @SRR1232291.1 HISEQ:60:D1VB2ACXX:7:1101:1684:1983 length=51
    CAGGCCCAGAACCACCTCAAGTCGGCCTCCCCANNNNCAGCTGCAGCCTCC
    +SRR1232291.1 HISEQ:60:D1VB2ACXX:7:1101:1684:1983 length=51
    B@BDDFFFFHHHGJJJJJJJHHIIHGIIJJJIJ####000BCGEHCDGHIJ
    @SRR1232291.2 HISEQ:60:D1VB2ACXX:7:1101:2385:1976 length=51
    ACCCNGGAGGTGGAGCTTGCAGTGAACCAAGATNNNNGTGCCACTTCACTC
    +SRR1232291.2 HISEQ:60:D1VB2ACXX:7:1101:2385:1976 length=51
    ???D#2ABDD:ADEEEIIIIEIEEIEEIEIDID####00?DDEIIIEEIIE
    @SRR1232291.3 HISEQ:60:D1VB2ACXX:7:1101:2562:1981 length=51
    GCGCAATCCCTGGGAGCCAGGATGAGCAGCACCNNNNAGCCGTAGGAGCCC
    Thanks in advance.
    Attached Files

  • #2
    I've just got a message from ncbi official support:

    There are currently 2 ways to load fastq files, you can use either fastq-load or latf-load.

    The advantage of latf-load is that it does not require XML files for the load process to occur and it can handle variable length reads. However, it does throw away the read names.

    Fastq-load is an older loader and cannot handle variable length reads and requires XML files.

    Fastq-load can have trouble handling fastq files that do not have the standard Illumina read header formats. For your example of SRR1232289 from GSM1366930, you will need to dump the data using the "-F" flag to get the original read header format. The latf-load command is a bit more resistant to non-standard Illumina read names if the data is not paired-end.

    The problem that caused the error message is with the output path. You need to specify a directory that does not exist because that is where all the files for the archive will be created and that area will be treated as archive folder. In your example I would use the following path "/media/Data/Alena/Data/Julie/rawdata/JMJD3".

    Once the load is successful you will need to run the "kar" utility. For your example it would look like this:
    kar -c JMJD3.sra -d /media/Data/Alena/Data/Julie/rawdata/JMJD3

    This would create the SRA archive on your local system.

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Essential Discoveries and Tools in Epitranscriptomics
      by seqadmin




      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
      04-22-2024, 07:01 AM
    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 04-25-2024, 11:49 AM
    0 responses
    19 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-24-2024, 08:47 AM
    0 responses
    19 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-11-2024, 12:08 PM
    0 responses
    62 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    60 views
    0 likes
    Last Post seqadmin  
    Working...
    X