Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Helical
    Junior Member
    • Mar 2013
    • 9

    Processing SOLiD data from SRA using Tophat

    Hello all,

    I'm attempting to run Tophat on SOLiD data from an SRA file and running into problems with the fastq file formatting.

    After running fastq-dump on the SRA file, I get the following format:

    @SRR1119927.1 solid309_20110721_FRAG_BC_yadegari_1_55_1170 length=50
    T000002201013000130000000.01...20...2....2.....2...
    +SRR1119927.1 solid309_20110721_FRAG_BC_yadegari_1_55_1170 length=50
    !+,0,,/'*&/)&&)2%&+2.0%37!7%!!!1%!!!%!!!!5!!!!!5!!!
    Executing Tophat like this:

    tophat -C -o output --bowtie1 ColorIndex SRR.fastq

    Results in the following error:

    Error running bowtie:
    Too few quality values for read: 2899T33
    are you sure this is a FASTQ-int file?
    I researched this error and found that the problem may be I need to use the --quals option and provide a separate quality file. So, I split the fastq file into two separate files:

    @SRR1119927.1 solid309_20110721_FRAG_BC_yadegari_1_55_1170 length=50
    T000002201013000130000000.01...20...2....2.....2...
    +SRR1119927.1 solid309_20110721_FRAG_BC_yadegari_1_55_1170 length=50
    !+,0,,/'*&/)&&)2%&+2.0%37!7%!!!1%!!!%!!!!5!!!!!5!!!
    And ran:

    tophat -C --quals -o output --bowtie1 ColorIndex SRR.fastq SRR_qual.fastq

    That generates the following error:

    Error encountered parsing file SRR.fastq:
    Premature end of file (missing quality values for SRR1119927.1 solid309_20110721_FRAG_BC_yadegari_1_55_1170 length=50)
    I can't find any information on how to properly format the base and quality files when they are separated so that Tophat can read them. Is this my problem? Or something else?

    <EDIT>

    I properly formatted the two split files into proper FASTA:

    >SRR1119927.1 solid309_20110721_FRAG_BC_yadegari_1_55_1170 length=50
    T000002201013000130000000.01...20...2....2.....2...
    >SRR1119927.1 solid309_20110721_FRAG_BC_yadegari_1_55_1170 length=50
    !+,0,,/'*&/)&&)2%&+2.0%37!7%!!!1%!!!%!!!!5!!!!!5!!!
    But now get the following error:

    Error running 'prep_reads'
    Error: beginning of quality values record not found! (!'/,<&.&&*'%1*%.2(%&20%'&!')!!!%&!!!1!!!!1!!!!!%!!!)
    Last edited by Helical; 06-19-2014, 06:43 AM.
  • mastal
    Senior Member
    • Mar 2009
    • 666

    #2
    TopHat is probably expecting the data to be in 2 files, .csfasta and .qual.

    I think there should be a command 'abi-dump', instead of fastq-dump,

    that will produce the file formats that you need.

    Comment

    • mbblack
      Senior Member
      • Aug 2009
      • 245

      #3
      Did you use fastq-dump, or abi-dump to generate your original files? If the SRA submission was actually in color space reads, then you should use "abi-dump" NOT fastq-dump with the SRA toolkit. The abi-dump command will actually give you matched csfasta/csqual files.
      Michael Black, Ph.D.
      ScitoVation LLC. RTP, N.C.

      Comment

      Latest Articles

      Collapse

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by SEQadmin2, 06-09-2026, 11:58 AM
      0 responses
      19 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-05-2026, 10:09 AM
      0 responses
      27 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-04-2026, 08:59 AM
      0 responses
      38 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-02-2026, 12:03 PM
      0 responses
      61 views
      0 reactions
      Last Post SEQadmin2  
      Working...