Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • scozza
    Member
    • Jan 2009
    • 16

    Illumina FASTQ format question...

    I am having some confusion over an Illumina FASTQ formatted file I have been asked to assemble. Looking at the file though has lead me to some confusion. Originally I had expected two files each containing one 36bp long read for each paired-end. Instead what I got was one file with sequences and quality lines that are 77 characters long.

    I had inquired from the originator of the file what is going on and they said that the file simply hadn't been split and that the lines were in fact the paired-end reads concatenated. They suggested that I simply split the sequence up and write them out into two files.

    My problem is with the math, 77 is not 36*2. This leaves me wondering what is going on with the remaining 5 bases. So I would like to see if someone can clear up my confusion by answering a couple of questions.

    Is this file a "standard" Illumina/Solexa sequence file?
    What is the deal with the concatenated reads?
    Why wouldn't I want to last 5 bases? Are they adaptors? Low-quality?

    For now I am going to do as suggested and just split the 77 bases in two 36 bases sequences and toss the last 5.

    Thanks for any help you can provide in clearing up my confusion.

    -steve
  • drio
    Senior Member
    • Oct 2008
    • 323

    #2
    Originally posted by scozza View Post
    Is this file a "standard" Illumina/Solexa sequence file?
    What is the deal with the concatenated reads?
    Why wouldn't I want to last 5 bases? Are they adaptors? Low-quality?
    First time I see something like that. I would expect, as you say, two
    separate files, one per each read. Are you sure this is not a fragment-76bp
    run?

    I suggest you map the data treating the reads as FR-77 before doing
    anything else.
    -drd

    Comment

    • scozza
      Member
      • Jan 2009
      • 16

      #3
      Originally posted by drio View Post
      First time I see something like that. I would expect, as you say, two
      separate files, one per each read. Are you sure this is not a fragment-76bp
      run?

      I suggest you map the data treating the reads as FR-77 before doing
      anything else.
      No I am not sure. The info I have at this point comes from an email exchanges I had with the group that sequenced it in which they said that the reads had not been separated for me and that I should run a Perl script they provided to do the splitting.

      Still waiting to hear back from them.

      -steve

      Comment

      • drio
        Senior Member
        • Oct 2008
        • 323

        #4
        Do you have the summary.(htm|xml) file? What's the % of alignment telling you? Are you seeing stats for READ1 and READ2?
        -drd

        Comment

        • scozza
          Member
          • Jan 2009
          • 16

          #5
          Originally posted by drio View Post
          Do you have the summary.(htm|xml) file? What's the % of alignment telling you? Are you seeing stats for READ1 and READ2?
          I didn't but fortunately I didn't need it. My contact at the group that sequenced this got back to me. It turns out these are 77bp single-end reads. Somewhere some miscommunication happened. This is a load off my mind because I thought either I was crazy or the assembler I was using was buggy.

          Thanks Drio appreciate your help!

          -steve

          Comment

          • jkbonfield
            Senior Member
            • Jul 2008
            • 146

            #6
            Build a histogram of the quality values per cycle (we use a local tool called fastqcheck to do this). It clearly shows the end of the first read and the start of the other by the gradual decay in quality per cycle resetting back up to a high quality again (1st cycle of 2nd read). This will indicate the actual number of cycles rather than the claimed number, and it's nicely independent of any html or illumina QC so you can run it on data passed to you from more random sources. It's also a good QC check and can show sudden dips or loss of signal.

            Is it possible that this run was a tagged/indexed run too? The index tag normally resides between the 1st and 2nd read, and again it'll be clearly visible as a sudden jump in the quality values.

            James

            Comment

            Latest Articles

            Collapse

            • GATTACAT
              Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
              by GATTACAT
              Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
              07-01-2026, 11:43 AM
            • SEQadmin2
              Nine Things a Sample Prep Scientist Thinks About Before Sequencing
              by SEQadmin2


              I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

              Here are nine questions we think about, in roughly the order they matter, before...
              06-18-2026, 07:11 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by SEQadmin2, Yesterday, 11:08 AM
            0 responses
            6 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-30-2026, 05:37 AM
            0 responses
            11 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-26-2026, 11:10 AM
            0 responses
            19 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-17-2026, 06:09 AM
            0 responses
            53 views
            0 reactions
            Last Post SEQadmin2  
            Working...