Announcement

Collapse
No announcement yet.

MiSeq fastq output: 250-251 bp reads

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • MiSeq fastq output: 250-251 bp reads

    Hello,

    I'm getting reads from a MiSeq machine and noticing that many have 251bp instead of 250bp, and that the last base has a highly skewed base composition. From what I read of the manual, the sequencer always make read_length + 1 cycles of imaging, but only read_length are analyzed (for phasing etc...). Shouldn't the final fastq have all reads the same size and of only 250bp?? Is it ok to cut out the final base or is it a sign that there's something wrong?

    Also, a side note: some reads have less than 250bp, and I assume it is because of adapter trimming. If this is the case, then both read1 and read2 of a pair should be trimmed to the same size, otherwise it shouldn't be trusted, no?

    Thanks,
    Daniel

  • #2
    Hi Daniel,

    You're correct in that the actual read length of an Illumina run is always N+1 because the extra base is used for phasing/pre-phasing analysis. Ideally that last base should be trimmed off because it's not properly quality checked.

    As for reads < 250bp, if you're using a Nextera kit and had Trim Adapters checked in the sample sheet, then you're correct about why you have shorter reads. If you're seeing that the two reads of a pair aren't the same length, then you're also probably seeing that read 1 is shorter than read 2. This would be an issue with the trimming where the base quality of read 2 dropped low enough that the adapter sequence wasn't properly called and thus couldn't be recognized to be trimmed. Some third-party apps can do a much better job of trimming so you may want to try those.

    Comment


    • #3
      Thanks for the reply. Very useful information.

      The only thing I'm still puzzled is why some reads have 250bp and others have 251bp.

      Daniel

      Comment


      • #4
        Some facilities set up a run as (n+1) depending on the number of bases (n) you had asked to be sequenced.

        If you did not set this run up yourself then it is possible that the original run was set up as 250 x 251 bp (if one read is consistently 250 or less and other is 251 bp or less depending on trimming).

        Comment


        • #5
          I would understand if there was some obvious consistency.
          What I observe is that for the same run, read1 OR read2 can be either 250 or 251bp (and sometimes 249bp!) with no apparently consistent pattern. I'm suspicious that the behaviour is coming from adapter trimming.

          Counts | Read1 | Read2
          4223 | 250 | 248
          7940 | 250 | 249
          58517 | 250 | 250
          130842 | 250 | 251
          10571 | 251 | 248
          21321 | 251 | 249
          145959 | 251 | 250
          331396 | 251 | 251
          ...

          Comment


          • #6
            Adapter trimming can't be the cause of it (unless this was set up as a longer run originally than 250 bp).

            Did you run this yourself (if not you should ask the facility that ran it to see how the original run was set up).

            Comment


            • #7
              I didn't run it myself, but it was using Nextera V2 250x250
              Adaptor trimming was on (I guess by default)

              Thanks,
              Daniel
              Last edited by dsobral; 12-10-2013, 06:16 AM.

              Comment


              • #8
                PS: although the data has these peculiarities, I used these for denovo assembly of a bacteria, and it gave good results...

                I just noticed because when I tried Edena on the full data, it complained about the sizes...

                I was just wandering what to think of it.

                Thanks

                Comment

                Working...
                X