Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • MiSeq fastq output: 250-251 bp reads

    Hello,

    I'm getting reads from a MiSeq machine and noticing that many have 251bp instead of 250bp, and that the last base has a highly skewed base composition. From what I read of the manual, the sequencer always make read_length + 1 cycles of imaging, but only read_length are analyzed (for phasing etc...). Shouldn't the final fastq have all reads the same size and of only 250bp?? Is it ok to cut out the final base or is it a sign that there's something wrong?

    Also, a side note: some reads have less than 250bp, and I assume it is because of adapter trimming. If this is the case, then both read1 and read2 of a pair should be trimmed to the same size, otherwise it shouldn't be trusted, no?

    Thanks,
    Daniel

  • #2
    Hi Daniel,

    You're correct in that the actual read length of an Illumina run is always N+1 because the extra base is used for phasing/pre-phasing analysis. Ideally that last base should be trimmed off because it's not properly quality checked.

    As for reads < 250bp, if you're using a Nextera kit and had Trim Adapters checked in the sample sheet, then you're correct about why you have shorter reads. If you're seeing that the two reads of a pair aren't the same length, then you're also probably seeing that read 1 is shorter than read 2. This would be an issue with the trimming where the base quality of read 2 dropped low enough that the adapter sequence wasn't properly called and thus couldn't be recognized to be trimmed. Some third-party apps can do a much better job of trimming so you may want to try those.

    Comment


    • #3
      Thanks for the reply. Very useful information.

      The only thing I'm still puzzled is why some reads have 250bp and others have 251bp.

      Daniel

      Comment


      • #4
        Some facilities set up a run as (n+1) depending on the number of bases (n) you had asked to be sequenced.

        If you did not set this run up yourself then it is possible that the original run was set up as 250 x 251 bp (if one read is consistently 250 or less and other is 251 bp or less depending on trimming).

        Comment


        • #5
          I would understand if there was some obvious consistency.
          What I observe is that for the same run, read1 OR read2 can be either 250 or 251bp (and sometimes 249bp!) with no apparently consistent pattern. I'm suspicious that the behaviour is coming from adapter trimming.

          Counts | Read1 | Read2
          4223 | 250 | 248
          7940 | 250 | 249
          58517 | 250 | 250
          130842 | 250 | 251
          10571 | 251 | 248
          21321 | 251 | 249
          145959 | 251 | 250
          331396 | 251 | 251
          ...

          Comment


          • #6
            Adapter trimming can't be the cause of it (unless this was set up as a longer run originally than 250 bp).

            Did you run this yourself (if not you should ask the facility that ran it to see how the original run was set up).

            Comment


            • #7
              I didn't run it myself, but it was using Nextera V2 250x250
              Adaptor trimming was on (I guess by default)

              Thanks,
              Daniel
              Last edited by dsobral; 12-10-2013, 06:16 AM.

              Comment


              • #8
                PS: although the data has these peculiarities, I used these for denovo assembly of a bacteria, and it gave good results...

                I just noticed because when I tried Edena on the full data, it complained about the sizes...

                I was just wandering what to think of it.

                Thanks

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                25 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                27 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                24 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                52 views
                0 likes
                Last Post seqadmin  
                Working...
                X