Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • salamay
    Member
    • May 2014
    • 20

    FASTQC trends

    I have some metagenomic data obtained from whole genome shotgun sequencing using illumina-hiseq. The reads are 100bp paired end and when I examine the reads in fastqc, I see a couple of things. Firstly, the per base sequence content and per base GC content seem to be very skewed at the beginning of the reads (~ bp 1-16), and the per base N content seems to have a spike at bp 4. As well, I have over represented kmers at the beginning of the reads which do not belong to any adapters (as far as I can tell). I know that these trends are sometimes seen in RNA-seq data due to the (not so) random hexamer priming but I am confused as to why I see this in whole genome data. I am also not sure about the N spike at bp 4. I have attached images of what I mentioned and would appreciate any insight.

    thanks.
    Attached Files
  • TonyBrooks
    Senior Member
    • Jun 2009
    • 303

    #2
    Originally posted by salamay View Post
    I have some metagenomic data obtained from whole genome shotgun sequencing using illumina-hiseq. The reads are 100bp paired end and when I examine the reads in fastqc, I see a couple of things. Firstly, the per base sequence content and per base GC content seem to be very skewed at the beginning of the reads (~ bp 1-16), and the per base N content seems to have a spike at bp 4. As well, I have over represented kmers at the beginning of the reads which do not belong to any adapters (as far as I can tell). I know that these trends are sometimes seen in RNA-seq data due to the (not so) random hexamer priming but I am confused as to why I see this in whole genome data. I am also not sure about the N spike at bp 4. I have attached images of what I mentioned and would appreciate any insight.

    thanks.
    I'm assuming these were sequenced on a HiSeq? The spike at 4 cycles is most likely a phenomenon known as Bottom Middle Swath (or BMS in Illumispeak). The HiSeq attempts to find focus before scanning at a fixed point near the inlet port. If a bubble is present over at this point, then there is a mis-focus and that particular swatch is scanned out of focus. You should be able to see if you look at the thumbnail images for cycle 4. Basecalling can't be done on these images, so each cluster is given an N at this position.

    Comment

    • salamay
      Member
      • May 2014
      • 20

      #3
      Thanks tonybrooks, yes it was on a hiseq. I had not heard about this issue before thanks for bringing it to my attention.

      Comment

      • TonyBrooks
        Senior Member
        • Jun 2009
        • 303

        #4
        Originally posted by TonyBrooks View Post
        I'm assuming these were sequenced on a HiSeq? The spike at 4 cycles is most likely a phenomenon known as Bottom Middle Swath (or BMS in Illumispeak). The HiSeq attempts to find focus before scanning at a fixed point near the inlet port. If a bubble is present over at this point, then there is a mis-focus and that particular swatch is scanned out of focus. You should be able to see if you look at the thumbnail images for cycle 4. Basecalling can't be done on these images, so each cluster is given an N at this position.
        See here for more info

        Bridged amplification & clustering followed by sequencing by synthesis. (Genome Analyzer / HiSeq / MiSeq)

        Comment

        • lac302
          Member
          • Dec 2012
          • 64

          #5
          I've seen the same fluctuation in GC content over the first 20 or so bases on samples run both on the HiSeq and Miseq. I typically have enough coverage to just trim them off even though the Q scores are always above 30.

          Comment

          • salamay
            Member
            • May 2014
            • 20

            #6
            Originally posted by lac302 View Post
            I've seen the same fluctuation in GC content over the first 20 or so bases on samples run both on the HiSeq and Miseq. I typically have enough coverage to just trim them off even though the Q scores are always above 30.
            Thanks lac302, from what I have done so far I have trimmed the sequences up to bp 16 and worked from there as you seem to have done but I can't figure out the cause for it or whether it is a bit wasteful to trim off 15 bp of useful sequence.

            Comment

            • mastal
              Senior Member
              • Mar 2009
              • 666

              #7
              Was the library prep done using a Nextera kit?

              Comment

              • salamay
                Member
                • May 2014
                • 20

                #8
                Originally posted by mastal View Post
                Was the library prep done using a Nextera kit?
                I believe so but I am not sure and have asked those responsible for the generation of the data. Would using a nextera kit explain what is seen?

                Comment

                • mastal
                  Senior Member
                  • Mar 2009
                  • 666

                  #9
                  Originally posted by salamay View Post
                  I believe so but I am not sure and have asked those responsible for the generation of the data. Would using a nextera kit explain what is seen?
                  Yes. There was a recent thread discussing this. I will post a link if I can find it.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    New Genomics Tools and Methods Shared at AGBT 2025
                    by seqadmin


                    This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                    The Headliner
                    The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                    03-03-2025, 01:39 PM
                  • seqadmin
                    Investigating the Gut Microbiome Through Diet and Spatial Biology
                    by seqadmin




                    The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
                    02-24-2025, 06:31 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 05:03 AM
                  0 responses
                  16 views
                  0 reactions
                  Last Post seqadmin  
                  Started by seqadmin, 03-19-2025, 07:27 AM
                  0 responses
                  17 views
                  0 reactions
                  Last Post seqadmin  
                  Started by seqadmin, 03-18-2025, 12:50 PM
                  0 responses
                  18 views
                  0 reactions
                  Last Post seqadmin  
                  Started by seqadmin, 03-03-2025, 01:15 PM
                  0 responses
                  185 views
                  0 reactions
                  Last Post seqadmin  
                  Working...