Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • FASTQC trends

    I have some metagenomic data obtained from whole genome shotgun sequencing using illumina-hiseq. The reads are 100bp paired end and when I examine the reads in fastqc, I see a couple of things. Firstly, the per base sequence content and per base GC content seem to be very skewed at the beginning of the reads (~ bp 1-16), and the per base N content seems to have a spike at bp 4. As well, I have over represented kmers at the beginning of the reads which do not belong to any adapters (as far as I can tell). I know that these trends are sometimes seen in RNA-seq data due to the (not so) random hexamer priming but I am confused as to why I see this in whole genome data. I am also not sure about the N spike at bp 4. I have attached images of what I mentioned and would appreciate any insight.

    thanks.
    Attached Files

  • #2
    Originally posted by salamay View Post
    I have some metagenomic data obtained from whole genome shotgun sequencing using illumina-hiseq. The reads are 100bp paired end and when I examine the reads in fastqc, I see a couple of things. Firstly, the per base sequence content and per base GC content seem to be very skewed at the beginning of the reads (~ bp 1-16), and the per base N content seems to have a spike at bp 4. As well, I have over represented kmers at the beginning of the reads which do not belong to any adapters (as far as I can tell). I know that these trends are sometimes seen in RNA-seq data due to the (not so) random hexamer priming but I am confused as to why I see this in whole genome data. I am also not sure about the N spike at bp 4. I have attached images of what I mentioned and would appreciate any insight.

    thanks.
    I'm assuming these were sequenced on a HiSeq? The spike at 4 cycles is most likely a phenomenon known as Bottom Middle Swath (or BMS in Illumispeak). The HiSeq attempts to find focus before scanning at a fixed point near the inlet port. If a bubble is present over at this point, then there is a mis-focus and that particular swatch is scanned out of focus. You should be able to see if you look at the thumbnail images for cycle 4. Basecalling can't be done on these images, so each cluster is given an N at this position.

    Comment


    • #3
      Thanks tonybrooks, yes it was on a hiseq. I had not heard about this issue before thanks for bringing it to my attention.

      Comment


      • #4
        Originally posted by TonyBrooks View Post
        I'm assuming these were sequenced on a HiSeq? The spike at 4 cycles is most likely a phenomenon known as Bottom Middle Swath (or BMS in Illumispeak). The HiSeq attempts to find focus before scanning at a fixed point near the inlet port. If a bubble is present over at this point, then there is a mis-focus and that particular swatch is scanned out of focus. You should be able to see if you look at the thumbnail images for cycle 4. Basecalling can't be done on these images, so each cluster is given an N at this position.
        See here for more info

        Bridged amplification & clustering followed by sequencing by synthesis. (Genome Analyzer / HiSeq / MiSeq)

        Comment


        • #5
          I've seen the same fluctuation in GC content over the first 20 or so bases on samples run both on the HiSeq and Miseq. I typically have enough coverage to just trim them off even though the Q scores are always above 30.

          Comment


          • #6
            Originally posted by lac302 View Post
            I've seen the same fluctuation in GC content over the first 20 or so bases on samples run both on the HiSeq and Miseq. I typically have enough coverage to just trim them off even though the Q scores are always above 30.
            Thanks lac302, from what I have done so far I have trimmed the sequences up to bp 16 and worked from there as you seem to have done but I can't figure out the cause for it or whether it is a bit wasteful to trim off 15 bp of useful sequence.

            Comment


            • #7
              Was the library prep done using a Nextera kit?

              Comment


              • #8
                Originally posted by mastal View Post
                Was the library prep done using a Nextera kit?
                I believe so but I am not sure and have asked those responsible for the generation of the data. Would using a nextera kit explain what is seen?

                Comment


                • #9
                  Originally posted by salamay View Post
                  I believe so but I am not sure and have asked those responsible for the generation of the data. Would using a nextera kit explain what is seen?
                  Yes. There was a recent thread discussing this. I will post a link if I can find it.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Exploring the Dynamics of the Tumor Microenvironment
                    by seqadmin




                    The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
                    07-08-2024, 03:19 PM
                  • seqadmin
                    Exploring Human Diversity Through Large-Scale Omics
                    by seqadmin


                    In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
                    06-25-2024, 06:43 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 07-16-2024, 05:49 AM
                  0 responses
                  25 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 07-15-2024, 06:53 AM
                  0 responses
                  32 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 07-10-2024, 07:30 AM
                  0 responses
                  40 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 07-03-2024, 09:45 AM
                  0 responses
                  205 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X