Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • High duplication levels in FASTQC

    Hi,

    I was using FASTQC to QC my directional mRNA-Seq data obtained from suspension culture cells. I have about 30 million reads.

    Although most of my QC stats are fine, I see a big uptick in the "Duplicate sequences" section of sequences with duplication levels > 10 (see below). Sequence Duplication Level >84.56%.

    I was wondering what could be wrong. There were 2 possibilities I could think of:
    1) Some amplification bias in PCR and/or
    2) Since the RNA is not very diverse (its from suspension cells - same cell type) and sequenced to a high coverage, many sequences got sequenced multiple times.

    Wonder if the second reason makes sense? If it is true, by extension, it also means that we have successfully sequenced even the very low abundance transcripts. However, if it was PCR bias, that wouldnt be true. Wonder if there is a way to distinguish between these two possibilities?

    I'd appreciate any suggestions.

    Thanks

  • #2
    have a look at the alignments and you will know. generally i wouldnt trust fastqc duplication levels for mRNA seq too much ..

    Comment


    • #3
      The overall duplication level reported by FastQC needs to be taken in context with the shape of the profile you're seeing and also the results of the overrepresented sequence plot. There's a big difference between having a generally oversequenced sample (which often happens with RNA-Seq so you can see low expressed transcripts), and having a small number of sequences accounting for large chunks of your library.

      What FastQC can't do is to put the duplication in any kind of context. For libraries with expected uneven coverage (such as RNA-Seq) you'd need to look at the positions of the mapped data to see if you were getting even coverage over highly duplicated regions, which would suggest you simply have really high coverage, or duplicated patchy coverage which would indicate a techinical problem.

      If you haven't seen it already I wrote up a more detailed explanation of this on my blog since this is such a common thing to come up (the duplicate sequence plot is probably the least intuitive module to interpret in the FastQC output).

      Comment


      • #4
        Fast QC Duplication

        Hello.

        I read your blog http://proteo.me.uk/2011/05/interpre...lot-in-fastqc/

        and find it helpful. I have the same problem.

        So at the end of the blog you mentioned to consider the per base quality plot to gain a realistic assessment of the duplication.

        In my case: My per base sequence quality is great. but I have the same image posted above, what does this imply?

        If my per base sequence quality passes, and I have a high sequence duplication levels, caused by the overrepresented sequence TrueSeq Adapter, can I then conclude that the quality is okay?

        Thank you

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Recent Advances in Sequencing Analysis Tools
          by seqadmin


          The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
          05-06-2024, 07:48 AM
        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 05-14-2024, 07:03 AM
        0 responses
        17 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-10-2024, 06:35 AM
        0 responses
        40 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-09-2024, 02:46 PM
        0 responses
        50 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-07-2024, 06:57 AM
        0 responses
        41 views
        0 likes
        Last Post seqadmin  
        Working...
        X