Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • tahamasoodi
    Success
    • May 2012
    • 130

    Sequence Duplication

    I have some whole genome fastq files, one for read 1 and one for read 2. Before going for its analysis, I checked their quality using FASTQC but amazingly some of the samples show a very high level of duplication (around 90%). Just want to know what might be the reason for this? Can these samples be further processed for analysis or just discard them?

    Any help will be appreciated!
    Thanks,
  • tahamasoodi
    Success
    • May 2012
    • 130

    #2
    There seems to be no answers for my query!
    Thanks,

    Comment

    • dpryan
      Devon Ryan
      • Jul 2011
      • 3478

      #3
      Contamination, really small genome with a high sequencing depth... There are a number of possibilities, you could probably get more if you provided more details. Whether your data is useless or not will likely depend on the nature of the samples and what you intend to do with the data.

      Comment

      • fkrueger
        Senior Member
        • Sep 2009
        • 627

        #4
        Here is a nice blog post about interpreting the duplication plot of FastQC.

        Comment

        • tahamasoodi
          Success
          • May 2012
          • 130

          #5
          Thanks dpryan,
          I have around 100 cancer samples with equal number of controls for which we are doing WGS. Till now we have completed around 16 samples but when I started analysing them, I get a high level of duplication and in some samples the base quality is also not good.
          Thanks,

          Comment

          • tahamasoodi
            Success
            • May 2012
            • 130

            #6
            Can anyone throw more light on this?
            Thanks,

            Comment

            • tahamasoodi
              Success
              • May 2012
              • 130

              #7
              Can I use MarkDuplicates command of Picard for removing the duplicate sequences?
              Thanks,

              Comment

              • kopi-o
                Senior Member
                • Feb 2008
                • 319

                #8
                Yes, you can.

                Comment

                • tahamasoodi
                  Success
                  • May 2012
                  • 130

                  #9
                  Will it remove all the duplicate sequences from the fastq file?
                  Thanks,

                  Comment

                  • kopi-o
                    Senior Member
                    • Feb 2008
                    • 319

                    #10
                    It will if you supply REMOVE_DUPLICATES=true (http://picard.sourceforge.net/comman...MarkDuplicates). Otherwise it will just flag them as duplicates in the output file.

                    Comment

                    • tahamasoodi
                      Success
                      • May 2012
                      • 130

                      #11
                      I have a few samples with over 80% duplication (detected by FASTQC), will picard work for these samples?
                      Thanks,

                      Comment

                      • kopi-o
                        Senior Member
                        • Feb 2008
                        • 319

                        #12
                        I don't see why it wouldn't. By the way, it seems the numbers you get from FastQC usually overstate the duplication you detect with Picard.

                        Comment

                        Latest Articles

                        Collapse

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by SEQadmin2, 06-05-2026, 10:09 AM
                        0 responses
                        14 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-04-2026, 08:59 AM
                        0 responses
                        24 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-02-2026, 12:03 PM
                        0 responses
                        29 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-02-2026, 11:40 AM
                        0 responses
                        23 views
                        0 reactions
                        Last Post SEQadmin2  
                        Working...