Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sequence Duplication

    I have some whole genome fastq files, one for read 1 and one for read 2. Before going for its analysis, I checked their quality using FASTQC but amazingly some of the samples show a very high level of duplication (around 90%). Just want to know what might be the reason for this? Can these samples be further processed for analysis or just discard them?

    Any help will be appreciated!
    Thanks,

  • #2
    There seems to be no answers for my query!
    Thanks,

    Comment


    • #3
      Contamination, really small genome with a high sequencing depth... There are a number of possibilities, you could probably get more if you provided more details. Whether your data is useless or not will likely depend on the nature of the samples and what you intend to do with the data.

      Comment


      • #4
        Here is a nice blog post about interpreting the duplication plot of FastQC.

        Comment


        • #5
          Thanks dpryan,
          I have around 100 cancer samples with equal number of controls for which we are doing WGS. Till now we have completed around 16 samples but when I started analysing them, I get a high level of duplication and in some samples the base quality is also not good.
          Thanks,

          Comment


          • #6
            Can anyone throw more light on this?
            Thanks,

            Comment


            • #7
              Can I use MarkDuplicates command of Picard for removing the duplicate sequences?
              Thanks,

              Comment


              • #8
                Yes, you can.

                Comment


                • #9
                  Will it remove all the duplicate sequences from the fastq file?
                  Thanks,

                  Comment


                  • #10
                    It will if you supply REMOVE_DUPLICATES=true (http://picard.sourceforge.net/comman...MarkDuplicates). Otherwise it will just flag them as duplicates in the output file.

                    Comment


                    • #11
                      I have a few samples with over 80% duplication (detected by FASTQC), will picard work for these samples?
                      Thanks,

                      Comment


                      • #12
                        I don't see why it wouldn't. By the way, it seems the numbers you get from FastQC usually overstate the duplication you detect with Picard.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Best Practices for Single-Cell Sequencing Analysis
                          by seqadmin



                          While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                          06-06-2024, 07:15 AM
                        • seqadmin
                          Latest Developments in Precision Medicine
                          by seqadmin



                          Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                          Somatic Genomics
                          “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                          05-24-2024, 01:16 PM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, Yesterday, 02:20 PM
                        0 responses
                        14 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 06-07-2024, 06:58 AM
                        0 responses
                        181 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 06-06-2024, 08:18 AM
                        0 responses
                        229 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 06-06-2024, 08:04 AM
                        0 responses
                        185 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X