Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Different duplicate results produced by MarkDuplicates and EstimateLibraryComplexity

    Hi:
    I compared Picard's MarkDuplicates.jar and EstimateLibraryComplexity.jar on finding optical duplication rate using the same sam/bam files. MarkDuplicates always reported 0 as optical duplication rate, whereas EstimateLibaryComplexity.jar reported different numbers. I am wondering whether anyone has tried this. Which method is trustable?
    Thanks.
    Jason

  • #2
    Hi,

    The algorithm for estimating the complexity is different in the two tools. We downsampled files that we had sequenced to saturation and none was trustwothy. One overshot a factor 10, one underestimated it a factor 10. Cant remember which was which now off the top of my head.

    It's interesting though. It's solved mathematically exact, and empirically it works extremely poorly...

    Comment


    • #3
      Thanks for the comment. I found out that the version of picard that I used (1.48) did not seem to calculate optical duplicates by MarkDuplicates (always 0), and in EstimateLibraryComplexity results, READ_PAIR_DUPLICATES always incorrectly had the same value as READ_PAIR_OPTICAL_DUPLICATES. I tried the newest version, 1.86. Now the results seem more believable. EstimateLibraryComplexity gave a higher number (~10% more) of duplicates. But the ratio of READ_PAIR_OPTICAL_DUPLICATES/READ_PAIR_DUPLICATES was quite close, in my sample, it was 58% by EstimateLibraryComplexity, and 56% by MarkDuplicates.
      Jason

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Exploring the Dynamics of the Tumor Microenvironment
        by seqadmin




        The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
        07-08-2024, 03:19 PM
      • seqadmin
        Exploring Human Diversity Through Large-Scale Omics
        by seqadmin


        In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
        06-25-2024, 06:43 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 07-10-2024, 07:30 AM
      0 responses
      29 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 07-03-2024, 09:45 AM
      0 responses
      201 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 07-03-2024, 08:54 AM
      0 responses
      212 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 07-02-2024, 03:00 PM
      0 responses
      193 views
      0 likes
      Last Post seqadmin  
      Working...
      X