Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Depth higher after subsampling?

    Hi, I've been noticing that my median deduplicated depth goes up after sub-sampling (with samtools view -s) compared to the original median depth. Wondering if anyone else has observed something similar and if it's an artifact of samtools view -s? I don't think it is an issue with picking the same seed in multiple sequential sub-samplings because the original bam that is being sampled from was not already sub-sampled previously.

    My workflow is:
    complete non-deduped bam file -> sub-sample or no sub-sampling -> barcode deduplication -> calculate median depth

  • #2
    emham I don't have an answer at the moment, but let me see if I can find something helpful.

    Comment


    • #3
      Hello emham

      A few things I've been told. Subsampling before deduplication will bias your downstream analysis. This eliminates many low coverage areas by chance, and so you should expect the median to go up.

      Are you using UMIs? They should reduce this kind of artifact and are used for deduplication.

      I was specifically told, "If your analysis is sensitive to depth, consider whether non-UMI deduplication could make it worse instead of better: if your reads aren't uniformly distributed over the whole genome (e.g. you did targeted sequencing or RNA-seq or ATAC-seq or anything else like that), non-UMI deduplication is likely to incorrectly distort high-coverage regions to have lower coverage (pre-PCR duplicate molecules that occur by chance will be lost as false-positive PCR duplicates) and if the reads are really densely concentrated even short UMIs can start to run into problems"

      Samtools shouldn't be wrong. If you want to understand what's happening just open up your BAM file, with duplicates marked but not removed, in IGV and take a look at interesting regions.

      I have two final questions:

      Why are you subsampling? And why/how are you deduplicating?

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM
      • seqadmin
        Techniques and Challenges in Conservation Genomics
        by seqadmin



        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

        Avian Conservation
        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
        03-08-2024, 10:41 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 06:37 PM
      0 responses
      10 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, Yesterday, 06:07 PM
      0 responses
      9 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-22-2024, 10:03 AM
      0 responses
      51 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-21-2024, 07:32 AM
      0 responses
      67 views
      0 likes
      Last Post seqadmin  
      Working...
      X