Header Leaderboard Ad

Collapse

Depth higher after subsampling?

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Depth higher after subsampling?

    Hi, I've been noticing that my median deduplicated depth goes up after sub-sampling (with samtools view -s) compared to the original median depth. Wondering if anyone else has observed something similar and if it's an artifact of samtools view -s? I don't think it is an issue with picking the same seed in multiple sequential sub-samplings because the original bam that is being sampled from was not already sub-sampled previously.

    My workflow is:
    complete non-deduped bam file -> sub-sample or no sub-sampling -> barcode deduplication -> calculate median depth

  • #2
    emham I don't have an answer at the moment, but let me see if I can find something helpful.

    Comment


    • #3
      Hello emham

      A few things I've been told. Subsampling before deduplication will bias your downstream analysis. This eliminates many low coverage areas by chance, and so you should expect the median to go up.

      Are you using UMIs? They should reduce this kind of artifact and are used for deduplication.

      I was specifically told, "If your analysis is sensitive to depth, consider whether non-UMI deduplication could make it worse instead of better: if your reads aren't uniformly distributed over the whole genome (e.g. you did targeted sequencing or RNA-seq or ATAC-seq or anything else like that), non-UMI deduplication is likely to incorrectly distort high-coverage regions to have lower coverage (pre-PCR duplicate molecules that occur by chance will be lost as false-positive PCR duplicates) and if the reads are really densely concentrated even short UMIs can start to run into problems"

      Samtools shouldn't be wrong. If you want to understand what's happening just open up your BAM file, with duplicates marked but not removed, in IGV and take a look at interesting regions.

      I have two final questions:

      Why are you subsampling? And why/how are you deduplicating?

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Improved Targeted Sequencing: A Comprehensive Guide to Amplicon Sequencing
        by seqadmin



        Amplicon sequencing is a targeted approach that allows researchers to investigate specific regions of the genome. This technique is routinely used in applications such as variant identification, clinical research, and infectious disease surveillance. The amplicon sequencing process begins by designing primers that flank the regions of interest. The DNA sequences are then amplified through PCR (typically multiplex PCR) to produce amplicons complementary to the targets. RNA targets...
        03-21-2023, 01:49 PM
      • seqadmin
        Targeted Sequencing: Choosing Between Hybridization Capture and Amplicon Sequencing
        by seqadmin




        Targeted sequencing is an effective way to sequence and analyze specific genomic regions of interest. This method enables researchers to focus their efforts on their desired targets, as opposed to other methods like whole genome sequencing that involve the sequencing of total DNA. Utilizing targeted sequencing is an attractive option for many researchers because it is often faster, more cost-effective, and only generates applicable data. While there are many approaches...
        03-10-2023, 05:31 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 11:44 AM
      0 responses
      8 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-24-2023, 02:45 PM
      0 responses
      18 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-22-2023, 12:26 PM
      0 responses
      19 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-17-2023, 12:32 PM
      0 responses
      19 views
      0 likes
      Last Post seqadmin  
      Working...
      X