Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Depth higher after subsampling?

    Hi, I've been noticing that my median deduplicated depth goes up after sub-sampling (with samtools view -s) compared to the original median depth. Wondering if anyone else has observed something similar and if it's an artifact of samtools view -s? I don't think it is an issue with picking the same seed in multiple sequential sub-samplings because the original bam that is being sampled from was not already sub-sampled previously.

    My workflow is:
    complete non-deduped bam file -> sub-sample or no sub-sampling -> barcode deduplication -> calculate median depth

  • #2
    emham I don't have an answer at the moment, but let me see if I can find something helpful.

    Comment


    • #3
      Hello emham

      A few things I've been told. Subsampling before deduplication will bias your downstream analysis. This eliminates many low coverage areas by chance, and so you should expect the median to go up.

      Are you using UMIs? They should reduce this kind of artifact and are used for deduplication.

      I was specifically told, "If your analysis is sensitive to depth, consider whether non-UMI deduplication could make it worse instead of better: if your reads aren't uniformly distributed over the whole genome (e.g. you did targeted sequencing or RNA-seq or ATAC-seq or anything else like that), non-UMI deduplication is likely to incorrectly distort high-coverage regions to have lower coverage (pre-PCR duplicate molecules that occur by chance will be lost as false-positive PCR duplicates) and if the reads are really densely concentrated even short UMIs can start to run into problems"

      Samtools shouldn't be wrong. If you want to understand what's happening just open up your BAM file, with duplicates marked but not removed, in IGV and take a look at interesting regions.

      I have two final questions:

      Why are you subsampling? And why/how are you deduplicating?

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        04-22-2024, 07:01 AM
      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Today, 11:49 AM
      0 responses
      12 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, Yesterday, 08:47 AM
      0 responses
      16 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      61 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      60 views
      0 likes
      Last Post seqadmin  
      Working...
      X