Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bwa mem - low properly paired percentage

    After aligning paired-end 100bp reads to a reference genome, I am getting very low properly paired percentage:

    369208441 0 total (QC-passed reads + QC-failed reads)
    8985531 0 secondary
    289733341 0 mapped
    78.47% N/A mapped %
    360222910 0 paired in sequencing
    180111455 0 read1
    180111455 0 read2
    1393338 0 properly paired
    0.39% N/A properly paired %
    280747810 0 with itself and mate mapped
    0 0 singletons
    0.00% N/A singletons %
    39590468 0 with mate mapped to a different chr
    0 0 with mate mapped to a different chr (mapQ>=5)

    I followed GATK best practices to align paired-end short-read data to a reference genome. I downloaded the short-read data from NCBI SRA into fastq files using SRA toolkit's fastq-dump, converted the fastq files into unmapped bam using Picard FastqToSam, and marked adapters using Picard MarkIlluminaAdapters. I then piped Picard SamToFastq, bwa mem, and Picard MergeBamAlignment. To get stats on the alignment, I used samtools flagstat. For several of my samples, the alignment went great (90% mapped, 80% properly paired). However, for a couple of my samples, the properly paired percentage was well below 1%. I'm wondering how I could have a normal amount of reads mapping (~78%) but have only .39% of those reads properly paired.

    I have double-checked that my fastq files from fastq-dump have identical read counts, and that they are properly interleaved after Picard FastqToSam.

  • #2
    Cross-posted at biostars: https://www.biostars.org/p/464101/

    Comment

    Latest Articles

    Collapse

    • seqadmin
      The Impact of AI in Genomic Medicine
      by seqadmin



      Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
      02-26-2024, 02:07 PM
    • seqadmin
      Multiomics Techniques Advancing Disease Research
      by seqadmin


      New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

      A major leap in the field has
      ...
      02-08-2024, 06:33 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 02-28-2024, 06:12 AM
    0 responses
    27 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 02-23-2024, 04:11 PM
    0 responses
    74 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 02-21-2024, 08:52 AM
    0 responses
    81 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 02-20-2024, 08:57 AM
    0 responses
    69 views
    0 likes
    Last Post seqadmin  
    Working...
    X