Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bwa mem - low properly paired percentage

    After aligning paired-end 100bp reads to a reference genome, I am getting very low properly paired percentage:

    369208441 0 total (QC-passed reads + QC-failed reads)
    8985531 0 secondary
    289733341 0 mapped
    78.47% N/A mapped %
    360222910 0 paired in sequencing
    180111455 0 read1
    180111455 0 read2
    1393338 0 properly paired
    0.39% N/A properly paired %
    280747810 0 with itself and mate mapped
    0 0 singletons
    0.00% N/A singletons %
    39590468 0 with mate mapped to a different chr
    0 0 with mate mapped to a different chr (mapQ>=5)

    I followed GATK best practices to align paired-end short-read data to a reference genome. I downloaded the short-read data from NCBI SRA into fastq files using SRA toolkit's fastq-dump, converted the fastq files into unmapped bam using Picard FastqToSam, and marked adapters using Picard MarkIlluminaAdapters. I then piped Picard SamToFastq, bwa mem, and Picard MergeBamAlignment. To get stats on the alignment, I used samtools flagstat. For several of my samples, the alignment went great (90% mapped, 80% properly paired). However, for a couple of my samples, the properly paired percentage was well below 1%. I'm wondering how I could have a normal amount of reads mapping (~78%) but have only .39% of those reads properly paired.

    I have double-checked that my fastq files from fastq-dump have identical read counts, and that they are properly interleaved after Picard FastqToSam.

  • #2
    Cross-posted at biostars: https://www.biostars.org/p/464101/

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Latest Developments in Precision Medicine
      by seqadmin



      Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

      Somatic Genomics
      “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
      05-24-2024, 01:16 PM
    • seqadmin
      Recent Advances in Sequencing Analysis Tools
      by seqadmin


      The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
      05-06-2024, 07:48 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, Yesterday, 06:55 AM
    0 responses
    12 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 05-30-2024, 03:16 PM
    0 responses
    24 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 05-29-2024, 01:32 PM
    0 responses
    27 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 05-24-2024, 07:15 AM
    0 responses
    215 views
    0 likes
    Last Post seqadmin  
    Working...
    X