Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Rangika
    Junior Member
    • Aug 2016
    • 6

    Selecting the best alignment BAM file

    Hi,

    I have a PE dataset 300bp inserts by illumina MiSeq. I aligned the raw data using BWA-mem. Mapping statistics generated using Samtools flagstat are below.

    5541008 + 0 in total (QC-passed reads + QC-failed reads)
    0 + 0 secondary
    76008 + 0 supplementary
    0 + 0 duplicates
    5413610 + 0 mapped (97.70% : N/A)
    5465000 + 0 paired in sequencing
    2732500 + 0 read1
    2732500 + 0 read2
    5266140 + 0 properly paired (96.36% : N/A)
    5319406 + 0 with itself and mate mapped
    18196 + 0 singletons (0.33% : N/A)
    32368 + 0 with mate mapped to a different chr
    8821 + 0 with mate mapped to a different chr (mapQ>=5)

    I also used Trimmomatic on the same dataset, ILLUMINACLIP to remove any adapter sequences, trimmed reads sliding window 4:10, leading & trailing bases <3, length <39bp. Aligned this set using BWA-mem and got the results as below.

    5529752 + 0 in total (QC-passed reads + QC-failed reads)
    0 + 0 secondary
    65642 + 0 supplementary
    0 + 0 duplicates
    5396698 + 0 mapped (97.59% : N/A)
    5464110 + 0 paired in sequencing
    2732055 + 0 read1
    2732055 + 0 read2
    5263982 + 0 properly paired (96.34% : N/A)
    5308488 + 0 with itself and mate mapped
    22568 + 0 singletons (0.41% : N/A)
    23856 + 0 with mate mapped to a different chr
    4865 + 0 with mate mapped to a different chr (mapQ>=5)

    1) Can I use this information to select a best alignment based on mapped %. Raw data gave 97.7% mapping which is higher than trimmed data. So can I select BAM I got from raw data as the best?

    2) I used "samtools view -c -f 3 data.bam" to find the properly paired reads. But the value I got is different to the value for that parameter by flagstat for both datasets. I checked some other parameters like itself & mate mapped they too gave different results. What could be the reason.

    Appreciate your answers.
    Thanks in advance.

    Regds
    Rangika
  • Michael.Ante
    Senior Member
    • Oct 2011
    • 127

    #2
    Hi Rangika,

    2nd first:
    You need to be aware of the fact that samtools flagstat produces statistics on alignments. Meaning, a read can align multiple time and will occur multiple times in the flagstat output. You may check your alignment file with e.g. bam_stat.py from the RSeQC tools.
    Furthermore, I'd check the read files with FastQC before and after trimming.

    So:
    1) I'd check a set of different data sets to choose which way to go. Also, I would not rely on the %mapped from samtools flagstat.

    Cheers,
    Michael

    Comment

    • Rangika
      Junior Member
      • Aug 2016
      • 6

      #3
      Thank you Michael. My dataset is DNA-seq. Can I use RSeQC tools to check alignment for DNA data as well. Do you suggest RSeQC statistics would lead in to better BAM selection?

      Appreciate if you would clarify this a bit more since I'm new to this.

      Regards
      Rangika

      Comment

      • Michael.Ante
        Senior Member
        • Oct 2011
        • 127

        #4
        The bam_stat.py was a suggestion since it also works for DNA-seq alignments (you'll hopefully don't see spliced reads).
        You can also have a look at the QC-metrics from Picard tools, or have a look at GATK.
        Or you can extract the aligned reads (samtools view) and count e.g. how often each read is aligned. Without trimming you might have a high %mapping rate given by samtools flagstat, but you don't know how many reads were aligned with a high confidence to a single or few positions.

        Most of the library preps have also a small section of how to deal with the analysis. Additionally, there are a plethora of publications describing their approach to DNA-Seq analysis.

        Cheers,
        Michael

        Comment

        • Rangika
          Junior Member
          • Aug 2016
          • 6

          #5
          Thank you Michael for your answer.

          Regards
          Sumudu

          Comment

          Latest Articles

          Collapse

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by SEQadmin2, Yesterday, 10:09 AM
          0 responses
          10 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-04-2026, 08:59 AM
          0 responses
          20 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-02-2026, 12:03 PM
          0 responses
          27 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-02-2026, 11:40 AM
          0 responses
          21 views
          0 reactions
          Last Post SEQadmin2  
          Working...