Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BWA sampe mapping result, what is "PROPER PAIR"?

    I have been using Bwa to map paired end reads(illumina) recently and I thought i could use some of your help to get answers for some questions that I have.

    1. What does it exactly mean by the term "proper pair" in bwa? Does bwa consider orientation of mapped pairs? r we talking only case i below? or just based on the insert size
    case i) -----> <------
    case ii) -----> ------->
    case iii) <------ <------
    case iv) <------ -------->

    2. i have a whole bunch of mapping result that looks quite odd to me.
    They look like they are paired by bwa sampe but obviously they have different read names so can't be a pair..
    I321_1_FC30VWBAAXX:7:100:1611:994 83 gi|150002608|ref|NC_009614.1| 1915637 29 75M = 1859241 -56471 TGGCAAATTCCAATTGGGGCTTTTCAATGAATGTTTTTACTTTAAAGAATTCTACTTGTTTTTCTTCCTCAATCT AAALIENKEHOLOKJD>=HXOIUNbda_Sh`hLhah]h]haZhhhhShhhhhhhhhhhhhhhhhhhhhhhhhhhh XT:A:U NM:i:1 SM:i:29 AM:i:29 X0:i:1 X1:i:0 XM:i:1 XO:i:0 XG:i:0 MD:Z:53C21

    I321_1_FC30VWBAAXX:7:100:1615:784
    163 gi|150002608|ref|NC_009614.1| 1859241 29 20M55S = 1915637 56471 GCTTGTTGTGACTTTGATAGATTGTGACGTGTACGAAAATATGCAAGAGGCGGGGATTGATTCGTCTAGCCCGTT hhhhhhhhhhhehhhhhhhhhhhhSheMhRh`hNhWXhWhehhP][\Sh^Ehh^hNW^[PK_<NJJNGN>AGRCH XT:A:M NM:i:2 SM:i:29 AM:i:29 XM:i:2 XO:i:0 XG:i:0 MD:Z:5C9T4


    3. When I grep for sequence "I321_1_FC30VWBAAXX:7:100:1611:994"

    I get :

    I321_1_FC30VWBAAXX:7:100:1611:994 129 gi|150002608|ref|NC_009614.1| 1915576 37 75M = 3763132 1847556 TTGATATTCCATAAGAATATTCCTGAGTTCCAATAGAATTCTCCACTTTCTACGAATACTTTGGCAAATTCCAAT hhhhhhhhhhhhhhhhhhhhhedhhhhhh`hh]hhhcZhcXhZR`ThhhPhhU]RQ`YJSX^PPLNPMWSLCIHU XT:A:U NM:i:0 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:75

    I321_1_FC30VWBAAXX:7:100:1611:994 83 gi|150002608|ref|NC_009614.1| 1915637 29 75M = 1859241 -56471 TGGCAAATTCCAATTGGGGCTTTTCAATGAATGTTTTTACTTTAAAGAATTCTACTTGTTTTTCTTCCTCAATCT AAALIENKEHOLOKJD>=HXOIUNbda_Sh`hLhah]h]haZhhhhShhhhhhhhhhhhhhhhhhhhhhhhhhhh XT:A:U NM:i:1 SM:i:29 AM:i:29 X0:i:1 X1:i:0 XM:i:1 XO:i:0 XG:i:0 MD:Z:53C21


    where the flag of read1(129 --> 0b10000001) is telling me that read1 is mapped in a proper pair and the flag of read2(83 --> 0b1010011) is also telling it's mapped in a proper pair. but they are not paired ( i can tell by the difference size of inferred insert size)


    Can anyone help answering these questions? I am trying to filter pairs are mapped properly. I tried using samtools view with -f 2 option (since 2 is 0x02 bit for proper mapping) but i have so many pairs similar to what i described up there.

    Anyone? Thanks!
    Last edited by hl450; 07-29-2010, 08:03 AM.

  • #2
    Anyone?????? plz

    Comment


    • #3
      I have the same question...

      Comment


      • #4
        Could anyone experienced in alignment via BWA please answer? Thank you!!

        Comment


        • #5
          How can they look like they are paired if they have different read names? That doesn't make sense at all. Are you sure that your input FASTQ files are properly lined up in the correct order? I believe many of these tools read one record at a time from each of the input files, and assume that corresponding records are part of a pair -- it don't necessarily check the read name.

          That being said, I have observed that occasionally I get outputs with the "properly paired" flag set, even when the alignments are to different chromosomes, which is weird, so I think there could be some bugs.

          Comment


          • #6
            I found what was causing the problem... It was due to the data I downloaded off EBI short read archive. Read 1 and Read 2 fastq files had uneven number of paired reads. Shouldn't short read archive check this upon submission of data? arg...

            Originally posted by zlu View Post
            My problem was actually due to the uneven number of pair reads in the input fastq files. I was doing some quality filterings, mainly artefacts removal, on read1 and read2 separately and this resulted in the 2 files having different number of reads.
            Originally posted by lh3 View Post
            No. You must make sure the two files contain the same set of pairs with identical order in each file. Your input will fail all aligners to date, so far as I know.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Latest Developments in Precision Medicine
              by seqadmin



              Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

              Somatic Genomics
              “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
              05-24-2024, 01:16 PM
            • seqadmin
              Recent Advances in Sequencing Analysis Tools
              by seqadmin


              The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
              05-06-2024, 07:48 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 05-24-2024, 07:15 AM
            0 responses
            13 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 05-23-2024, 10:28 AM
            0 responses
            17 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 05-23-2024, 07:35 AM
            0 responses
            20 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 05-22-2024, 02:06 PM
            0 responses
            10 views
            0 likes
            Last Post seqadmin  
            Working...
            X