Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bwa sampe: proper pair but on different contigs!!??!!

    Dear all,


    Does anyone have an idea how the following is possible:
    I have reads mapped in a proper pair (as indicated by the sam-flag) but they map to different contigs!!!???

    HWUSI-EAS300R:7:1:15:1404#0 147 FW_DM_LINE_Jockey 128 29 74M FW3_DM_LINE_Jockey 3131 0 TGCAAGATCGCTTAAATACATAGTGAATTGTTATCTTAAATAATAAAACTATGAGTCAGAATGACACTCGCGCC Y^S[]^\[]a_XSZ[_]]_`_`]```_^a^`^`[aa__`]V]```aa\a_`]aaaaaaaa`Ta\a`aaaba`aa XT:A:U NM:i:0 SM:i:29 AM:i:29 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:74
    HWUSI-EAS300R:7:1:15:1495#0 147 Gypsy4_LTR_LTR_Gypsy 112 60 74M Gypsy4_I_LTR_Gypsy 6216 0 CATTCCACTGCCCGGAGCGTGTGAAGCGCAATGTCAGCATTCTGCCGTGAGCGCTGCTTCAAAAGACGGGCTAC XUPM^NHLSMW\SWSPM\MW]PW\TZ\aPMP^MS^S]]Z^M_^X]^Z^]Z^]`a]^Z_\aaS]Z`Sa]a`_a\a XT:A:U NM:i:3 XN:i:1 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:3 XO:i:0 XG:i:0 MD:Z:5T32C22G12
    HWUSI-EAS300R:7:1:22:1504#0 147 FW_DM_LINE_Jockey 85 29 74M FW3_DM_LINE_Jockey 3125 0 AACTAAATAAAAAATCTGAAAGCGAAAGAGACGCTCTATGCGATGCAAGATCGCTTAAATACATAGTGAATTGT ]N^I_^WG[[[_YNFQP[XGM\_^^S\a__^``_Y[a^\_a_```aaa`a]a`a````ba_baa`a_bbaabaa XT:A:U NM:i:0 SM:i:29 AM:i:29 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:74
    HWUSI-EAS300R:7:1:25:1975#0 83 BLOOD_I_LTR_Gypsy 145 29 13M3D61M BLASTOPIA_LTR_LTR_Gypsy 271 0
    Hope anyone can help on this!!
    best ro

  • #2
    Could be a bug in the mapping tool used. What tool and what version was it?

    Comment


    • #3
      Mapper: bwa
      Version: 0.57
      command bwa aln -n 0.01 -o 2 -e 12 -d 12 -t 2 etc

      Comment


      • #4
        Is there any obvious link between the contigs, in particular are they subsequent entries in the FASTA reference file?

        Comment


        • #5
          I was under the impression that BWA concatenates all the references together and aligns reads against that long string. Might it have something to do with that?

          Comment


          • #6
            Yes they are subsequent entries in the fasta file! It is the insert of a LTR transposon followed by the LTR, i.e.: this sequences are frequently found in exactly this order in the different species.
            This could be an explanation for the problem than. If BWA is concatenating the sequences and measuring the distance between the mates, than it finds the difference is correct, while ignoring the fact that a contig boundary is crossed, and thus assigns the flag mapped in a proper pair.

            Comment


            • #7
              Originally posted by GoneSouth View Post
              Yes they are subsequent entries in the fasta file!
              Given Lee Sam's post you can probably see why I asked that

              i.e. This is probably a bug in BWA, wrongly marking the reads as "properly paired".

              Comment


              • #8
                Yes I do, many thanks for all your help!!
                Now that I know whats going on I can handle this in my sam parser.
                And maybee the people from Sanger will find some time to fix this in one of the next versions - I will send a bug report.
                thanks ro

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin


                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                  Yesterday, 07:01 AM
                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                39 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                41 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                35 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                55 views
                0 likes
                Last Post seqadmin  
                Working...
                X