Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BWA sampe: wierd pairing

    When running bwa (0.5.4) smape I got this line output to the screen over and over again:

    [infer_isize] fail to infer insert size: weird pairing

    Should I be worrying about this? Does it mean the pairing is not correct?

    The following are the commands I used for alignment and sampe:

    bwa aln -l 32 -t 2 -q 4 Genomes/Btau_UMD3.fa s_1_1_sequence.fq > Run20_s_1_1_sequence.sai & bwa aln -l 32 -t 2 -q 4 Genomes/Btau_UMD3.fa s_1_2_sequence.fq > Run20_s_1_2_sequence.sai &

    bwa sampe -a 253 -o 1000 Genomes/Btau_UMD3/Btau_UMD3.fa s_1_1_sequence.sai s_1_2_sequence.sai s_1_1_sequence.fq s_1_2_sequence.fq > Run20_s_1_pe.bwa.sam

    Thank you.

  • #2
    Bwa fails to infer insert size and will use "-a" to set the maximum insert size in pairing. This may happen if too few reads are mapped or the insert size distribution is bimodal or something alike. You should check the distribution after mapping.

    Comment


    • #3
      The insert size specified was obtained from ELAND alignment. Do you think increasing the -a will help? And is there a tag in the sam file that reports pairing with wrong insert size?

      Comment


      • #4
        You should draw the distribution.

        Comment


        • #5
          Thanks for the quick reply.

          Looking closer at the output from the sampe below, am I right to assume that there are 1649 out of 262144 processed reads where the insert size cannot be inferred correctly?

          [bwa_read_seq] 0.0% bases are trimmed.
          [bwa_sai2sam_pe_core] convert to sequence coordinate...
          [infer_isize] fail to infer insert size: weird pairing
          [bwa_sai2sam_pe_core] time elapses: 160.55 sec
          [bwa_sai2sam_pe_core] change of coordinates in 1649 alignments.
          [bwa_sai2sam_pe_core] align unmapped mate...
          [bwa_sai2sam_pe_core] time elapses: 1.39 sec
          [bwa_sai2sam_pe_core] refine gapped alignments... 0.72 sec
          [bwa_sai2sam_pe_core] print alignments... 2.13 sec
          [bwa_sai2sam_pe_core] 262144 sequences have been processed.

          Comment


          • #6
            Did you ever solve this issue? I'm encountering the same error message...

            Comment


            • #7
              This message is usually caused by bad libraries. You should check the quality of your library in the first place. As I replied above, bwa still works if -a is about right, but to set a proper -a, again, you should plot the distribution of insert size. This is not a major problem with bwa but with your input data.

              Comment


              • #8
                My problem was actually due to the uneven number of pair reads in the input fastq files. I was doing some quality filterings, mainly artefacts removal, on read1 and read2 separately and this resulted in the 2 files having different number of reads.

                Comment


                • #9
                  No. You must make sure the two files contain the same set of pairs with identical order in each file. Your input will fail all aligners to date, so far as I know.

                  Comment


                  • #10
                    Thank you both for your help! It was indeed an issue with my library...

                    Comment


                    • #11
                      Dear Heng,

                      I aligned my mate-pair data with BWA (0.5.5) and observed a weird pairing of reads. I explain below:

                      when I run bwa sampe for one of pairs I get:
                      Code:
                      HWUSI-EAS454:1:2:0:108#0        113     chr2    96713303        0       50M     =       96439877        -273426 GATCAGTGGACTTTATGTTAATGAAAAAGGAAATCATCCAGGGTGCATCT      :B?BC?A-357;67C@C<CC<9B>BC<BB>B:<7>B=-BCBBBC@BB@B@      XT:A:R  NM:i:2    SM:i:0  AM:i:0  X0:i:3  X1:i:0  XM:i:2  XO:i:0  XG:i:0  MD:Z:7T23C18
                      HWUSI-EAS454:1:2:0:108#0        177     chr2    96439877        23      50M     =       96713303        273426  GAGTCTCTTTTGCTGAGTGTTGTCATATATGGAGGTGATGCATGGAACTG      ?A95/5?@B;?:@7BB9959?'79BAC>@B?;@>;B(B8:/'>;9C:BBB      XT:A:U  NM:i:2    SM:i:23 AM:i:0  X0:i:1  X1:i:2  XM:i:2  XO:i:0  XG:i:0  MD:Z:28C12C8
                      So here the distance between ends is 273426bp, though I (and BWA) know that "inferred external isize from 157719 pairs: 3054.215 +/- 185.122".

                      When I run BWA in simple end mode "bwa samse -n 30" for the same pair I get:
                      >HWUSI-EAS454:1:2:0:108#0 3 3
                      chr2 -96713303 2
                      chr2 -98220112 2
                      chr2 +96442725 2

                      on the left and

                      >HWUSI-EAS454:1:2:0:108#0 3 3
                      chr2 -96439877 2
                      chr2 +98222957 3
                      chr2 +96716152 3

                      on the right.

                      So my question is why BWA decides to pair ends in such a weird way when I could pair them as:
                      left: chr2 +96442725 2
                      right: chr2 -96439877 2
                      with ~2800bp of insert size?

                      And also, why in the output of "bwa samse -n 30" there is no information about quality of mapping? Why can't it be printed in SAM format as well?

                      Thank you in advance,
                      Valentina

                      Comment


                      • #12
                        Could you show the low and high boundaries from the bwa output? Something like:

                        [infer_isize] low and high boundaries: 330 and 670

                        EDIT: For a "proper read pair", you would expect to see the read with small coordinate mapped to the forward strand but in your example, it is the contrary. I guess you are aligning reads from Illumina long-insert library where the "proper pair" has RF orientation. Bwa does not support such read pairs. So far as I know, Maq is still the best tool for such alignment.
                        Last edited by lh3; 01-15-2010, 08:29 AM.

                        Comment


                        • #13
                          Low and high boundaries are: 2284 and 3824.

                          You are right, these are Solexa mate-pair data which should be aligned as "RF" instead of "FR"..

                          I have too much data to use Maq on them... Or I should run Bowtie first and then use Maq to align what was not aligned. But it is really a pitty that I cannot use BWA for that.

                          Maybe you could add a parameter that would specify which type of mapping you expect? Like you can run Bowtie in "--rf" or "--fr" mode.

                          Thanks,
                          Valentina

                          Comment


                          • #14
                            Hi elalo,
                            How did you find out it was an issue with your library. How can I take of this isize failure message?

                            Comment


                            • #15
                              Originally posted by zlu View Post
                              When running bwa (0.5.4) smape I got this line output to the screen over and over again:

                              [infer_isize] fail to infer insert size: weird pairing

                              Should I be worrying about this? Does it mean the pairing is not correct?

                              The following are the commands I used for alignment and sampe:

                              bwa aln -l 32 -t 2 -q 4 Genomes/Btau_UMD3.fa s_1_1_sequence.fq > Run20_s_1_1_sequence.sai & bwa aln -l 32 -t 2 -q 4 Genomes/Btau_UMD3.fa s_1_2_sequence.fq > Run20_s_1_2_sequence.sai &

                              bwa sampe -a 253 -o 1000 Genomes/Btau_UMD3/Btau_UMD3.fa s_1_1_sequence.sai s_1_2_sequence.sai s_1_1_sequence.fq s_1_2_sequence.fq > Run20_s_1_pe.bwa.sam

                              Thank you.
                              Hi zlu,

                              Do you mind tell me how you got rid of the failure message from bwa? I keep getting the message?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Understanding Genetic Influence on Infectious Disease
                                by seqadmin




                                During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

                                Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
                                09-09-2024, 10:59 AM
                              • seqadmin
                                Addressing Off-Target Effects in CRISPR Technologies
                                by seqadmin






                                The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality1. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes2. This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways3. Identifying the full range...
                                08-27-2024, 04:44 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 09-11-2024, 02:44 PM
                              0 responses
                              11 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 09-06-2024, 08:02 AM
                              0 responses
                              145 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 09-03-2024, 08:30 AM
                              0 responses
                              152 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 08-27-2024, 04:40 AM
                              0 responses
                              161 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X