Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • myronpeto
    Member
    • Sep 2011
    • 10

    Odd bwa-mem alignment

    I'm trying to align a whole genome sample using bwa-mem. I've previously aligned using bwa aln and bwa sampe and it went through fine. The command I'm using is:

    bwa mem -t 8 -R '@RG\tID:Label\tLB:Label\tSM:Label' reference_v37.fasta read_1.fq.gz read_2.fq.gz > Sample.sam

    Followed by:

    samtools import reference_v37.fasta.fai Sample.sam Sample_unsorted.bam
    samtools sort Sample_unsorted.bam Sample_sorted

    The error I got from picard MarkDuplicates was the following:

    Exception in thread "main" net.sf.picard.PicardException: Value was put into PairInfoMap more than once. 1: 0749_7455:HS2000-1111A_168:5:1203:11571:84654
    at net.sf.picard.sam.CoordinateSortedPairInfoMap.ensureSequenceLoaded(CoordinateSortedPairInfoMap.java:124)
    at net.sf.picard.sam.CoordinateSortedPairInfoMap.remove(CoordinateSortedPairInfoMap.java:78)
    at net.sf.picard.sam.DiskReadEndsMap.remove(DiskReadEndsMap.java:61)
    at net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:294)
    at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:117)
    at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:169)
    at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:101)



    I found the offending reads from the bam file:

    HS2000-1111A_168:5:1203:11571:84654 81 2 243164263 1 100M 1 231669 0 AGCCTATGTGATGACTACATGTCGTGCGGGATCCTGGATGGGATCCTGGGTCAGAGTAAGATAGAACTAAGGGAATCCAAATGAAATATGAACTTTAGTT DDDDDCCCCDEEDDCEDDB?CDEE8IJHFBIJIGBBJIHDHGHGF=JIIG@HIJIJJGJJIGDIHEB?IGIHIHEEGJJJHEGHEG>DHFDADDBFD@C@ NM:i:0 AS:i:100 XS:i:95 RG:Z:0749_7455
    HS2000-1111A_168:5:1203:11571:84654 161 1 231669 0 51M49S 2 243164263 0 CAGAAAGTAAGTACTAAAAAAATTAAAATATATCAAACAAAAATAAAAGCCTAGAAATCTCCTTTGCAAAAGAATTCCAAATAACTGATGTAGACACTCA @@@FDFF2CFHFFEGGIEEHIJJIJJJIIGGCGGGGIJJGDIJIG>DCGEFGEHIIEHIHGGHIGIIHEEH@DFFEF@CA:@@AACCDCC@CDEDDDDDC NM:i:0 AS:i:51 XS:i:51 RG:Z:0749_7455 XP:Z:1,+231860,51S49M,0,0;
    HS2000-1111A_168:5:1203:11571:84654 161 1 231860 0 51S49M 2 243164263 0 CAGAAAGTAAGTACTAAAAAAATTAAAATATATCAAACAAAAATAAAAGCCTAGAAATCTCCTTTGCAAAAGAATTCCAAATAACTGATGTAGACACTCA @@@FDFF2CFHFFEGGIEEHIJJIJJJIIGGCGGGGIJJGDIJIG>DCGEFGEHIIEHIHGGHIGIIHEEH@DFFEF@CA:@@AACCDCC@CDEDDDDDC NM:i:0 AS:i:49 XS:i:49 RG:Z:0749_7455 XP:Z:1,+231669,51M49S,0,0;

    It seems that bwa-mem aligned the second of the pair twice. In addition, I got an insert size of zero. From a post a few years back (somewhere) I read that this indicates the alignment is unpaired or the mate reference ID is invalid.

    I've tried using FixMateInformation, which shuffled some of the information around but still left me with two mappings of what appears to be the same read.

    Anyone able to shed some light on this? I did a search but found very few threads related to bwa-mem, as it's a relatively new tool.

    Myron Peto
    Last edited by myronpeto; 05-13-2013, 01:21 PM. Reason: Minor formatting issue.
  • lh3
    Senior Member
    • Feb 2008
    • 686

    #2
    The 2nd read cannot be aligned as a whole, if, for example, it contains a long deletion or bridges a break point due to reference assembly or structural variations. Bwa-mem gives you the opportunity to identify such events. You can use "bwa mem -M". As the command line help says, this option is for picard/gatk compatibility.

    Comment

    • myronpeto
      Member
      • Sep 2011
      • 10

      #3
      Thanks for the response. I tried your suggestion and although it took a while to figure it out (30X whole genome samples) it did work. I'll know for future reference to use that parameter.

      Comment

      • bwubb
        Member
        • Jan 2012
        • 61

        #4
        I have been struggling with this error.

        I re-aligned with bwa mem using -M option. I assigned readgroups for each sequencing run before merging files. I still receive the 'Value was put into PairInfoMap more than once.' error

        I need to take a look at the reads themselves still, but Im in the process adding lane number info into the RGID field to distinguish reads from different lanes of the same sequencing run.

        If that does not work I am not sure what else I can try. Any thoughts? Thank you.

        Comment

        • lh3
          Senior Member
          • Feb 2008
          • 686

          #5
          You can drop all the secondary alignments in the file (also use the latest version). If that does not work, please show the part of SAM that causes the problem. Thanks.

          Comment

          • bwubb
            Member
            • Jan 2012
            • 61

            #6
            I dont think I have any. I used -M with bwa mem and I did not use -a to output all alignments.

            Here is an example:
            Code:
            Exception in thread "main" net.sf.picard.PicardException: Value was put into PairInfoMap more than once.  1: KTR308-FGC0298:HWI-ST970:298:C0MUAACXX:5:1310:1225:35905
            
            
            HWI-ST970:298:C0MUAACXX:5:1310:1225:35905       97      1       63484   0       100M    2       65480858        0       GTGATTTTCCTCGGGTTATTAAACTTGCATGCATGGACACTTATGGGCTAGAATTTGTGGTCACTGCCAACAGTGGATTCATATCGATGGGCACCTTCTT  @@@ADDFFHHHHHDGCFB@FGHEGCHIIIIIGGHIIIFFAFCHGCII@;FHIIHHGGFGGCHEHIGHI@DGEHHC@>B?@?BCAA?=A>@?==<?CCDDC  RG:Z:KTR308-FGC0298     NM:i:0  AS:i:100        XS:i:100
            HWI-ST970:298:C0MUAACXX:5:1310:1225:35905       177     1       63638   2       100M    2       120882793       0       AAATGATTTATCCAAAGCATTCTTCACTTCGTCGGCTCACATCACCGTAGTGGTTTTGTTTTTTGCTCCATGCATGTTTCTCTACGTGTGGCCTTTCCCT  >:CA4DCDDCACAA:>@CACCA@:>8B=;DBB@CCFEB?BEHHEBGHGECG@DF@HIHGJIHCGGEFECIHGBGF?C<EEAF>?EA<<+<:C;?DBD1@?  RG:Z:KTR308-FGC0298     NM:i:0  AS:i:100        XS:i:95

            picard-tools v 1.94

            I see they recently came out with 1.95 but I suspect I am not running into a bug. Ive analyzed some of this data a while back without issue. Only recently with additional sequencing data included.
            Last edited by bwubb; 07-17-2013, 02:41 PM.

            Comment

            • lh3
              Senior Member
              • Feb 2008
              • 686

              #7
              I think your read file contains reads with identical names. You should check that first.

              Comment

              • bwubb
                Member
                • Jan 2012
                • 61

                #8
                I do not understand how that could have happened though...

                Comment

                • lh3
                  Senior Member
                  • Feb 2008
                  • 686

                  #9
                  The two reads you are showing have different sequences. They cannot be the same read.

                  Comment

                  • bwubb
                    Member
                    • Jan 2012
                    • 61

                    #10
                    Yes that is the part I am confused about. I have checked the fastq files for that sample and run. There is only 1 read with that ID in the fastq, but two come up in the resulting sam file.

                    Perhaps I have set up my bwa mem command wrong?

                    Code:
                    bwa mem -M -t 16 ~/human_g1k_v37.fasta KTR308_FGC0298_5_1_GTCCGC.fastq.gz KTR308_FGC0298_5_2_GTCCGC.fastq.gz > DATA/KTR308.7.FGC0298.GTCCGC.sam

                    Comment

                    • dpryan
                      Devon Ryan
                      • Jul 2011
                      • 3478

                      #11
                      You should get two reads in the SAM file with the same name, one from each of the fastq files.

                      Comment

                      • lh3
                        Senior Member
                        • Feb 2008
                        • 686

                        #12
                        Originally posted by dpryan View Post
                        You should get two reads in the SAM file with the same name, one from each of the fastq files.
                        I am also thinking about this. Apparently bwubb's two read files are not paired together. John has provided a pull request to check read names, such that the users may know their input is wrong. Unfortunately, it is not in the released package.

                        Comment

                        • dpryan
                          Devon Ryan
                          • Jul 2011
                          • 3478

                          #13
                          Originally posted by lh3 View Post
                          I am also thinking about this. Apparently bwubb's two read files are not paired together. John has provided a pull request to check read names, such that the users may know their input is wrong. Unfortunately, it is not in the released package.
                          Ah, that'll cause no end of problems.

                          Comment

                          • bwubb
                            Member
                            • Jan 2012
                            • 61

                            #14
                            Have I done something wrong in the pairing?

                            I was use to using bwa aln to align each fastq and then use sampe to do the pairing. Once I ran into the picard-tools error, I went back and used bwa mem -M instead, but still am receiving the error.

                            If Im following correctly, I have the appropriate number of reads in my files. Am I missing an option or step for proper pairing?

                            After bwa mem alignment I import to bam and then sort, before Read groups and merge.

                            Edit:
                            I guess I should ask how the pairing is displayed in the sam files resulting from bwa mem. I do not know how to identify if my pairing is proper.
                            Last edited by bwubb; 07-18-2013, 07:37 AM.

                            Comment

                            • lh3
                              Senior Member
                              • Feb 2008
                              • 686

                              #15
                              The i-th read in your first file and the i-th read in the second must be in a read pair, i.e. they have the same read name.

                              Comment

                              Latest Articles

                              Collapse

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Yesterday, 10:09 AM
                              0 responses
                              10 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              20 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              27 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 11:40 AM
                              0 responses
                              21 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...