Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Odd bwa-mem alignment

    I'm trying to align a whole genome sample using bwa-mem. I've previously aligned using bwa aln and bwa sampe and it went through fine. The command I'm using is:

    bwa mem -t 8 -R '@RG\tID:Label\tLB:Label\tSM:Label' reference_v37.fasta read_1.fq.gz read_2.fq.gz > Sample.sam

    Followed by:

    samtools import reference_v37.fasta.fai Sample.sam Sample_unsorted.bam
    samtools sort Sample_unsorted.bam Sample_sorted

    The error I got from picard MarkDuplicates was the following:

    Exception in thread "main" net.sf.picard.PicardException: Value was put into PairInfoMap more than once. 1: 0749_7455:HS2000-1111A_168:5:1203:11571:84654
    at net.sf.picard.sam.CoordinateSortedPairInfoMap.ensureSequenceLoaded(CoordinateSortedPairInfoMap.java:124)
    at net.sf.picard.sam.CoordinateSortedPairInfoMap.remove(CoordinateSortedPairInfoMap.java:78)
    at net.sf.picard.sam.DiskReadEndsMap.remove(DiskReadEndsMap.java:61)
    at net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:294)
    at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:117)
    at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:169)
    at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:101)



    I found the offending reads from the bam file:

    HS2000-1111A_168:5:1203:11571:84654 81 2 243164263 1 100M 1 231669 0 AGCCTATGTGATGACTACATGTCGTGCGGGATCCTGGATGGGATCCTGGGTCAGAGTAAGATAGAACTAAGGGAATCCAAATGAAATATGAACTTTAGTT DDDDDCCCCDEEDDCEDDB?CDEE8IJHFBIJIGBBJIHDHGHGF=JIIG@HIJIJJGJJIGDIHEB?IGIHIHEEGJJJHEGHEG>DHFDADDBFD@C@ NM:i:0 AS:i:100 XS:i:95 RG:Z:0749_7455
    HS2000-1111A_168:5:1203:11571:84654 161 1 231669 0 51M49S 2 243164263 0 CAGAAAGTAAGTACTAAAAAAATTAAAATATATCAAACAAAAATAAAAGCCTAGAAATCTCCTTTGCAAAAGAATTCCAAATAACTGATGTAGACACTCA @@@FDFF2CFHFFEGGIEEHIJJIJJJIIGGCGGGGIJJGDIJIG>DCGEFGEHIIEHIHGGHIGIIHEEH@DFFEF@CA:@@AACCDCC@CDEDDDDDC NM:i:0 AS:i:51 XS:i:51 RG:Z:0749_7455 XP:Z:1,+231860,51S49M,0,0;
    HS2000-1111A_168:5:1203:11571:84654 161 1 231860 0 51S49M 2 243164263 0 CAGAAAGTAAGTACTAAAAAAATTAAAATATATCAAACAAAAATAAAAGCCTAGAAATCTCCTTTGCAAAAGAATTCCAAATAACTGATGTAGACACTCA @@@FDFF2CFHFFEGGIEEHIJJIJJJIIGGCGGGGIJJGDIJIG>DCGEFGEHIIEHIHGGHIGIIHEEH@DFFEF@CA:@@AACCDCC@CDEDDDDDC NM:i:0 AS:i:49 XS:i:49 RG:Z:0749_7455 XP:Z:1,+231669,51M49S,0,0;

    It seems that bwa-mem aligned the second of the pair twice. In addition, I got an insert size of zero. From a post a few years back (somewhere) I read that this indicates the alignment is unpaired or the mate reference ID is invalid.

    I've tried using FixMateInformation, which shuffled some of the information around but still left me with two mappings of what appears to be the same read.

    Anyone able to shed some light on this? I did a search but found very few threads related to bwa-mem, as it's a relatively new tool.

    Myron Peto
    Last edited by myronpeto; 05-13-2013, 01:21 PM. Reason: Minor formatting issue.

  • #2
    The 2nd read cannot be aligned as a whole, if, for example, it contains a long deletion or bridges a break point due to reference assembly or structural variations. Bwa-mem gives you the opportunity to identify such events. You can use "bwa mem -M". As the command line help says, this option is for picard/gatk compatibility.

    Comment


    • #3
      Thanks for the response. I tried your suggestion and although it took a while to figure it out (30X whole genome samples) it did work. I'll know for future reference to use that parameter.

      Comment


      • #4
        I have been struggling with this error.

        I re-aligned with bwa mem using -M option. I assigned readgroups for each sequencing run before merging files. I still receive the 'Value was put into PairInfoMap more than once.' error

        I need to take a look at the reads themselves still, but Im in the process adding lane number info into the RGID field to distinguish reads from different lanes of the same sequencing run.

        If that does not work I am not sure what else I can try. Any thoughts? Thank you.

        Comment


        • #5
          You can drop all the secondary alignments in the file (also use the latest version). If that does not work, please show the part of SAM that causes the problem. Thanks.

          Comment


          • #6
            I dont think I have any. I used -M with bwa mem and I did not use -a to output all alignments.

            Here is an example:
            Code:
            Exception in thread "main" net.sf.picard.PicardException: Value was put into PairInfoMap more than once.  1: KTR308-FGC0298:HWI-ST970:298:C0MUAACXX:5:1310:1225:35905
            
            
            HWI-ST970:298:C0MUAACXX:5:1310:1225:35905       97      1       63484   0       100M    2       65480858        0       GTGATTTTCCTCGGGTTATTAAACTTGCATGCATGGACACTTATGGGCTAGAATTTGTGGTCACTGCCAACAGTGGATTCATATCGATGGGCACCTTCTT  @@@ADDFFHHHHHDGCFB@FGHEGCHIIIIIGGHIIIFFAFCHGCII@;FHIIHHGGFGGCHEHIGHI@DGEHHC@>B?@?BCAA?=A>@?==<?CCDDC  RG:Z:KTR308-FGC0298     NM:i:0  AS:i:100        XS:i:100
            HWI-ST970:298:C0MUAACXX:5:1310:1225:35905       177     1       63638   2       100M    2       120882793       0       AAATGATTTATCCAAAGCATTCTTCACTTCGTCGGCTCACATCACCGTAGTGGTTTTGTTTTTTGCTCCATGCATGTTTCTCTACGTGTGGCCTTTCCCT  >:CA4DCDDCACAA:>@CACCA@:>8B=;DBB@CCFEB?BEHHEBGHGECG@DF@HIHGJIHCGGEFECIHGBGF?C<EEAF>?EA<<+<:C;?DBD1@?  RG:Z:KTR308-FGC0298     NM:i:0  AS:i:100        XS:i:95

            picard-tools v 1.94

            I see they recently came out with 1.95 but I suspect I am not running into a bug. Ive analyzed some of this data a while back without issue. Only recently with additional sequencing data included.
            Last edited by bwubb; 07-17-2013, 02:41 PM.

            Comment


            • #7
              I think your read file contains reads with identical names. You should check that first.

              Comment


              • #8
                I do not understand how that could have happened though...

                Comment


                • #9
                  The two reads you are showing have different sequences. They cannot be the same read.

                  Comment


                  • #10
                    Yes that is the part I am confused about. I have checked the fastq files for that sample and run. There is only 1 read with that ID in the fastq, but two come up in the resulting sam file.

                    Perhaps I have set up my bwa mem command wrong?

                    Code:
                    bwa mem -M -t 16 ~/human_g1k_v37.fasta KTR308_FGC0298_5_1_GTCCGC.fastq.gz KTR308_FGC0298_5_2_GTCCGC.fastq.gz > DATA/KTR308.7.FGC0298.GTCCGC.sam

                    Comment


                    • #11
                      You should get two reads in the SAM file with the same name, one from each of the fastq files.

                      Comment


                      • #12
                        Originally posted by dpryan View Post
                        You should get two reads in the SAM file with the same name, one from each of the fastq files.
                        I am also thinking about this. Apparently bwubb's two read files are not paired together. John has provided a pull request to check read names, such that the users may know their input is wrong. Unfortunately, it is not in the released package.

                        Comment


                        • #13
                          Originally posted by lh3 View Post
                          I am also thinking about this. Apparently bwubb's two read files are not paired together. John has provided a pull request to check read names, such that the users may know their input is wrong. Unfortunately, it is not in the released package.
                          Ah, that'll cause no end of problems.

                          Comment


                          • #14
                            Have I done something wrong in the pairing?

                            I was use to using bwa aln to align each fastq and then use sampe to do the pairing. Once I ran into the picard-tools error, I went back and used bwa mem -M instead, but still am receiving the error.

                            If Im following correctly, I have the appropriate number of reads in my files. Am I missing an option or step for proper pairing?

                            After bwa mem alignment I import to bam and then sort, before Read groups and merge.

                            Edit:
                            I guess I should ask how the pairing is displayed in the sam files resulting from bwa mem. I do not know how to identify if my pairing is proper.
                            Last edited by bwubb; 07-18-2013, 07:37 AM.

                            Comment


                            • #15
                              The i-th read in your first file and the i-th read in the second must be in a read pair, i.e. they have the same read name.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              66 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X