Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bwa/picard issue

    Hi,

    I have a pair of paired end fastq files (lets say forward.fq and reverse.fq) from which I clipped for the adapters, trimmed for quality, and removed barcodes and now have the individual (balanced) files. Off these, I took one of the pairs, lets say, "pe1" and "pe2" and converted to a .SAM file using bwa tool (after indexing corresponding genomic reference etc..).

    Then I used ViewSam module of Picard to write "Aligned" and "Unaligned" output files separately. After that when I tried to convert the Unaligned (or Aligned for that matter) file back to fastq (paired end) with "SamToFastq" module of picard, it spits out this error:

    Code:
    Exception in thread "main" net.sf.picard.PicardException: Found 185855 unpaired mates
            at net.sf.picard.sam.SamToFastq.doWork(SamToFastq.java:153)
            at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:156)
            at net.sf.picard.sam.SamToFastq.main(SamToFastq.java:112)
    I tried converting the sam file generated from bwa tool back to fastq and that works!! So this means the ISSUE is with the PICARD ViewSam? I tried validating the Unaligned sam file and got the error log file which consisted of entries like the one shown below 185855 times, which I guess is what the number in the first error represents. I understand what it says, maybe. But I don't understand how or why.
    Code:
    ERROR: Read name SOLEXA1_0503:5:25:1600:20006#0, Mate not found for paired read
    How do I fix this error or how did this error occur? I also tried "FixMateInformation" (module from Picard) for both Aligned and Unaligned reads separately and also tried to "MarkDuplicates" (module from Picard).

    I am using bwa 0.5.9 and Picard 1.4.1 (both latest from their download pages).

    I would appreciate any ideas on fixing this.
    Thank you.
    Last edited by cedance; 03-18-2011, 01:29 PM. Reason: Better Title

  • #2
    Have you tried running Picard with VALIDATION_STRINGENCY=LENIENT?

    Comment


    • #3
      Hi,

      I just tried it. Got the same error!
      Code:
      Exception in thread "main" net.sf.picard.PicardException: Found 185855 unpaired mates

      Comment


      • #4
        me too i've the same error.. even with VALIDATION_STRINGENCY=SILENT

        Comment


        • #5
          Hi


          Code:
          [Thu Oct 13 00:08:39 CST 2011] net.sf.picard.sam.SamToFastq done. Elapsed time: 0.01 minutes.
          Runtime.totalMemory()=12255232
          Exception in thread "main" net.sf.samtools.SAMFormatException: Error parsing SAM header. @SQ line missing LN tag. Line:
          @SQ     SN:S; ; Line number 2234
                  at net.sf.samtools.SAMTextHeaderCodec.reportErrorParsingLine(SAMTextHeaderCodec.java:230)
                  at net.sf.samtools.SAMTextHeaderCodec.access$100(SAMTextHeaderCodec.java:39)
                  at net.sf.samtools.SAMTextHeaderCodec$ParsedHeaderLine.requireTag(SAMTextHeaderCodec.java:306)
                  at net.sf.samtools.SAMTextHeaderCodec.parseSQLine(SAMTextHeaderCodec.java:199)
                  at net.sf.samtools.SAMTextHeaderCodec.decode(SAMTextHeaderCodec.java:96)
                  at net.sf.samtools.SAMTextReader.readHeader(SAMTextReader.java:198)
                  at net.sf.samtools.SAMTextReader.<init>(SAMTextReader.java:79)
                  at net.sf.samtools.SAMTextReader.<init>(SAMTextReader.java:88)
                  at net.sf.samtools.SAMFileReader.init(SAMFileReader.java:518)
                  at net.sf.samtools.SAMFileReader.<init>(SAMFileReader.java:142)
                  at net.sf.samtools.SAMFileReader.<init>(SAMFileReader.java:112)
                  at net.sf.picard.sam.SamToFastq.doWork(SamToFastq.java:121)
          I have checked the line 2234, the line indeed has the LN tag, how was the error produced? Thanks.

          Comment


          • #6
            I have the same error
            ' Found unmapped mates'
            Any solution to this? No fastq file was generated...

            Comment


            • #7
              I have the same issue as following,
              Any suggestion ?!

              INFO 2013-03-03 21:41:42 SamToFastq Processed 40,000,000 records.
              Elapsed time: 00:10:58s. Time for last 1,000,000: 14s. Last read position:
              chrX:151,532,960
              [Sun Mar 03 21:41:46 EST 2013] net.sf.picard.sam.SamToFastq done. Elapsed time:
              11.40 minutes.
              Runtime.totalMemory()=4834525184
              FAQ: http://sourceforge.net/apps/mediawik...itle=Main_Page
              Exception in thread "main" net.sf.picard.PicardException: Found 3494637 unpaired
              mates
              at net.sf.picard.sam.SamToFastq.doWork(SamToFastq.java:185)
              at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
              at net.sf.picard.sam.SamToFastq.main(SamToFastq.java:119)
              ~

              Comment


              • #8
                So you just filtered using, say, the 4 flag? So you will have a lot of reads where one end is mapped, and the other is not, right? That seems to be what the software is complaining about.

                Comment


                • #9
                  A friend recently experienced similar issues with Picard. So he found the following tool to work for him:



                  You'll have to use it's bam2FastQ tool. Might be worth a try.

                  Comment


                  • #10
                    Do some accounting. Are all reads paired? or just some. If you have all reads paired, then try "samtools fixmate" first.

                    If that doesn't work, I hacked fixmate to deal with tougher situations ...

                    Code:
                    -bash-3.00$ cat bamfixunmaps.c
                    
                    #include <stdlib.h>
                    #include <string.h>
                    #include "bam.h"
                    
                    /*
                    usage: ./bamfixunmaps  inputsortebyname.bam outputfixed.bam
                    
                    remember to resort output bam  by genomic location and re-index 
                    
                    this takes sorted by NAME (samtools sort -n") input,
                    
                    how to compile on my machine, fix -I to point to your samtools directory and specify location of libbam.a  and bam.h  ...
                    
                    vi +19 bamfixunmaps.c ; gcc -Wall -O2 bamfixunmaps.c -o bamfixunmaps -I..  ../libbam.a -lz 
                    
                    Example fixes for this error message:
                    ERROR: Record 94341, Read name NCI-GA4_1:2:90:1195:129, Mate unmapped flag does not match read unmapped flag of mate
                    */
                    
                    
                    // currently, this function ONLY works if each read has one hit
                    void bam_mating_core(bamFile in, bamFile out)
                    {
                        bam_header_t *header;
                        bam1_t *b[2];
                        int curr, has_prev;
                    
                        header = bam_header_read(in);
                        bam_header_write(out, header);
                    
                        b[0] = bam_init1();
                        b[1] = bam_init1();
                        curr = 0; has_prev = 0;
                        while (bam_read1(in, b[curr]) >= 0) 
                        {
                        	bam1_t *cur = b[curr], *pre = b[1-curr];
                        	if ((cur->core.flag&(BAM_FUNMAP)) || (cur->core.tid < 0))
                        	{
                                 cur->core.qual = 0;
                                 cur->core.tid  = -1;
                                 cur->core.pos  = -1; // prints as 0 in sam
                                 cur->core.flag |= BAM_FUNMAP;  // turn on unmapped bit
                        	}
                        	if (has_prev) {
                        		if (strcmp(bam1_qname(cur), bam1_qname(pre)) == 0) { // identical pair name
                        			cur->core.mtid = pre->core.tid; cur->core.mpos = pre->core.pos;
                        			pre->core.mtid = cur->core.tid; pre->core.mpos = cur->core.pos;
                    // rpf
                        		if (pre->core.flag&BAM_FUNMAP) // pre is not mapped
                                    {   // set cur's "minfo"
                        		   cur->core.flag |= BAM_FMUNMAP;  // turn on mate unmapped bit
                                       cur->core.mtid  = -1;
                                       cur->core.mpos  = -1;
                                    }
                        		else
                                    {
                        		   cur->core.flag &= ~BAM_FMUNMAP; // turn off mate unmapped bit
                                    }
                        		if (cur->core.flag&BAM_FUNMAP)     // cur is not NOT mapped
                                    {
                        		   pre->core.flag |= BAM_FMUNMAP;  // turn on unmapped bit
                                       pre->core.mtid  = -1;
                                       pre->core.mpos  = -1;
                                    }
                        		else
                                    {
                        		   pre->core.flag &= ~BAM_FMUNMAP;  // turn off unmapped bit
                                    }
                    // rpf 
                        			if (pre->core.tid == cur->core.tid && !(cur->core.flag&(BAM_FUNMAP|BAM_FMUNMAP))
                        				&& !(pre->core.flag&(BAM_FUNMAP|BAM_FMUNMAP)))
                        			{
                        				uint32_t cur5, pre5;
                        				cur5 = (cur->core.flag&BAM_FREVERSE)? bam_calend(&cur->core, bam1_cigar(cur)) : cur->core.pos;
                        				pre5 = (pre->core.flag&BAM_FREVERSE)? bam_calend(&pre->core, bam1_cigar(pre)) : pre->core.pos;
                        				cur->core.isize = pre5 - cur5; pre->core.isize = cur5 - pre5;
                        			} else cur->core.isize = pre->core.isize = 0;
                        			if (pre->core.flag&BAM_FREVERSE) cur->core.flag |= BAM_FMREVERSE;
                        			else cur->core.flag &= ~BAM_FMREVERSE;
                        			if (cur->core.flag&BAM_FREVERSE) pre->core.flag |= BAM_FMREVERSE;
                        			else pre->core.flag &= ~BAM_FMREVERSE;
                        			if (cur->core.flag & BAM_FUNMAP) { pre->core.flag |= BAM_FMUNMAP; pre->core.flag &= ~BAM_FPROPER_PAIR; }
                        			if (pre->core.flag & BAM_FUNMAP) { cur->core.flag |= BAM_FMUNMAP; cur->core.flag &= ~BAM_FPROPER_PAIR; }
                        			bam_write1(out, pre);
                        			bam_write1(out, cur);
                        			has_prev = 0;
                        		} else { // unpaired or singleton
                        			pre->core.mtid = -1; pre->core.mpos = -1; pre->core.isize = 0;
                        			if (pre->core.flag & BAM_FPAIRED) {
                        				pre->core.flag |= BAM_FMUNMAP;
                        				pre->core.flag &= ~BAM_FMREVERSE & ~BAM_FPROPER_PAIR;
                        			}
                        			bam_write1(out, pre);
                        		}
                        	} else has_prev = 1;
                        	curr = 1 - curr;
                        }
                        if (has_prev) bam_write1(out, b[1-curr]);
                        bam_header_destroy(header);
                        bam_destroy1(b[0]);
                        bam_destroy1(b[1]);
                    
                    //    fprintf(stderr,"rpf message %ld fixed by set unmap to qual zero \n", count_unmap_qualNZs);
                    }
                    
                    int main(int argc, char *argv[])
                    {
                        bamFile in, out;
                        if (argc < 3) {
                        	fprintf(stderr, "bamfixunmaps <in.nameSrt.bam> <out.nameSrt.bam>\n");
                        	return 1;
                        }
                        in = (strcmp(argv[1], "-") == 0)? bam_dopen(fileno(stdin), "r") : bam_open(argv[1], "r");
                        out = (strcmp(argv[2], "-") == 0)? bam_dopen(fileno(stdout), "w") : bam_open(argv[2], "w");
                        bam_mating_core(in, out);
                        bam_close(in); bam_close(out);
                        return 0;
                    }

                    Comment


                    • #11
                      picard - samtools discrepancy

                      I am also having this issue, and it appears to be a headache for many people.

                      I am not interested in paired-end information, since I am comparing unmapped reads to a library of de-novo assembled repeats identified from the raw reads (how much of the mapping issue is due to repetitive elements sampled in the reads?)

                      1) samtools view -f4 file.bam > unmapped.sam =30,161,064 unmapped reads
                      2) samtools view -f1 unmapped.sam = 29,415,609 paired, unmapped reads
                      3) samtools view -F3 unmapped.sam = 752,455 unpaired, unmapped reads

                      29,415,609 + 752,455 = 30,161,064 unmapped reads. Great, all accounted for.

                      4) picard-tools SamToFastq on #3 gives me 752,455 reads. Great.
                      5) picard-tools SamToFastq on #2 gives me 25,317,354 reads and the below error:

                      SAM validation error: ERROR: Found 4,098,255 unpaired mates.

                      Why on Earth does samtools tell me I have 29,415,609 paired, unmapped reads while Picard tools tells me that 4,098,255 of those reads are actually UNPAIRED?

                      coffee break.

                      Comment


                      • #12
                        Originally posted by etwatson View Post
                        Why on Earth does samtools tell me I have 29,415,609 paired, unmapped reads while Picard tools tells me that 4,098,255 of those reads are actually UNPAIRED?

                        coffee break.
                        Ok, after my coffee break, I decided to lean on my awk crutch.

                        This did the job:
                        Code:
                        samtools -f4 file.srt.bam | awk 'BEGIN{OFS="\n"}{print "@"$1,$10,"+",$11}'> reads.fastq

                        Comment


                        • #13
                          Originally posted by etwatson View Post
                          This did the job:
                          Code:
                          samtools -f4 file.srt.bam | awk 'BEGIN{OFS="\n"}{print "@"$1,$10,"+",$11}'> reads.fastq
                          Just a caution: You can't get the original reads back by just extracting the 10th column of the SAM.

                          2 issues result in the 10th column sequences not corresponding precisely to the original reads. Hard clipping results in the sequence stored in the SAM file to be truncated, compared to the original FASTQ read. And secondary alignments result in the same read appearing more than once in the SAM file. See this figure from Heng Li's paper for details on clipping and how reads are stored:
                          Last edited by nstoler; 03-05-2015, 11:46 AM.

                          Comment


                          • #14
                            Originally posted by nstoler View Post
                            Just a caution: Hard clipping results in the sequence stored in the SAM file to be truncated, compared to the original FASTQ read. And secondary alignments result in the same read appearing more than once in the SAM file.
                            Now I'm confused. I am using the -f4 flag to extract unmapped reads. Are unmapped reads altered?

                            Comment


                            • #15
                              Originally posted by etwatson View Post
                              Now I'm confused. I am using the -f4 flag to extract unmapped reads. Are unmapped reads altered?
                              The SAM format does not store reads; it stores alignments. The original reads can be reconstructed from the alignments, but it's not as simple as just extracting the sequence in column 10.

                              For a good example why, take a look at this figure, paying attention to "hard clipping":

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              67 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X