Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • cedance
    Senior Member
    • Feb 2011
    • 108

    bwa/picard issue

    Hi,

    I have a pair of paired end fastq files (lets say forward.fq and reverse.fq) from which I clipped for the adapters, trimmed for quality, and removed barcodes and now have the individual (balanced) files. Off these, I took one of the pairs, lets say, "pe1" and "pe2" and converted to a .SAM file using bwa tool (after indexing corresponding genomic reference etc..).

    Then I used ViewSam module of Picard to write "Aligned" and "Unaligned" output files separately. After that when I tried to convert the Unaligned (or Aligned for that matter) file back to fastq (paired end) with "SamToFastq" module of picard, it spits out this error:

    Code:
    Exception in thread "main" net.sf.picard.PicardException: Found 185855 unpaired mates
            at net.sf.picard.sam.SamToFastq.doWork(SamToFastq.java:153)
            at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:156)
            at net.sf.picard.sam.SamToFastq.main(SamToFastq.java:112)
    I tried converting the sam file generated from bwa tool back to fastq and that works!! So this means the ISSUE is with the PICARD ViewSam? I tried validating the Unaligned sam file and got the error log file which consisted of entries like the one shown below 185855 times, which I guess is what the number in the first error represents. I understand what it says, maybe. But I don't understand how or why.
    Code:
    ERROR: Read name SOLEXA1_0503:5:25:1600:20006#0, Mate not found for paired read
    How do I fix this error or how did this error occur? I also tried "FixMateInformation" (module from Picard) for both Aligned and Unaligned reads separately and also tried to "MarkDuplicates" (module from Picard).

    I am using bwa 0.5.9 and Picard 1.4.1 (both latest from their download pages).

    I would appreciate any ideas on fixing this.
    Thank you.
    Last edited by cedance; 03-18-2011, 01:29 PM. Reason: Better Title
  • n00c
    Member
    • Nov 2009
    • 12

    #2
    Have you tried running Picard with VALIDATION_STRINGENCY=LENIENT?

    Comment

    • cedance
      Senior Member
      • Feb 2011
      • 108

      #3
      Hi,

      I just tried it. Got the same error!
      Code:
      Exception in thread "main" net.sf.picard.PicardException: Found 185855 unpaired mates

      Comment

      • Seq84
        Member
        • Feb 2011
        • 19

        #4
        me too i've the same error.. even with VALIDATION_STRINGENCY=SILENT

        Comment

        • pengchy
          Senior Member
          • Feb 2009
          • 116

          #5
          Hi


          Code:
          [Thu Oct 13 00:08:39 CST 2011] net.sf.picard.sam.SamToFastq done. Elapsed time: 0.01 minutes.
          Runtime.totalMemory()=12255232
          Exception in thread "main" net.sf.samtools.SAMFormatException: Error parsing SAM header. @SQ line missing LN tag. Line:
          @SQ     SN:S; ; Line number 2234
                  at net.sf.samtools.SAMTextHeaderCodec.reportErrorParsingLine(SAMTextHeaderCodec.java:230)
                  at net.sf.samtools.SAMTextHeaderCodec.access$100(SAMTextHeaderCodec.java:39)
                  at net.sf.samtools.SAMTextHeaderCodec$ParsedHeaderLine.requireTag(SAMTextHeaderCodec.java:306)
                  at net.sf.samtools.SAMTextHeaderCodec.parseSQLine(SAMTextHeaderCodec.java:199)
                  at net.sf.samtools.SAMTextHeaderCodec.decode(SAMTextHeaderCodec.java:96)
                  at net.sf.samtools.SAMTextReader.readHeader(SAMTextReader.java:198)
                  at net.sf.samtools.SAMTextReader.<init>(SAMTextReader.java:79)
                  at net.sf.samtools.SAMTextReader.<init>(SAMTextReader.java:88)
                  at net.sf.samtools.SAMFileReader.init(SAMFileReader.java:518)
                  at net.sf.samtools.SAMFileReader.<init>(SAMFileReader.java:142)
                  at net.sf.samtools.SAMFileReader.<init>(SAMFileReader.java:112)
                  at net.sf.picard.sam.SamToFastq.doWork(SamToFastq.java:121)
          I have checked the line 2234, the line indeed has the LN tag, how was the error produced? Thanks.

          Comment

          • nupurgupta
            Member
            • Aug 2010
            • 29

            #6
            I have the same error
            ' Found unmapped mates'
            Any solution to this? No fastq file was generated...

            Comment

            • Naarkhoo
              Member
              • Jan 2013
              • 11

              #7
              I have the same issue as following,
              Any suggestion ?!

              INFO 2013-03-03 21:41:42 SamToFastq Processed 40,000,000 records.
              Elapsed time: 00:10:58s. Time for last 1,000,000: 14s. Last read position:
              chrX:151,532,960
              [Sun Mar 03 21:41:46 EST 2013] net.sf.picard.sam.SamToFastq done. Elapsed time:
              11.40 minutes.
              Runtime.totalMemory()=4834525184
              FAQ: http://sourceforge.net/apps/mediawik...itle=Main_Page
              Exception in thread "main" net.sf.picard.PicardException: Found 3494637 unpaired
              mates
              at net.sf.picard.sam.SamToFastq.doWork(SamToFastq.java:185)
              at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
              at net.sf.picard.sam.SamToFastq.main(SamToFastq.java:119)
              ~

              Comment

              • swbarnes2
                Senior Member
                • May 2008
                • 910

                #8
                So you just filtered using, say, the 4 flag? So you will have a lot of reads where one end is mapped, and the other is not, right? That seems to be what the software is complaining about.

                Comment

                • aggp11
                  Member
                  • Jun 2011
                  • 87

                  #9
                  A friend recently experienced similar issues with Picard. So he found the following tool to work for him:



                  You'll have to use it's bam2FastQ tool. Might be worth a try.

                  Comment

                  • Richard Finney
                    Senior Member
                    • Feb 2009
                    • 701

                    #10
                    Do some accounting. Are all reads paired? or just some. If you have all reads paired, then try "samtools fixmate" first.

                    If that doesn't work, I hacked fixmate to deal with tougher situations ...

                    Code:
                    -bash-3.00$ cat bamfixunmaps.c
                    
                    #include <stdlib.h>
                    #include <string.h>
                    #include "bam.h"
                    
                    /*
                    usage: ./bamfixunmaps  inputsortebyname.bam outputfixed.bam
                    
                    remember to resort output bam  by genomic location and re-index 
                    
                    this takes sorted by NAME (samtools sort -n") input,
                    
                    how to compile on my machine, fix -I to point to your samtools directory and specify location of libbam.a  and bam.h  ...
                    
                    vi +19 bamfixunmaps.c ; gcc -Wall -O2 bamfixunmaps.c -o bamfixunmaps -I..  ../libbam.a -lz 
                    
                    Example fixes for this error message:
                    ERROR: Record 94341, Read name NCI-GA4_1:2:90:1195:129, Mate unmapped flag does not match read unmapped flag of mate
                    */
                    
                    
                    // currently, this function ONLY works if each read has one hit
                    void bam_mating_core(bamFile in, bamFile out)
                    {
                        bam_header_t *header;
                        bam1_t *b[2];
                        int curr, has_prev;
                    
                        header = bam_header_read(in);
                        bam_header_write(out, header);
                    
                        b[0] = bam_init1();
                        b[1] = bam_init1();
                        curr = 0; has_prev = 0;
                        while (bam_read1(in, b[curr]) >= 0) 
                        {
                        	bam1_t *cur = b[curr], *pre = b[1-curr];
                        	if ((cur->core.flag&(BAM_FUNMAP)) || (cur->core.tid < 0))
                        	{
                                 cur->core.qual = 0;
                                 cur->core.tid  = -1;
                                 cur->core.pos  = -1; // prints as 0 in sam
                                 cur->core.flag |= BAM_FUNMAP;  // turn on unmapped bit
                        	}
                        	if (has_prev) {
                        		if (strcmp(bam1_qname(cur), bam1_qname(pre)) == 0) { // identical pair name
                        			cur->core.mtid = pre->core.tid; cur->core.mpos = pre->core.pos;
                        			pre->core.mtid = cur->core.tid; pre->core.mpos = cur->core.pos;
                    // rpf
                        		if (pre->core.flag&BAM_FUNMAP) // pre is not mapped
                                    {   // set cur's "minfo"
                        		   cur->core.flag |= BAM_FMUNMAP;  // turn on mate unmapped bit
                                       cur->core.mtid  = -1;
                                       cur->core.mpos  = -1;
                                    }
                        		else
                                    {
                        		   cur->core.flag &= ~BAM_FMUNMAP; // turn off mate unmapped bit
                                    }
                        		if (cur->core.flag&BAM_FUNMAP)     // cur is not NOT mapped
                                    {
                        		   pre->core.flag |= BAM_FMUNMAP;  // turn on unmapped bit
                                       pre->core.mtid  = -1;
                                       pre->core.mpos  = -1;
                                    }
                        		else
                                    {
                        		   pre->core.flag &= ~BAM_FMUNMAP;  // turn off unmapped bit
                                    }
                    // rpf 
                        			if (pre->core.tid == cur->core.tid && !(cur->core.flag&(BAM_FUNMAP|BAM_FMUNMAP))
                        				&& !(pre->core.flag&(BAM_FUNMAP|BAM_FMUNMAP)))
                        			{
                        				uint32_t cur5, pre5;
                        				cur5 = (cur->core.flag&BAM_FREVERSE)? bam_calend(&cur->core, bam1_cigar(cur)) : cur->core.pos;
                        				pre5 = (pre->core.flag&BAM_FREVERSE)? bam_calend(&pre->core, bam1_cigar(pre)) : pre->core.pos;
                        				cur->core.isize = pre5 - cur5; pre->core.isize = cur5 - pre5;
                        			} else cur->core.isize = pre->core.isize = 0;
                        			if (pre->core.flag&BAM_FREVERSE) cur->core.flag |= BAM_FMREVERSE;
                        			else cur->core.flag &= ~BAM_FMREVERSE;
                        			if (cur->core.flag&BAM_FREVERSE) pre->core.flag |= BAM_FMREVERSE;
                        			else pre->core.flag &= ~BAM_FMREVERSE;
                        			if (cur->core.flag & BAM_FUNMAP) { pre->core.flag |= BAM_FMUNMAP; pre->core.flag &= ~BAM_FPROPER_PAIR; }
                        			if (pre->core.flag & BAM_FUNMAP) { cur->core.flag |= BAM_FMUNMAP; cur->core.flag &= ~BAM_FPROPER_PAIR; }
                        			bam_write1(out, pre);
                        			bam_write1(out, cur);
                        			has_prev = 0;
                        		} else { // unpaired or singleton
                        			pre->core.mtid = -1; pre->core.mpos = -1; pre->core.isize = 0;
                        			if (pre->core.flag & BAM_FPAIRED) {
                        				pre->core.flag |= BAM_FMUNMAP;
                        				pre->core.flag &= ~BAM_FMREVERSE & ~BAM_FPROPER_PAIR;
                        			}
                        			bam_write1(out, pre);
                        		}
                        	} else has_prev = 1;
                        	curr = 1 - curr;
                        }
                        if (has_prev) bam_write1(out, b[1-curr]);
                        bam_header_destroy(header);
                        bam_destroy1(b[0]);
                        bam_destroy1(b[1]);
                    
                    //    fprintf(stderr,"rpf message %ld fixed by set unmap to qual zero \n", count_unmap_qualNZs);
                    }
                    
                    int main(int argc, char *argv[])
                    {
                        bamFile in, out;
                        if (argc < 3) {
                        	fprintf(stderr, "bamfixunmaps <in.nameSrt.bam> <out.nameSrt.bam>\n");
                        	return 1;
                        }
                        in = (strcmp(argv[1], "-") == 0)? bam_dopen(fileno(stdin), "r") : bam_open(argv[1], "r");
                        out = (strcmp(argv[2], "-") == 0)? bam_dopen(fileno(stdout), "w") : bam_open(argv[2], "w");
                        bam_mating_core(in, out);
                        bam_close(in); bam_close(out);
                        return 0;
                    }

                    Comment

                    • etwatson
                      Member
                      • Jun 2012
                      • 18

                      #11
                      picard - samtools discrepancy

                      I am also having this issue, and it appears to be a headache for many people.

                      I am not interested in paired-end information, since I am comparing unmapped reads to a library of de-novo assembled repeats identified from the raw reads (how much of the mapping issue is due to repetitive elements sampled in the reads?)

                      1) samtools view -f4 file.bam > unmapped.sam =30,161,064 unmapped reads
                      2) samtools view -f1 unmapped.sam = 29,415,609 paired, unmapped reads
                      3) samtools view -F3 unmapped.sam = 752,455 unpaired, unmapped reads

                      29,415,609 + 752,455 = 30,161,064 unmapped reads. Great, all accounted for.

                      4) picard-tools SamToFastq on #3 gives me 752,455 reads. Great.
                      5) picard-tools SamToFastq on #2 gives me 25,317,354 reads and the below error:

                      SAM validation error: ERROR: Found 4,098,255 unpaired mates.

                      Why on Earth does samtools tell me I have 29,415,609 paired, unmapped reads while Picard tools tells me that 4,098,255 of those reads are actually UNPAIRED?

                      coffee break.

                      Comment

                      • etwatson
                        Member
                        • Jun 2012
                        • 18

                        #12
                        Originally posted by etwatson View Post
                        Why on Earth does samtools tell me I have 29,415,609 paired, unmapped reads while Picard tools tells me that 4,098,255 of those reads are actually UNPAIRED?

                        coffee break.
                        Ok, after my coffee break, I decided to lean on my awk crutch.

                        This did the job:
                        Code:
                        samtools -f4 file.srt.bam | awk 'BEGIN{OFS="\n"}{print "@"$1,$10,"+",$11}'> reads.fastq

                        Comment

                        • nstoler
                          Junior Member
                          • May 2013
                          • 3

                          #13
                          Originally posted by etwatson View Post
                          This did the job:
                          Code:
                          samtools -f4 file.srt.bam | awk 'BEGIN{OFS="\n"}{print "@"$1,$10,"+",$11}'> reads.fastq
                          Just a caution: You can't get the original reads back by just extracting the 10th column of the SAM.

                          2 issues result in the 10th column sequences not corresponding precisely to the original reads. Hard clipping results in the sequence stored in the SAM file to be truncated, compared to the original FASTQ read. And secondary alignments result in the same read appearing more than once in the SAM file. See this figure from Heng Li's paper for details on clipping and how reads are stored:
                          Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in ...
                          Last edited by nstoler; 03-05-2015, 11:46 AM.

                          Comment

                          • etwatson
                            Member
                            • Jun 2012
                            • 18

                            #14
                            Originally posted by nstoler View Post
                            Just a caution: Hard clipping results in the sequence stored in the SAM file to be truncated, compared to the original FASTQ read. And secondary alignments result in the same read appearing more than once in the SAM file.
                            Now I'm confused. I am using the -f4 flag to extract unmapped reads. Are unmapped reads altered?

                            Comment

                            • nstoler
                              Junior Member
                              • May 2013
                              • 3

                              #15
                              Originally posted by etwatson View Post
                              Now I'm confused. I am using the -f4 flag to extract unmapped reads. Are unmapped reads altered?
                              The SAM format does not store reads; it stores alignments. The original reads can be reconstructed from the alignments, but it's not as simple as just extracting the sequence in column 10.

                              For a good example why, take a look at this figure, paying attention to "hard clipping":
                              Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in ...

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by SEQadmin2


                                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                                Here are nine questions we think about, in roughly the order they matter, before...
                                06-18-2026, 07:11 AM
                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-26-2026, 11:10 AM
                              0 responses
                              12 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-17-2026, 06:09 AM
                              0 responses
                              48 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-09-2026, 11:58 AM
                              0 responses
                              106 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              125 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...