Announcement

Collapse
No announcement yet.

Fastq: Paired end reads and mapping

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fastq: Paired end reads and mapping

    Hi,
    I took a pair of paired-end reads (PE1.fq and PE2.fq) and mapped it to my reference database using bwa tool and obtained an ouput in SAM format. Then I split the Aligned and Unaligned reads from the sam file using ViewSam from Picard.

    However, I see that for some reads, one end is mapped/aligned and the other end is unaligned. Is this normal?

    The problem at a later stage is that even if they are split into aligned and unaligned their flag is still set to "paired reads" and so if you try to convert either of the split-files (Aligned.sam and Unaligned.sam) back to fastq format using "SamToFastq" module of picard, it gives error showing that the other end of the paired read isn't found!

    I just wrote a script to extract the fastq format file from this myself because of this issue. However, I would like to know the likeliness of this error.

    Thank you.

  • #2
    Try running picard FixMateInformation (or something like that) after ViewSam to correct the mate pair information in the resulting sam files.

    I guess it is not unexpected to see fragments where only one read has been mapped - this could happen for example when the other read comes from some inserted sequence that is not present in the reference genome. We do see such read pairs in our own data.

    Comment


    • #3
      Li,

      Thanks for your reply. I already tried fixing for Mate information and duplicates and still Picard was not able to extract the fastq incase of paired reads where 1 read was missing and ended up in error. I guess, SamToFastq checks for a flag which tells if its a paired read of not (and if it is, then the other one must exist), but ViewSam doesn't necessarily rewrite this information for the reads which were separated. Anyway I just extracted the fastq files from the sam file myself, it wasn't difficult.

      However, its nice to know that this can happen. Thanks once again!

      Comment


      • #4
        FixMateInformation complains about incorrect mate pair information

        I was trying to use SRMA to perform local realignment of my RNA-seq data. It complained about some reads having bad mate information. I think it is because the mates of some of my reads mapped to chromosomes that are not in my bam reference file (long story). I was hoping to use FixMateInformation to correct this issue, but I get the same complaint from Picard when trying to run this. Is there a way to over-ride this check in Picard?

        java -jar picard-tools-1.47/FixMateInformation.jar INPUT=HS0639_7.bam OUTPUT=HS0639_7.bam
        [Fri Jun 17 16:24:03 PDT 2011] net.sf.picard.sam.FixMateInformation INPUT=[HS0639_7.rmdup.broad.sort.bam] OUTPUT=HS0639_7.rmdup.broad.sort.matefix.bam TMP_DIR=/tmp/rmorin VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
        INFO 2011-06-17 16:24:03 FixMateInformation Sorting input into queryname order.
        [Fri Jun 17 16:24:03 PDT 2011] net.sf.picard.sam.FixMateInformation done. Elapsed time: 0.00 minutes.
        Runtime.totalMemory()=758054912
        Exception in thread "main" java.lang.RuntimeException: SAM validation error: ERROR: Record 938, Read name SOLEXA3_60:2:4:1712:630, Mate Alignment start should be 0 because reference name = *.
        at net.sf.samtools.SAMUtils.processValidationErrors(SAMUtils.java:334)
        at net.sf.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:469)
        at net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:450)
        at net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:417)
        at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:629)
        at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:607)
        at net.sf.picard.sam.FixMateInformation.doWork(FixMateInformation.java:146)
        at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:158)
        at net.sf.picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:118)
        at net.sf.picard.sam.FixMateInformation.main(FixMateInformation.java:74)

        Comment


        • #5
          Try this option:
          Code:
          VALIDATION_STRINGENCY=SILENT
          .
          Last edited by nilshomer; 06-17-2011, 10:23 PM.

          Comment


          • #6
            Originally posted by cedance View Post
            However, I see that for some reads, one end is mapped/aligned and the other end is unaligned. Is this normal?
            Yes, normal but undesirable. It can and does happen for good reasons.

            For example, with poor quality reads (or if you are mapping a different strain) one read might match within the thresholds, but the other might be too different.

            Another example is if you are mapping against an unfinished genome, one read might map to a contig but the partner would map to the unassembled region off the end of the contig.

            Comment


            • #7
              Thanks Nils.
              Does VALIDATION_STRINGENCY=SILENT apply to running SRMA or will this only work for Picard tools? In other words, I'm wondering if FixMateInformation is a prerequisite for running SRMA successfully, or if I can simply run SRMA.

              Thanks.
              Ryan

              Comment


              • #8
                It should work with all tools that use the Picard library (like SRMA). No, FixMateInformation is not a pre-requisite for SRMA. In fact, you will lose mate information with SRMA. See: http://sourceforge.net/apps/mediawik...ng_information

                Comment

                Working...
                X