Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fastq: Paired end reads and mapping

    Hi,
    I took a pair of paired-end reads (PE1.fq and PE2.fq) and mapped it to my reference database using bwa tool and obtained an ouput in SAM format. Then I split the Aligned and Unaligned reads from the sam file using ViewSam from Picard.

    However, I see that for some reads, one end is mapped/aligned and the other end is unaligned. Is this normal?

    The problem at a later stage is that even if they are split into aligned and unaligned their flag is still set to "paired reads" and so if you try to convert either of the split-files (Aligned.sam and Unaligned.sam) back to fastq format using "SamToFastq" module of picard, it gives error showing that the other end of the paired read isn't found!

    I just wrote a script to extract the fastq format file from this myself because of this issue. However, I would like to know the likeliness of this error.

    Thank you.

  • #2
    Try running picard FixMateInformation (or something like that) after ViewSam to correct the mate pair information in the resulting sam files.

    I guess it is not unexpected to see fragments where only one read has been mapped - this could happen for example when the other read comes from some inserted sequence that is not present in the reference genome. We do see such read pairs in our own data.

    Comment


    • #3
      Li,

      Thanks for your reply. I already tried fixing for Mate information and duplicates and still Picard was not able to extract the fastq incase of paired reads where 1 read was missing and ended up in error. I guess, SamToFastq checks for a flag which tells if its a paired read of not (and if it is, then the other one must exist), but ViewSam doesn't necessarily rewrite this information for the reads which were separated. Anyway I just extracted the fastq files from the sam file myself, it wasn't difficult.

      However, its nice to know that this can happen. Thanks once again!

      Comment


      • #4
        FixMateInformation complains about incorrect mate pair information

        I was trying to use SRMA to perform local realignment of my RNA-seq data. It complained about some reads having bad mate information. I think it is because the mates of some of my reads mapped to chromosomes that are not in my bam reference file (long story). I was hoping to use FixMateInformation to correct this issue, but I get the same complaint from Picard when trying to run this. Is there a way to over-ride this check in Picard?

        java -jar picard-tools-1.47/FixMateInformation.jar INPUT=HS0639_7.bam OUTPUT=HS0639_7.bam
        [Fri Jun 17 16:24:03 PDT 2011] net.sf.picard.sam.FixMateInformation INPUT=[HS0639_7.rmdup.broad.sort.bam] OUTPUT=HS0639_7.rmdup.broad.sort.matefix.bam TMP_DIR=/tmp/rmorin VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
        INFO 2011-06-17 16:24:03 FixMateInformation Sorting input into queryname order.
        [Fri Jun 17 16:24:03 PDT 2011] net.sf.picard.sam.FixMateInformation done. Elapsed time: 0.00 minutes.
        Runtime.totalMemory()=758054912
        Exception in thread "main" java.lang.RuntimeException: SAM validation error: ERROR: Record 938, Read name SOLEXA3_60:2:4:1712:630, Mate Alignment start should be 0 because reference name = *.
        at net.sf.samtools.SAMUtils.processValidationErrors(SAMUtils.java:334)
        at net.sf.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:469)
        at net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:450)
        at net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:417)
        at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:629)
        at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:607)
        at net.sf.picard.sam.FixMateInformation.doWork(FixMateInformation.java:146)
        at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:158)
        at net.sf.picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:118)
        at net.sf.picard.sam.FixMateInformation.main(FixMateInformation.java:74)

        Comment


        • #5
          Try this option:
          Code:
          VALIDATION_STRINGENCY=SILENT
          .
          Last edited by nilshomer; 06-17-2011, 10:23 PM.

          Comment


          • #6
            Originally posted by cedance View Post
            However, I see that for some reads, one end is mapped/aligned and the other end is unaligned. Is this normal?
            Yes, normal but undesirable. It can and does happen for good reasons.

            For example, with poor quality reads (or if you are mapping a different strain) one read might match within the thresholds, but the other might be too different.

            Another example is if you are mapping against an unfinished genome, one read might map to a contig but the partner would map to the unassembled region off the end of the contig.

            Comment


            • #7
              Thanks Nils.
              Does VALIDATION_STRINGENCY=SILENT apply to running SRMA or will this only work for Picard tools? In other words, I'm wondering if FixMateInformation is a prerequisite for running SRMA successfully, or if I can simply run SRMA.

              Thanks.
              Ryan

              Comment


              • #8
                It should work with all tools that use the Picard library (like SRMA). No, FixMateInformation is not a pre-requisite for SRMA. In fact, you will lose mate information with SRMA. See: http://sourceforge.net/apps/mediawik...ng_information

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM
                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 03-27-2024, 06:37 PM
                0 responses
                12 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-27-2024, 06:07 PM
                0 responses
                11 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-22-2024, 10:03 AM
                0 responses
                53 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-21-2024, 07:32 AM
                0 responses
                69 views
                0 likes
                Last Post seqadmin  
                Working...
                X