Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by offspring View Post
    It's included, lines 85-89 in the current version. However, the bug has been fixed in TopHat 2.0.9.
    That's what it looked like. However, I tried running this earlier today on TopHat 2.0.9 data. The script ran successfully. When I ran Picard on the merged file, I got "Mapped mate should have mate reference name" errors.

    Eventually, Picard terminated with this:
    Code:
    Exception in thread "main" net.sf.samtools.SAMException: Exception when processing alignment for BAM index HISEQ01:514:H812AADXX:1:2105:9732:29000 1/2 51b unmapped read.
    	at net.sf.samtools.BAMFileWriter.writeAlignment(BAMFileWriter.java:120)
    	at net.sf.samtools.SAMFileWriterImpl.addAlignment(SAMFileWriterImpl.java:168)
    	at net.sf.picard.sam.AddOrReplaceReadGroups.doWork(AddOrReplaceReadGroups.java:100)
    	at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
    	at net.sf.picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:119)
    	at net.sf.picard.sam.AddOrReplaceReadGroups.main(AddOrReplaceReadGroups.java:66)
    Caused by: net.sf.samtools.SAMException: Exception creating BAM index for record HISEQ01:514:H812AADXX:1:2105:9732:29000 1/2 51b unmapped read.
    	at net.sf.samtools.BAMIndexer.processAlignment(BAMIndexer.java:94)
    	at net.sf.samtools.BAMFileWriter.writeAlignment(BAMFileWriter.java:117)
    	... 5 more
    I went back to this thread and the only problem I could think of was that last step. If that's not the issue, do you know what may be? This is my first attempt to merge and process TopHat BAM files.
    Last edited by id0; 08-19-2013, 01:50 PM.

    Comment


    • #17
      Originally posted by id0 View Post
      That's what it looked like. However, I tried running this earlier today on TopHat 2.0.9 data. The script ran successfully. When I ran Picard on the merged file, I got "Mapped mate should have mate reference name" errors.

      Eventually, Picard terminated with this:
      Code:
      Exception in thread "main" net.sf.samtools.SAMException: Exception when processing alignment for BAM index HISEQ01:514:H812AADXX:1:2105:9732:29000 1/2 51b unmapped read.
      	at net.sf.samtools.BAMFileWriter.writeAlignment(BAMFileWriter.java:120)
      	at net.sf.samtools.SAMFileWriterImpl.addAlignment(SAMFileWriterImpl.java:168)
      	at net.sf.picard.sam.AddOrReplaceReadGroups.doWork(AddOrReplaceReadGroups.java:100)
      	at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
      	at net.sf.picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:119)
      	at net.sf.picard.sam.AddOrReplaceReadGroups.main(AddOrReplaceReadGroups.java:66)
      Caused by: net.sf.samtools.SAMException: Exception creating BAM index for record HISEQ01:514:H812AADXX:1:2105:9732:29000 1/2 51b unmapped read.
      	at net.sf.samtools.BAMIndexer.processAlignment(BAMIndexer.java:94)
      	at net.sf.samtools.BAMFileWriter.writeAlignment(BAMFileWriter.java:117)
      	... 5 more
      I went back to this thread and the only problem I could think of was that last step. If that's not the issue, do you know what may be? This is my first attempt to merge and process TopHat BAM files.
      Hmm, the mate unmapped problem looks like the culprit then indeed. You can check your unmapped.bam file with the following command:

      Code:
      samtools view -f 0x8 unmapped.bam
      If the file is OK, there should be at least a couple of read pairs (e.g. two reads with the same name) in there. I'll re-check what TopHat does and follow up with a new bug report to the developers if needed.

      I disabled this specific piece of fixup code for TopHat > 2.0.8. You can enable it manually by simply changing "False" to "True" on line 43 for now.

      EDIT: I can confirm that the bug in TopHat is still there. I've updated the script to make the fixup code unconditional again.
      Last edited by offspring; 08-20-2013, 05:21 AM.

      Comment


      • #18
        Originally posted by offspring View Post
        Hmm, the mate unmapped problem looks like the culprit then indeed. You can check your unmapped.bam file with the following command:

        Code:
        samtools view -f 0x8 unmapped.bam
        If the file is OK, there should be at least a couple of read pairs (e.g. two reads with the same name) in there. I'll re-check what TopHat does and follow up with a new bug report to the developers if needed.

        I disabled this specific piece of fixup code for TopHat > 2.0.8. You can enable it manually by simply changing "False" to "True" on line 43 for now.

        EDIT: I can confirm that the bug in TopHat is still there. I've updated the script to make the fixup code unconditional again.
        Looks like you already checked yourself, but just to confirm, I am not getting any reads with that flag. The BAM file has over 100,000,000 reads, so that should definitely be plenty.

        Thanks for the quick follow-up! I'll try the updated script and see how it goes.

        Also, I tried running Picard on the faulty merged file without CREATE_INDEX argument. It still spits out "Mapped mate should have mate reference name" errors, but completes without terminating. Indexing with samtools works. The merged file still has the same number of reads before and after Picard processing, so the erroneous reads are not being thrown away. The old script might still be useable.

        Comment


        • #19
          I am still getting Picard "Mapped mate should have mate reference name" errors with the new fixed unmapped.bam. Perhaps that particular step is not the culprit.

          Comment


          • #20
            Originally posted by id0 View Post
            I am still getting Picard "Mapped mate should have mate reference name" errors with the new fixed unmapped.bam. Perhaps that particular step is not the culprit.
            Yes, this error occurred for files generated with TopHat < 2.0.7. Basically I messed up the order in which the processing steps should be performed. The updated version should fix this.

            Comment


            • #21
              I downloaded the newest fix_tophat_unmapped_reads.py. I saw the changed order. I ran it on TopHat 2.0.9 data, but I am still getting the "Mapped mate should have mate reference name" errors.

              Sorry about the continued complaints.

              Comment


              • #22
                Unfortunately I can't reproduce this with the datasets I have available. Since the datasets are probably too large to share, could you run some tests and report the results? The following would be interesting for a start:

                Take the read recorded in the Picard exception, e.g. HISEQ01:514:H812AADXX:1:2105:9732:29000 in the example you posted earlier.

                Look for the read in both accepted_hits.bam and unfixed unmapped.bam as well as fixed unmapped.bam:

                Code:
                samtools view accepted_hits.bam | grep HISEQ01:514:H812AADXX:1:2105:9732:29000
                
                samtools view unmapped.bam | grep HISEQ01:514:H812AADXX:1:2105:9732:29000
                
                samtools view unmapped_fixup.bam | grep HISEQ01:514:H812AADXX:1:2105:9732:29000
                Hopefully we can pin down the problem.

                Comment


                • #23
                  Originally posted by offspring View Post
                  Unfortunately I can't reproduce this with the datasets I have available. Since the datasets are probably too large to share, could you run some tests and report the results? The following would be interesting for a start:

                  Take the read recorded in the Picard exception, e.g. HISEQ01:514:H812AADXX:1:2105:9732:29000 in the example you posted earlier.

                  Look for the read in both accepted_hits.bam and unfixed unmapped.bam as well as fixed unmapped.bam:

                  Code:
                  samtools view accepted_hits.bam | grep HISEQ01:514:H812AADXX:1:2105:9732:29000
                  
                  samtools view unmapped.bam | grep HISEQ01:514:H812AADXX:1:2105:9732:29000
                  
                  samtools view unmapped_fixup.bam | grep HISEQ01:514:H812AADXX:1:2105:9732:29000
                  Hopefully we can pin down the problem.
                  As expected, accepted_hits.bam does not match anything.

                  Column 5 in unmapped.bam is 255, but is changed to 0 in unmapped_fixup.bam.

                  I trimmed my starting FASTQs to just 100,000 reads for testing. The resulting BAMs are less than 10 MBs combined. I posted them here if you want to check for yourself:



                  (Whoever is reading this thread a few weeks from now, these will probably be broken links. Sorry.)

                  Comment


                  • #24
                    Can you reproduce the problem you described with the downsampled files you provided?

                    I am getting the following Picard exception with them:

                    Exception in thread "main" net.sf.picard.PicardException: Discordant contig lengths: read chr1 LN=195471971, ref chr1 LN=249250621
                    To me this seems to indicate a problem with how those files were derived from the originals.

                    Comment


                    • #25
                      I think we are just using different reference sequences. I was using the iGenomes version of mm10.

                      I trimmed the original FASTQs to generate those BAMs.

                      Which Picard tool are you using? I just tried using SortSam, which does not require a reference sequence. That still gives me the same "Mapped mate should have mate reference name" errors for the merged BAM.

                      Code:
                      java -Xmx16G -jar ${PICARD_ROOT}/SortSam.jar \
                      VERBOSITY=WARNING QUIET=true VALIDATION_STRINGENCY=LENIENT \
                      SORT_ORDER=coordinate \
                      INPUT=all.bam \
                      OUTPUT=all.sortsam.bam
                      These are the specific errors (just the first few):
                      Code:
                      Ignoring SAM validation error: ERROR: Record 113550, Read name HISEQ01:267:H77T6ADXX:1:1101:15356:2352, Mapped mate should have mate reference name
                      Ignoring SAM validation error: ERROR: Record 114728, Read name HISEQ01:267:H77T6ADXX:1:1101:6588:2773, Mapped mate should have mate reference name
                      Ignoring SAM validation error: ERROR: Record 115145, Read name HISEQ01:267:H77T6ADXX:1:1101:15768:2826, Mapped mate should have mate reference name
                      Ignoring SAM validation error: ERROR: Record 115811, Read name HISEQ01:267:H77T6ADXX:1:1101:12628:3135, Mapped mate should have mate reference name
                      Ignoring SAM validation error: ERROR: Record 116103, Read name HISEQ01:267:H77T6ADXX:1:1101:19872:3119, Mapped mate should have mate reference name
                      Last edited by id0; 09-30-2013, 07:31 AM.

                      Comment


                      • #26
                        You were right, with the mm10 genome I can reproduce the problem. I'll see what I can find.

                        Comment


                        • #27
                          The problematic reads are unmapped ones in unmapped.bam with the "mate is unmapped" flag not set. This means their mates would be expected to be mapped and in accepted_hits.bam; however they are not in there. This is a problem, as my script needs these mapped reads to update some fields in their unmapped mates. Because these fields cannot be updated, Picard barfs later on. The samtools output you posted earlier indicates this is also the problem in the original files.

                          Do you filter your fastq files in any way before handing them to TopHat?

                          Comment


                          • #28
                            The FASTQs were not filtered.

                            I am thinking this may be due to TopHat parameters like "--prefilter-multihits" to exclude multi-mapped reads. I believe those reads would not end up in either accepted_hits.bam or unmapped.bam.

                            Comment


                            • #29
                              Yes, that's quite possible. We don't use any options like this in our TopHat commandline, as far as I see.

                              Comment


                              • #30
                                Short update: the script posted earlier in this thread has seen some updates (Python 3 compatibility, better argument handling etc) and is now called tophat_recondition.

                                It can be found here: https://github.com/cbrueffer/tophat-recondition

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Genetic Variation in Immunogenetics and Antibody Diversity
                                  by seqadmin



                                  The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                                  11-06-2024, 07:24 PM
                                • seqadmin
                                  Choosing Between NGS and qPCR
                                  by seqadmin



                                  Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                                  10-18-2024, 07:11 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 11-08-2024, 11:09 AM
                                0 responses
                                211 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 11-08-2024, 06:13 AM
                                0 responses
                                157 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 11-01-2024, 06:09 AM
                                0 responses
                                80 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-30-2024, 05:31 AM
                                0 responses
                                27 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X