Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tophat2 high discordant alignments

    Hi,

    I am mapping paired end RNAseq data using tophat2, but the alignment summary generated is showing I am getting a very high discordant alignment rate. The only tophat options I am specifying is -p 16 and -o "DIR". Below is the output from tophat2:


    PHP Code:
    Left reads:
              
    Input     :  88556961
               Mapped   
    :  76938162 (86.9of input)
                
    of these:  20429665 (26.6%) have multiple alignments (622137 have >20)
    Right reads:
              
    Input     :  88556961
               Mapped   
    :  75252663 (85.0of input)
                
    of these:  20114304 (26.7%) have multiple alignments (621700 have >20)
    Unpaired reads:
              
    Input     :     68008
               Mapped   
    :     56927 (83.7of input)
                
    of these:      8045 (14.1%) have multiple alignments (9 have >20)
    85.9overall read mapping rate.

    Aligned pairs:  65389463
         of these
    :  18622479 (28.5%) have multiple alignments
                    61775607 
    (94.5%) are discordant alignments
     4.1
    concordant pair alignment rate
    The flagstat output I get is also below:

    PHP Code:
    341377625 0 in total (QC-passed reads QC-failed reads)
    189129873 0 secondary
    0 supplimentary
    0 duplicates
    341377625 
    0 mapped (100.00%:-nan%)
    152190825 0 paired in sequencing
    76938162 
    0 read1
    75252663 
    0 read2
    263998 
    0 properly paired (0.17%:-nan%)
    130778926 0 with itself and mate mapped
    21411899 
    0 singletons (14.07%:-nan%)
    116104620 0 with mate mapped to a different chr
    80799382 
    0 with mate mapped to a different chr (mapQ>=5

    I am using cutadapt to remove adapters and remove low quality reads and that is running fine. But then when I pass the paired files onto tophat, the results don't seem good. From what I have read, it is to do with the mate pairs no longer being in sync in the two fastq files. Is there a way around this and to get the number of discordant alignments down?

    I have tried aligning the fastq files with tophat2 without passing the files through cutadapt first and the alignment is fine and there is a very low discordant alignment rate, so I'm guessing the fastq files are good, but something is happening after the cutadapt step.
    Just as a note, I am not using Galaxy for the analysis.

    Thanks

  • #2
    Originally posted by ea11 View Post
    From what I have read, it is to do with the mate pairs no longer being in sync in the two fastq files. Is there a way around this and to get the number of discordant alignments down?
    Thanks
    Use a paired-end aware trimmer like trimmomatic/BBDuk (from BBMap) which keep the paired end files in sync post trimming.

    That said, if you are happy with the cutadapt results and just want to fix the PE read order you can do so by using repair.sh from BBMap (paired end reads in two files example): http://seqanswers.com/forums/showpos...0&postcount=45

    Comment


    • #3
      Thanks for the reply. I though cutadapt did that with the -p option to specify paired end data.
      I shall give BBDuk a try and see the results. I was not happy with the results of trimmomatic on my data, so staying away from that trimmer for now.

      Thanks

      Comment


      • #4
        Just checking. You are not switching the R1/R2 files when you use them as input for tophat by mistake? That will produce discordant results for obvious reasons.

        Comment


        • #5
          Nope I am not. R1 files are before the R2 files in the script.

          Comment


          • #6
            BBMap will do spliced alignments so after you use BBDuk you may want to give BBMap a try on the side while you do your TopHat2 runs.

            Comment


            • #7
              Thanks, I shall have a read and see what the results look like with BBDuk/BBMap while my tophat jobs are running

              Comment

              Latest Articles

              Collapse

              • seqadmin
                The Impact of AI in Genomic Medicine
                by seqadmin



                Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                02-26-2024, 02:07 PM
              • seqadmin
                Multiomics Techniques Advancing Disease Research
                by seqadmin


                New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

                A major leap in the field has
                ...
                02-08-2024, 06:33 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 02-23-2024, 04:11 PM
              0 responses
              57 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 02-21-2024, 08:52 AM
              0 responses
              67 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 02-20-2024, 08:57 AM
              0 responses
              56 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 02-14-2024, 09:19 AM
              0 responses
              65 views
              0 likes
              Last Post seqadmin  
              Working...
              X