Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Low hits in express

    Hi,

    I'm working on RNA-seq data (Illumina Hi-Seq). After Trimmomatic, CLC (de novo assembly) and MIRA assembly and adding the Debris, I run Tophat to map my raw reads against the transcriptome.

    Here's the code and the alignment summary:

    tophat -p 64 -o Path/to/Output /path/to/bowtie/database /path/to/Paired_forward.fq.gz /path/to/Paired_reverse.fq.gz

    The Output I get is:

    Left reads
    Input: 11157893
    Mapped: 10404612 (93.2% of Input)
    of These: 8239736 (79.2%) have multiple alignments (1167061 have >20)

    Right reads
    Input: 11157893
    Mapped: 10404612 (93.2% of Input)
    of These: 8239736 (79.2%) have multiple alignments (1167061 have >20)

    93.2% Overall read mapping rate.

    Aligned pairs: 10404612
    of These: 8239769 (79.2%) have multiple alignments
    10404497 (100.0%) are discordant alignments
    0.0% concordant pair alignment rate.
    This already does look not too good for me.
    After sorting with samtools (samtools -n), express gives me about 10-30 hits per sample.

    Does anybody have any idea what might be going wrong here?

    Thanks a lot!

  • #2
    Having 80% of your data multi-map (when 93+% of the data is mapping) seems to indicate that there is some issue with either the data you started with or the assembly that was generated.

    Was the quality of the sequences you started with good (reasonably clean FastQC results)? Is this data for an organism with no reference genome (or is a closely related genome available)? How did the assembly look (in terms of number of contigs, length of contigs, genes you expected to find)?

    Comment


    • #3
      Originally posted by GenoMax View Post
      Having 80% of your data multi-map (when 93+% of the data is mapping) seems to indicate that there is some issue with either the data you started with or the assembly that was generated.
      I think it's OK in this case, as he's mapping RNA-seq data to a transcriptome. I'm more concerned with the fact that almost 100% of the alignments are discordant, which indicates the reads don't go together, or pair ordering was broken.

      First, you need to re-pair the reads, or perhaps even validate that they are from the same library. You could alternatively reprocess them from the raw reads. Be sure to process read 1 and read 2 at the same time to keep them together. In fact, it might help if you displayed all the command lines you used up for processing up to mapping.

      Secondly, for mapping RNA to a transcriptome, you should not use Tophat2, which is designed for spliced alignments to a genome; it's better to use Bowtie2 or something else.

      EDIT: Looking at the statistics again, I think the mapping command was wrong, and the same file was specified twice. Note that the exact same number of read 1 was mapped as read 2, with the exact same number having multiple alignments, and the same number with mapq>20; that can't be a coincidence.
      Last edited by Brian Bushnell; 07-11-2015, 02:10 PM.

      Comment


      • #4
        Brian: Oh yes, very good observation! I completely missed that. I will check it right away. Thanks

        Max: The quality of the sequences was good. Yes, I sequenced the transcriptome of an ant species and there are no genomes of closely related species out there. That's why I had to do a denovo assembly. I agree that there seems to be a problem with the data and hope (also it would be a bit shameful..) that it is what Brian suggested

        Comment


        • #5
          Indeed, I accidently used the Reverse reads twice rather than 1x reverse and 1x forward. This completely solved the problem. Thanks guys!

          Comment


          • #6
            Ouch!

            What did the new stats look like (compared to one's posted above)?

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Recent Advances in Sequencing Analysis Tools
              by seqadmin


              The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
              05-06-2024, 07:48 AM
            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              04-22-2024, 07:01 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 05-14-2024, 07:03 AM
            0 responses
            19 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 05-10-2024, 06:35 AM
            0 responses
            42 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 05-09-2024, 02:46 PM
            0 responses
            53 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 05-07-2024, 06:57 AM
            0 responses
            42 views
            0 likes
            Last Post seqadmin  
            Working...
            X