Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problems using TopHat with RNA-seq data

    Hi

    I am totally new to all this stuff and promptly ran into problems using Tophat for aligning RNAseq-data.

    I have an invertebrate draft genome around 100megabases in 7000 scaffolds and I wanted to align PE RNAseq-data to the genome with Tophat. I wanted to use the output in Maker for annotation of the genome, because I have not yet done a de-novo-assembly.

    Tophat finished without any errors but the results have been a tiny accepted_hits.bam and a huge unmapped.bam.

    So what I did was:

    - build the genome index with bowtie2-build
    - trimmed the 76 bp paired-end reads with Trimmomatic (leading: 20 trailing: 20 minlen: 50)
    - ran Tophat with default parameters, except: -r 150 (insert size 250 - 300bp)


    I am not sure which of the many logs are important, some also seem to contain redundant information, but here are:

    - tophat.log:

    [2015-02-06 17:50:08] Beginning TopHat run (v2.0.0)
    -----------------------------------------------
    [2015-02-06 17:50:08] Checking for Bowtie
    Bowtie version: 2.0.0.6
    [2015-02-06 17:50:08] Checking for Samtools
    Samtools version: 0.1.18.0
    [2015-02-06 17:50:08] Checking for Bowtie index files
    [2015-02-06 17:50:08] Checking for reference FASTA file
    Warning: Could not find FASTA file H2genome.fa
    [2015-02-06 17:50:08] Reconstituting reference FASTA file from Bowtie index
    Executing: /local64/usr_local/bin/bowtie2-inspect H2genome > ./tophat_out/tmp/H2genome.fa
    [2015-02-06 17:50:15] Generating SAM header for H2genome
    format: fastq
    quality scale: phred33 (default)
    [2015-02-06 17:50:17] Preparing reads
    left reads: min. length=50, count=31002899
    right reads: min. length=50, count=31002899
    [2015-02-06 18:15:30] Mapping left_kept_reads against H2genome with Bowtie2
    [2015-02-06 18:39:35] Mapping left_kept_reads_seg1 against H2genome with Bowtie2 (1/3)
    [2015-02-06 18:44:16] Mapping left_kept_reads_seg2 against H2genome with Bowtie2 (2/3)
    [2015-02-06 18:49:13] Mapping left_kept_reads_seg3 against H2genome with Bowtie2 (3/3)
    [2015-02-06 18:56:30] Mapping right_kept_reads against H2genome with Bowtie2
    [2015-02-06 19:20:49] Mapping right_kept_reads_seg1 against H2genome with Bowtie2 (1/3)
    [2015-02-06 19:25:54] Mapping right_kept_reads_seg2 against H2genome with Bowtie2 (2/3)
    [2015-02-06 19:31:06] Mapping right_kept_reads_seg3 against H2genome with Bowtie2 (3/3)
    [2015-02-06 19:36:59] Searching for junctions via segment mapping
    [2015-02-06 19:41:42] Retrieving sequences for splices
    [2015-02-06 19:41:48] Indexing splices
    [2015-02-06 19:43:16] Mapping left_kept_reads_seg1 against segment_juncs with Bowtie2 (1/3)
    [2015-02-06 19:45:25] Mapping left_kept_reads_seg2 against segment_juncs with Bowtie2 (2/3)
    [2015-02-06 19:47:56] Mapping left_kept_reads_seg3 against segment_juncs with Bowtie2 (3/3)
    [2015-02-06 19:50:27] Joining segment hits
    [2015-02-06 20:06:22] Mapping right_kept_reads_seg1 against segment_juncs with Bowtie2 (1/3)
    [2015-02-06 20:08:41] Mapping right_kept_reads_seg2 against segment_juncs with Bowtie2 (2/3)
    [2015-02-06 20:11:01] Mapping right_kept_reads_seg3 against segment_juncs with Bowtie2 (3/3)
    [2015-02-06 20:13:48] Joining segment hits
    [2015-02-06 20:28:45] Reporting output tracks

    - reports.log:

    [samopen] SAM header is present: 6973 sequences.
    Loaded 7 junctions
    Reporting final accepted alignments...done.
    Printing junction BED track...done
    Printing insertions...done
    Printing deletions...done
    Found 5 junctions from happy spliced reads

    - bowtie.left_kept_reads.fixmap.log:

    49466 out of 31052365 reads have been filtered out
    31002899 reads; of these:
    31002899 (100.00%) were unpaired; of these:
    5185062 (16.72%) aligned 0 times
    23947202 (77.24%) aligned exactly 1 time
    1870635 (6.03%) aligned >1 times
    83.28% overall alignment rate

    - bowtie.right_kept_reads.fixmap.log:

    31002899 reads; of these:
    31002899 (100.00%) were unpaired; of these:
    5185062 (16.72%) aligned 0 times
    23947202 (77.24%) aligned exactly 1 time
    1870635 (6.03%) aligned >1 times
    83.28% overall alignment rate

    So it seems that most of the reads align to the genome, right (>80%)? Although, to me, it looks a little bit odd that the right reads have exactly the same values as the left reads?

    Also, in any of the "bowtie.right/left_kept_reads_seg.fixmap.log" files, the alignment rate is much lower.

    e.g. bowtie.right_kept_reads_seg2.fixmap.log:

    9740245 reads; of these:
    9740245 (100.00%) were unpaired; of these:
    6453402 (66.26%) aligned 0 times
    2647488 (27.18%) aligned exactly 1 time
    639355 (6.56%) aligned >1 times
    33.74% overall alignment rate

    So I am really interested to understand what has happened in my run and why Tophat has found almost no accepted hits and junctions?

    Any suggestions?

    Thanks in advance!

  • #2
    Sorry,

    could be a false alarm. I may have simple messed up my fastq files. I am rerunning the alignment right now.

    Sorry again

    Comment


    • #3
      Q20 is a pretty high threshold for quality trimming. Depending on your data quality, you may shorten your reads substantially, which will make them less sensitive to alternative splicing. Also, since you are (probably) doing a quantitative analysis, very aggressive trimming and discarding of reads shorter than 50bp can cause bias, as there is a correlation between bases/motifs and quality values.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Recent Advances in Sequencing Analysis Tools
        by seqadmin


        The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
        05-06-2024, 07:48 AM
      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        04-22-2024, 07:01 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 05-14-2024, 07:03 AM
      0 responses
      17 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 05-10-2024, 06:35 AM
      0 responses
      40 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 05-09-2024, 02:46 PM
      0 responses
      50 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 05-07-2024, 06:57 AM
      0 responses
      41 views
      0 likes
      Last Post seqadmin  
      Working...
      X