Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • balaena
    Member
    • Jan 2015
    • 29

    Problems using TopHat with RNA-seq data

    Hi

    I am totally new to all this stuff and promptly ran into problems using Tophat for aligning RNAseq-data.

    I have an invertebrate draft genome around 100megabases in 7000 scaffolds and I wanted to align PE RNAseq-data to the genome with Tophat. I wanted to use the output in Maker for annotation of the genome, because I have not yet done a de-novo-assembly.

    Tophat finished without any errors but the results have been a tiny accepted_hits.bam and a huge unmapped.bam.

    So what I did was:

    - build the genome index with bowtie2-build
    - trimmed the 76 bp paired-end reads with Trimmomatic (leading: 20 trailing: 20 minlen: 50)
    - ran Tophat with default parameters, except: -r 150 (insert size 250 - 300bp)


    I am not sure which of the many logs are important, some also seem to contain redundant information, but here are:

    - tophat.log:

    [2015-02-06 17:50:08] Beginning TopHat run (v2.0.0)
    -----------------------------------------------
    [2015-02-06 17:50:08] Checking for Bowtie
    Bowtie version: 2.0.0.6
    [2015-02-06 17:50:08] Checking for Samtools
    Samtools version: 0.1.18.0
    [2015-02-06 17:50:08] Checking for Bowtie index files
    [2015-02-06 17:50:08] Checking for reference FASTA file
    Warning: Could not find FASTA file H2genome.fa
    [2015-02-06 17:50:08] Reconstituting reference FASTA file from Bowtie index
    Executing: /local64/usr_local/bin/bowtie2-inspect H2genome > ./tophat_out/tmp/H2genome.fa
    [2015-02-06 17:50:15] Generating SAM header for H2genome
    format: fastq
    quality scale: phred33 (default)
    [2015-02-06 17:50:17] Preparing reads
    left reads: min. length=50, count=31002899
    right reads: min. length=50, count=31002899
    [2015-02-06 18:15:30] Mapping left_kept_reads against H2genome with Bowtie2
    [2015-02-06 18:39:35] Mapping left_kept_reads_seg1 against H2genome with Bowtie2 (1/3)
    [2015-02-06 18:44:16] Mapping left_kept_reads_seg2 against H2genome with Bowtie2 (2/3)
    [2015-02-06 18:49:13] Mapping left_kept_reads_seg3 against H2genome with Bowtie2 (3/3)
    [2015-02-06 18:56:30] Mapping right_kept_reads against H2genome with Bowtie2
    [2015-02-06 19:20:49] Mapping right_kept_reads_seg1 against H2genome with Bowtie2 (1/3)
    [2015-02-06 19:25:54] Mapping right_kept_reads_seg2 against H2genome with Bowtie2 (2/3)
    [2015-02-06 19:31:06] Mapping right_kept_reads_seg3 against H2genome with Bowtie2 (3/3)
    [2015-02-06 19:36:59] Searching for junctions via segment mapping
    [2015-02-06 19:41:42] Retrieving sequences for splices
    [2015-02-06 19:41:48] Indexing splices
    [2015-02-06 19:43:16] Mapping left_kept_reads_seg1 against segment_juncs with Bowtie2 (1/3)
    [2015-02-06 19:45:25] Mapping left_kept_reads_seg2 against segment_juncs with Bowtie2 (2/3)
    [2015-02-06 19:47:56] Mapping left_kept_reads_seg3 against segment_juncs with Bowtie2 (3/3)
    [2015-02-06 19:50:27] Joining segment hits
    [2015-02-06 20:06:22] Mapping right_kept_reads_seg1 against segment_juncs with Bowtie2 (1/3)
    [2015-02-06 20:08:41] Mapping right_kept_reads_seg2 against segment_juncs with Bowtie2 (2/3)
    [2015-02-06 20:11:01] Mapping right_kept_reads_seg3 against segment_juncs with Bowtie2 (3/3)
    [2015-02-06 20:13:48] Joining segment hits
    [2015-02-06 20:28:45] Reporting output tracks

    - reports.log:

    [samopen] SAM header is present: 6973 sequences.
    Loaded 7 junctions
    Reporting final accepted alignments...done.
    Printing junction BED track...done
    Printing insertions...done
    Printing deletions...done
    Found 5 junctions from happy spliced reads

    - bowtie.left_kept_reads.fixmap.log:

    49466 out of 31052365 reads have been filtered out
    31002899 reads; of these:
    31002899 (100.00%) were unpaired; of these:
    5185062 (16.72%) aligned 0 times
    23947202 (77.24%) aligned exactly 1 time
    1870635 (6.03%) aligned >1 times
    83.28% overall alignment rate

    - bowtie.right_kept_reads.fixmap.log:

    31002899 reads; of these:
    31002899 (100.00%) were unpaired; of these:
    5185062 (16.72%) aligned 0 times
    23947202 (77.24%) aligned exactly 1 time
    1870635 (6.03%) aligned >1 times
    83.28% overall alignment rate

    So it seems that most of the reads align to the genome, right (>80%)? Although, to me, it looks a little bit odd that the right reads have exactly the same values as the left reads?

    Also, in any of the "bowtie.right/left_kept_reads_seg.fixmap.log" files, the alignment rate is much lower.

    e.g. bowtie.right_kept_reads_seg2.fixmap.log:

    9740245 reads; of these:
    9740245 (100.00%) were unpaired; of these:
    6453402 (66.26%) aligned 0 times
    2647488 (27.18%) aligned exactly 1 time
    639355 (6.56%) aligned >1 times
    33.74% overall alignment rate

    So I am really interested to understand what has happened in my run and why Tophat has found almost no accepted hits and junctions?

    Any suggestions?

    Thanks in advance!
  • balaena
    Member
    • Jan 2015
    • 29

    #2
    Sorry,

    could be a false alarm. I may have simple messed up my fastq files. I am rerunning the alignment right now.

    Sorry again

    Comment

    • Brian Bushnell
      Super Moderator
      • Jan 2014
      • 2709

      #3
      Q20 is a pretty high threshold for quality trimming. Depending on your data quality, you may shorten your reads substantially, which will make them less sensitive to alternative splicing. Also, since you are (probably) doing a quantitative analysis, very aggressive trimming and discarding of reads shorter than 50bp can cause bias, as there is a correlation between bases/motifs and quality values.

      Comment

      Latest Articles

      Collapse

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by SEQadmin2, 06-09-2026, 11:58 AM
      0 responses
      22 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-05-2026, 10:09 AM
      0 responses
      27 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-04-2026, 08:59 AM
      0 responses
      38 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-02-2026, 12:03 PM
      0 responses
      61 views
      0 reactions
      Last Post SEQadmin2  
      Working...