Announcement

Collapse
No announcement yet.

More problems with Tophat (v1.3.2)

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • More problems with Tophat (v1.3.2)

    I ran tophat version 1.3.2 to align my deep sequencing data to my reference genome via the following command

    tophat --solexa1.3-quals –p 6 –o /control/8905X1/tophat /rice_index/rice_index /8905X1/8905X1.txt

    It generated a file called accepted_hits.bam.
    I converted the file to a bed file using the bamToBed tool.
    Then split the file up based on chromosomes. Here are the results:
    331723028 May 11 09:26 accepted_hits_bed_Chr1
    119325730 May 11 09:26 accepted_hits_bed_Chr10
    123994839 May 11 09:26 accepted_hits_bed_Chr11
    121898474 May 11 09:27 accepted_hits_bed_Chr12
    416137943 May 11 09:27 accepted_hits_bed_Chr2
    334052893 May 11 09:27 accepted_hits_bed_Chr3
    161529836 May 11 09:27 accepted_hits_bed_Chr4
    347228298 May 11 09:28 accepted_hits_bed_Chr5
    189465060 May 11 09:28 accepted_hits_bed_Chr6
    200695493 May 11 09:28 accepted_hits_bed_Chr7
    171194241 May 11 09:28 accepted_hits_bed_Chr8
    759976461 May 11 09:29 accepted_hits_bed_Chr9
    2184791 May 11 09:29 accepted_hits_bed_ChrSy
    539606 May 11 09:29 accepted_hits_bed_ChrUn

    Tophat overloaded chromosome 9 which is one of the smaller chromosomes.

    Unless this can be resolved, I recommend not using Tophat.

  • #2
    To all,
    I discovered my problem with my RNA-seq data. It looks like there was some rRNA contamination in my sample which accounted for 5% of the total reads in the sample. The rRNA genes are located on chromosome 9. There are also some regions on chromosome 2 that show high homology to the rRNA genes on chromosome 9. This is the cause of the reads overloading chromosomes 9 and 2.

    The latest version of Tophat works fine.

    If other people are having problems with reads overloading a chromosome, check to see if there is rRNA contamination.
    Thanks all.

    Comment


    • #3
      With any RNA-seq data, you're almost always going to get rRNA contamination. It makes subsequent mapping quicker if you filter out the rRNA reads first (i.e. prior to any other mapping that is done).

      Comment

      Working...
      X