Header Leaderboard Ad


Bowtie and Tophat



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bowtie and Tophat

    We are currently trying to analyse Solid RNA-seq data using tophat and/or bowtie.
    With default settings tophat is able to align about 35% of reads, whereas bowtie on default is able to align 53%!!
    I'm assuming that tophat first runs bowtie then tried to align reads that span exon splice junctions, how is it therefore aligning less reads that bowtie alone!?
    With some parameter adjustments (i.e. --best flag) we are able to get up to 70% reads mapped using bowtie alone but in doing this we are not able to map the splice junction reads and the --best flag cannot be set for bowtie when running it through tophat.
    Does anyone know if it is possible to get tophat to output only the splice juntion reads (i.e. everything bowtie would not have aligned) so they can be added to the bowtie output, or is there a fairly simple way of altering the bowtie parameters when running it within tophat?

    As you might have guessed, I am only just begining my journey into the worderful world of bioinformatics so go easy on me!

    Any help or advise would be greatly received!!


  • #2
    We also can't understand why tophat is less able to map reads than bowtie despite it using bowtie for the initial mapping?


    • #3

      Hopefully, this is the right place to post my question:

      I ran TopHat v1.2.0 on single reads (40 ntds) generated with Illumina GA IIx. These were the used options:

      tophat --max-multihits 2\
      --segment-mismatches 2\
      --library-type fr-unstranded\
      -p 4\
      -o sample_tophat_out\
      hg19 sample.fastq

      After getting the results, I tried to collect some statistics about the runs. Interestingly, in the generated log folder for each of the samples, there are two files that contain statistics data such as:

      ==> fileq5okX8.log <==
      # reads processed: 28183908
      # reads with at least one reported alignment: 1081639 (3.84%)
      # reads that failed to align: 26764052 (94.96%)
      # reads with alignments suppressed due to -m: 338217 (1.20%)
      Reported 1219857 alignments to 1 output stream(s)

      ==> fileueW7Yw.log <==
      # reads processed: 28183908
      # reads with at least one reported alignment: 20657663 (73.30%)
      # reads that failed to align: 3984728 (14.14%)
      # reads with alignments suppressed due to -m: 3541517 (12.57%)
      Reported 23408344 alignments to 1 output stream(s)

      There is no documentation on TopHat's website on what these two files actually represent. As you can see, the summaries look pretty different - I am not sure why there are two files in the first place, nor do I understand why there is such a large difference between the % of aligned reads.

      There are also two additional files that seem to contain relevant data: reports.log and prep_reads.log. Does anyone know what the results presented in all these files are?

      Thank you so much!


      • #4
        Looks like the files in log folder are not much informative. It is quite easy to collect statistics about the mapping by looking at 'accepted_hits.bam'.


        • #5
          This post helps a lot to the question here:



          Latest Articles


          • seqadmin
            A Brief Overview and Common Challenges in Single-cell Sequencing Analysis
            by seqadmin

            ​​​​​​The introduction of single-cell sequencing has advanced the ability to study cell-to-cell heterogeneity. Its use has improved our understanding of somatic mutations1, cell lineages2, cellular diversity and regulation3, and development in multicellular organisms4. Single-cell sequencing encompasses hundreds of techniques with different approaches to studying the genomes, transcriptomes, epigenomes, and other omics of individual cells. The analysis of single-cell sequencing data i...

            01-24-2023, 01:19 PM
          • seqadmin
            Introduction to Single-Cell Sequencing
            by seqadmin
            Single-cell sequencing is a technique used to investigate the genome, transcriptome, epigenome, and other omics of individual cells using high-throughput sequencing. This technology has provided many scientific breakthroughs and continues to be applied across many fields, including microbiology, oncology, immunology, neurobiology, precision medicine, and stem cell research.

            The advancement of single-cell sequencing began in 2009 when Tang et al. investigated the single-cell transcriptomes
            01-09-2023, 03:10 PM
          • seqadmin
            AVITI from Element Biosciences: Latest Sequencing Technologies—Part 6
            by seqadmin
            Element Biosciences made its sequencing market debut this year when it released AVITI, its first sequencer. The AVITI System uses avidity sequencing, a novel sequencing chemistry that delivers higher quality data, decreases cycle times, and requires lower reagent concentrations. This new instrument reportedly features lower operating and start-up costs while maintaining quality sequencing.

            Read type and length
            AVITI is a short-read benchtop sequencer that also offers an innovative...
            12-29-2022, 10:44 AM