Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bowtie and Tophat

    We are currently trying to analyse Solid RNA-seq data using tophat and/or bowtie.
    With default settings tophat is able to align about 35% of reads, whereas bowtie on default is able to align 53%!!
    I'm assuming that tophat first runs bowtie then tried to align reads that span exon splice junctions, how is it therefore aligning less reads that bowtie alone!?
    With some parameter adjustments (i.e. --best flag) we are able to get up to 70% reads mapped using bowtie alone but in doing this we are not able to map the splice junction reads and the --best flag cannot be set for bowtie when running it through tophat.
    Does anyone know if it is possible to get tophat to output only the splice juntion reads (i.e. everything bowtie would not have aligned) so they can be added to the bowtie output, or is there a fairly simple way of altering the bowtie parameters when running it within tophat?

    As you might have guessed, I am only just begining my journey into the worderful world of bioinformatics so go easy on me!

    Any help or advise would be greatly received!!

    Huw

  • #2
    We also can't understand why tophat is less able to map reads than bowtie despite it using bowtie for the initial mapping?

    Comment


    • #3
      Hi,

      Hopefully, this is the right place to post my question:

      I ran TopHat v1.2.0 on single reads (40 ntds) generated with Illumina GA IIx. These were the used options:

      tophat --max-multihits 2\
      --segment-mismatches 2\
      --library-type fr-unstranded\
      -p 4\
      -o sample_tophat_out\
      hg19 sample.fastq

      After getting the results, I tried to collect some statistics about the runs. Interestingly, in the generated log folder for each of the samples, there are two files that contain statistics data such as:

      ==> fileq5okX8.log <==
      # reads processed: 28183908
      # reads with at least one reported alignment: 1081639 (3.84%)
      # reads that failed to align: 26764052 (94.96%)
      # reads with alignments suppressed due to -m: 338217 (1.20%)
      Reported 1219857 alignments to 1 output stream(s)

      ==> fileueW7Yw.log <==
      # reads processed: 28183908
      # reads with at least one reported alignment: 20657663 (73.30%)
      # reads that failed to align: 3984728 (14.14%)
      # reads with alignments suppressed due to -m: 3541517 (12.57%)
      Reported 23408344 alignments to 1 output stream(s)

      There is no documentation on TopHat's website on what these two files actually represent. As you can see, the summaries look pretty different - I am not sure why there are two files in the first place, nor do I understand why there is such a large difference between the % of aligned reads.

      There are also two additional files that seem to contain relevant data: reports.log and prep_reads.log. Does anyone know what the results presented in all these files are?

      Thank you so much!
      Alexandra

      Comment


      • #4
        Looks like the files in log folder are not much informative. It is quite easy to collect statistics about the mapping by looking at 'accepted_hits.bam'.

        Comment


        • #5
          This post helps a lot to the question here:

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          25 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          28 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          24 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          52 views
          0 likes
          Last Post seqadmin  
          Working...
          X