Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bob-loblaw
    replied
    Another problem that just popped into my head, if tophat tries to align all reads first without looking for splicing, won't I just have the same problem as before that a lot of human reads will be falsely identified as being bacterial? Or do you know if Tophat will first try to align everything without splicing, then with splicing and only return the best hit?

    Leave a comment:


  • bob-loblaw
    replied
    Originally posted by chadn737 View Post
    The size limit on the index is a problem. You could go ahead and combine them and see if what they told you was true. If it is you will only get an error message.

    As for using Tophat on bacterial reads. Tophat will try to align reads first to the genome before looking for splicing. Ideally, all the bacterial reads will align to the bacterial genome in this first round and not be splice. I won't say that wont happen, because inevitably some will have some sort of mismatch and show up spliced.

    Have you tried aligning reads to the bacterial genome and then to the human? Or has it only been human than bacterial?
    I haven't tried aligning reads to the bacterial genome then to human, but originally we were using bowtie2 to map human reads (which only mapped a few thousand reads per file compared to the tens of millions that tophat mapped for the same file). Then when we did the bowtie2 to map bacterial reads we got about 5 or 10 times as many bacterial reads being mapped as we did when we used tophat to align human reads. (So few human reads were being aligned by bowtie2 it gives me an indication of what doing bowtie2 for bacterial reads before tophat for human would result in). Basically I think no matter which alignment we do first we'll have the same problem, that if bacterial goes first then we'll get a lot of false positives, and vice versa for if human goes first. Thanks for all your help here! I'll defiantly be trying a tophat run with a database of both human and bacterial as soon as I can!

    Finally if I could ask you one more question, what about the no discordant options that I mentioned in the OP? Do you think I use that parameter when running tophat? Or should I just go with the default settings?
    Last edited by bob-loblaw; 02-15-2013, 03:21 AM.

    Leave a comment:


  • chadn737
    replied
    The size limit on the index is a problem. You could go ahead and combine them and see if what they told you was true. If it is you will only get an error message.

    As for using Tophat on bacterial reads. Tophat will try to align reads first to the genome before looking for splicing. Ideally, all the bacterial reads will align to the bacterial genome in this first round and not be splice. I won't say that wont happen, because inevitably some will have some sort of mismatch and show up spliced.

    Have you tried aligning reads to the bacterial genome and then to the human? Or has it only been human than bacterial?

    Leave a comment:


  • bob-loblaw
    replied
    Originally posted by chadn737 View Post
    Are you mapping your reads first to one and then the other or at the same time? Ideally it shouldn't make a differences. The way you described it, where you map with tophat to human then got fewer reads with bowtie2 mapping to bacterial genomes makes me wonder if you are not mapping some of the bacterial reads to the human genome? Its similar to the problem of mapping reads to only part of the genome, rather than the whole genome. Tophat, bowtie2, or any tool will try to map the read no matter what. Maybe a read is genuinely from one genome, but if that genome is absent, it will settle for the best it can get from the reference you give it. Maybe combine your two references, map to both simultaneously, and see what results.
    First to one, then to another. I had thought about this before, but when building the bacterial database we hit the max size of a reference database or and index that bowtie2 can build (well that's what I've been told, it was built just before I started this project). This is defiantly something to look into though, thanks!

    If I was going to be mapping both human and bacterial simultaneously, we'd have to use tophat in order to efficiently map the human reads (human reads comprise a large amount of the reads in our samples), do you (or anyone else who see's this post) know how using tophat to map bacterial reads would work out? since tophat was designed to look for spliced reads?
    Last edited by bob-loblaw; 02-14-2013, 09:38 AM.

    Leave a comment:


  • chadn737
    replied
    Are you mapping your reads first to one and then the other or at the same time? Ideally it shouldn't make a differences. The way you described it, where you map with tophat to human then got fewer reads with bowtie2 mapping to bacterial genomes makes me wonder if you are not mapping some of the bacterial reads to the human genome? Its similar to the problem of mapping reads to only part of the genome, rather than the whole genome. Tophat, bowtie2, or any tool will try to map the read no matter what. Maybe a read is genuinely from one genome, but if that genome is absent, it will settle for the best it can get from the reference you give it. Maybe combine your two references, map to both simultaneously, and see what results.

    Leave a comment:


  • Optimizing tophat mapping for mixed RNA-Seq data

    Hi all,

    I’m currently using Tophat and bowtie2 to map 100bp PE RNA-Seq reads from a mixed human/bacterial sample. We’re more interested in the bacterial side of things, but there's plenty that we can learn from the human reads too. We originally used bowtie2 to map human reads to hg19, and then another bowtie2 to map bacterial reads. However we then switched to tophat for obvious reasons and redid the processing, and obviously a much larger number of human reads were mapping. But when we repeated the bowtie2 run for bacterial reads we had significantly less reads map.

    We’ve also repeated tophat on a few different settings to try find whats optimal. The no-discordant option in tophat changes the results quite a lot both for the amount of human reads mapped, and the number of bacterial reads mapped. I haven’t looked into the biological outcomes of this yet, but the differences in the amount of reads has me concerned, and the bacterial reads that come out from the file that were preprocessed with tophat on the default settings the no-discordant run
    I’ve looked into the differences between bacterial reads mapped by bowtie2 after tophat run with default settings and tophat run with the no-discordant option and they only share about 0.0007% of the bacterial reads, which is very odd.

    Basically I’m wondering if anyone could shed light on why the different tophat parametres have such a huge impact on the amount of reads which bowtie2 later identifies as being bacterial??

    Also any general advice would be appreciated
    Thanks

Latest Articles

Collapse

  • seqadmin
    Recent Advances in Sequencing Analysis Tools
    by seqadmin


    The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
    05-06-2024, 07:48 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 10:28 AM
0 responses
9 views
0 likes
Last Post seqadmin  
Started by seqadmin, Yesterday, 07:35 AM
0 responses
12 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-22-2024, 02:06 PM
0 responses
8 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-14-2024, 07:03 AM
0 responses
28 views
0 likes
Last Post seqadmin  
Working...
X